CorrectionOps
CorrectionOps is a workflow pattern that compares predictions with later human corrections.
Instead of retraining the model, CorrectionOps improves the workflow around the model. It stores predictions at decision time, compares them with later trusted human truth, and uses that evidence to update instructions, routing, thresholds, and rollout decisions.
The basic loop is simple:
- Save what the workflow predicted
- Collect what humans later decided
- Use the difference to improve the workflow
When to Use CorrectionOps
Section titled “When to Use CorrectionOps”Use CorrectionOps when you want to turn a human decision process into an agentic workflow iteratively rather than all at once.
It is a good fit when humans still make or correct the real decision, but you want the workflow to improve over time by updating instructions, routing, thresholds, or rollout state.
Typical fits include labeling and classification, routing and prioritization, moderation and approvals, and summaries or recommendations that humans later correct.
It is especially useful when the rollout path is gradual:
- Start with
staged: true - Keep evaluation and reporting in Ops
- Use later corrections to improve the workflow
- Promote to direct writes only when the evidence is strong enough
How It Works
Section titled “How It Works”A clean CorrectionOps setup has two long-lived surfaces. Production stays authoritative. Ops is the long-lived home for prediction, correction intake, reporting, instruction updates, and rollout control.
That means the workflows usually stay in Ops. Early on they report, compare, and adapt from Ops without writing back to production. After promotion they can write directly to production.
Most implementations reduce to three workflow classes: a thin relay that forwards stable facts into ops, a prediction workflow that persists snapshots and writes safely, and a compare/report/decide workflow that checks later human truth and updates the system when the evidence is strong enough.
The important rule is to keep relays, snapshot resolution, diffing, and grouping deterministic. Use the agent for semantic judgment, not for reconstructing event history or inferring provenance after the fact.
Example: Issue Labeling
Section titled “Example: Issue Labeling”flowchart TB
subgraph ProductionRepo[Production Repo]
A[Issue or item in production]
D[Later human correction in production]
B[Thin relay]
end
subgraph OpsRepo[Ops Repo]
C[Store prediction snapshot]
E[Collect correction evidence]
F[Build deterministic diff]
G[Publish report or open instruction PR]
H[Make rollout decision]
end
A -->|item-created event| B
B --> C
D -->|truth-feedback event| E
C --> F
E --> F
F --> G
G --> H
H -.->|improves next run| A
In this shape, production stays authoritative. Ops records the original prediction, collects later human corrections, builds the diff, and decides whether the workflow should stay staged, update its instructions, or graduate to direct writes.
---on: schedule: daily workflow_dispatch: repository_dispatch: types: [truth-feedback]permissions: contents: read issues: readsafe-outputs: create-issue: create-pull-request:---
# CorrectionOps Worker
Read persisted predictions and later trusted truth, compare them deterministically, then either publish a health report or open a draft PR updating instructions.CorrectionOps solves a different problem than model training. Reinforcement Learning from Human Feedback (RLHF) updates model weights from human feedback. CorrectionOps updates the workflow system around the model. In practice that usually means changing instruction files, routing rules, deterministic checks, thresholds, or rollout decisions rather than trying to retrain the engine.
In a healthy CorrectionOps loop, production truth stays authoritative, predictions are saved explicitly, corrections include provenance, and diffs are built deterministically before the agent is asked to reason about them.
CorrectionOps does not require a separate evaluation repository. The normal progression is to start with staged: true, then use ops-managed adaptation and gated review, then enable direct production writes once the evidence is strong enough.
Full Workflow Pieces
Section titled “Full Workflow Pieces”If you want the explicit workflow split, the same example usually breaks into four pieces.
1. Relay In The Source Repo
Section titled “1. Relay In The Source Repo”The relay only forwards stable facts and provenance into ops. It should not compute diffs, infer human intent, or decide whether the workflow was correct.
name: Relay Correction Signals
on: issues: types: [opened, labeled, unlabeled]
jobs: relay: runs-on: ubuntu-latest steps: - name: Forward stable facts to ops uses: actions/github-script@v8 with: github-token: ${{ secrets.OPS_DISPATCH_TOKEN }} script: | await github.rest.repos.createDispatchEvent({ owner: 'org', repo: 'ops-repo', event_type: context.payload.action === 'opened' ? 'item-created' : 'truth-feedback', client_payload: { data: { source_repository: `${context.repo.owner}/${context.repo.repo}`, source_type: 'issue', item_number: context.payload.issue.number, item_title: context.payload.issue.title, item_url: context.payload.issue.html_url, event_type: context.payload.action, label: context.payload.label?.name || null, actor: context.actor, actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human', occurred_at: new Date().toISOString(), }, }, });2. Prediction Workflow In Ops
Section titled “2. Prediction Workflow In Ops”The prediction workflow consumes normalized inputs, applies the current instructions, and persists a durable snapshot that can be compared later.
---name: Predict Items
on: schedule: daily workflow_dispatch: repository_dispatch: types: [item-created]
tools: github: toolsets: [issues, repos]
safe-outputs: create-issue: update-issue:---
# Predict Items
Read prepared items from `/tmp/gh-aw/agent/item-scan`, apply the current instructions, write review artifacts through safe outputs in Ops, and append a prediction snapshot containing the source identifier, predicted action, instruction version, and timestamp.3. Compare, Report, And Decide In Ops
Section titled “3. Compare, Report, And Decide In Ops”The review workflow reads persisted predictions and later human truth, builds deterministic diffs first, and only then asks the agent to summarize patterns or propose instruction updates.
---name: Review Corrections
on: schedule: weekly workflow_dispatch: inputs: mode: description: report or adaptation required: false default: report type: choice options: [report, adaptation]
safe-outputs: create-issue: create-pull-request:---
# Review Corrections
Read `correction-diffs.json` from `/tmp/gh-aw/agent/correction-review`. In `report` mode, publish a health summary. In `adaptation` mode, open a draft PR updating the instruction file only when the grouped evidence is strong enough.4. Optional Deterministic Collector
Section titled “4. Optional Deterministic Collector”Add a separate collector only when the later-truth boundary deserves its own trigger, permissions, or serialized write path.
name: Collect Corrections
on: repository_dispatch: types: [truth-feedback]
jobs: collect: runs-on: ubuntu-latest steps: - name: Resolve authoritative truth and store correction evidence run: ./scripts/store-correction-evidence.shStable Contracts To Define First
Section titled “Stable Contracts To Define First”Before adding rollout logic or adaptation prompts, define four small deterministic contracts:
- relay payload: the minimal source identity, object identity, event type, actor facts, and timestamps forwarded into ops
- prediction snapshot: the durable record of what the workflow predicted and under which instruction version
- correction review input: the deterministic diff artifact used by reporting and adaptation
- rollout gate contract: what evidence or approvals are required before direct production writes are enabled
Discussion labeling, routing, moderation, prioritization, approvals, and summaries can all reuse this shape. The production object changes, but the CorrectionOps setup does not.
Related Documentation
Section titled “Related Documentation”- Staged Mode for the optional safe-write rollout guidance inside CorrectionOps
- SideRepoOps for separating workflow infrastructure from the production repository
- MultiRepoOps for coordinating workflows across repository boundaries
- Safe Outputs Reference for controlling write targets and protections
- GitHub Tools for cross-repository reads and operations