GitHub Agentic Workflows

Outcomes

Outcomes describe what happened after a safe output landed in a repository. Safe outputs record what a workflow did. Outcomes record the repository state that can be observed afterward.

For example, a pull request can be merged or closed, an issue can remain relevant or be dismissed, and a comment can lead to follow-up activity or be ignored. Outcome data is based on repository state, not on the workflow’s self-assessment.

This page defines the common outcome states, summarizes what accepted means across safe output types, and lists the telemetry and cost rollups built from that data.

Token and cost data are necessary, but they are not enough. A workflow can become cheaper because it became more efficient, or because it simply did less useful work. Outcomes make that difference visible by relating effective tokens to accepted results.

Outcome efficiency is measured as effective tokens divided by accepted outcomes. Lower is better: a lower value means the workflow spent less effective AI work per accepted result.

To support that measurement, every evaluated output is classified into an outcome state. These states provide the base vocabulary for the rest of the page.

OutcomeMeaning
acceptedThe result was kept, merged, completed, or otherwise accepted by the repository state.
rejectedThe result was explicitly undone, closed, removed, or not accepted.
pendingThe result exists, but has not reached a terminal state yet.
ignoredThe result received no meaningful follow-up within the evaluation window.
lifecycleClosed or removed by the workflow itself as part of its normal operation (for example, a close-older-issues workflow) — not a rejection.
lifecycle_closeA close_issue or close_pull_request output where the close actor was a lifecycle bot (for example, a stale bot) and no visible non-bot actor has since reopened it.

An accepted outcome is the simplest useful unit for measuring workflow effectiveness. Typical examples include merged pull requests, issues that remained relevant and were completed, and labels or comments that stuck and were acted on.

Accepted outcomes are intentionally simpler than a full value model. They do not try to rank one accepted result as inherently more important than another.

The table below is the quick lookup for what accepted currently means for each safe output type and whether that meaning comes from a dedicated rule, a fallback rule, a limited check, or no implemented rule yet.

Rows marked fallback rule use a generic existence check, not a type-specific rule. For exact rules, edge cases, and conformance details, see Safe Output Outcome Evaluation Specification.

Outcome evaluation is based on visible repository state and visible actor identity. A non-bot actor may still be AI-assisted; the lookup reflects what the system can observe, not hidden authoring provenance.

Safe output typeaccepted at a glanceCurrent rule source
create_pull_requestmergeddedicated rule
create_issuecompleted/closeddedicated rule
add_commentreacted to or replied todedicated rule
add_labelslabel retentionlimited check
add_reviewerreviewer acted or request remained/was removeddedicated rule
update_issueintended edit still matches current issue statededicated rule
update_pull_requestintended edit still matches current PR statededicated rule
close_issuestill closeddedicated rule
close_pull_requeststill closeddedicated rule
close_discussionnone yetno implemented rule yet
create_discussionnone yetno implemented rule yet
update_discussiondiscussion target existsfallback rule
create_pull_request_review_commentnone yetno implemented rule yet
submit_pull_request_reviewreview affected PR lifecyclededicated rule
reply_to_pull_request_review_commentreview target existsfallback rule
resolve_pull_request_review_threadnone yetno implemented rule yet
push_to_pull_request_branchmergeddedicated rule
mark_pull_request_as_ready_for_reviewrevieweddedicated rule
assign_to_agentmerged or completeddedicated rule
dispatch_workflowdispatch target existsfallback rule
autofix_code_scanning_alertalert target existsfallback rule
create_code_scanning_alertalert target existsfallback rule
link_sub_issuesub-issue link target existsfallback rule
hide_commentnone yetno implemented rule yet
assign_milestonemilestone still setdedicated rule
update_projectproject target existsfallback rule
update_releaserelease target existsfallback rule
noopskippedskipped
missing_toolskippedskipped

Outcome data is derived from safe outputs and later checked against repository state. The system records the safe output produced by the workflow, looks up the affected repository object later, and classifies the observed state into an outcome.

This makes outcome evaluation external and observable. The workflow does not decide whether it succeeded; the repository state does.

Outcome information appears in OpenTelemetry spans and related artifacts. Workflow-level rollups such as accepted counts and acceptance rate are emitted on outcome summary or conclusion spans, and per-item spans can carry more detailed fields such as object type, URL, comments, review activity, and zero-touch acceptance.

For the span-level attribute inventory, see OpenTelemetry.

Outcomes are most useful when read together with cost data. At the workflow level, the basic questions are how many effective tokens a workflow spent, how many accepted outcomes it produced, and how many effective tokens each accepted outcome cost.

The basic dashboard for outcomes is therefore intentionally small: total effective tokens, total accepted outcomes, effective tokens per accepted outcome, a trend over time, and a workflow ranking by effective tokens per accepted outcome.

For simple workflows, a single run is usually the right unit for outcome measurement.

For orchestrated workflows, multiple runs can belong to one logical execution. In that case, the more meaningful unit is the episode. Outcome and cost totals can be rolled up from runs into episodes using simple sums, and then from episodes into workflow totals and repository totals.

The outcomes model is deliberately narrow. It does not try to estimate the full business value of a workflow, replace human judgment for nuanced quality questions, combine deterministic compute cost and inference cost into one synthetic score, or solve overlap and duplicate-work analysis in the first version.

Those questions may matter later, but they are separate from the base outcomes model described here.