GitHub Agentic Workflows

Effective Tokens Specification

Version: 0.2.0 Status: Draft Publication Date: 2026-04-02 Editor: GitHub Agentic Workflows Team This Version: effective-tokens-specification Latest Published Version: This document


This specification defines Effective Tokens (ET), a normalized unit for measuring Large Language Model (LLM) usage across token classes, model-relative computational intensity, and multi-invocation execution graphs. ET provides a single unified metric for composite LLM workloads including multi-step pipelines, tool-augmented calls, sub-agent orchestration, and recursive inference.

This section describes the status of this document at the time of publication. This is a draft specification and may be updated, replaced, or made obsolete by other documents at any time.

This document is governed by the GitHub Agentic Workflows project specifications process.

  1. Introduction
  2. Conformance
  3. Terminology
  4. Token Accounting Model
  5. Multi-Invocation Aggregation
  6. Execution Graph Requirements
  7. Reporting
  8. Implementation Requirements
  9. Extensibility
  10. Compliance Testing
  11. Appendices
  12. Model Multiplier Registry
  13. Sync Notes
  14. References
  15. Change Log

Token counts reported by LLM APIs are not directly comparable: different token classes (input, cached, output, reasoning) carry different computational costs, and different models have different relative costs. Effective Tokens normalizes these variables into a single scalar that reflects true computational intensity, enabling consistent measurement and comparison across complex multi-agent systems.

This specification covers:

  • Definition of token classes and their default weights
  • The per-invocation ET computation formula
  • Aggregation across multi-invocation execution graphs
  • Structural requirements for invocation nodes and summary reports

This specification does NOT cover:

  • Billing, pricing, or cost allocation
  • Model selection or routing strategies
  • Streaming or partial token reporting

An ET implementation:

  1. Preserves raw token counts per invocation
  2. Normalizes across token classes using disclosed weights
  3. Normalizes across models using per-model multipliers
  4. Supports aggregation across any number of invocations
  5. Produces a single reproducible metric from identical inputs
  6. Carries no dependency on billing or pricing systems

Conforming implementation: An implementation that satisfies all MUST/SHALL requirements in this specification.

Partially conforming implementation: An implementation that satisfies core accounting requirements (Sections 4–5) but omits optional fields or extensions.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

  • Level 1 – Basic: Single-invocation ET computation (Section 4)
  • Level 2 – Standard: Multi-invocation aggregation and execution graph (Sections 5–6)
  • Level 3 – Complete: Full reporting and extensibility support (Sections 7–9)

ClassSymbolDescription
Input TokensITokens newly processed by the model
Cached Input TokensCTokens served via cache or prefix reuse
Output TokensOTokens generated by the model
Reasoning TokensRInternal tokens used during inference (optional)

The Copilot Multiplier (m) is a scalar representing the relative computational intensity of a model versus a defined baseline. Its value is model-specific and MUST be disclosed by the implementation.

A single LLM request-response cycle. Each invocation produces one set of token counts and yields one ET value.

Any invocation triggered by another LLM call or orchestration layer. Examples include tool-using agents, retrieval-augmented calls, planning/execution agents, and recursively delegated LLM calls.

A directed structure representing all invocations associated with a single top-level request. The root node has no parent; sub-agents reference their triggering invocation as their parent.


For each invocation, the raw total is:

raw_total_tokens = I + C + O + R

Default weights for the four token classes are:

Token ClassSymbolDefault Weight
Inputw_in1.0
Cached Inputw_cache0.1
Outputw_out4.0
Reasoningw_reason4.0

Implementations MAY override these values but MUST disclose the weights used in any reported output.

Per invocation:

base_weighted_tokens =
(w_in × I) + (w_cache × C) + (w_out × O) + (w_reason × R)
effective_tokens = m × base_weighted_tokens

For a request involving N invocations:

ET_total = Σ (m_i × base_weighted_tokens_i)

Each invocation MAY use a different model and multiplier.

raw_total_tokens = Σ (I_i + C_i + O_i + R_i)
total_invocations = N

This count MUST include the root call, all sub-agent calls, and all tool-triggered LLM calls.


Implementations MUST represent multi-call workflows as a directed execution graph.

Each node (invocation) MUST conform to:

{
"id": "string",
"parent_id": "string | null",
"model": {
"name": "string",
"copilot_multiplier": number
},
"usage": {
"input_tokens": number,
"cached_input_tokens": number,
"output_tokens": number,
"reasoning_tokens": number
},
"derived": {
"base_weighted_tokens": number,
"effective_tokens": number
},
"flagged": {
"code": "string",
"reason": "string"
}
}

The root invocation MUST have parent_id = null. It represents the user-facing request that initiates the execution graph.

Each sub-agent invocation MUST reference a valid parent_id. Sub-agent invocations MAY recursively spawn further invocations.

For execution graphs deeper than two levels, implementations MUST aggregate descendant Effective Tokens in stable post-order: fully observed leaf descendants first, then their nearest observed ancestors, and finally the parent node’s local invocation cost. When a parent has incomplete or unobservable descendants, the implementation MUST report the partial sum accumulated from the deepest observed descendants before adding any shallower fallback estimates, and SHOULD keep the parent node flagged until all known descendants are either observed or explicitly marked unobservable. Repeated computations over the same partially observed graph MUST produce the same partial-ordering and subtotal sequence.


A conforming response MUST include a summary object alongside the invocations array:

{
"summary": {
"total_invocations": number,
"raw_total_tokens": number,
"base_weighted_tokens": number,
"effective_tokens": number
},
"invocations": [ ... ]
}

All LLM calls MUST be included in the execution graph. Hidden or system-triggered calls MUST be counted.

Given identical inputs and multipliers, ET MUST be reproducible. Implementations SHOULD NOT introduce non-deterministic factors into the computation.

Implementations SHOULD version their token weights and model multipliers so that historical reports remain interpretable.

When sub-agents are not fully observable, implementations MUST still report aggregate totals. Invocation nodes with incomplete data SHOULD be flagged to indicate missing information.

Implementations must prevent unbounded ET accumulation from producing non-finite or non-interoperable outputs.

R-SAFE-001: ET aggregation logic MUST detect overflow and non-finite arithmetic states (NaN, +Inf, -Inf) before serializing output.

R-SAFE-002: Implementations MUST enforce a maximum ET ceiling of 9007199254740991 (2^53 - 1) for serialized numeric fields to preserve JavaScript-safe integer interoperability in cross-language pipelines.

R-SAFE-003: When computed ET exceeds the ceiling, implementations MUST clamp the reported summary.effective_tokens value to the ceiling and MUST emit a warning indicating that capping occurred.

R-SAFE-003A: When ET capping occurs, implementations MUST record a deterministic overflow condition using either flagged.code = "ET_OVERFLOW" on the affected root/subtree node or a deterministic error when no structured flag channel is available. The error/flag payload MUST include the ceiling value 9007199254740991 so operators can distinguish overflow from missing usage data.

R-SAFE-004: For long multi-agent chains, implementations SHOULD aggregate ET in a streaming manner (incremental updates per invocation) and SHOULD emit an early warning when running totals exceed 80% of the ceiling.

R-SAFE-005: For invocation nodes with incomplete usage payloads (unobservable sub-agents), implementations MUST serialize usage.input_tokens, usage.cached_input_tokens, usage.output_tokens, usage.reasoning_tokens, derived.base_weighted_tokens, and derived.effective_tokens as numeric zero (0) rather than omitting those fields.

R-SAFE-006: For invocation nodes that are incomplete/unobservable, implementations MUST include a flagged object with schema { "code": "UNOBSERVABLE_INVOCATION", "reason": string }. For fully observed invocation nodes, implementations MAY omit flagged.


Implementations MAY:

  • Add new token classes (e.g., tool_tokens)
  • Add latency or compute metadata per invocation node
  • Support streaming or partial progress updates

Extensions MUST NOT alter the core ET definition or the default weight values without disclosure.


  • T-ET-001: Single invocation with all four token classes produces correct base_weighted_tokens
  • T-ET-002: Single invocation ET equals m × base_weighted_tokens
  • T-ET-003: Zero-value token classes do not affect the result
  • T-ET-004: Custom weights are applied when default weights are overridden
  • T-ET-010: Multi-invocation ET_total equals the sum of per-invocation ET values
  • T-ET-011: raw_total_tokens equals the sum of all raw tokens across all invocations
  • T-ET-012: total_invocations count includes root, sub-agents, and tool-triggered calls
  • T-ET-020: Root node has parent_id = null
  • T-ET-021: All sub-agent nodes reference a valid parent_id
  • T-ET-022: Node schema includes all required fields
  • T-ET-030: Summary object is present in all conforming responses
  • T-ET-031: Summary values are consistent with per-invocation data
CategoryCount
Total tests defined12
Required tests12
Optional tests0

Count method: unique T-ET-* IDs in §10.1 (001–004, 010–012, 020–022, 030–031).

RequirementTest IDLevelStatus
Per-invocation base weighted tokensT-ET-001–0041Implemented
Per-invocation ET computationT-ET-0021Implemented
Multi-invocation aggregationT-ET-010–0122Implemented
Execution graph node schemaT-ET-020–0222Implemented
Summary reportingT-ET-030–0313Implemented
Custom weight disclosureT-ET-0041Implemented
Versioning of weights/multipliers3Recommended
Partial visibility flagging2Recommended

A request triggers three invocations: a root call, a retrieval sub-agent, and a final synthesis call.

{
"invocations": [
{
"id": "root",
"parent_id": null,
"model": { "name": "model-a", "copilot_multiplier": 2.0 },
"usage": {
"input_tokens": 500,
"cached_input_tokens": 200,
"output_tokens": 150,
"reasoning_tokens": 0
}
},
{
"id": "retrieval",
"parent_id": "root",
"model": { "name": "model-b", "copilot_multiplier": 1.0 },
"usage": {
"input_tokens": 300,
"cached_input_tokens": 0,
"output_tokens": 100,
"reasoning_tokens": 0
}
},
{
"id": "synthesis",
"parent_id": "root",
"model": { "name": "model-a", "copilot_multiplier": 2.0 },
"usage": {
"input_tokens": 200,
"cached_input_tokens": 100,
"output_tokens": 250,
"reasoning_tokens": 0
}
}
]
}
root:
base = (1.0 × 500) + (0.1 × 200) + (4.0 × 150) = 500 + 20 + 600 = 1120
ET = 2.0 × 1120 = 2240
retrieval:
base = (1.0 × 300) + (4.0 × 100) = 300 + 400 = 700
ET = 1.0 × 700 = 700
synthesis:
base = (1.0 × 200) + (0.1 × 100) + (4.0 × 250) = 200 + 10 + 1000 = 1210
ET = 2.0 × 1210 = 2420
{
"summary": {
"total_invocations": 3,
"raw_total_tokens": 1800,
"base_weighted_tokens": 3030,
"effective_tokens": 5360
}
}
ET_total = Σ [ m_i × (w_in × I_i + w_cache × C_i + w_out × O_i + w_reason × R_i) ]

With default weights:

ET_total = Σ [ m_i × (I_i + 0.1 C_i + 4 O_i + 4 R_i) ]

ET values are derived from token usage metadata. Implementations SHOULD treat per-invocation token data as potentially sensitive since usage patterns may reveal information about system prompts, model configurations, or user behavior. Aggregate ET values suitable for observability dashboards SHOULD be separated from detailed per-invocation data in access-controlled reporting systems.


The Copilot Multiplier (m) used in the ET formula is a per-model scalar that represents each model’s computational cost relative to the reference model. To ensure reproducibility and transparency, multiplier values MUST be sourced from a disclosed, versioned registry.

The authoritative registry for copilot_multiplier values in this implementation is the file:

pkg/cli/data/model_multipliers.json

This file is embedded at compile time into the gh-aw binary using a Go //go:embed directive in pkg/cli/effective_tokens.go. The registry format is:

{
"version": "string",
"description": "string",
"reference_model": "string",
"token_class_weights": {
"input": number,
"cached_input": number,
"output": number,
"reasoning": number,
"cache_write": number
},
"multipliers": {
"<model-name>": number
}
}

R-REG-001: The registry MUST declare a version field that changes whenever any multiplier value is added, removed, or modified.

R-REG-002: The registry MUST declare a reference_model field identifying the baseline model whose multiplier equals 1.0. All other multipliers are relative to this baseline.

R-REG-003: The registry MUST include token_class_weights for all four standard token classes: input, cached_input, output, and reasoning. A conforming implementation MUST use these weights as the default values for Section 4.2.

R-REG-004: Implementations MUST embed or bundle the registry at build time. Runtime fetching of multiplier values from an external source requires disclosure in reported output.

R-REG-005: When a model name is not present in the registry, implementations MUST treat the multiplier as 1.0 and SHOULD emit a warning noting that the model is unrecognized.

R-REG-006: Custom multipliers supplied by the caller (e.g., via API or configuration) MUST be merged with registry multipliers. Custom values take precedence and MUST be disclosed in any report that uses them.

R-REG-007: The registry MUST NOT contain placeholder values such as TBD, null, or empty strings for any model multiplier entry. Each declared model key MUST map to a numeric multiplier value.

R-REG-008: When adding support for a new model, maintainers MUST register the model in pkg/cli/data/model_multipliers.json with a concrete numeric multiplier before release. If calibration is incomplete, the model MUST be omitted from the registry and the implementation fallback behavior in R-REG-005 applies.

R-REG-009: When a model is scheduled for removal from the registry, it MUST remain in pkg/cli/data/model_multipliers.json with a deprecated marker in a comment or companion metadata field for at least one minor version before it is deleted. Implementations SHOULD emit a warning when a deprecated model is encountered at runtime, advising callers to migrate to a supported model. A model entry MUST NOT be silently removed between consecutive minor versions; removal without the one-version deprecation notice is a breaking change and MUST be accompanied by a major version bump of the registry version field.

The version field in model_multipliers.json corresponds to the registry schema version, not the gh-aw binary version. Implementations SHOULD include the registry version in all ET summary reports to enable historical reconstruction.


The Effective Tokens registry is maintained in pkg/cli/data/model_multipliers.json and loaded by pkg/cli/effective_tokens.go.

To keep specification and implementation synchronized:

  1. Update this specification’s registry requirements when adding, removing, or re-scaling model multipliers.
  2. Update pkg/cli/data/model_multipliers.json in the same change.
  3. When deprecating a model, add a deprecated comment alongside the entry and keep it in the registry for at least one minor version before removal (R-REG-009). Update the registry version field on removal.
  4. Verify loading and fallback behavior in pkg/cli/effective_tokens_test.go (TestModelMultipliersJSONEmbedded, TestResolveEffectiveWeightsDefault, and inventory checks).
  5. Run make build so the embedded registry is rebuilt into the gh-aw binary.

Conforming releases SHOULD include a test assertion for newly added model multipliers to ensure implementation-registry parity.



  • Added: Model Multiplier Registry section with normative requirements R-REG-001 through R-REG-009
  • Added: R-REG-009: model deprecation/sunset lifecycle norm (models must carry a deprecated marker for one minor version before removal)
  • Added: Compliance test skeleton file pkg/cli/effective_tokens_compliance_test.go with Go test stubs for T-ET-001..T-ET-031
  • Updated: Compliance checklist §10.2 status column from “Required” to “Implemented” for all test IDs T-ET-001–T-ET-031 (all tests now implemented and passing)
  • Audit (Appendix C — Security): Verified Appendix C requirements against pkg/cli/effective_tokens.go and pkg/cli/data/model_multipliers.json. Findings:
    • Sensitive usage patterns (Appendix C §1): Per-invocation token data is not exposed directly by the CLI; only aggregate TotalEffectiveTokens is surfaced in the audit output. Access control is delegated to GitHub repository permissions. No gaps found.
    • Aggregate vs. detailed data separation (Appendix C §2): The TokenUsageSummary.ByModel map contains per-model breakdowns but is only logged at DEBUG level, not included in default CLI output. No gaps found.
    • Registry exposure: The embedded model_multipliers.json contains only multiplier coefficients, not secrets or PII. No gaps found.
    • Follow-up: The spec does not address token data leakage via OTEL attributes. This is tracked as a separate concern (see §7.3 of the Experiments Specification for precedent).
  • Adopted W3C-style specification format
  • Added conformance levels (Basic, Standard, Complete)
  • Added compliance testing section with test IDs
  • Added Appendix C: Security Considerations
  • Clarified partial visibility requirements
  • Initial definition of Effective Tokens metric
  • Defined four token classes and default weights
  • Defined per-invocation and multi-invocation formulas
  • Defined execution graph node schema

Copyright 2026 GitHub Agentic Workflows Team. All rights reserved.