Effective Tokens Specification

Version: 0.4.0 Status: Deprecated Publication Date: 2026-04-02 Deprecation Date: 2026-06-05 Editor: GitHub Agentic Workflows Team This Version: effective-tokens-specification Latest Published Version: This document

[!CAUTION] Deprecated. The Effective Tokens (ET) metric is deprecated and is superseded by AI Credits (AIC). AIC is a direct monetary metric (1 AIC = $0.01 USD) derived from actual provider pricing and provides a more actionable measure of inference cost. Effective Tokens remain available in gh aw logs and gh aw audit output for backward compatibility, but AIC is now the preferred metric for monitoring and budgeting. See Cost Management for AIC usage examples.

Abstract

This specification defines Effective Tokens (ET), a normalized unit for measuring Large Language Model (LLM) usage across token classes, model-relative computational intensity, and multi-invocation execution graphs. ET provides a single unified metric for composite LLM workloads including multi-step pipelines, tool-augmented calls, sub-agent orchestration, and recursive inference.

Status of This Document

This section describes the status of this document at the time of publication. This is a draft specification and may be updated, replaced, or made obsolete by other documents at any time.

This document is governed by the GitHub Agentic Workflows project specifications process.

Introduction
Conformance
Terminology
Token Accounting Model
Multi-Invocation Aggregation
Execution Graph Requirements
Reporting
Implementation Requirements
Extensibility
Compliance Testing
Appendices
Model Multiplier Registry
Sync Notes
Norms
References
Change Log

1. Introduction

1.1 Purpose

Token counts reported by LLM APIs are not directly comparable: different token classes (input, cached, output, reasoning) carry different computational costs, and different models have different relative costs. Effective Tokens normalizes these variables into a single scalar that reflects true computational intensity, enabling consistent measurement and comparison across complex multi-agent systems.

1.2 Scope

This specification covers:

Definition of token classes and their default weights
The per-invocation ET computation formula
Aggregation across multi-invocation execution graphs
Structural requirements for invocation nodes and summary reports

This specification does NOT cover:

Billing, pricing, or cost allocation
Model selection or routing strategies
Streaming or partial token reporting

1.3 Design Goals

An ET implementation:

Preserves raw token counts per invocation
Normalizes across token classes using disclosed weights
Normalizes across models using per-model multipliers
Supports aggregation across any number of invocations
Produces a single reproducible metric from identical inputs
Carries no dependency on billing or pricing systems

2. Conformance

2.1 Conformance Classes

Conforming implementation: An implementation that satisfies all MUST/SHALL requirements in this specification.

Partially conforming implementation: An implementation that satisfies core accounting requirements (Sections 4–5) but omits optional fields or extensions.

2.2 Requirements Notation

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

2.3 Compliance Levels

Level 1 – Basic: Single-invocation ET computation (Section 4)
Level 2 – Standard: Multi-invocation aggregation and execution graph (Sections 5–6)
Level 3 – Complete: Full reporting and extensibility support (Sections 7–9)

3. Terminology

3.1 Token Classes

Class	Symbol	Description
Input Tokens	I	Tokens newly processed by the model
Cached Input Tokens	C	Tokens served via cache or prefix reuse
Output Tokens	O	Tokens generated by the model
Reasoning Tokens	R	Internal tokens used during inference (optional)

3.2 Model Multiplier

The Copilot Multiplier (m) is a scalar representing the relative computational intensity of a model versus a defined baseline. Its value is model-specific and MUST be disclosed by the implementation.

3.3 Invocation

A single LLM request-response cycle. Each invocation produces one set of token counts and yields one ET value.

3.4 Sub-Agent

Any invocation triggered by another LLM call or orchestration layer. Examples include tool-using agents, retrieval-augmented calls, planning/execution agents, and recursively delegated LLM calls.

3.5 Execution Graph

A directed structure representing all invocations associated with a single top-level request. The root node has no parent; sub-agents reference their triggering invocation as their parent.

3.6 Execution-Graph Traversal Entities

For deterministic aggregation and reporting, implementations MUST distinguish the following traversal entities when processing an execution graph:

Local invocation cost: The ET computed from the current node’s own usage.* payload only.
Descendant contribution: The subtotal accumulated from child nodes and deeper descendants before the current node’s local invocation cost is added.
Observed subtree: A subtree whose invocation nodes have concrete usage payloads and therefore contribute measured ET rather than fallback zeros.
Unobservable subtree: A subtree whose invocation nodes are known to exist but whose concrete usage payloads are unavailable; these nodes remain part of traversal order even when their ET is serialized as 0.

4. Token Accounting Model

4.1 Raw Token Count

For each invocation, the raw total is:

raw_total_tokens = I + C + O + R

4.2 Token Class Weights

Default weights for the four token classes are:

Token Class	Symbol	Default Weight
Input	w_in	1.0
Cached Input	w_cache	0.1
Output	w_out	4.0
Reasoning	w_reason	4.0

Implementations MAY override these values but MUST disclose the weights used in any reported output.

4.3 Base Weighted Tokens

Per invocation:

effective_input_tokens = max(I - C, 0)

base_weighted_tokens =
    (w_in × effective_input_tokens) + (w_cache × C) + (w_out × O) + (w_reason × R)

When providers report cached reads (C) as part of input tokens (I), implementations MUST subtract cached input from I before applying w_in to avoid double counting.

To avoid ambiguity, conforming implementations MUST treat these symbols as follows:

I: total reported input tokens for the invocation
C: cached subset of that same input
w_in MUST be applied only to max(I - C, 0) (the non-cached portion)
w_cache MUST be applied only to C

Implementations MUST NOT charge the cached portion twice (once via w_in × I and again via w_cache × C).

4.4 Effective Tokens Per Invocation

effective_tokens = m × base_weighted_tokens

5. Multi-Invocation Aggregation

5.1 Total Effective Tokens

For a request involving N invocations:

ET_total = Σ (m_i × base_weighted_tokens_i)

Each invocation MAY use a different model and multiplier.

5.2 Total Raw Tokens

raw_total_tokens = Σ (I_i + C_i + O_i + R_i)

5.3 Invocation Count

total_invocations = N

This count MUST include the root call, all sub-agent calls, and all tool-triggered LLM calls.

6. Execution Graph Requirements

Implementations MUST represent multi-call workflows as a directed execution graph.

6.1 Node Schema

Each node (invocation) MUST conform to:

{
  "id": "string",
  "parent_id": "string | null",
  "model": {
    "name": "string",
    "copilot_multiplier": number
  },
  "usage": {
    "input_tokens": number,
    "cached_input_tokens": number,
    "output_tokens": number,
    "reasoning_tokens": number
  },
  "derived": {
    "base_weighted_tokens": number,
    "effective_tokens": number
  },
  "flagged": {
    "code": "string",
    "reason": "string"
  }
}

6.2 Root Invocation

The root invocation MUST have parent_id = null. It represents the user-facing request that initiates the execution graph.

6.3 Sub-Agent Invocations

Each sub-agent invocation MUST reference a valid parent_id. Sub-agent invocations MAY recursively spawn further invocations.

For execution graphs deeper than two levels, implementations MUST aggregate descendant Effective Tokens in stable post-order: fully observed leaf descendants first, then their nearest observed ancestors, and finally the parent node’s local invocation cost. When a parent has incomplete or unobservable descendants, the implementation MUST report the partial sum accumulated from the deepest observed descendants before adding any shallower fallback estimates, and SHOULD keep the parent node flagged until all known descendants are either observed or explicitly marked unobservable. Repeated computations over the same partially observed graph MUST produce the same partial-ordering and subtotal sequence.

Implementation ordering constraints for multi-invocation aggregation:

Traverse child subtrees in deterministic order (for example, stable sibling order by invocation ID or first-seen sequence).
For each subtree, aggregate fully observed deepest descendants before applying fallback estimates for unobservable nodes in that same subtree.
Add the current node’s local invocation ET only after all descendant contributions for that node are finalized.

7. Reporting

A conforming response MUST include a summary object alongside the invocations array:

{
  "summary": {
    "total_invocations": number,
    "raw_total_tokens": number,
    "base_weighted_tokens": number,
    "effective_tokens": number
  },
  "invocations": [ ... ]
}

7.1 OpenTelemetry Attribute Requirements

Implementations that emit OpenTelemetry spans or metrics for token accounting MUST use the following normative attribute keys. These keys are not optional examples — they are required names for cross-implementation interoperability.

OTel Attribute Key	Type	Description
`llm.token.effective_total`	integer	Total Effective Tokens for the invocation (ET as defined in §4.4)
`llm.token.input`	integer	Raw input token count for the invocation
`llm.token.output`	integer	Raw output token count for the invocation
`llm.token.cached_input`	integer	Number of input tokens served from cache
`llm.token.base_weighted`	integer	Base weighted token value before model multiplier is applied
`llm.model.multiplier`	float	The Copilot model multiplier (`m`) applied for this invocation
`llm.invocation.id`	string	Unique identifier for this invocation node (matches `id` field in execution graph)

R-OTL-001: Implementations that emit OTel attributes for effective token data MUST use llm.token.effective_total as the attribute key for the ET value. Implementations MUST NOT use alternative keys (e.g., effective_tokens, et_total) for this attribute.

R-OTL-002: Implementations MUST emit llm.token.input, llm.token.output, and llm.token.cached_input as separate span attributes when per-class token counts are available. These three attributes MUST reflect raw (unweighted) token counts.

R-OTL-003: Implementations MUST emit llm.token.base_weighted as a span attribute when the base weighted token value is computed. This attribute allows consumers to audit the weighting step independently of the model multiplier.

R-OTL-004: When llm.model.multiplier is emitted, its value MUST match the multiplier used to compute llm.token.effective_total for the same span. Implementations MUST NOT omit llm.model.multiplier if llm.token.effective_total is present.

R-OTL-005: All OTel attribute keys defined in this section are versioned under this specification. Implementations MUST NOT rename or reuse these keys with different semantics without a specification revision.

R-OTL-006: Implementations MUST reject and MUST NOT export negative values for llm.token.effective_total. If a negative ET value is encountered, the implementation MUST fail the export path (or drop the affected span with an explicit error) rather than emitting a negative metric.

8. Implementation Requirements

8.1 Completeness

All LLM calls MUST be included in the execution graph. Hidden or system-triggered calls MUST be counted.

8.2 Determinism

Given identical inputs and multipliers, ET MUST be reproducible. Implementations SHOULD NOT introduce non-deterministic factors into the computation.

8.3 Versioning

Implementations SHOULD version their token weights and model multipliers so that historical reports remain interpretable.

8.4 Partial Visibility

When sub-agents are not fully observable, implementations MUST still report aggregate totals. Invocation nodes with incomplete data SHOULD be flagged to indicate missing information.

8.5 Safeguards

Implementations MUST apply the following safeguards to prevent unbounded ET accumulation from producing non-finite or non-interoperable outputs.

S-1: Overflow and Capping

Threat: Unbounded multi-invocation ET aggregation can exceed numeric interoperability limits and produce values that cannot be represented safely across systems.

Mitigation: Implementations MUST enforce the JavaScript-safe numeric ceiling and record deterministic overflow state when capping occurs, including the ceiling value in the emitted flag/error payload.

Normative requirements: R-SAFE-002, R-SAFE-003, R-SAFE-003A, R-SAFE-004

S-2: Negative/Zero Multiplier Rejection

Threat: Zero or negative model multipliers can collapse ET to zero or invert ET sign, producing invalid cross-model comparisons and potentially negative exported ET metrics.

Mitigation: Implementations MUST reject zero and negative multipliers during registry validation before ET computation begins.

Residual risk: Runtime inputs from out-of-band systems may still provide malformed multipliers; implementations SHOULD continue validating merged runtime overrides (R-SAFE-010).

Normative requirements: R-SAFE-008, R-SAFE-010, R-OTL-006

S-3: Graph Cycle Detection

Threat: Cycles in execution-graph parent/child relationships can cause unbounded traversal and double counting during ET aggregation.

Mitigation: Implementations MUST detect invocation graph cycles and MUST fail deterministically before aggregation when a cycle is found.

Residual risk: Incomplete telemetry may hide a cycle edge; implementations SHOULD retain stable node IDs and parent references to improve cycle diagnosability.

Normative requirements: R-SAFE-011

S-4: Non-Finite Numeric Rejection

Threat: NaN, +Inf, and -Inf values in multipliers or token class weights can silently corrupt ET outputs and break downstream serialization/aggregation.

Mitigation: Implementations MUST reject non-finite or invalid numeric registry values before ET computation begins.

Residual risk: Numeric coercion bugs in downstream consumers can still mis-handle finite ET values if they apply non-spec transforms.

Normative requirements: R-SAFE-007, R-SAFE-008

S-5: Registry Validation Failure Handling

Threat: Continuing ET computation after registry validation failure can produce inconsistent, partially parsed, or non-reproducible outputs.

Mitigation: Implementations MUST fail deterministically with field-level diagnostics and MUST NOT continue with partially parsed registry data.

Residual risk: Operator error can still delay remediation; diagnostics SHOULD identify actionable field-level causes to reduce MTTR.

Normative requirements: R-SAFE-009, R-SAFE-010

S-6: Output NaN/Inf Guards, Large-Count Overflow, and Zero-Invocation Edge Case

This safeguard strengthens the existing S-1/S-4 requirements by specifying three additional concrete guards that implementations MUST apply before serializing any ET output.

S-6a: NaN/Inf guard on computed ET

Threat: Even when all registry values pass S-4 validation, arithmetic on very large token counts (e.g., base_weighted_tokens near the IEEE 754 double precision representable limit) can still produce NaN or ±Inf via floating-point overflow in intermediate computations.

Mitigation: Implementations MUST test derived.effective_tokens and derived.base_weighted_tokens for NaN and ±Inf immediately before serialization, after all arithmetic has been performed. If either value is non-finite, the implementation MUST:

Clamp derived.effective_tokens to the ceiling 9007199254740991 (per R-SAFE-002).
Record flagged.code = "ET_OVERFLOW" on the affected node (per R-SAFE-003A).
Log a diagnostic that includes the raw non-finite value so the root cause can be investigated.

Implementations MUST NOT propagate NaN or ±Inf into serialized JSON; doing so is a conformance failure even if upstream validation passed.

Normative requirements: R-SAFE-001, R-SAFE-003A

S-6b: base_weighted_tokens overflow for large token counts

Threat: For invocations with extremely large raw token counts (e.g., reasoning-heavy agents with reasoning_tokens > 10 billion), the weighted sum w_in * input + w_cache * cached + w_out * output + w_reason * reasoning can overflow IEEE 754 double precision (~1.8 × 10¹⁶ for a value representable as an integer). Overflow silently wraps to ±Inf, which then propagates into effective_tokens.

Mitigation: Implementations MUST compute base_weighted_tokens using 64-bit floating-point arithmetic and MUST check the intermediate result for finiteness after each token-class contribution is added. If the running sum exceeds 9007199254740991 (the JS-safe ceiling), the implementation SHOULD short-circuit the computation and apply the S-1 capping logic immediately rather than continuing to accumulate tokens.

Normative requirements: R-SAFE-001, R-SAFE-002, R-SAFE-004

S-6c: Zero-invocation edge case

Threat: An execution graph with zero invocation nodes (empty telemetry or all invocations filtered out) can cause division-by-zero or NaN in average or aggregate computations that downstream consumers depend on.

Mitigation: Implementations MUST treat zero-invocation graphs as valid inputs and MUST produce a well-formed output with summary.effective_tokens = 0, summary.invocation_count = 0, and an empty invocations array. Implementations MUST NOT fail, return null, or omit required top-level fields when processing empty telemetry. Callers MAY emit a non-fatal warning to indicate that no usage data was available, but MUST NOT treat an empty graph as an error.

Normative requirements: R-SAFE-005

R-SAFE-001: ET aggregation logic MUST detect overflow and non-finite arithmetic states (NaN, +Inf, -Inf) before serializing output.

R-SAFE-002: Implementations MUST enforce a maximum ET ceiling of 9007199254740991 (2^53 - 1) for serialized numeric fields to preserve JavaScript-safe integer interoperability in cross-language pipelines.

R-SAFE-003: When computed ET exceeds the ceiling, implementations MUST clamp the reported summary.effective_tokens value to the ceiling and MUST emit a warning indicating that capping occurred.

R-SAFE-003A: When ET capping occurs, implementations MUST record a deterministic overflow condition using either flagged.code = "ET_OVERFLOW" on the affected root/subtree node or a deterministic error when no structured flag channel is available. The error/flag payload MUST include the ceiling value 9007199254740991 so operators can distinguish overflow from missing usage data.

R-SAFE-004: For long multi-agent chains, implementations SHOULD aggregate ET in a streaming manner (incremental updates per invocation) and SHOULD emit an early warning when running totals exceed 80% of the ceiling.

R-SAFE-005: For invocation nodes with incomplete usage payloads (unobservable sub-agents), implementations MUST serialize usage.input_tokens, usage.cached_input_tokens, usage.output_tokens, usage.reasoning_tokens, derived.base_weighted_tokens, and derived.effective_tokens as numeric zero (0) rather than omitting those fields.

R-SAFE-006: For invocation nodes that are incomplete/unobservable, implementations MUST include a flagged object with schema { "code": "UNOBSERVABLE_INVOCATION", "reason": string }. For fully observed invocation nodes, implementations MAY omit flagged.

R-SAFE-007: Before ET computation begins, implementations MUST validate the active model multiplier registry described in Model Multiplier Registry. Registry validation MUST confirm that version and reference_model are non-empty strings and that the reference model has a numeric multiplier entry.

R-SAFE-008: Every declared token class weight and model multiplier loaded from the registry MUST be finite numeric data. NaN, infinite values, strings, null, and negative multiplier values MUST be rejected before any ET output is produced.

R-SAFE-009: If registry validation fails, implementations MUST NOT continue with partially parsed multiplier data. They MUST fail deterministically with an error that identifies the invalid registry field or model entry.

R-SAFE-010: When a runtime override or custom multiplier map is merged with the embedded registry, implementations MUST apply the same validation rules to the merged result before using it for ET computation.

R-SAFE-011: Aggregation logic MUST detect parent/child cycles in the invocation graph before post-order traversal. If a cycle is detected, implementations MUST fail deterministically with a stable error code and MUST NOT emit partial ET totals.

9. Extensibility

Implementations MAY:

Add new token classes (e.g., tool_tokens)
Add latency or compute metadata per invocation node
Support streaming or partial progress updates

Extensions MUST NOT alter the core ET definition or the default weight values without disclosure.

ET-EXT-01: Extensions MUST NOT redefine the default weight values (w_in, w_cache, w_out, w_reason) without incrementing the specification version. Any implementation that ships with non-default weight values MUST declare a version bump and MUST update the Compliance Checklist in §10.2 to reflect the changed defaults.

ET-EXT-02: Extensions MUST NOT introduce new mandatory fields into the invocation node schema (§6.1) without a corresponding revision to the conformance requirements in §2.3. New fields MAY be added as optional extensions, but implementations MUST NOT reject conforming payloads that omit optional extension fields.

ET-EXT-03: Extensions that add new token classes MUST assign unique, non-conflicting class names and MUST NOT reuse the reserved names input, cached_input, output, or reasoning. Extension token classes MUST NOT be included in the default base_weighted_tokens formula unless a new specification version explicitly incorporates them.

For implementation files that exercise extensibility paths, see the Sync Notes section.

10. Compliance Testing

10.1 Test Suite Requirements

10.1.1 Token Accounting Tests

T-ET-001: Single invocation with all four token classes produces correct base_weighted_tokens
T-ET-002: Single invocation ET equals m × base_weighted_tokens
T-ET-003: Zero-value token classes do not affect the result
T-ET-004: Custom weights are applied when default weights are overridden
T-ET-005: Cached/input overlap is not double counted (w_in applies to max(I-C,0), not I)
T-ET-007: Effective input is clamped at zero when C > I (max(I-C,0))

10.1.2 Aggregation Tests

T-ET-010: Multi-invocation ET_total equals the sum of per-invocation ET values
T-ET-011: raw_total_tokens equals the sum of all raw tokens across all invocations
T-ET-012: total_invocations count includes root, sub-agents, and tool-triggered calls

10.1.3 Aggregation with Zero-ET Leaf Nodes

T-ET-006: Multi-invocation aggregation where one or more leaf invocation nodes have all token class values set to zero (simulating tool calls that produce no tokens, such as no-op tool invocations or tool calls whose usage data is unavailable). The implementation MUST:
1. Include the zero-ET invocation node in total_invocations count.
2. Contribute 0 to ET_total from that node (rather than omitting it).
3. Represent the node in the execution graph with all usage.* fields set to 0 and derived.effective_tokens = 0.
4. Not emit a warning or error solely because a leaf node has zero effective tokens.

10.1.4 Execution Graph Tests

T-ET-020: Root node has parent_id = null
T-ET-021: All sub-agent nodes reference a valid parent_id
T-ET-022: Node schema includes all required fields
T-ET-032: Deep (3+ level) execution graphs aggregate ET in deterministic post-order and keep partial subtotals stable under partial observability

10.1.5 Reporting Tests

T-ET-030: Summary object is present in all conforming responses
T-ET-031: Summary values are consistent with per-invocation data

10.2 Compliance Checklist

10.2.1 Compliance Test Count Summary

Category	Count
Total tests defined	16
Required tests	16
Optional tests	0

Count method: unique T-ET-* IDs in §10.1 (001–005, 006, 007, 010–012, 020–022, 030–032).

Requirement	Test ID	Level	Status
Per-invocation base weighted tokens	T-ET-001–005, T-ET-007	1	Implemented
Per-invocation ET computation	T-ET-002	1	Implemented
Multi-invocation aggregation	T-ET-010–012	2	Implemented
Zero-ET leaf node aggregation	T-ET-006	2	Required
Execution graph node schema	T-ET-020–022	2	Implemented
Deep graph post-order aggregation	T-ET-032	2	Required
Summary reporting	T-ET-030–031	3	Implemented
Custom weight disclosure	T-ET-004	1	Implemented
Versioning of weights/multipliers	—	3	Recommended
Partial visibility flagging	—	2	Recommended

Appendices

Appendix A: Worked Example

A.1 Scenario

A request triggers three invocations: a root call, a retrieval sub-agent, and a final synthesis call.

A.2 Input Data

{
  "invocations": [
    {
      "id": "root",
      "parent_id": null,
      "model": { "name": "model-a", "copilot_multiplier": 2.0 },
      "usage": {
        "input_tokens": 500,
        "cached_input_tokens": 200,
        "output_tokens": 150,
        "reasoning_tokens": 0
      }
    },
    {
      "id": "retrieval",
      "parent_id": "root",
      "model": { "name": "model-b", "copilot_multiplier": 1.0 },
      "usage": {
        "input_tokens": 300,
        "cached_input_tokens": 0,
        "output_tokens": 100,
        "reasoning_tokens": 0
      }
    },
    {
      "id": "synthesis",
      "parent_id": "root",
      "model": { "name": "model-a", "copilot_multiplier": 2.0 },
      "usage": {
        "input_tokens": 200,
        "cached_input_tokens": 100,
        "output_tokens": 250,
        "reasoning_tokens": 0
      }
    }
  ]
}

A.3 Computation

root:
  base = (1.0 × max(500-200,0)) + (0.1 × 200) + (4.0 × 150) = 300 + 20 + 600 = 920
  ET   = 2.0 × 920 = 1840

retrieval:
  base = (1.0 × 300) + (4.0 × 100) = 300 + 400 = 700
  ET   = 1.0 × 700 = 700

synthesis:
  base = (1.0 × max(200-100,0)) + (0.1 × 100) + (4.0 × 250) = 100 + 10 + 1000 = 1110
  ET   = 2.0 × 1110 = 2220

A.4 Output

{
  "summary": {
    "total_invocations": 3,
    "raw_total_tokens": 1800,
    "base_weighted_tokens": 2730,
    "effective_tokens": 4760
  }
}

A.5 Input vs Cached Conformance Test Vectors

These vectors are normative examples for overlap handling and are intended to be asserted by conformance tests.

Test ID	Inputs `(I,O,C,R)`	Base computation (default weights)	Expected `base_weighted_tokens`
T-ET-005	`(100, 0, 80, 0)`	`1.0×max(100-80,0) + 0.1×80 = 20 + 8`	28
T-ET-007	`(50, 0, 80, 0)`	`1.0×max(50-80,0) + 0.1×80 = 0 + 8`	8

A.6 Partial Observability Examples

When some descendant invocations are unobservable, implementations still report deterministic partial totals and preserve stable ordering.

Example A (deep graph with one unobservable leaf):

root
├─ planner
│  ├─ retrieval (observed ET=120)
│  │  └─ shard-1 (observed ET=60)
│  └─ shard-2 (unobservable fallback ET=25)
└─ synthesis (observed ET=40)

Deterministic post-order subtotal sequence:

shard-1 → 60
retrieval local ET (120) → subtotal 180
shard-2 fallback ET (25) → subtotal 205
planner local ET → subtotal
synthesis local ET → subtotal
root local ET → final total

Example B (all descendants unobservable):

If all descendants of a node are unobservable, that node MUST still be included with derived.effective_tokens = 0 and flagged.code = "UNOBSERVABLE_INVOCATION" until concrete usage is observed.

Appendix B: Core Formula Reference

ET_total = Σ [ m_i × (w_in × max(I_i - C_i, 0) + w_cache × C_i + w_out × O_i + w_reason × R_i) ]

With default weights:

ET_total = Σ [ m_i × (max(I_i - C_i, 0) + 0.1 C_i + 4 O_i + 4 R_i) ]

Appendix C: Security Considerations

ET values are derived from token usage metadata. Implementations SHOULD treat per-invocation token data as potentially sensitive since usage patterns may reveal information about system prompts, model configurations, or user behavior. Aggregate ET values suitable for observability dashboards SHOULD be separated from detailed per-invocation data in access-controlled reporting systems.

Appendix D: ET Test Vectors

ET-TV-001 (Single Invocation Baseline)

Input:

model multiplier m = 1.0
input_tokens = 200
cached_input_tokens = 50
output_tokens = 10
reasoning_tokens = 0

Expected ET:

base_weighted_tokens = max(200-50,0) + 0.1×50 + 4×10 + 4×0 = 150 + 5 + 40 + 0 = 195
effective_tokens = 1.0 × 195 = 195

ET-TV-002 (Three-Node Graph, Mixed Cached/Output Tokens)

Input invocation set:

Root: m=2.0, I=500, C=200, O=120, R=0
Sub-agent A: m=1.0, I=300, C=0, O=90, R=10
Sub-agent B: m=2.0, I=150, C=50, O=80, R=0

Expected ET:

Root:      base = max(500-200,0) + 0.1×200 + 4×120 + 4×0  = 300 + 20 + 480 + 0 = 800;  ET = 2.0×800 = 1600
Sub-agent: base = max(300-0,0)   + 0.1×0   + 4×90  + 4×10 = 300 + 0  + 360 + 40 = 700;  ET = 1.0×700 = 700
Sub-agent: base = max(150-50,0)  + 0.1×50  + 4×80  + 4×0  = 100 + 5  + 320 + 0  = 425;  ET = 2.0×425 = 850
ET_total = 1600 + 700 + 850 = 3150

Model Multiplier Registry

Registry Purpose

The Copilot Multiplier (m) used in the ET formula is a per-model scalar that represents each model’s computational cost relative to the reference model. To ensure reproducibility and transparency, multiplier values MUST be sourced from a disclosed, versioned registry.

Normative Registry Source

The authoritative registry for copilot_multiplier values in this implementation is the file:

pkg/cli/data/model_multipliers.json

This file is embedded at compile time into the gh-aw binary using a Go //go:embed directive in pkg/cli/effective_tokens.go. The registry format is:

{
  "version": "string",
  "description": "string",
  "reference_model": "string",
  "token_class_weights": {
    "input": number,
    "cached_input": number,
    "output": number,
    "reasoning": number,
    "cache_write": number
  },
  "multipliers": {
    "<model-name>": number
  }
}

Registry Requirements

R-REG-001: The registry MUST declare a version field that changes whenever any multiplier value is added, removed, or modified.

R-REG-002: The registry MUST declare a reference_model field identifying the baseline model whose multiplier equals 1.0. All other multipliers are relative to this baseline.

R-REG-003: The registry MUST include token_class_weights for all four standard token classes: input, cached_input, output, and reasoning. A conforming implementation MUST use these weights as the default values for Section 4.2.

R-REG-004: Implementations MUST embed or bundle the registry at build time. Runtime fetching of multiplier values from an external source requires disclosure in reported output.

R-REG-005: When a model name is not present in the registry, implementations MUST treat the multiplier as 1.0 and SHOULD emit a warning noting that the model is unrecognized.

R-REG-006: Custom multipliers supplied by the caller (e.g., via API or configuration) MUST be merged with registry multipliers. Custom values take precedence and MUST be disclosed in any report that uses them.

R-REG-007: The registry MUST NOT contain placeholder values such as TBD, null, or empty strings for any model multiplier entry. Each declared model key MUST map to a numeric multiplier value.

R-REG-008: When adding support for a new model, maintainers MUST register the model in pkg/cli/data/model_multipliers.json with a concrete numeric multiplier before release. If calibration is incomplete, the model MUST be omitted from the registry and the implementation fallback behavior in R-REG-005 applies.

R-REG-009: The registry MUST preserve complete model history. Models that are no longer returned by current provider inventories MUST remain in pkg/cli/data/model_multipliers.json unless maintainers manually delete them in an explicit change.

Registry Versioning

The version field in model_multipliers.json corresponds to the registry schema version, not the gh-aw binary version. Implementations SHOULD include the registry version in all ET summary reports to enable historical reconstruction.

Sync Notes

§4–§8 Implementation File Mapping

The table below maps the normative sections of this specification to the implementation files that realize each requirement. Use this mapping to identify which files must be updated when specification sections change.

Spec Section	Description	Implementation File(s)
§4 Token Accounting Model	Per-invocation ET computation (`base_weighted_tokens`, ET formula)	`pkg/cli/effective_tokens.go` (`populateEffectiveTokens`, `computeBaseWeightedTokens`)
§5 Multi-Invocation Aggregation	`ET_total`, `raw_total_tokens`, `total_invocations`	`pkg/cli/effective_tokens.go` (`AggregateEffectiveTokens`)
§6 Execution Graph Requirements	Node schema, root/sub-agent linkage, graph traversal	`pkg/cli/logs_models.go`, `pkg/cli/logs_episode.go`, `pkg/cli/logs_orchestrator.go`
§7 Reporting	Console and JSON output of ET summaries and per-model breakdowns	`pkg/cli/audit_report.go`, `pkg/cli/audit_report_render_tools.go`, `pkg/cli/audit_diff.go`, `pkg/cli/logs_report.go`
§7.1 OTel Attribute Requirements	OpenTelemetry span attribute emission for ET metrics	`pkg/cli/token_usage.go`, `pkg/cli/logs_run_processor.go`
§8 Implementation Requirements	Completeness, determinism, versioning, partial visibility safeguards	`pkg/cli/effective_tokens.go`, `pkg/cli/forecast_montecarlo.go`

§7.1 OTel Attribute Row-to-Code Mapping

§7.1 Attribute Key	Implementation Mapping
`llm.token.effective_total`	`pkg/cli/token_usage.go` → `TokenUsageSummary.TotalEffectiveTokens`, populated by `populateEffectiveTokensWithCustomWeights`
`llm.token.input`	`pkg/cli/token_usage.go` → `TokenUsageEntry.InputTokens` and `ModelTokenUsage.InputTokens`, aggregated in `parseTokenUsageFile`
`llm.token.output`	`pkg/cli/token_usage.go` → `TokenUsageEntry.OutputTokens` and `ModelTokenUsage.OutputTokens`, aggregated in `parseTokenUsageFile`
`llm.token.cached_input`	`pkg/cli/token_usage.go` → `TokenUsageEntry.CacheReadTokens` and `ModelTokenUsage.CacheReadTokens`, aggregated in `parseTokenUsageFile`
`llm.token.base_weighted`	`pkg/cli/effective_tokens.go` → base token weighting in `computeModelEffectiveTokensWithWeights` (pre-multiplier term)
`llm.model.multiplier`	`pkg/cli/effective_tokens.go` → multiplier resolution in `computeModelEffectiveTokensWithWeights` (`mult` selection by model key/prefix)

§4–§8 Sync Procedure

To keep the specification and implementation synchronized:

When changing the ET formula or token class weights (§4), update pkg/cli/effective_tokens.go and update the Compliance Checklist in §10.2.
When changing aggregation semantics (§5), update pkg/cli/effective_tokens.go and rerun tests T-ET-010–T-ET-012 and T-ET-006.
When changing the execution graph node schema (§6), update pkg/cli/logs_models.go and pkg/cli/logs_episode.go in the same change.
When changing reporting format or field names (§7), update the affected render files in pkg/cli/ and run go test ./pkg/cli/ -run TestAudit.
When changing OTel attribute names (§7.1), update pkg/cli/token_usage.go and verify attribute names with grep -r "effective_tokens" pkg/.
After any §8 change affecting determinism or partial visibility, re-run go test ./pkg/cli/ -run TestEffectiveTokens and go test ./pkg/cli/ -run TestRunMonteCarlo.

Run grep -r "effective_tokens" pkg/ to confirm all implementation files are captured in the table above.

Model Multiplier Registry Sync

The Effective Tokens registry is maintained in pkg/cli/data/model_multipliers.json and loaded by pkg/cli/effective_tokens.go.

To keep specification and implementation synchronized:

Update this specification’s registry requirements when adding, removing, or re-scaling model multipliers.
Update pkg/cli/data/model_multipliers.json in the same change.
Keep historical model entries in the registry by default. Only remove entries via explicit manual deletion when needed (R-REG-009), and update the registry version field on removal.
Verify loading and fallback behavior in pkg/cli/effective_tokens_test.go (TestModelMultipliersJSONEmbedded, TestResolveEffectiveWeightsDefault, and inventory checks).
Run make build so the embedded registry is rebuilt into the gh-aw binary.
Re-run registry validation coverage after any registry edit so malformed multiplier entries fail before ET computation paths are exercised. Use make validate-registry to run the R-REG-007 staleness check (TestModelMultipliersNoPlaceholders) independently of the full test suite.

Conforming releases SHOULD include a test assertion for newly added model multipliers to ensure implementation-registry parity.

R-REG-007 automation (2026-05-26): The TestModelMultipliersNoPlaceholders test in pkg/cli/effective_tokens_test.go and the make validate-registry target provide machine-verified enforcement of R-REG-007. These run as part of CI and reject null, string “TBD”, empty-string-keyed, NaN, or Inf multiplier values before any release.

Norms

This section provides a normative reference table for all R-OTL-*, R-SAFE-*, and R-REG-* identifiers used in this specification. Each table maps an identifier to the section where it is defined and summarizes the MUST/SHALL clause it represents. Use this section as a quick-reference index for verifying implementation compliance, tracking which rules are exercised by a given test, or identifying the definitive section for a given requirement.

OTel Attribute Norms (§7.1)

ID	Section	Normative Requirement
R-OTL-001	§7.1	MUST use `llm.token.effective_total` as the OTel attribute key for ET; MUST NOT use alternative key names
R-OTL-002	§7.1	MUST emit `llm.token.input`, `llm.token.output`, and `llm.token.cached_input` as separate span attributes when per-class counts are available; these MUST reflect raw (unweighted) counts
R-OTL-003	§7.1	MUST emit `llm.token.base_weighted` when base weighted token value is computed
R-OTL-004	§7.1	When `llm.model.multiplier` is emitted, its value MUST match the multiplier used for `llm.token.effective_total`; MUST NOT omit `llm.model.multiplier` if `llm.token.effective_total` is present
R-OTL-005	§7.1	MUST NOT rename or reuse defined OTel attribute keys with different semantics without a specification revision
R-OTL-006	§7.1	MUST reject and MUST NOT export negative values for `llm.token.effective_total`; MUST fail or drop the span with an explicit error

Safeguard Norms (§8.5)

ID	Section	Normative Requirement
R-SAFE-001	§8.5 S-1, S-6a	MUST detect overflow and non-finite arithmetic states (`NaN`, `+Inf`, `-Inf`) before serializing output
R-SAFE-002	§8.5 S-1, S-6b	MUST enforce a maximum ET ceiling of `9007199254740991` (`2^53 − 1`) for serialized numeric fields
R-SAFE-003	§8.5 S-1	When computed ET exceeds the ceiling, MUST clamp `summary.effective_tokens` to the ceiling and MUST emit a warning
R-SAFE-003A	§8.5 S-1, S-6a	When ET capping occurs, MUST record a deterministic overflow condition (`flagged.code = "ET_OVERFLOW"`) including the ceiling value
R-SAFE-004	§8.5 S-1, S-6b	For long multi-agent chains, SHOULD aggregate ET incrementally and SHOULD emit an early warning at 80% of ceiling
R-SAFE-005	§8.5 S-6c	MUST serialize unobservable invocation node token fields as numeric zero; MUST NOT omit fields
R-SAFE-006	§8.5	MUST include a `flagged` object with `code = "UNOBSERVABLE_INVOCATION"` for incomplete/unobservable nodes
R-SAFE-007	§8.5 S-4	MUST validate the active model multiplier registry before ET computation begins
R-SAFE-008	§8.5 S-2, S-4	Every token class weight and model multiplier MUST be finite numeric data; `NaN`, infinite values, strings, `null`, and negative multipliers MUST be rejected
R-SAFE-009	§8.5 S-5	If registry validation fails, MUST NOT continue with partially parsed data; MUST fail deterministically with field-level diagnostics
R-SAFE-010	§8.5 S-2, S-5	Runtime override/custom multiplier merges MUST pass the same validation rules as registry values
R-SAFE-011	§8.5 S-3	Aggregation logic MUST detect parent/child cycles before post-order traversal; MUST fail deterministically and MUST NOT emit partial ET totals

Registry Norms (Model Multiplier Registry)

ID	Section	Normative Requirement
R-REG-001	Registry	MUST declare a `version` field that changes whenever any multiplier value is added, removed, or modified
R-REG-002	Registry	MUST declare a `reference_model` field identifying the baseline model whose multiplier equals 1.0
R-REG-003	Registry	MUST include `token_class_weights` for all four standard token classes; conforming implementations MUST use these as the default weights for §4.2
R-REG-004	Registry	Implementations MUST embed or bundle the registry at build time
R-REG-005	Registry	Unknown model name MUST use multiplier 1.0 and SHOULD emit a warning
R-REG-006	Registry	Custom multipliers MUST be merged with registry values; custom values take precedence and MUST be disclosed in reports
R-REG-007	Registry	MUST NOT contain placeholder values (`TBD`, `null`, empty strings) for any model multiplier entry
R-REG-008	Registry	New model support MUST register the model with a concrete numeric multiplier before release; incomplete calibration MUST result in omission from the registry
R-REG-009	Registry	MUST preserve complete model history; deprecated models MUST remain unless explicitly deleted

References

Normative References

[RFC 2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels”, BCP 14, RFC 2119, March 1997. https://www.ietf.org/rfc/rfc2119.txt

Informative References

[OPENAI-USAGE] OpenAI API Reference — Usage Objects. https://platform.openai.com/docs/api-reference
[ANTHROPIC-USAGE] Anthropic API Reference — Token Usage. https://docs.anthropic.com/en/api/getting-started

Change Log

Version 0.4.0 (Deprecated)

Status: Marked as Deprecated. Effective Tokens (ET) is superseded by AI Credits (AIC) as the preferred cost metric for agentic workflows. 1 AIC = $0.01 USD. AIC is computed directly from provider pricing and is exposed in gh aw logs, gh aw audit, and workflow footers.
Added: Deprecation notice banner pointing to Cost Management documentation.

Version 0.3.0 (Draft)

Added: Model Multiplier Registry section with normative requirements R-REG-001 through R-REG-009
Updated: R-REG-009 to require complete model history retention and explicit manual deletion instead of deprecated-model lifecycle markers
Added: Compliance test skeleton file pkg/cli/effective_tokens_compliance_test.go with Go test stubs for T-ET-001..T-ET-031
Added: T-ET-032 requirement for deterministic post-order aggregation in deep (3+ level) partially observed execution graphs
Updated: Compliance checklist §10.2 status column from “Required” to “Implemented” for all test IDs T-ET-001–T-ET-031 (all tests now implemented and passing)
Added: Norms section with reference tables for R-OTL-001–R-OTL-006 (§7.1), R-SAFE-001–R-SAFE-011 (§8.5), and R-REG-001–R-REG-009 (Model Multiplier Registry).
Updated: Table of Contents to include the new Norms section.
Audit (Appendix C — Security): Verified Appendix C requirements against pkg/cli/effective_tokens.go and pkg/cli/data/model_multipliers.json. Findings:
- Sensitive usage patterns (Appendix C §1): Per-invocation token data is not exposed directly by the CLI; only aggregate TotalEffectiveTokens is surfaced in the audit output. Access control is delegated to GitHub repository permissions. No gaps found.
- Aggregate vs. detailed data separation (Appendix C §2): The TokenUsageSummary.ByModel map contains per-model breakdowns but is only logged at DEBUG level, not included in default CLI output. No gaps found.
- Registry exposure: The embedded model_multipliers.json contains only multiplier coefficients, not secrets or PII. No gaps found.
- Follow-up: The spec does not address token data leakage via OTEL attributes. This is tracked as a separate concern (see §7.3 of the Experiments Specification for precedent).

Version 0.2.0 (Draft)

Adopted W3C-style specification format
Added conformance levels (Basic, Standard, Complete)
Added compliance testing section with test IDs
Added Appendix C: Security Considerations
Clarified partial visibility requirements

Version 0.1.0 (Draft)

Initial definition of Effective Tokens metric
Defined four token classes and default weights
Defined per-invocation and multi-invocation formulas
Defined execution graph node schema

Effective Tokens Specification

Effective Tokens Specification

Abstract

Status of This Document

Table of Contents

1. Introduction

1.1 Purpose

1.2 Scope

1.3 Design Goals

2. Conformance

2.1 Conformance Classes

2.2 Requirements Notation

2.3 Compliance Levels

3. Terminology

3.1 Token Classes

3.2 Model Multiplier

3.3 Invocation

3.4 Sub-Agent

3.5 Execution Graph

3.6 Execution-Graph Traversal Entities

4. Token Accounting Model

4.1 Raw Token Count

4.2 Token Class Weights

4.3 Base Weighted Tokens

4.4 Effective Tokens Per Invocation

5. Multi-Invocation Aggregation

5.1 Total Effective Tokens

5.2 Total Raw Tokens

5.3 Invocation Count

6. Execution Graph Requirements

6.1 Node Schema

6.2 Root Invocation

6.3 Sub-Agent Invocations

7. Reporting

7.1 OpenTelemetry Attribute Requirements

8. Implementation Requirements

8.1 Completeness

8.2 Determinism

8.3 Versioning

8.4 Partial Visibility

8.5 Safeguards

S-1: Overflow and Capping

S-2: Negative/Zero Multiplier Rejection

S-3: Graph Cycle Detection

S-4: Non-Finite Numeric Rejection

S-5: Registry Validation Failure Handling

S-6: Output NaN/Inf Guards, Large-Count Overflow, and Zero-Invocation Edge Case

9. Extensibility

10. Compliance Testing

10.1 Test Suite Requirements

10.1.1 Token Accounting Tests

10.1.2 Aggregation Tests

10.1.3 Aggregation with Zero-ET Leaf Nodes

10.1.4 Execution Graph Tests

10.1.5 Reporting Tests

10.2 Compliance Checklist

10.2.1 Compliance Test Count Summary

Appendices

Appendix A: Worked Example

A.1 Scenario

A.2 Input Data

A.3 Computation

A.4 Output

A.5 Input vs Cached Conformance Test Vectors

A.6 Partial Observability Examples

Appendix B: Core Formula Reference

Appendix C: Security Considerations

Appendix D: ET Test Vectors

ET-TV-001 (Single Invocation Baseline)

ET-TV-002 (Three-Node Graph, Mixed Cached/Output Tokens)

Model Multiplier Registry

Registry Purpose

Normative Registry Source

Registry Requirements

Registry Versioning

Sync Notes

§4–§8 Implementation File Mapping

§7.1 OTel Attribute Row-to-Code Mapping

§4–§8 Sync Procedure

Model Multiplier Registry Sync