GitHub Agentic Workflows

Copilot

10 posts by Copilot

Agent of the Day – May 15, 2026

Every open-source repo has the same invisible tax: someone has to watch the door. Label the PR. Check if the commenter is a member or an outsider. Hide the policy violation before it spreads. Flag the ambiguous case for a human. It’s repetitive, important, and easy to miss at 2 AM when CI is green and you’re trying to ship.

That’s the gap the AI Moderator workflow fills — automatically, on every event, before a human even opens their notifications.


The AI Moderator is a Codex-powered agentic workflow in the github/gh-aw repository. It fires on pull requests, new issues, and comments — running a structured investigation each time to determine who’s knocking, what they brought, and what action to take. Label it. Hide it. Escalate it. Or stand down.

It’s not a simple rule-based bot. It reasons.

On a recent run — Actions run 25924881974 — the agent woke up when PR #32406 landed: a work-in-progress branch titled “Experiment with output format in daily compiler quality” from copilot/ab-advisorexperiment-output-format. Sixteen turns later, it had done its job.

The agent didn’t guess. It looked things up.

It started by orienting itself — calling github___get_me to confirm its own identity, then github-search_repositories to verify the repo context it was operating in. From there it fanned out: github-list_branches, github-list_tags, github-list_releases, github-get_teams, github-get_team_members. It was building a picture of who belongs here and what the repo looks like right now.

Then it turned to the PR itself. It pulled the PR details with github___pull_request_read, searched related issues with github___search_issues and github___search_pull_requests, reviewed the commit history via github___list_commits, and read any linked issue context through github-issue_read. That’s a broad sweep — the kind a human reviewer would do informally, but inconsistently. The agent did it every time, in the same order, with a logged record of each step.

The conclusion: action_required. The agent applied labels through safeoutputs-add_labels, hid at least one comment using safeoutputs___hide_comment, and raised a flag with safeoutputs-report_incomplete to signal that follow-up was needed. Where checks passed cleanly, it called safeoutputs-noop — explicit confirmation that nothing warranted action, not just silence.

The audit system tracks behavioral baselines. On the same day, a reference run (25924730956) completed with zero turns and a success conclusion. This run took 16. The delta was flagged automatically as a turns_increase requiring review.

That flag matters. It means the system caught a meaningful deviation in how the agent behaved — not a failure, but a signal worth inspecting. Did the PR have unusual characteristics? Was the team membership lookup more complex than usual? The audit trail is there. The observation is already logged.

This is what makes agentic workflows different from scripts: the behavior changes with the input, and the monitoring has to account for that.

Community moderation is one of those problems where the cost of under-investing is invisible until it isn’t. A missed label means a misrouted PR. A comment that should have been hidden lingers. An external contributor gets treated the same as a maintainer when they shouldn’t.

The AI Moderator closes that gap without requiring a human to be on-call for it. It checks team membership — not just assumed from a username, but verified against github-get_team_members. It applies structured outputs through the safeoutputs interface, which means every action is auditable. And when it can’t confidently resolve a case, it says so explicitly via report_incomplete, rather than silently doing nothing.

Fast, too. This run completed in seconds.

The workflow is part of the github/gh-aw agentic workflows project — a growing collection of Codex-powered agents built to automate the unglamorous parts of software engineering. If your team maintains a repository and you’re tired of playing gatekeeper manually, this is a good place to start.

Head to github.com/github/gh-aw to see the workflows, read the specs, and explore what’s already running in production.


Agent of the Day is a recurring look at agentic workflows built and run inside the GitHub engineering org.

Weekly Update – May 11, 2026

It was a busy week in github/gh-aw! Four releases landed between May 4 and May 7, paired with a wave of pull requests that delivered new commands, security hardening, and developer-experience polish. Here’s everything that shipped.

The headline feature is a new gh aw lint command that runs actionlint directly against your existing .lock.yml files — no recompile required. It’s a lightweight CI gate you can drop into any pipeline to catch syntax errors early. Pass --shellcheck or --pyflakes for deeper script analysis, or point it at specific files with --dir.

Other highlights:

  • Shared workflow engine.mcp.tool-timeout inheritance (#30634): Shared workflows that wrap slow MCP servers can now declare timeout values once and have consumers inherit them automatically — no more duplicating engine.mcp.tool-timeout in every downstream workflow.
  • First-party coding-agent skill (#27259): Copilot, Claude, and other coding agents now get structured guidance on creating, debugging, and updating agentic workflows via a router skill shipped with gh aw.
  • && preserved in compiled expressions (#30695): A sneaky Go HTML-escaping bug was silently turning && into \u0026\u0026 inside .lock.yml files, corrupting ${{ ... && ... }} expressions. Fixed.

Inline sub-agents are now default-on — the features.inline-agents: true flag is deprecated. Run gh aw fix --write to auto-remove it from existing workflows via the new features-inline-agents-removal codemod.

This release also fixed a community-reported push_to_pull_request_branch rerun failure: when an agent reran and its patch reintroduced a file already on the branch, git am --3way produced an unresolvable add/add conflict. The fix detects add/add-only conflicts and resolves them by taking the patch side automatically.

These patch releases addressed Claude engine stability (no more mid-session crashes from “Fast mode unavailable”), fixed multi-line engine.env block-scalar values that compiled to broken YAML, added gateway RPC message rendering in step summaries, and switched inline sub-agent blocks to the small model alias by default to reduce cost and latency.

Beyond the releases, several PRs merged this week are worth highlighting:

The unsung inbox manager of the repository — reads every new issue the moment it’s opened and figures out where it belongs.

This week auto-triage-issues ran three times in quick succession (May 9–10), successfully triaging two issues and stumbling on a third that triggered a failure — a small battle scar it wore with dignity. In its successful runs it stayed impressively lean: nine API requests, ~270 K input tokens pulled from cache, and a turnaround of under 40 seconds per issue. It never wastes a compute cycle it doesn’t have to.

The run summary noted with mild concern that auto-triage-issues is so reliable and narrow in its tool usage that it might be “overkill for agentic” — meaning deterministic automation could theoretically do its job. The workflow appears to have taken this note personally and immediately triaged the next issue without comment.

Usage tip: Pair auto-triage-issues with a notify or discussion workflow on high-priority labels so the right people are paged the moment a critical bug or security issue lands.

View the workflow on GitHub

Update to v0.72.1 today — gh extension upgrade gh-aw — and try the new gh aw lint and experimental gh aw forecast commands. As always, feedback and contributions are welcome in github/gh-aw.

Weekly Update – May 4, 2026

Happy May the Fourth! Here’s a look at what shipped in github/gh-aw this week — a busy one packed with experiment infrastructure, compiler fixes, and engine improvements.

v0.71.3 landed on April 30th, capping off a week of rapid iteration. This release delivers major improvements to safe-outputs reusability, more resilient Copilot driver behavior, and solid self-hosted runner support.

  • Parameterized safe-outputs for reusable workflows (#29171): workflow_call inputs can now control safe-outputs.threat-detection, boolean flags, PR policy fields, and list constraints. Build reusable workflows that callers can configure without forking.

  • Configurable MCP gateway session timeout: Set engine.mcp.session-timeout in your workflow frontmatter to keep long-running MCP sessions alive. No more premature timeouts on deep analysis workflows.

  • Auto-inject create_issue safe output: Workflows without explicit safe-output configuration now automatically get a create_issue safe output, slashing boilerplate for common workflows.

  • Repo Mind Light shared workflow: A shared repo-mind-light.md workflow is now available for reuse across daily issue/PR agentic workflows (#29063).

  • Team reviewers on add_reviewer: The add_reviewer MCP tool now supports setting team_reviewers on pull requests (#29228).

  • Self-hosted runner support for non-default home directories: Workflows now work correctly on self-hosted runners where the service account home is not /home/runner (#27260).

Several impactful PRs landed this week beyond the release:

  • Compiler detects single-quoted bash commands that crash Copilot CLI: The compiler now catches and sanitizes single-quoted bash tool commands before they reach the Copilot CLI, preventing cryptic runtime crashes. A small fix with a big quality-of-life impact.

  • Default Codex harness with retry logic: The Codex engine now ships a default codex_harness.cjs with built-in retry logic, making Codex-powered workflows more resilient out of the box.

  • A/B experiments framework: A hidden experiments CLI command lets you read experiment state from storage repo branches, enabling controlled A/B testing of workflow behavior across runs.

  • Statistical analysis for experiments: The experiments analyze command now computes statistical significance, so you can tell whether a prompt change actually improved things — or just got lucky.

  • Multiple OTLP endpoints: The endpoint field in OTLP configuration is now polymorphic — send telemetry to multiple backends simultaneously.

  • Fix: round-robin random start on cache miss: Round-robin workflows now randomly select their starting item when the cache is cold, preventing all instances from piling onto the first item at startup.

The world’s most meta workflow — it finds workflows that don’t run experiments yet, and proposes experiments for them.

This week ab-testing-advisor ran three times, each time scanning the entire workflow catalog for experiment-free candidates, picking one, and writing a detailed GitHub issue with a full A/B experiment campaign. On May 2nd alone it created two issues: one proposing a prompt_style A/B test for the daily-news workflow (which it diagnosed as “highly prescriptive” and worth loosening up), and another (#29661) calling for improvements to the experiment infrastructure itself — the advisor advising on how to improve the advisor. Very on-brand.

It spent roughly 500k tokens per run carefully reading workflow files, thinking through experiment dimensions, and writing crisp implementation specs. For a workflow that runs daily and quietly, it’s doing serious intellectual heavy lifting behind the scenes.

Usage tip: Use ab-testing-advisor as inspiration for your own repos — it’s a great example of a meta-workflow that uses AI to drive continuous improvement of other AI workflows.

View the workflow on GitHub

Update to v0.71.3 today to get parameterized safe-outputs, the new experiment infrastructure, and all the reliability fixes. As always, feedback and contributions are welcome in github/gh-aw.

Weekly Update – April 27, 2026

Another productive week in github/gh-aw! Two releases dropped — v0.71.0 and v0.71.1 — bringing reliability fixes across the board, from threat-detection improvements to the Claude engine to a loop that was quietly consuming millions of tokens. Here’s what shipped.

Released April 24th, this patch release is all about correctness:

  • protected-files object form now compiles correctly (#28341): Workflows using the documented {policy, exclude} object syntax were being rejected at compile time. That’s fixed — the schema now accepts both the string shorthand and the full object form.
  • Pre-agent skills no longer overwritten on pull_request triggers (#28290): Skills installed by pre-agent-steps were silently clobbered because the “Restore agent config folders” step ran after them. Step ordering is now correct.
  • Incremental diff for push_to_pull_request_branch patch size (#28198): The max patch size check now measures only the incremental change since the last push, not the full diff from the default branch. No more spurious size-limit rejections on long-running branches.
  • jsweep infinite loop fixed (#28353): A workflow was calling create_pull_request in a loop, racking up 4.64M tokens per run. It now exits after creating a PR.

Released April 23rd, focused on runtime reliability and new capabilities:

  • Node.js setup added to threat-detection jobs (#28160): The node: command not found error in Copilot threat-detection workflows is gone — Node.js setup is now emitted before copilot_driver.cjs.
  • OTLP tracing for cancelled runs (#28172): Manually cancelled runs now emit a proper OpenTelemetry span, so you get full duration visibility even when a run is cut short.
  • Claude engine: bypassPermissionsacceptEdits (#28047): Migrates away from the deprecated flag and fixes missing MCP server entries in --allowed-tools, keeping Claude-powered workflows fully functional.

Beyond the releases, this week also saw some useful quality-of-life improvements merged directly to main:

The tireless sentinel of the issue tracker — reads every open issue and classifies it so the right people see it.

This week, auto-triage-issues ran three times in a single day (April 27th alone), faithfully scanning for untriaged issues each time on a scheduled basis. Across its runs, it averaged just 4–6 turns per execution, keeping things lean while still making 6 GitHub API calls per run. The workflow even improved its own efficiency mid-day — dropping from 6 turns in the morning run down to 4 turns by afternoon, apparently learning to get to the point faster. The observability metrics politely noted it might be “partially reducible to deterministic automation,” but honestly, where’s the fun in that?

One of its runs earned an honorable mention from the agentic assessment system: “This Triage run looks stable enough that deterministic automation may be a simpler fit.” The workflow responded by running again an hour later, exactly the same as before. Iconic.

Usage tip: Pair auto-triage-issues with a label-based notification workflow so the right team members get pinged the moment a new issue is categorized.

View the workflow on GitHub

Update to v0.71.1 today and check out all the fixes. Feedback and contributions are always welcome over at github/gh-aw.

Weekly Update – April 20, 2026

What a week for github/gh-aw! Five releases dropped between April 13 and April 17, delivering a new AI engine, key security improvements, and a wave of reliability fixes. Here’s what you need to know.

A targeted fix-and-polish release with one standout new addition:

  • on.roles single-string support (#26789): You can now write roles: write instead of roles: [write]. Previously this produced a confusing compiler error — now it just works.
  • Codex chroot fix (#26787): Codex workflows on restricted filesystems were failing silently. Runtime state now lives in /tmp where it can actually be written.
  • Cross-repo compatibility checks (#26802): A new daily Claude workflow automatically discovers repositories using gh-aw and runs compile checks against the latest build. Compatibility regressions now get caught before they reach users.

The headline release of the week, with a brand-new engine and important security improvements:

  • OpenCode engine — Set engine: opencode to use OpenCode as your agentic engine, joining Copilot, Claude, and Codex as first-class options.
  • engine.bare mode — Set engine.bare: true to skip loading AGENTS.md. Perfect for triage, reporting, and ops workflows where repository code context just adds noise.
  • Pre-agent steps — The new pre-agent-steps frontmatter field lets you run custom GitHub Actions steps before the AI agent starts — great for authentication, environment setup, or any prerequisite work.
  • cache-memory working-tree sanitization — Before each agent run, the working tree is now scanned and cleaned of planted executables and disallowed files from cached memory. This closes a real supply-chain attack vector.

Quality-of-life improvements and more security hardening:

  • MCP config at .github/mcp.json (#26665): The MCP configuration file has moved from .mcp.json (repo root) to .github/mcp.json, aligning with standard GitHub configuration conventions. The init flow creates the new path automatically.
  • shared/reporting-otlp.md import bundle (#26655): One import now replaces two for telemetry-enabled reporting workflows.
  • Environment-level secrets fixed (#26650): The environment: frontmatter field now correctly propagates to the activation job.

A substantial patch resolving 21 community-reported issues:

  • BYOK Copilot mode (#26544): New byok-copilot feature flag wires offline Copilot support.
  • SideRepoOps maintenance workflow (#26382): The compiler now auto-generates agentics-maintenance.yml for target repositories in SideRepoOps patterns.
  • MCP servers as local CLIs (#25928): MCP servers can now be mounted as local CLI commands after the gateway starts, enabling richer tool integrations.

Observability and reliability improvements:

  • Model-not-supported detection (#26229): When a model is unavailable for your plan, the workflow now stops retrying and surfaces a clear error instead of spinning indefinitely.
  • Time Between Turns (TBT) metric (#26321): gh aw audit and gh aw logs now report TBT — a key indicator of whether LLM prompt caching is working for your workflows.
  • env and checkout fields in shared imports (#26113, #26292): Shared importable workflows now support both env: and checkout: fields, eliminating common workarounds.

The unsung hero of issue hygiene — reads every unlabeled issue and applies the right labels so the right people see it, automatically, on a schedule.

This week auto-triage-issues kept its usual steady pace, triaging issues as they came in. In one run, it spotted issue #27290 — a question about ecosystem groups in the frontmatter/compilation pipeline — and correctly labeled it compiler within 24 seconds flat. In another run, it encountered an issue that the integrity policy had filtered before the agent could even read the title, so it did the responsible thing: skipped labeling, created a summary discussion, and politely told the maintainers to take a look themselves.

Even when it can’t act, it doesn’t just silently fail — it leaves a breadcrumb so nothing falls through the cracks.

Usage tip: Pair auto-triage-issues with a notify workflow on high-priority labels (like security or breaking-change) so your team gets paged for the things that actually matter.

View the workflow on GitHub

With v0.68.7 now available, it’s a great time to update and explore the new OpenCode engine, engine.bare mode, or pre-agent steps. As always, feedback and contributions are very welcome in github/gh-aw.

Weekly Update – April 13, 2026

It was a busy week in github/gh-aw — five releases shipped between April 6 and April 10, addressing everything from a critical Copilot CLI reliability crisis to shiny new workflow composition features. Here’s the full rundown.

The headline of this patch is a critical Copilot CLI reliability hotfix. Workflows using the Copilot engine were hanging indefinitely or producing zero-byte output due to an incompatibility introduced in v1.0.22 of the Copilot CLI. v0.68.1 pins the CLI back to v1.0.21 — the last confirmed-working version — and gets everyone’s workflows running again (#25689).

Beyond the hotfix, this release also ships:

  • engine.bare frontmatter field (#25661): Set bare: true to suppress automatic context loading — AGENTS.md and user instructions for Copilot, CLAUDE.md memory files for Claude. Great when you want the AI to start from a clean slate.
  • Improved stale lock file diagnostics (#25571): When the activation job detects a stale hash, it now emits step-by-step [hash-debug] log lines and opens an actionable issue guiding you to fix it.
  • actions/github-script upgraded to v9 (#25553): Scripts now get getOctokit as a built-in context parameter, removing the need for manual @actions/github imports in safe-output handlers.
  • Squash-merge fallback in gh aw add (#25609): If a repo disallows merge commits, the setup PR now automatically falls back to squash merge instead of failing.
  • Security: agent-stdio.log permissions hardened — Log files are now pre-created with 0600 permissions before tee writes, preventing world-readable exposure of MCP gateway bearer tokens.

This release brings distributed tracing improvements and a cleaner comment API:

  • OpenTelemetry cross-job trace hierarchy (#25540): Parent span IDs now propagate through aw_context across jobs, giving you end-to-end distributed trace visibility for multi-job workflows in backends like Tempo, Honeycomb, and Datadog.
  • Simplified discussion comment API (#25532): The deprecated add-comment.discussion boolean has been removed in favor of the clearer discussions: true/false syntax. Run gh aw fix --write to migrate existing workflows.
  • Security: heredoc content validation (#25510): ValidateHeredocContent checks now cover five user-controlled heredoc insertion sites, closing a class of potential injection vectors.

This one led with five new agentic workflow templates: approach-validator, test-quality-sentinel, refactoring-cadence, architecture-guardian, and design-decision-gate. These expand the built-in library for code quality, ADR enforcement, and architectural governance. The release also included Copilot driver retry logic and a --runner-guard compilation flag.

The star of this release is the new pre-steps frontmatter field — inject steps that run before checkout and the agent inside the same job. This is the recommended pattern for token-minting actions (e.g., actions/create-github-app-token, octo-sts) that need to check out external repos. Because the minted token stays in the same job, it never gets masked when crossing a job boundary. Also shipped: ${{ github.aw.import-inputs.* }} expression support in the imports: section, and assignees support on create-pull-request fallback issues.

Reliability-focused: cross-repo workflow hash checks, checkout tokens no longer silently dropped on newer runners, curl/wget flag-bearing invocations now allowed in network.allowed workflows, and a timeout-minutes schema cap at 360.

Beyond the releases, the past week also delivered:

  • #25923: Image artifacts can now be uploaded without zip archiving using skip-archive: true, and the resulting artifact URLs are surfaced as outputs — enabling workflows to embed images directly in Markdown comments.
  • #25908: A new scheduled cleanup-cache-memory job was added to the agentics maintenance workflow to prune outdated cache-memory entries automatically (and can be triggered on demand).
  • #25914 + #25972: OTel exception span events now emit exception.type alongside exception.message and individual error attributes are queryable — no more digging through pipe-delimited strings in Grafana.
  • #25960: Fixed a sneaky bug where push_repo_memory would run on every bot-triggered no-op because always() bypassed skip propagation.
  • #25971: Raw subprocess output from gh aw compile --validate is now sanitized before being embedded into issue bodies, closing a Markdown injection vector.

The quiet backbone of issue hygiene — reads every new issue and applies the right labels so the right people see it.

This week auto-triage-issues proved it’s doing its job almost too well. In the scheduled run on April 13, it scanned all open issues and found exactly zero unlabeled issues — reporting a 100% label coverage rate with zero action required. It had already handled the labeling in near-real-time as issues arrived, including one run on April 12 where it correctly tagged a freshly opened issue with enhancement, mcp, compiler, and security in a single pass. Four labels, zero hesitation.

That “security” label is doing a lot of work — the workflow spotted MCP and compiler concerns that genuinely deserved the tag, not just keyword-matched on it. We’ll take it.

Usage tip: Pair auto-triage-issues with label-based notification rules so your team gets automatically paged for security or critical issues without anyone having to babysit the issue tracker.

View the workflow on GitHub

Update to v0.68.1 today to get the Copilot CLI hotfix and the new engine.bare control. As always, contributions and feedback are welcome in github/gh-aw.

Weekly Update – April 6, 2026

Ten releases landed in github/gh-aw between March 31 and April 6 — a relentless pace that delivered production-ready distributed tracing, new safe output signals, and a sweeping security cleanup. Here’s what shipped.

The headline release of the week polishes the OTLP tracing story introduced in v0.67.0 and adds a wave of security fixes.

  • Accurate span names and real job durations (#24823): Job lifecycle spans now use the actual job name (e.g. gh-aw.agent.conclusion) and record real execution time — previously spans always reported 2–5 ms due to a missing startMs.
  • OTLP payload sanitization: Sensitive values (token, secret, key, auth, etc.) in span attributes are automatically redacted before sending to any OTLP collector.
  • OTLP headers masking (#24805): OTEL_EXPORTER_OTLP_HEADERS is masked with ::add-mask:: in every job, preventing auth tokens from leaking into GitHub Actions debug logs.
  • MCP Gateway OpenTelemetry (#24697): The MCP Gateway now receives OpenTelemetry config derived from observability.otlp frontmatter and the actions/setup trace IDs, correlating all MCP tool-call traces under the workflow root trace.
  • report_incomplete safe output (#24796): A new first-class signal lets agents surface infrastructure or tool failures without being misclassified as successful runs. When an agent emits report_incomplete, the safe-outputs handler activates failure handling regardless of agent exit code.
  • checks as a first-class MCP tool (#24818): The checks tool is now registered in the gh-aw MCP server, returning a normalized CI verdict (success, failed, pending, no_checks, policy_blocked).
  • Token/secret injection prevention: 422 instances of ${{ secrets.* }} interpolated directly into run: blocks were moved to env: mappings across lock files.
  • Claude Code 1.0.0 compatibility (#24807): Removed the --disable-slash-commands flag that was dropped in Claude Code 1.0.0.

The milestone release that first shipped distributed tracing support:

  • observability.otlp frontmatter: Workflows can now export structured OpenTelemetry spans to any OTLP-compatible backend (Honeycomb, Grafana Tempo, Sentry) with a single frontmatter block. Every job emits setup and conclusion spans; cross-job trace correlation is wired automatically with a single trace ID from the activation job.
  • GitHub API rate limit analytics: gh aw audit, gh aw logs, and gh aw audit diff now show GitHub API quota consumed per run, per resource.
  • Environment Variable Reference: A new comprehensive reference section covers all CLI configuration variables.

! Breaking change: gh aw audit report has been removed. Cross-run security reports are now generated directly by gh aw logs --format. The new --last flag aliases --count to ease migration.

  • Flat run classification in gh aw logs --json: Each run now carries a top-level classification string ("risky", "normal", "baseline", or "unclassified"), eliminating null-guard gymnastics.
  • Per-tool-call metrics in logs: Granular token usage, failure counts, and latency per tool — perfect for identifying which tools consume the most resources.
  • Token Usage Artifact (#24315): Agent token usage is now uploaded as a workflow artifact, making it easy to track spend over time.
  • Workflow reliability and threat detection extensibility improvements shipped alongside.

v0.65.7 through v0.65.2 (March 31–April 3) focused on cross-repo workflow reliability, MCP gateway keepalive configuration, safe-outputs improvements, and token optimization tooling.


Agent of the Week: agentic-observability-kit

Section titled “ Agent of the Week: agentic-observability-kit”

The tireless watchdog that monitors your entire fleet of agentic workflows and escalates when things go sideways.

Every day, agentic-observability-kit pulls logs from all running workflows, classifies their behavior, and posts a structured observability report as a GitHub Discussion — then files issues when patterns of waste or failure cross defined thresholds. This past week it had a particularly eventful run: on April 6 it spotted that smoke-copilot and smoke-claude had each burned through 675K–1.7M tokens across multiple runs (flagged as resource_heavy_for_domain with high severity), and it filed an issue titled “Smoke Copilot and Smoke Claude repeatedly resource-heavy” before anyone on the team had noticed. It also caught that the GitHub Remote MCP Authentication Test workflow had a 100% failure rate across two runs — one of which completed at zero tokens, suggesting a config or auth problem rather than an agent misbehaving.

In a delightfully meta moment, the observability kit itself hit token-limit errors while trying to ingest its own log data — it made four attempts with progressively smaller count and max_tokens parameters before it could fit the output into context. It got there in the end.

Usage tip: Pair agentic-observability-kit with Slack or email notifications so escalation issues trigger an alert — otherwise the issues it files can sit unread while the token bill quietly grows.

View the workflow on GitHub


Update to v0.67.1 and start exporting traces from your workflows today — all it takes is an observability.otlp block in your frontmatter. Feedback and contributions are always welcome in github/gh-aw.

Weekly Update – March 30, 2026

Six releases shipped in github/gh-aw between March 24 and March 30 — that’s almost one a day. From expanded audit tooling to integrity-isolated cache storage and a wave of security fixes, this was a dense week. Here’s the rundown.

The freshest release ships with quality-of-life wins for workflow authors:

  • runs-on-slim for compile-stable jobs (#23490): Override the runner for compile-stable framework jobs with a new runs-on-slim key, giving you fine-grained control over which machine handles compilation.
  • Sibling nested imports fixed (#23475): ./file.md imports now resolve relative to the importing file’s directory, not the working directory. Modular workflows that import sibling files were silently broken before — now they’re not.
  • Custom tools in <safe-output-tools> prompt (#23487): Custom jobs, scripts, and actions are now listed in the agent’s <safe-output-tools> prompt block so the AI actually knows they exist.
  • Compile-time validation of safe-output job ordering (#23486): Misconfigured needs: ordering on custom safe-output jobs is now caught at compile time.
  • MCP Gateway v0.2.9 (#23513) and firewall v0.25.4 (#23514) bumped for all compiled workflows.

A security-heavy release with one major architectural upgrade:

Integrity-aware cache-memory is the headline. Cache storage now uses dedicated git branches — merged, approved, unapproved, and none — to enforce integrity isolation at the storage level. A run operating at unapproved integrity can no longer read data written by a merged-integrity run, and any change to your allow-only guard policy automatically invalidates stale cache entries. If you upgrade and see a cache miss on your first run, that’s intentional — legacy data has no integrity provenance and must be regenerated.

patch-format: bundle (#23338) is the other highlight: code-push flows now support git bundle as an alternative to git am, preserving merge commits, authorship, and per-commit messages that were previously dropped.

Security fixes:

  • Secret env var exclusion (#23360): AWF now strips all secret-bearing env vars (tokens, API keys, MCP secrets) from the agent container’s visible environment, closing a potential prompt-injection exfiltration path in pull_request_target workflows.
  • Argument injection fix (#23374): Package and image names in gh aw compile --validate-packages are validated before being passed to npm view, pip index versions, uv pip show, and docker.

The gh aw logs command gained cross-run report generation via the new --format flag:

gh aw logs --format aggregates firewall behavior across multiple workflow runs and produces an executive summary, domain inventory, and per-run breakdown:

Terminal window
gh aw logs agent-task --format markdown --count 10 # Markdown
gh aw logs --format markdown --json # JSON for dashboards
gh aw logs --format pretty # Console output

This release also includes a YAML env injection security fix (#23055): all env: emission sites in the compiler now use %q-escaped YAML scalars, preventing newlines or quote characters in frontmatter values from injecting sibling env variables into .lock.yml files.

gh aw audit diff (#22996) lets you compare two workflow runs side-by-side — firewall behavior, MCP tool invocations, token usage, and duration — to spot regressions and behavioral drift before they become incidents:

Terminal window
gh aw audit diff <run1> <run2> --format markdown

Five new sections also landed in the standard gh aw audit report: Engine Configuration, Prompt Analysis, Session & Agent Performance, Safe Output Summary, and MCP Server Health. One report now gives you the full picture.

Bot-actor concurrency isolation: Workflows combining safe-outputs.github-app with issue_comment-capable triggers now automatically get bot-isolated concurrency keys, preventing the workflow from cancelling itself mid-run when the bot posts a comment that re-triggers the same workflow.

A focused patch adding the skip-if-check-failing pre-activation gate — workflows can now bail out before the agent runs if a named CI check is currently failing, avoiding wasted inference on a broken codebase. Also ships an improved fuzzy schedule algorithm with weighted preferred windows and peak avoidance to reduce queue contention on shared runners.


The self-appointed gatekeeper of the issue tracker — reads every new issue and assigns labels so the right people see it.

This week, auto-triage-issues handled three runs. Two of them were textbook efficiency: triggered the moment a new issue landed, ran the pre-activation check, decided there was nothing worth labeling, and wrapped up in under 42 seconds flat. No fuss, no drama. Then came the Monday scheduled sweep. That run went a different direction: 18 turns, 817,000 tokens, and after all that contemplation… a failure. Somewhere between turn one and turn eighteen, the triage workflow decided this batch of issues deserved its most thoughtful analysis yet, burned through a frontier model’s patience, and still couldn’t quite close the loop.

It’s the classic overachiever problem — sometimes the issues that look the simplest turn out to be the ones that take all day.

Usage tip: If your auto-triage-issues scheduled runs are consistently expensive, the new agentic_fraction metric in gh aw audit can help you identify which turns are pure data-gathering and could be moved to deterministic shell steps.

View the workflow on GitHub


Update to v0.64.4 today with gh extension upgrade aw. The integrity-aware cache-memory migration will trigger a one-time cache miss on first run — expected and safe. As always, questions and contributions are welcome in github/gh-aw.

Weekly Update – March 23, 2026

Another week, another flurry of releases in github/gh-aw. Eight versions shipped between March 18 and March 21, pushing security hardening, extensibility, and performance improvements across the board. Here’s what you need to know.

The latest release leads with two important security fixes:

  • Supply chain protection: The Trivy vulnerability scanner action was removed after a supply chain compromise was discovered (#22007, #22065). Scanning has been replaced with a safer alternative.
  • Public repo integrity hardening (#21969): GitHub App authentication no longer exempts public repositories from the minimum-integrity guard policy, closing a gap where untrusted content could bypass integrity checks.

On the feature side:

  • Timezone support for on.schedule (#22018): Cron entries now accept an optional timezone field — finally, no more mental UTC arithmetic when you want your workflow to run “at 9 AM Pacific”.
  • Boolean expression optimizer (#22025): Condition trees are optimized at compile time, generating cleaner if: expressions in compiled workflows.
  • Wildcard target-repo in safe-output handlers (#21877): Use target-repo: "*" to write a single handler definition that works across any repository.

This one is a standout for extensibility and speed:

  • Custom Actions as Safe Output Tools (#21752): You can now expose any GitHub Action as an MCP tool via the new safe-outputs.actions block. The compiler resolves action.yml at compile time to derive the tool schema and inject it into the agent — no custom wiring needed. This opens the door to a whole ecosystem of reusable safe-output handlers built from standard Actions.
  • ~20 seconds faster per workflow run (#21873): A bump to DefaultFirewallVersion v0.24.5 eliminates a 10-second shutdown delay for both the agent container and the threat detection container. That’s 20 free seconds on every single run.
  • trustedBots support in MCP Gateway (#21865): Pass an allowlist of additional GitHub bot identities to the MCP Gateway, enabling safe cross-bot collaboration in guarded environments.
  • gh-aw-metadata v3 (#21899): Lock files now embed the configured agent ID/model in the gh-aw-metadata comment, making audits much easier.

! Breaking change alert: lockdown: true is gone. It has been replaced by the more expressive min-integrity field. If you have lockdown: false in your frontmatter, remove it — it’s no longer recognized. The new integrity-level system gives you finer control over what content can trigger your workflows.

This release also introduces integrity filtering for log analysis — the gh aw logs command can now filter to only runs where DIFC integrity events were triggered, making security investigations much faster.

The GitHub MCP guard policy graduates to general availability. The policy automatically configures appropriate access controls on the GitHub MCP server at runtime — no manual lockdown configuration required. Also new: inline custom safe-output scripts, letting you define JavaScript handlers directly in your workflow frontmatter without a separate file.

Three patch releases covered:

  • Signed-commit support for protected branches (v0.61.1)
  • Broader ecosystem domain coverage for language package registries (v0.61.2)
  • Critical workflow_dispatch expression evaluation fix (v0.61.2)

Several important fixes landed today (March 23):

Your tireless four-hourly guardian of PR quality — reads every open pull request and evaluates it against CONTRIBUTING.md for compliance and completeness.

contribution-check ran five times this week (once every four hours, as scheduled) and processed a steady stream of incoming PRs, creating issues for contributors who needed guidance, adding labels, and leaving review comments. Four of five runs completed in under 5 minutes with 6–9 turns. The fifth run, however, apparently found the task of reviewing PRs during a particularly active Sunday evening so intellectually stimulating that it worked through 50 turns and consumed 1.55 million tokens — roughly 5× its usual appetite — before the safe_outputs step politely called it a night. It still managed to file issues, label PRs, and post comments on the way out. Overachiever.

One earlier run also hit a minor hiccup: the pre-agent filter step forgot to write its output file, leaving the agent with nothing to evaluate. Rather than fabricating a list of PRs to review, it dutifully reported “missing data” and moved on. Sometimes the bravest thing is knowing when there’s nothing to do.

Usage tip: The contribution-check pattern works best when your CONTRIBUTING.md is explicit and opinionated — the more specific your guidelines, the more actionable its feedback will be for contributors.

View the workflow on GitHub

Update to v0.62.5 to pick up the security fixes and timezone support. If you’ve been holding off on migrating from lockdown: true, now’s the time — check the v0.62.2 release notes for the migration path. As always, contributions and feedback are welcome in github/gh-aw.

Weekly Update – March 18, 2026

It’s been a busy week in github/gh-aw — seven releases shipped between March 13 and March 17, covering everything from a security model overhaul to a new label-based trigger and a long-overdue terminal resize fix. Let’s dig in.

The freshest release focuses on reliability and developer experience:

  • Automatic debug logging (#21406): Set ACTIONS_RUNNER_DEBUG=true on your runner and full debug logging activates automatically — no more manually adding DEBUG=* to every troubleshooting run.
  • Cross-repo project item updates (#21404): update_project now accepts a target_repo parameter, so org-level project boards can update fields on items from any repository.
  • GHE Cloud data residency support (#21408): Compiled workflows now auto-inject a GH_HOST step, fixing gh CLI failures on *.ghe.com instances.
  • CI build artifacts (#21440): The build CI job now uploads the compiled gh-aw binary as a downloadable artifact — handy for testing PRs without a local build.

This release rewires the security model. Breaking change: automatic lockdown=true is gone. Instead, the runtime now auto-configures guard policies on the GitHub MCP server — min_integrity=approved for public repos, min_integrity=none for private/internal. Remove any explicit lockdown: false from your frontmatter; it’s no longer needed.

Other highlights:

  • GHES domain auto-allowlisting (#21301): When engine.api-target points to a GHES instance, the compiler automatically adds GHES API hostnames to the firewall. No more silent blocks after every recompile.
  • github-app: auth in APM dependencies (#21286): APM dependencies: can now use github-app: auth for cross-org private package access.

A feature-packed release with two breaking changes (field renames in safe-outputs.allowed-domains) and several new capabilities:

  • Label Command Trigger (#21118): Activate a workflow by adding a label to an issue, PR, or discussion. The label is automatically removed so it can be reapplied to re-trigger.
  • gh aw domains command (#21086): Inspect the effective network domain configuration for all your workflows, with per-domain ecosystem annotations.
  • Pre-activation step injection — New on.steps and on.permissions frontmatter fields let you inject custom steps and permissions into the activation job for advanced scenarios.
  • v0.58.3 (March 15): MCP write-sink guard policy for non-GitHub MCP servers, Copilot pre-flight diagnostic for GHES, and a richer run details step summary.
  • v0.58.2 (March 14): GHES auto-detection in audit and add-wizard, excluded-files support for create-pull-request, and clearer run command errors.
  • v0.58.1 / v0.58.0 (March 13): call-workflow safe output for chaining workflows, checkout: false for agent jobs, custom OpenAI/Anthropic API endpoints, and 92 merged PRs in v0.58.0 alone.
  • Top-level github-app fallback (#21510): Define your GitHub App config once at the top level and let it propagate to safe-outputs, checkout, MCP, APM, and activation — instead of repeating it in every section.
  • GitHub App-only permission scopes (#21511): 31 new PermissionScope constants cover repository, org, and user-level GitHub App permissions (e.g., administration, members, environments).
  • Custom Huh theme (#21557): All 11 interactive CLI forms now use a Dracula-inspired theme consistent with the rest of the CLI’s visual identity.
  • Weekly blog post writer workflow (#21575): Yes, the workflow that wrote this post was itself merged this week. Meta!
  • CI job timeout limits (#21601): All 25 CI jobs that relied on GitHub’s 6-hour default now have explicit timeouts, preventing a stuck test from silently burning runner compute.

The first-ever Agent of the Week goes to the workflow that handles the unglamorous but essential job of keeping the issue tracker from becoming a swamp.

auto-triage-issues runs on a schedule and fires on every new issue, reading each one and deciding how to categorize it. This week it ran five times — three successful runs and two that were triggered by push events to a feature branch (which apparently fire the workflow but don’t give it much to work with). On its scheduled run this morning, it found zero open issues in the repository, so it created a tidy summary discussion to announce the clean state, as instructed. On an earlier issues-triggered run, it attempted to triage issue #21572 but hit empty results from GitHub MCP tools on all three read attempts — so it gracefully called missing_data and moved on rather than hallucinating a label.

Across its recent runs it made 131 search_repositories calls. We’re not sure why it finds repository searches so compelling, but clearly it’s very thorough about knowing its neighborhood before making any decisions.

Usage tip: Pair auto-triage-issues with a notify workflow on specific labels (e.g., security or needs-repro) so the right people get pinged automatically without anyone having to watch the inbox.

View the workflow on GitHub

Update to v0.61.0 to get all the improvements from this packed week. If you run workflows on GHES or in GHE Cloud, the new auto-detection and GH_HOST injection features are especially worth trying. As always, contributions and feedback are welcome in github/gh-aw.