GitHub Agentic Workflows

Blog

Agent of the Day – May 20, 2026

You know that sinking feeling when your CI pipeline kicks off a full build-test-deploy cycle because someone fixed a typo in the README? Or when your security scanner churns through every line of code at 2 AM, finds nothing new, and emails you a 47-page report that’s identical to yesterday’s?

Yeah, we’ve all been there. The robot dutifully did its job. You dutifully archived the notification. Nobody won.

Enter Architecture Guardian, a scheduled workflow that’s learned the ancient DevOps virtue of knowing when not to run.

This workflow runs every weekday around 14:00 UTC with a straightforward mission: scan Go and JavaScript source files for architecture drift, naming violations, or structural anti-patterns that might’ve slipped through code review. It’s the kind of governance check that should run regularly—but doesn’t need to re-analyze the entire codebase when nothing has changed.

On run 26171885477, Architecture Guardian demonstrated exactly how a smart agent should behave: it showed up, looked around, realized there was no work to do, and gracefully bowed out.

The Smart Skip: 5.5 Minutes of Doing Nothing (Efficiently)

Section titled “The Smart Skip: 5.5 Minutes of Doing Nothing (Efficiently)”

Here’s what happened under the hood:

The workflow spun up, spent three agent turns checking for recent changes, and concluded: zero Go or JavaScript files modified in the last 24 hours. Instead of proceeding with the full architecture scan—parsing files, running static analysis, generating reports—it called safeoutputs.noop with a clear message:

“No Go or JavaScript source files changed in the last 24 hours. Architecture scan skipped.”

Total runtime? 5.5 minutes. Token usage? 123k—mostly spent confirming the skip was valid. No unnecessary compute, no noise in the logs, no pointless notifications.

Compare that to a naïve scheduled job that runs the full analysis every single day regardless of activity. Over a month of weekdays (roughly 22 runs), this skip-when-idle logic could save hours of compute time and thousands of tokens on quiet days.

The Read-Only Posture: Analysis, Not Automation Chaos

Section titled “The Read-Only Posture: Analysis, Not Automation Chaos”

Architecture Guardian operates in read-only mode—it never writes back to GitHub, never auto-fixes violations, never opens PRs. It’s pure analysis. When it does find issues, it surfaces them cleanly for human review. When it finds nothing (or nothing new), it stays silent.

This run hit some network friction—3 blocked requests out of 8 total, a 38% block rate—but still completed successfully. The agent adapted, worked within constraints, and delivered its finding: nothing to report.

Two anomalous event patterns flagged during the run suggest the reliability monitoring is working as intended, catching edge cases for future iteration.

Why This Matters: Respecting Developer Time

Section titled “Why This Matters: Respecting Developer Time”

The real win isn’t the 5.5 minutes saved on one run. It’s the cognitive load reduction. When your scheduled jobs only notify you about actual changes, you start trusting them again. The alert fatigue drops. The “mark all as read” reflex fades.

Architecture Guardian isn’t trying to impress you with how much work it can do. It’s trying to impress you by doing only the work that matters.

That’s automation maturity.

Architecture Guardian workflow metrics


Want workflows that know when to quit while they’re ahead? Check out the gh-aw project on GitHub and see how agentic workflows can respect your time as much as your architecture.

Agent of the Day – May 15, 2026

Every open-source repo has the same invisible tax: someone has to watch the door. Label the PR. Check if the commenter is a member or an outsider. Hide the policy violation before it spreads. Flag the ambiguous case for a human. It’s repetitive, important, and easy to miss at 2 AM when CI is green and you’re trying to ship.

That’s the gap the AI Moderator workflow fills — automatically, on every event, before a human even opens their notifications.


The AI Moderator is a Codex-powered agentic workflow in the github/gh-aw repository. It fires on pull requests, new issues, and comments — running a structured investigation each time to determine who’s knocking, what they brought, and what action to take. Label it. Hide it. Escalate it. Or stand down.

It’s not a simple rule-based bot. It reasons.

On a recent run — Actions run 25924881974 — the agent woke up when PR #32406 landed: a work-in-progress branch titled “Experiment with output format in daily compiler quality” from copilot/ab-advisorexperiment-output-format. Sixteen turns later, it had done its job.

The agent didn’t guess. It looked things up.

It started by orienting itself — calling github___get_me to confirm its own identity, then github-search_repositories to verify the repo context it was operating in. From there it fanned out: github-list_branches, github-list_tags, github-list_releases, github-get_teams, github-get_team_members. It was building a picture of who belongs here and what the repo looks like right now.

Then it turned to the PR itself. It pulled the PR details with github___pull_request_read, searched related issues with github___search_issues and github___search_pull_requests, reviewed the commit history via github___list_commits, and read any linked issue context through github-issue_read. That’s a broad sweep — the kind a human reviewer would do informally, but inconsistently. The agent did it every time, in the same order, with a logged record of each step.

The conclusion: action_required. The agent applied labels through safeoutputs-add_labels, hid at least one comment using safeoutputs___hide_comment, and raised a flag with safeoutputs-report_incomplete to signal that follow-up was needed. Where checks passed cleanly, it called safeoutputs-noop — explicit confirmation that nothing warranted action, not just silence.

The audit system tracks behavioral baselines. On the same day, a reference run (25924730956) completed with zero turns and a success conclusion. This run took 16. The delta was flagged automatically as a turns_increase requiring review.

That flag matters. It means the system caught a meaningful deviation in how the agent behaved — not a failure, but a signal worth inspecting. Did the PR have unusual characteristics? Was the team membership lookup more complex than usual? The audit trail is there. The observation is already logged.

This is what makes agentic workflows different from scripts: the behavior changes with the input, and the monitoring has to account for that.

Community moderation is one of those problems where the cost of under-investing is invisible until it isn’t. A missed label means a misrouted PR. A comment that should have been hidden lingers. An external contributor gets treated the same as a maintainer when they shouldn’t.

The AI Moderator closes that gap without requiring a human to be on-call for it. It checks team membership — not just assumed from a username, but verified against github-get_team_members. It applies structured outputs through the safeoutputs interface, which means every action is auditable. And when it can’t confidently resolve a case, it says so explicitly via report_incomplete, rather than silently doing nothing.

Fast, too. This run completed in seconds.

The workflow is part of the github/gh-aw agentic workflows project — a growing collection of Codex-powered agents built to automate the unglamorous parts of software engineering. If your team maintains a repository and you’re tired of playing gatekeeper manually, this is a good place to start.

Head to github.com/github/gh-aw to see the workflows, read the specs, and explore what’s already running in production.


Agent of the Day is a recurring look at agentic workflows built and run inside the GitHub engineering org.

Weekly Update – May 11, 2026

It was a busy week in github/gh-aw! Four releases landed between May 4 and May 7, paired with a wave of pull requests that delivered new commands, security hardening, and developer-experience polish. Here’s everything that shipped.

The headline feature is a new gh aw lint command that runs actionlint directly against your existing .lock.yml files — no recompile required. It’s a lightweight CI gate you can drop into any pipeline to catch syntax errors early. Pass --shellcheck or --pyflakes for deeper script analysis, or point it at specific files with --dir.

Other highlights:

  • Shared workflow engine.mcp.tool-timeout inheritance (#30634): Shared workflows that wrap slow MCP servers can now declare timeout values once and have consumers inherit them automatically — no more duplicating engine.mcp.tool-timeout in every downstream workflow.
  • First-party coding-agent skill (#27259): Copilot, Claude, and other coding agents now get structured guidance on creating, debugging, and updating agentic workflows via a router skill shipped with gh aw.
  • && preserved in compiled expressions (#30695): A sneaky Go HTML-escaping bug was silently turning && into \u0026\u0026 inside .lock.yml files, corrupting ${{ ... && ... }} expressions. Fixed.

Inline sub-agents are now default-on — the features.inline-agents: true flag is deprecated. Run gh aw fix --write to auto-remove it from existing workflows via the new features-inline-agents-removal codemod.

This release also fixed a community-reported push_to_pull_request_branch rerun failure: when an agent reran and its patch reintroduced a file already on the branch, git am --3way produced an unresolvable add/add conflict. The fix detects add/add-only conflicts and resolves them by taking the patch side automatically.

These patch releases addressed Claude engine stability (no more mid-session crashes from “Fast mode unavailable”), fixed multi-line engine.env block-scalar values that compiled to broken YAML, added gateway RPC message rendering in step summaries, and switched inline sub-agent blocks to the small model alias by default to reduce cost and latency.

Beyond the releases, several PRs merged this week are worth highlighting:

The unsung inbox manager of the repository — reads every new issue the moment it’s opened and figures out where it belongs.

This week auto-triage-issues ran three times in quick succession (May 9–10), successfully triaging two issues and stumbling on a third that triggered a failure — a small battle scar it wore with dignity. In its successful runs it stayed impressively lean: nine API requests, ~270 K input tokens pulled from cache, and a turnaround of under 40 seconds per issue. It never wastes a compute cycle it doesn’t have to.

The run summary noted with mild concern that auto-triage-issues is so reliable and narrow in its tool usage that it might be “overkill for agentic” — meaning deterministic automation could theoretically do its job. The workflow appears to have taken this note personally and immediately triaged the next issue without comment.

Usage tip: Pair auto-triage-issues with a notify or discussion workflow on high-priority labels so the right people are paged the moment a critical bug or security issue lands.

View the workflow on GitHub

Update to v0.72.1 today — gh extension upgrade gh-aw — and try the new gh aw lint and experimental gh aw forecast commands. As always, feedback and contributions are welcome in github/gh-aw.

Weekly Update – May 4, 2026

Happy May the Fourth! Here’s a look at what shipped in github/gh-aw this week — a busy one packed with experiment infrastructure, compiler fixes, and engine improvements.

v0.71.3 landed on April 30th, capping off a week of rapid iteration. This release delivers major improvements to safe-outputs reusability, more resilient Copilot driver behavior, and solid self-hosted runner support.

  • Parameterized safe-outputs for reusable workflows (#29171): workflow_call inputs can now control safe-outputs.threat-detection, boolean flags, PR policy fields, and list constraints. Build reusable workflows that callers can configure without forking.

  • Configurable MCP gateway session timeout: Set engine.mcp.session-timeout in your workflow frontmatter to keep long-running MCP sessions alive. No more premature timeouts on deep analysis workflows.

  • Auto-inject create_issue safe output: Workflows without explicit safe-output configuration now automatically get a create_issue safe output, slashing boilerplate for common workflows.

  • Repo Mind Light shared workflow: A shared repo-mind-light.md workflow is now available for reuse across daily issue/PR agentic workflows (#29063).

  • Team reviewers on add_reviewer: The add_reviewer MCP tool now supports setting team_reviewers on pull requests (#29228).

  • Self-hosted runner support for non-default home directories: Workflows now work correctly on self-hosted runners where the service account home is not /home/runner (#27260).

Several impactful PRs landed this week beyond the release:

  • Compiler detects single-quoted bash commands that crash Copilot CLI: The compiler now catches and sanitizes single-quoted bash tool commands before they reach the Copilot CLI, preventing cryptic runtime crashes. A small fix with a big quality-of-life impact.

  • Default Codex harness with retry logic: The Codex engine now ships a default codex_harness.cjs with built-in retry logic, making Codex-powered workflows more resilient out of the box.

  • A/B experiments framework: A hidden experiments CLI command lets you read experiment state from storage repo branches, enabling controlled A/B testing of workflow behavior across runs.

  • Statistical analysis for experiments: The experiments analyze command now computes statistical significance, so you can tell whether a prompt change actually improved things — or just got lucky.

  • Multiple OTLP endpoints: The endpoint field in OTLP configuration is now polymorphic — send telemetry to multiple backends simultaneously.

  • Fix: round-robin random start on cache miss: Round-robin workflows now randomly select their starting item when the cache is cold, preventing all instances from piling onto the first item at startup.

The world’s most meta workflow — it finds workflows that don’t run experiments yet, and proposes experiments for them.

This week ab-testing-advisor ran three times, each time scanning the entire workflow catalog for experiment-free candidates, picking one, and writing a detailed GitHub issue with a full A/B experiment campaign. On May 2nd alone it created two issues: one proposing a prompt_style A/B test for the daily-news workflow (which it diagnosed as “highly prescriptive” and worth loosening up), and another (#29661) calling for improvements to the experiment infrastructure itself — the advisor advising on how to improve the advisor. Very on-brand.

It spent roughly 500k tokens per run carefully reading workflow files, thinking through experiment dimensions, and writing crisp implementation specs. For a workflow that runs daily and quietly, it’s doing serious intellectual heavy lifting behind the scenes.

Usage tip: Use ab-testing-advisor as inspiration for your own repos — it’s a great example of a meta-workflow that uses AI to drive continuous improvement of other AI workflows.

View the workflow on GitHub

Update to v0.71.3 today to get parameterized safe-outputs, the new experiment infrastructure, and all the reliability fixes. As always, feedback and contributions are welcome in github/gh-aw.

Weekly Update – April 27, 2026

Another productive week in github/gh-aw! Two releases dropped — v0.71.0 and v0.71.1 — bringing reliability fixes across the board, from threat-detection improvements to the Claude engine to a loop that was quietly consuming millions of tokens. Here’s what shipped.

Released April 24th, this patch release is all about correctness:

  • protected-files object form now compiles correctly (#28341): Workflows using the documented {policy, exclude} object syntax were being rejected at compile time. That’s fixed — the schema now accepts both the string shorthand and the full object form.
  • Pre-agent skills no longer overwritten on pull_request triggers (#28290): Skills installed by pre-agent-steps were silently clobbered because the “Restore agent config folders” step ran after them. Step ordering is now correct.
  • Incremental diff for push_to_pull_request_branch patch size (#28198): The max patch size check now measures only the incremental change since the last push, not the full diff from the default branch. No more spurious size-limit rejections on long-running branches.
  • jsweep infinite loop fixed (#28353): A workflow was calling create_pull_request in a loop, racking up 4.64M tokens per run. It now exits after creating a PR.

Released April 23rd, focused on runtime reliability and new capabilities:

  • Node.js setup added to threat-detection jobs (#28160): The node: command not found error in Copilot threat-detection workflows is gone — Node.js setup is now emitted before copilot_driver.cjs.
  • OTLP tracing for cancelled runs (#28172): Manually cancelled runs now emit a proper OpenTelemetry span, so you get full duration visibility even when a run is cut short.
  • Claude engine: bypassPermissionsacceptEdits (#28047): Migrates away from the deprecated flag and fixes missing MCP server entries in --allowed-tools, keeping Claude-powered workflows fully functional.

Beyond the releases, this week also saw some useful quality-of-life improvements merged directly to main:

The tireless sentinel of the issue tracker — reads every open issue and classifies it so the right people see it.

This week, auto-triage-issues ran three times in a single day (April 27th alone), faithfully scanning for untriaged issues each time on a scheduled basis. Across its runs, it averaged just 4–6 turns per execution, keeping things lean while still making 6 GitHub API calls per run. The workflow even improved its own efficiency mid-day — dropping from 6 turns in the morning run down to 4 turns by afternoon, apparently learning to get to the point faster. The observability metrics politely noted it might be “partially reducible to deterministic automation,” but honestly, where’s the fun in that?

One of its runs earned an honorable mention from the agentic assessment system: “This Triage run looks stable enough that deterministic automation may be a simpler fit.” The workflow responded by running again an hour later, exactly the same as before. Iconic.

Usage tip: Pair auto-triage-issues with a label-based notification workflow so the right team members get pinged the moment a new issue is categorized.

View the workflow on GitHub

Update to v0.71.1 today and check out all the fixes. Feedback and contributions are always welcome over at github/gh-aw.