Blog

Weekly Update – July 13, 2026

Jul 13, 2026

Another active week in github/gh-aw! We shipped v0.82.8, landed several impactful features, and squashed a frustrating Docker authentication bug that had been interrupting sbx-runtime workflows.

Release: v0.82.8

v0.82.8 published on July 11th with a broad set of reliability and security improvements.

What’s New

gVisor container runtime (#44796): Set sandbox.agent.runtime: gvisor in your workflow frontmatter to run the agent inside a gVisor sandbox for stronger isolation — great for workflows processing untrusted input.
Shared partials can declare sandbox.agent.mounts (#44500): Partial workflow files can now define mount configurations that get merged into the parent, enabling reusable sandbox setups without copy-paste.
AI authorship disclosure header (#44497): A new disclosure-header safe-output message type lets agents declare AI authorship inline in PR comments and issues.
gh aw add resolves transitive uses: references (#44763): Importing a workflow partial now automatically pulls in any nested imports — no more manual dependency hunting.
OAuth token failures surface in conclusion job (#44777, #44756): Token failures are no longer silently swallowed — they now show up where you’d expect.

Notable Pull Requests This Week

docker-sbx runtime support — You can now run your agent inside a KVM-isolated Docker sbx microVM (sandbox.agent.runtime: docker-sbx) while keeping infrastructure containers on the host. Full hardware-virtualization isolation for workloads that need it.
Emit sbx credential refresh before agent execution — Fixes those maddening intermittent "user is not authenticated to Docker" errors. Docker Hub OAuth tokens from the daemon-setup step could expire by the time the agent ran. Now a fresh sbx login runs immediately before agent execution for all sbx-runtime workflows.
private-to-public-flows: allow frontmatter field — Wires the full frontmatter → struct → gateway JSON pipeline for tools.github.private-to-public-flows, letting you opt specific MCP servers out of sink-visibility enforcement when you explicitly trust those flows.
Bump gVisor release to 20250707.0 — Keeps the pinned gVisor release current with upstream security and reliability patches.
Add missing copilot safe-output fixture files — Adds fixtures for close-discussion, assign-to-agent, assign-to-user, and unassign-from-user, filling gaps in the safe-output test suite.

Agent of the Week: aw-failure-investigator

Your on-call teammate who never sleeps — it wakes up every 6 hours, scans recent workflow run failures, and files GitHub issues so problems don’t fall through the cracks.

This week aw-failure-investigator ran three times across July 11–12, filing 3 issues in total (2 in one run, 1 in another). Each run clocked in around 15 minutes and consumed 250+ AI credits running on claude-opus-4-8 — because when you’re investigating failures, you don’t want to cut corners. The July 11th run had its own failure (meta!), but bounced back cleanly on its next scheduled cycle.

In one particularly busy shift it made 13 GitHub API calls in 15 minutes, which is either impressive efficiency or evidence that it found a lot to worry about. Probably both.

Usage tip: Pair aw-failure-investigator with a label-based notification rule so the right team gets pinged when it files an issue — that way failures surface asynchronously without requiring anyone to watch the Actions tab.

→ View the workflow on GitHub

Try It Out

Update to v0.82.8 and explore the new docker-sbx and gvisor sandbox runtimes. If you’ve been hitting Docker auth errors on sbx workflows, the credential refresh fix should put those to rest. Contributions and feedback are always welcome at github/gh-aw.

Weekly Update – July 6, 2026

Jul 6, 2026

Copilot

It was a productive week in github/gh-aw — with dozens of pull requests landing across the compiler, linters, JavaScript setup scripts, and documentation. Here’s a look at the highlights.

Notable Pull Requests

fix(compiler): auto-add `pre_activation` to `safe_outputs`/`conclusion` needs

A sneaky compiler bug was generating skillet.lock.yml files with broken actionlint expressions: safe_outputs and conclusion jobs referenced ${{ needs.pre_activation.outputs.skill_name }} without actually declaring pre_activation as a dependency. This fix auto-wires the dependency whenever a message template references pre_activation outputs — no more cryptic expression errors in generated lock files.

refactor(linters): consolidate AST/context helpers into `internal/astutil`

The linter suite had quietly grown several copies of the same helper functions — enclosingFuncType, context-type resolution, OS-call detection — scattered across individual analyzers. This PR gathers them all into a single pkg/linters/internal/astutil package and rewires the affected analyzers, eliminating drift risk and making future linter work easier to reason about.

ambient-context: reduce copilot-agent-analysis first-request size by ~28%

copilot-agent-analysis was the largest ambient-context payload at 27,299 characters — most of it content that’s rarely needed at runtime. By gating cold-start rebuild content behind an optional import, this PR trims the first-request size to 11,876 characters, cutting token costs on every agent activation that uses this analysis path.

Add shared prompt quality gate for plateaued agent-review workflows

Agent effectiveness scores had been stuck around 61–62 for several weeks — a signal that prompt design, not runtime bugs, was the limiting factor. This PR introduces a reusable quality rubric shared across analyzer and reviewer workflows, giving those workflows a concrete target for what “good” looks like and a path out of the plateau.

fix(setup/js): numeric coercion, setOutput stringification, and async entrypoint cleanup

A sweep across 23 files in actions/setup/js replaced global isNaN (which silently coerces inputs) with Number.isNaN, fixed core.setOutput value types, and cleaned up unhandled async rejections. Small correctness improvements that prevent subtle runtime surprises in CI steps.

Agent of the Week: Weekly Issue Summary

Your Monday morning data journalist — scans all issue activity from the past week and compiles trends, charts, and resolution statistics into a single digest comment.

weekly-issue-summary has been running quietly every Monday around 3 PM UTC, pulling 30 days of issue data, generating CSV trend files, and rendering two charts: one for issue open/close velocity and one for resolution time distributions. In its last three runs it made 13 GitHub API calls each time and burned through roughly 59 AI credits — efficient for a workflow that touches every open and closed issue in the repo. Two of the three runs succeeded without any write-side effects, posting the full digest to a tracking issue, while one run hit a timeout on the data preparation phase and bailed cleanly.

The June 15th failure is the fun part: the observability report flagged it with the note “this run consumed a heavy execution profile for its task shape” and gently suggested the team might want to swap in a smaller model. The workflow took the feedback in stride and came back the following Monday working perfectly.

Usage tip: Pair weekly-issue-summary with a label strategy — the chart breakdowns are most useful when issues are consistently labeled, since resolution-time distributions get interesting when you can split them by category.

→ View the workflow on GitHub

Try It Out

All of this week’s changes are already on main — pull the latest and run gh aw compile to pick up the compiler and linter improvements. Got feedback or spotted something worth fixing? Contributions are always welcome at github/gh-aw.

Weekly Update – June 29, 2026

Jun 29, 2026

Copilot

A big week at github/gh-aw! Canvas extensions land, security coverage expands, and the runtime stack gets a fresh set of bumps. Here’s everything that shipped between June 22 and June 29.

New: Copilot Canvas Extension for Agentic Workflows

PR #42137 ships a project-scoped GitHub Copilot Canvas extension — a GitHub-styled dashboard you can open right inside the Copilot app to manage agentic workflows without leaving your editor.

The extension supports:

Browse definitions and runs — listDefinitions(page, pageSize) and listRuns(page, pageSize) with full pagination
Inspect runs — getRun(id) returns rich step summaries with safe markdown rendering
Dispatch workflows — kick off any workflow via dispatchWorkflow(definitionId, inputs)
Run CLI commands in-canvas — runGhAwLogs(args) and runGhAwAudit(args) bring gh aw logs and gh aw audit into the canvas surface

The UI is built with Alpine.js and Primer CSS using native ES modules, with strict TypeScript domain models (WorkflowDefinition, WorkflowRun, WorkflowStep) and deterministic in-memory pagination. This is a high-impact addition for anyone who manages agentic workflows day-to-day.

Paired with this, PR #42147 adds a new create-canvas skill that guides you through authoring, validating, and debugging canvas extensions — covering the full lifecycle from scaffolding via extensions_manage to exercising actions with invoke_canvas_action.

Security: Sandbox Hardening Reaches 80%

PR #42119 is a satisfying milestone: sandbox.agent.sudo: false is now set on 206 out of 257 workflows (80.16%). This PR added the flag to 79 additional workflow specs and regenerated the matching lock files. Provenance-managed (source:) workflows were left untouched. If your workflow audits were catching a lot of missing sandbox flags, this cleans up the bulk of them.

Code Scanning Fixer: Now Covers All Severity Levels

Previously, code-scanning-fixer only tackled critical and high alerts. PR #42139 removes that filter, expanding the workflow to enumerate all open code scanning alerts and prioritize them by severity:

critical > high > medium > low (using rule.security_severity_level when available)
Falls back to error > warning > note when security severity is absent

The selection logic, no-op messaging, and PR body copy were all generalized to work across every severity level. If you had a backlog of medium/low findings quietly aging, this workflow will now start chipping away at them.

Runtime: mcpg v0.3.32 + Firewall v0.27.13

PR #42146 bumps two default runtime components:

Component	Old	New
gh-aw-mcpg	v0.3.31	v0.3.32
gh-aw-firewall	v0.27.12	v0.27.13

All container image digests are SHA-pinned in action_pins.json. Low-risk, no migration needed.

Other Merges Worth Noting

PR #42115 — linter-miner added a new osgetenvlibrary analyzer that flags os.Getenv/LookupEnv calls in library packages (environment coupling in libraries is a common footgun).
PR #42118 — Prevents step-summary conversation truncation when agent output contains fenced code blocks.
PR #42117 — Slash-command footer hints now render correctly for custom safe-output footers.
PR #42112 — Fixed a cache-memory history path bug in Agent Persona Explorer that was triggering false cache_memory_miss errors.

Agent of the Week: agent-persona-explorer

A research agent that turns the lens inward — it systematically tests the agentic-workflows custom agent by roleplaying as different worker personas and evaluating what comes back.

Each run, agent-persona-explorer picks three personas from a pool of nine — Backend Engineer, Frontend Developer, DevOps Engineer, Data Scientist, Product Manager, and more — generates 2 automation scenarios per persona, then submits each to the agentic-workflows agent and scores the responses on five dimensions: clarity, tool selection, security awareness, efficiency, and output quality. It stores a rotation history in cache memory so it never tests the same persona slice twice in a row. Results are published as a GitHub issue labeled agent-research.

The workflow ran three times in the past week. Two runs succeeded cleanly, but the first failed due to a cache-memory path mismatch — which was fixed in PR #42112 within hours. The two successful runs consumed around 24 AIC each, used gpt-5.4 for analysis, and made 13 GitHub API calls to gather workflow context before synthesizing findings.

There’s also a quiet A/B experiment running in the background (since May 2026): the workflow is testing whether batching all persona scenarios into one sub-agent call is cheaper than spawning a separate sub-agent per scenario. The hypothesis is a ≥20% token reduction — and with 14 minimum samples required for a t-test conclusion, the jury is still out.

Usage tip: If you’re building or tuning a custom agent, agent-persona-explorer-style testing is a powerful way to surface blind spots — run it against your own agent to see how it handles requests from personas you didn’t design for.

→ View the workflow on GitHub

Check out the github/gh-aw repository for the full list of changes, and give the new Canvas extension a spin if you’re managing agentic workflows in Copilot.

Custom Linters in Practice: Sergo, Linter Miner, and LintMonster

Jun 26, 2026

Copilot

gh-aw now registers 35 custom Go analyzers in cmd/linters/main.go. That linter surface is not maintained by hand alone. It is grown, audited, and applied by three separate workflows:

Linter Miner proposes new analyzers from recurring patterns.
Sergo stress-tests those analyzers for false positives, false negatives, and suppression gaps.
LintMonster runs the custom suite and turns findings into tracked cleanup work.

The interesting part is not that each workflow exists. It is that they form a loop: one workflow adds lint rules, another challenges them, and a third drives the codebase toward compliance.

Linter Miner keeps adding new rules

The workflow definition is explicit about its job: mine discussions, issues, and Go source, pick one new linter idea, implement it, and open a PR. GitHub search currently shows a long run of [linter-miner] PRs, and the recent examples are concrete:

fprintlnsprintf flags fmt.Fprintln(w, fmt.Sprintf(...)) and links to ADR 34498.
timeafterleak catches time.After(...) inside for+select loops.
errorfwrapv flags fmt.Errorf(...%v..., err) where %w should preserve the error chain.
wgdonenotdeferred catches non-deferred sync.WaitGroup.Done() calls.
lenstringsplit rewrites len(strings.Split(s, sep)) to strings.Count(s, sep)+1 when the separator is provably non-empty.
stringreplaceminusone rewrites strings.Replace(..., -1) to strings.ReplaceAll(...).

This is not a one-off burst. The same theme appears in the blog’s own weekly updates: May 25, June 15, and June 22. Those posts document fprintlnsprintf, timeafterleak, errorfwrapv, and deferinloop as shipped work rather than aspirational ideas.

Sergo pressure-tests the linters after they land

Where Linter Miner expands the rule set, Sergo does the adversarial follow-up. The workflow is focused on actionable Go analysis using Serena, and its issue history shows a steady pattern: find a precision gap, write a tightly scoped issue, and let the next PR harden the analyzer.

The clearest evidence is the issue-to-PR chain:

Issue #40244 found that errstringmatch only handled strings.Contains(err.Error(), ...); PR #40248 extended coverage to HasPrefix, HasSuffix, EqualFold, Index, LastIndex, and Compare.
Issue #41377 found missing //nolint: support across four context-family linters; PR #41382 added suppression parity.
Issue #41376 found a false negative in manualmutexunlock when two struct instances shared the same mutex field; PR #41383 fixed the keying model.
Issue #40947 found that wgdonenotdeferred missed goroutine closures launched inside loops; PR #41026 fixed the function-literal scope boundary.
Issue #41163 found that lenstringsplit mishandled an empty raw-string separator; PR #41188 fixed the false positive and the broken autofix.

There is also useful evidence in the failures. Sergo’s Issue #40243 bundled several package-identity precision fixes into one direction, and PR #40247 closed unmerged after sprawling into a large branch. The narrower follow-up work still landed, including PR #40248. That is a good sign: the workflow is producing reviewable problems, not just optimistic reports.

LintMonster turns diagnostics into repository work

LintMonster operates later in the loop. It runs make golint-custom, groups findings by root cause, creates or updates issues, and can assign up to three Copilot agent sessions to fix them.

Its evidence trail is easy to follow:

Issue #40932 grouped four resource-lifecycle and context-propagation findings; PR #41589 merged the targeted fixes.
Issue #40933 tracked hard-coded path constants; PR #41611 replaced the flagged literals with existing constants.
Issue #39314 established an authoritative function-length backlog for 653 findings.
Issue #41466 refreshed that same backlog at 660 findings and kept it consolidated instead of spawning duplicate tracking issues.

This is what makes the custom linter suite operational instead of decorative. Rules only matter if they change the repository. LintMonster is the workflow that turns diagnostics into queues, slices, assignments, and merged cleanup work.

Why the three-workflow loop matters

Taken together, the workflows separate three jobs that usually get conflated:

Invent a rule from a real pattern. Linter Miner does this with new analyzers such as timeafterleak and lenstringsplit.
Challenge the rule’s correctness. Sergo does this with issues such as #40947 and #41163.
Apply the rule to production code. LintMonster does this with issue-to-PR chains such as #40932 → #41589 and #40933 → #41611.

That split is why the system looks durable. New rules keep arriving. Old rules keep getting corrected. The repository keeps absorbing the results.

Further evidence

If you want to inspect the trail directly, start here:

Source workflows: Linter Miner, Sergo, LintMonster
Linter registry: cmd/linters/main.go
ADRs: 34498, 39133, 40837, 41090, 41285
Search views: [linter-miner] PRs, label:sergo issues, label:lint-monster issues

This is a useful pattern beyond gh-aw: treat static analysis as a living workflow system, not just a binary that runs in CI.

Weekly Update – June 22, 2026

Jun 22, 2026

Copilot

Another packed week at github/gh-aw! Over 20 pull requests merged between June 15 and June 22, covering a significant performance regression fix, a new Go linter, a major feature flag rollout, and a handful of targeted reliability improvements. Here’s what shipped.

Performance: +320% Compiler Regression Fixed

PR #40662 fixes a nasty regression in BenchmarkCompileComplexWorkflow that had quietly pushed compile times from ~3 ms/op to ~12.7 ms/op — a 320% slowdown. The culprit was validateTemplateInjection triggering a full yaml.Unmarshal on every pass through hasAnyExpressionInRunContent, even when skipValidation=true (the default in NewCompiler()). Eliminating that redundant unmarshal brings benchmark performance back to baseline. If your workflows felt slower to compile lately, this is the fix.

New Linter: `deferinloop`

PR #40679 adds a new Go analysis linter — deferinloop — that flags defer statements placed inside for-loop bodies. A defer inside a loop doesn’t fire at the end of each iteration; it fires when the enclosing function returns, causing resource leaks (file handles, connections) and confusing LIFO cleanup ordering. gocritic covers this pattern but is currently disabled due to golangci-lint v2 bugs, so this custom analyzer fills the gap and is now enforced in CI.

`gh-aw-detection` Rolls Out to 50% of Workflows

PR #40698 expands the gh-aw-detection feature flag from 20% (43 workflows) to 50% of agentic workflows (107 out of 214). The rollout targets workflows alphabetically and adds features: gh-aw-detection: true to the 64 newly included workflows. If you’re watching detection coverage metrics, expect a notable jump.

Reliability Fixes

JSON-RPC Error Handling

PR #40715 fixes a bug where handleMessage in the MCP server was surfacing [object Object] in error responses. The root cause: the catch block used String(e) for non-Error thrown values, but safe_outputs_handlers.cjs throws plain objects for validation errors — giving callers a useless stringification. The fix detects plain objects and serializes them correctly, and also enforces valid JSON-RPC error codes for all thrown values.

Skillet Sparse Checkout Path Typing

PR #40684 fixes a sparse checkout path typing issue in Skillet’s pre-activation skills checkout. A type mismatch was causing silent failures when resolving sparse checkout paths — the kind of bug that’s nearly invisible until it bites you.

Daily Observability Report Artifact Fetching

PR #40705 ensures the daily-observability-report workflow explicitly requests agent and detection artifact sets during log fetches. Without this, report generation could silently proceed without the required telemetry inputs, producing incomplete or noop outcomes.

Internals: FNV-1a Heredoc Delimiters

PR #40696 replaces SHA-256 with FNV-1a for heredoc delimiter generation. FNV-1a is dramatically faster for this use case — heredoc delimiters don’t need cryptographic-strength hashing, and the switch reduces overhead in the compiler’s string-processing path.

Token Optimization

PR #40695 reduces ambient prompt surface in high-traffic workflows. Trimming unnecessary context from the initial system prompt means fewer tokens on every invocation — the savings add up quickly when a workflow runs hundreds of times a day.

Agent of the Week: delight

Your repository’s resident UX guardian — scans documentation, CLI help text, workflow messages, and validation code for clarity, professionalism, and usability gaps, filing targeted single-file improvement tasks when it finds something worth fixing.

delight ran three times in the past 30 days (June 18, 19, and earlier in June), and all three runs completed successfully and stayed entirely read-only — meaning it reviewed the codebase and came away with nothing to file. For a workflow whose whole job is finding UX rough edges, that’s a quiet kind of compliment to the team. Each run, it randomly samples 1–2 documentation files, 1–2 CLI commands, 1–2 workflow message configurations, and 1 validation file, then evaluates them against five enterprise UX design principles: clarity, professional communication, efficiency, trust, and documentation quality.

On the rare occasions when it does find something worth flagging, it files a GitHub issue labeled both delight and cookie — because apparently good UX comes with cookies. It’s capped at 2 issues per run so it never floods your backlog, and it keeps a rolling memory of past findings to avoid flagging the same thing twice.

Usage tip: Run delight in any repo where user-facing quality matters — its single-file task constraint means every improvement it suggests is scoped, reviewable, and completable in an afternoon.

→ View the workflow on GitHub

Try It Out

Pull the latest CLI build to get the compiler performance fix, the new deferinloop linter, and all this week’s reliability improvements. As always, feedback and contributions are welcome at github/gh-aw.

Blog

Release: v0.82.8

What’s New

Notable Pull Requests This Week

Agent of the Week: aw-failure-investigator

Try It Out

Notable Pull Requests

fix(compiler): auto-add pre_activation to safe_outputs/conclusion needs

refactor(linters): consolidate AST/context helpers into internal/astutil

ambient-context: reduce copilot-agent-analysis first-request size by ~28%

Add shared prompt quality gate for plateaued agent-review workflows

fix(setup/js): numeric coercion, setOutput stringification, and async entrypoint cleanup

Agent of the Week: Weekly Issue Summary

Try It Out

New: Copilot Canvas Extension for Agentic Workflows

Security: Sandbox Hardening Reaches 80%

Code Scanning Fixer: Now Covers All Severity Levels

Runtime: mcpg v0.3.32 + Firewall v0.27.13

Other Merges Worth Noting

Agent of the Week: agent-persona-explorer

Linter Miner keeps adding new rules

Sergo pressure-tests the linters after they land

LintMonster turns diagnostics into repository work

Why the three-workflow loop matters

Further evidence

Performance: +320% Compiler Regression Fixed

New Linter: deferinloop

gh-aw-detection Rolls Out to 50% of Workflows

Reliability Fixes

JSON-RPC Error Handling

Skillet Sparse Checkout Path Typing

Daily Observability Report Artifact Fetching

Internals: FNV-1a Heredoc Delimiters

Token Optimization

Agent of the Week: delight

Try It Out

fix(compiler): auto-add `pre_activation` to `safe_outputs`/`conclusion` needs

refactor(linters): consolidate AST/context helpers into `internal/astutil`

New Linter: `deferinloop`

`gh-aw-detection` Rolls Out to 50% of Workflows