Excellent journey! Now it’s time to plunge into the observatory - the nerve center of Peli’s Agent Factory! Where we watch everything and know all!
In our previous post, we explored quality and hygiene workflows - the vigilant caretakers that investigate failed CI runs, detect schema drift, and catch breaking changes before users do. These workflows maintain codebase health by spotting problems before they escalate.
But here’s a question: when you’re running dozens of AI agents, how do you know if they’re actually working well? How do you spot performance issues, cost problems, or quality degradation? That’s where metrics and analytics workflows come in - they’re the agents that monitor other agents, turning raw activity data into actionable insights. This is where we got meta and built our central nervous system.
Audit Workflows - A meta-agent that audits all the other agents’ runs - very Inception
Here’s where things got meta: we built agents to monitor agents. The Metrics Collector became our central nervous system, gathering performance data that feeds into higher-level orchestrators. What we learned: you can’t optimize what you don’t measure. The Portfolio Analyst was eye-opening - it identified workflows that were costing us money unnecessarily (turns out some agents were way too chatty with their LLM calls).
These workflows taught us that observability isn’t optional when you’re running dozens of AI agents - it’s the difference between a well-oiled machine and an expensive black box.
Then edit and remix the workflow specifications to meet your needs, recompile using gh aw compile, and push to your repository. See our Quick Start for further installation and setup instructions.
Ah, splendid! Welcome back to Peli’s Agent Factory! Come, let me show you the chamber where vigilant caretakers investigate faults before they escalate!
In our previous post, we explored issue and PR management workflows.
Now let’s shift from collaboration ceremony to fault investigation.
While issue workflows help us handle what comes in, fault investigation workflows act as vigilant caretakers - spotting problems before they escalate and keeping our codebase healthy. These are the agents that investigate failed CI runs, detect schema drift, and catch breaking changes before users do.
The CI Doctor was one of our most important workflows. Instead of drowning in CI failure notifications, we now get timely, investigated failures with actual diagnostic insights. The agent doesn’t just tell us something broke - it analyzes logs, identifies patterns, searches for similar past issues, and even suggests fixes - even before the human has read the failure notification. We learned that agents excel at the tedious investigation work that humans find draining.
The Schema Consistency Checker caught drift that would have taken us days to notice manually.
These “hygiene” workflows became our first line of defense, catching issues before they reached users.
The CI Doctor has inspired a growing range of similar workflows inside GitHub, where agents proactively do depth investigations of site incidents and failures. This is the future of operational excellence: AI agents kicking in immediately to do depth investigation, for faster organizational response.
Then edit and remix the workflow specifications to meet your needs, recompile using gh aw compile, and push to your repository. See our Quick Start for further installation and setup instructions.
Next up, we look at workflows which help us understand if the agent collection as a whole is working well That’s where metrics and analytics workflows come in.
Ah! Let’s discuss the art of managing issues and pull requests at Peli’s Agent Factory! A most delicious topic indeed!
In our previous post, we explored documentation and content workflows - agents that maintain glossaries, technical docs, slide decks, and blog content. We learned how we took a heterogeneous approach to documentation agents - some workflows generate content, others maintain it, and still others validate it.
Now let’s talk about the daily rituals of software development: managing issues and pull requests. GitHub provides excellent primitives for collaboration, but there’s ceremony involved - linking related issues, merging main into PR branches, assigning work, closing completed sub-issues, optimizing templates. These are small papercuts individually, but they can add up to significant friction.
The Issue Arborist automatically links related issues, building a dependency tree we’d never maintain manually.
The Issue Monster became our task dispatcher for AI agents - it assigns one issue at a time to Copilot agents, preventing the chaos of parallel work on the same codebase.
Mergefest eliminates the “please merge main” dance that happens on long-lived PRs.
The Issue Template Optimizer analyzes which fields in our templates actually get filled out and suggests improvements (“nobody uses the ‘Expected behavior’ field, remove it”).
Issue and PR management workflows don’t replace GitHub’s features; they enhance them, removing ceremony and making collaboration feel smoother.
Then edit and remix the workflow specifications to meet your needs, recompile using gh aw compile, and push to your repository. See our Quick Start for further installation and setup instructions.
Step right up, step right up, and enter the documentation chamber of Peli’s Agent Factory! Pure imagination meets technical accuracy in this most delightful corner of our establishment!
In our previous posts, we explored autonomous cleanup agents - workflows that continuously improve code quality by simplifying complexity, refactoring structure, polishing style, and maintaining overall repository health. These agents never take a day off, quietly working to make our codebase better.
Now let’s address one of software development’s eternal challenges: keeping documentation accurate and up-to-date. Code evolves rapidly; docs… not so much. Terminology drifts, API examples become outdated, slide decks grow stale, and blog posts reference deprecated features. The question isn’t “can AI agents write good documentation?” but rather “can they maintain it as code changes?” Documentation and content workflows challenge conventional wisdom about AI-generated technical content. Spoiler: the answer involves human review, but it’s way better than the alternative (no docs at all).
Blog Auditor - Verifies blog posts are accessible and contain expected content
Documentation is where we challenged conventional wisdom. Can AI agents write good documentation?
The Technical Doc Writer generates API docs from code, but more importantly, it maintains them - updating docs when code changes. The Glossary Maintainer caught terminology drift (“we’re using three different terms for the same concept”).
The Slide Deck Maintainer keeps our presentation materials current without manual updates.
The Multi-device Docs Tester uses Playwright to verify our documentation site works across phones, tablets, and desktops - testing responsive layouts, accessibility, and interactive elements. It catches visual regressions and layout issues that only appear on specific screen sizes.
The Blog Auditor ensures our blog posts stay accurate as the codebase evolves - it flags outdated code examples and broken links.
AI-generated docs need human/agent review, but they’re dramatically better than no docs (which is often the alternative). Validation can be automated to a large extent, freeing writers to focus on content shaping, topic, clarity, tone, and accuracy.
In this collection of agents, we took a heterogeneous approach - some workflows generate content, others maintain it, and still others validate it. Other approaches are possible - all tasks can be rolled into a single agent. We found that it’s easier to explore the space by using multiple agents, to separate concerns, and that encouraged us to use agents for other communication outputs such as blogs and slides.
Then edit and remix the workflow specifications to meet your needs, recompile using gh aw compile, and push to your repository. See our Quick Start for further installation and setup instructions.
In our previous posts, we’ve explored autonomous cleanup agents that continuously improve code: simplifying complexity, refactoring structure, and polishing style. Now we complete the picture with agents that take a holistic view - analyzing dependencies, type safety patterns, and overall repository quality.
The Go Fan is perhaps the most uniquely characterized workflow in the factory - an “enthusiastic Go module expert” who performs daily deep-dive reviews of the project’s Go dependencies. This isn’t just dependency scanning - it’s thoughtful analysis of how well we’re using the tools we’ve chosen.
Most dependency tools focus on vulnerabilities or outdated versions. Go Fan asks deeper and more positive questions: Are we using this module’s best features? Have recent updates introduced better patterns we should adopt? Could we use a more appropriate module for this use case? Are we following the module’s recommended practices?
Go Fan uses an intelligent selection algorithm. It extracts direct dependencies from go.mod, fetches GitHub metadata for each dependency including last update time, sorts by recency to prioritize recently updated modules, uses round-robin selection to cycle through modules ensuring comprehensive coverage, and maintains persistent memory through cache-memory to track which modules were recently reviewed.
This ensures recently updated modules get reviewed first since new features might be relevant, all modules eventually get reviewed so nothing is forgotten, and reviews don’t repeat unnecessarily thanks to cache tracking.
For each selected module, Go Fan researches the module’s repository including recent releases and changelog entries, documentation and best practices, and example usage patterns. It analyzes the project’s actual usage by using Serena to find all imports and usage, examining actual code patterns, and identifying gaps between best practices and current usage. Then it generates recommendations suggesting better usage patterns, highlighting new features worth adopting, and identifying potential issues or anti-patterns. Finally, it saves summaries under scratchpad/mods/ and opens GitHub Discussions with findings, complete with specific code examples and recommendations.
The kinds of insights Go Fan produces are quite specific: “The Lipgloss update added adaptive color support - we’re still using fixed colors in 12 places,” or “Cobra now recommends using ValidArgsFunction instead of ValidArgs - we should migrate,” or “We’re using low-level HTTP client code - the go-gh module we already have provides better abstractions.”
The 30-minute timeout gives Go Fan substantial time to do deep research, making each review thorough and actionable.
The Typist analyzes Go type usage patterns with a singular focus: improving type safety. It hunts for untyped code that should be strongly typed, and identifies duplicated type definitions that create confusion.
Typist looks for untyped usages: interface{} or any where specific types would be better, untyped constants that should have explicit types, and type assertions that could be eliminated with better design. It also hunts for duplicated type definitions - the same types defined in multiple packages, similar types with different names, and type aliases that could be unified.
Using grep patterns to find type definitions, interface{} usage, and any usage combined with Serena’s semantic analysis, Typist discovers type definitions across the codebase, identifies semantic duplicates that are structurally similar, analyzes usage patterns where untyped code appears, and generates specific actionable refactoring recommendations.
Strong typing catches bugs at compile time, documents intent, and makes code easier to understand. But as codebases evolve, quick prototypes use any for flexibility, similar types emerge in different packages, and type information gets lost in translations.
Typist trails behind development, systematically identifying opportunities to strengthen type safety without slowing down feature development.
Typist creates discussions rather than issues because type safety improvements often involve architectural decisions that benefit from team conversation. Each discussion includes specific file references and line numbers, current problematic patterns, suggested type definitions, and migration path recommendations.
Today’s hybrid languages like Go, C# and F# support both strong and dynamic typing. Seeing strong typing as arising from continuous improvement area is a particularly novel insight: rather than enforcing strict typing upfront, we can develop quickly with flexibility, then let autonomous agents like Typist trail behind, strengthening type safety over time.
The Functional Pragmatist systematically identifies opportunities to apply moderate, tasteful functional programming techniques to improve code clarity, safety, and maintainability. Unlike dogmatic functional approaches, this workflow balances pragmatism with functional purity.
The workflow focuses on seven key patterns: immutability (making data immutable where there’s no existing mutation), functional initialization (using composite literals and declarative patterns), transformative operations (leveraging map/filter/reduce approaches), functional options pattern (using option functions for flexible configuration), avoiding shared mutable state (eliminating global variables), pure functions (extracting calculations from side effects), and reusable logic wrappers (creating higher-order functions for retry, logging, caching).
The enhancement process follows a structured approach. During discovery, it searches for variables that could be immutable, imperative loops that could be transformative, initialization anti-patterns, constructors that could use functional options, shared mutable state (global variables and mutexes), functions with side effects that could be pure, and repeated logic patterns that could use wrappers.
For each opportunity, it scores by safety improvement (reduces mutation risk), clarity improvement (makes code more readable), testability improvement (makes code easier to test), and risk level (lower risk gets higher priority). Using Serena for deep analysis, it understands full context, identifies dependencies and side effects, verifies no hidden mutations, and designs specific improvements.
Implementation examples include converting mutable initialization to immutable patterns (using composite literals instead of incremental building), transforming constructors to use functional options (allowing extensible APIs without breaking changes), eliminating global state through explicit parameter passing, extracting pure functions from impure code (separating calculations from I/O), and creating reusable wrappers like Retry[T] with exponential backoff, WithTiming[T] for performance logging, and Memoize[K,V] for caching expensive computations.
The workflow applies principles of immutability first (variables are immutable unless mutation is necessary), declarative over imperative (initialization expresses “what” not “how”), transformative over iterative (data transformations use functional patterns), explicit parameters (pass dependencies rather than using globals), pure over impure (separate calculations from side effects), and composition over complexity (build behavior from simple wrappers).
What makes this workflow particularly effective is its pragmatism. It doesn’t force functional purity at the cost of clarity. Go’s simple, imperative style is respected - sometimes a for-loop is clearer than a functional helper. The workflow only adds abstraction where it genuinely improves code, focusing on low-risk changes like converting var x T; x = value to x := value, using composite literals, and extracting pure helper functions.
The result is code that’s safer (reduced mutation surface area), more testable (pure functions need no mocks), more maintainable (functional patterns are easier to reason about), and more extensible (functional options allow API evolution). The workflow runs on a schedule (Tuesday and Thursday mornings), systematically improving functional patterns across the entire codebase over time.
The Repository Quality Improver takes the widest view of any workflow we’ve discussed. Rather than focusing on a specific aspect (simplicity, refactoring, styling, types), it selects a focus area each day and analyzes the repository from that perspective.
The workflow uses cache memory to track which areas it has recently analyzed, ensuring diverse coverage through a careful distribution: roughly 60% custom areas exploring repository-specific concerns that emerge from analysis, 30% standard categories covering fundamentals like code quality, documentation, testing, security, and performance, and 10% reuse occasionally revisiting areas for consistency. This distribution ensures novel insights from creative focus areas, systematic coverage of fundamental concerns, and periodic verification that previous improvements held.
Standard categories include code quality and static analysis, documentation completeness, testing coverage and quality, security best practices, and performance optimization. Custom areas are repository-specific: error message consistency, CLI flag naming conventions, workflow YAML generation patterns, console output formatting, and configuration file validation.
The analysis workflow loads history by checking cache for recent focus areas, selects the next area based on rotation strategy, spends 20 minutes on deep analysis from that perspective, generates discussions with actionable recommendations, and saves state by updating cache with this run’s focus area.
A repository is more than the sum of its parts. Individual workflows optimize specific concerns, but quality emerges from balance. Is error handling consistent across the codebase? Do naming conventions align throughout? Are architectural patterns coherent? Does the overall structure make sense?
The Repository Quality Improver looks for these cross-cutting concerns that don’t fit neatly into “simplify” or “refactor” but nonetheless impact overall quality.
Together, these workflows complete the autonomous improvement picture. Go Fan ensures our dependencies stay fresh and well-used, Typist systematically strengthens type safety, Functional Pragmatist applies moderate functional techniques for clarity and safety, and Repository Quality Improver maintains overall coherence.
Combined with our earlier workflows covering simplicity, refactoring, and style, we now have agents that continuously improve code at every level: the Terminal Stylist ensures beautiful output at the line level, Code Simplifier removes complexity at the function level, Semantic Function Refactor improves organization at the file level, Go Pattern Detector enforces consistency at the pattern level, Functional Pragmatist applies functional patterns for clarity and safety, Typist strengthens type safety at the type level, Go Fan optimizes dependencies at the module level, and Repository Quality Improver maintains coherence at the repository level.
This is the future of code quality: not periodic cleanup sprints, but continuous autonomous improvement across every dimension simultaneously.
Then edit and remix the workflow specifications to meet your needs, recompile using gh aw compile, and push to your repository. See our Quick Start for further installation and setup instructions.