Stu Kennedy · stu.kennedy@multiverse.io · May 2026

AI Engineering
at Scale

What happens when cost stops being the constraint —
and agents become your entire engineering org.

~100
Parallel Agents
Every
Commit Reviewed
6mo
Stale Issues Closed
Zero
Security Gaps Missed
The Premise
"How would we build software in the future
if tokens don't matter?" — Peter Steinberger, on building OpenClaw

This isn't speculative. One team is running ~100 cloud agents continuously —
reviewing code, triaging issues, hunting bugs, shipping fixes.
Here are the patterns they use — and how any AI engineering team can adopt them.

10 Production Patterns

The agent fleet running a modern open-source project.

01

Continuous Code Review

Every PR, every commit — agents review before humans touch it.

02

Security Gate

Dedicated security review on every commit. Humans miss things.

03

Stale Issue Resolution

6-month-old issues matched to recent fixes and auto-closed.

04

Issue Triage & Clustering

Deduplicate issues, find clusters, surface the pressing ones.

05

Autonomous Fix PRs

New issues matched to project vision → auto-generated PRs.

06

Spam & Abuse Defense

Scanning comments, blocking bad actors, keeping signal high.

07

Performance Regression

Benchmark agents that catch regressions and alert the team.

08

Meeting-Driven Agents

Listen in meetings, start work on discussed features in real-time.

09

Ephemeral Reproduction

Spin up disposable machines, reproduce bugs, record evidence.

10

Functional Decomposition

Split projects into units for targeted bug & security scanning.

Pattern 01

Continuous Code Review

Every PR and every commit reviewed by agents — before a human ever opens the file.

Traditional Review

  • Humans review when they have time
  • Bottleneck at senior engineers
  • Context-switching cost is high
  • Inconsistent depth across reviewers
  • Security review is a separate process

Agent-Fleet Review

  • Every commit reviewed within seconds
  • Consistent standards, no fatigue
  • Parallel review — ~100 agents at once
  • Security baked into every review pass
  • Humans approve, agents surface issues
The key insight: This isn't replacing human review — it's augmenting it. Humans make final decisions. Agents ensure nothing is missed.
Pattern 02

Security Gate

Dedicated security-focused agents scan every commit — because it's far too easy to miss things.

🔍 Deep Security Scanning

Agents trained specifically on security patterns review every diff. Not "also check security" — dedicated security agents with full context of the codebase's threat model.

🛡️ Dual-Layer Approach

Using both custom agents and tools like Vercel's deepsec + Codex Security in parallel — catching regressions and new vulnerabilities that either system alone would miss.

TRIGGER
git push
SECURITY AGENTS
Codex Sec
deepsec
Custom Rules
OUTPUT
PR Comment
Block Merge
Alert Team
Pattern 03

Stale Issue Resolution

When a fix lands on main, agents find the 6-month-old issue and close it with an exact reference.

EVENT
Commit merges to main
AGENT: ISSUE MATCHER
Diff Analysis
Issue Search
Semantic Match
ACTION
Close Issue
+
Reference Commit
Why this matters: Issue backlogs are entropy. Every unresolved issue is a drag on velocity and morale. Agents turn the backlog into a self-cleaning system — the more you ship, the cleaner it gets.
Pattern 04

Issue Triage & Clustering

Deduplicate reports, find patterns, and surface what actually matters.

🔗 Deduplication

When 15 people report the same bug, the agent recognizes the pattern, merges the issues, and preserves unique context from each report.

NLP similarity stack trace match

📊 Cluster Detection

Groups issues by root cause, not symptom. Three different error messages might all trace back to one race condition.

root cause graph analysis

🚨 Priority Reports

Generates weekly reports of the most pressing clusters — ranked by user impact, frequency, and alignment to roadmap.

impact scoring roadmap align
For your team: Start with deduplication — it's the highest-ROI agent you can build. One agent that merges duplicate GitHub issues saves hours per week immediately.
Pattern 05

Autonomous Fix PRs

Watch new issues. If the fix aligns with the documented vision — generate a PR automatically.

📋 The Pipeline

  • New issue opened
  • Agent reads issue + project vision docs
  • Semantic alignment check — does this fit?
  • Agent generates code fix
  • Agent opens PR with issue reference
  • Another agent reviews the PR
  • Human does final approval

🧠 Design Principles

  • Vision-aligned: Only acts when the fix matches documented project direction
  • Dual-agent: Creator and reviewer are separate agents — no self-approval
  • Human gate: Final merge always requires human approval
  • Full context: Agents have access to codebase, tests, docs
The critical guardrail: "Documented vision" is the constraint. Without it, agents will "fix" things that shouldn't exist. The vision doc is the alignment layer between human intent and agent action.
Patterns 06–07

Spam Defense & Performance

Two always-on agents that protect quality from opposite directions.

🛡️ Spam & Abuse Defense

Agents continuously scan issue comments, PR comments, and discussion threads for spam, abuse, and off-topic content.

Action: auto-hide, auto-block, flag for human review. Keeps the signal-to-noise ratio high on public repos.

comment scanning auto-block moderation queue

⚡ Performance Regression Watch

Agents run benchmarks on every meaningful change and compare against baselines.

Regressions get reported to Discord immediately — with the specific commit, the metric that regressed, and the magnitude.

benchmark suite regression alert Discord webhook
Common thread: These are always-on background agents — not CI jobs that run on schedule. They're part of the environment, like a immune system for your codebase.
Patterns 08–09

Meeting-Driven & Ephemeral

Agents that act in real-time during discussions, and disposable environments for reproduction.

🎙️ Meeting-Driven Agents

Agents listen to team meetings (via transcription). When a feature is discussed, they proactively start work — creating PRs while the discussion is still happening.

The team finishes the call and the first draft is already waiting.

real-time transcription intent extraction draft PR

🖥️ Ephemeral Reproduction

Agents spin up disposable environments (crabbox.sh machines), reproduce complex bugs, log into services, record before/after videos, and post evidence on the PR.

Full reproduction pipeline: environment → bug → fix → video proof. All automated.

ephemeral VMs video evidence repro pipeline
Pattern 10

Functional Decomposition

Split the entire project into functional units — then scan each one independently for bugs, regressions, and vulnerabilities.

MONOLITH REPO
Full Codebase
↓ clawpatch.ai
FUNCTIONAL UNITS
Auth
API
WebSocket
Storage
CLI
PARALLEL AGENT SCAN
Security
Bug Hunt
Regression
Performance
OUTPUT
Report per unit
Auto-fix PRs
Priority ranking
Why decomposition matters: Agents have context windows. Scanning a 100k-file repo as one blob misses deep issues. Splitting into functional units gives each agent focused scope — and focused scope means deeper analysis.

The Full Architecture

How all 10 patterns connect into a single agent-powered engineering system.

EVENT SOURCES
git push
new issue
new comment
meeting
schedule
AGENT FLEET (~100 parallel)
Code Review
Security
Issue Match
Triage
Auto-Fix
Benchmarks
CROSS-CUTTING CAPABILITIES
Ephemeral VMs
Video Recording
Functional Split
Vision Docs
OUTPUTS
PR Reviews
Auto PRs
Issue Closes
Discord Alerts
Human Approval

Adoption Roadmap

How to bring these patterns to your team — in priority order.

🔴 Week 1–2: Code Review Agent

Start with automated PR review on every commit. Highest immediate ROI. Use existing tools: Codex, Claude Code, or custom agents triggered by GitHub webhooks.

start here webhook → agent → PR comment

🔴 Week 2–3: Security Gate

Add a dedicated security review agent. Run alongside code review — different system prompt, different focus. Block merge on critical findings.

security-first deepsec + custom rules

🟢 Month 1: Issue Triage + Stale Cleanup

Build the issue matcher. When commits land, search open issues for semantic matches. Start with keyword matching, evolve to embedding-based similarity.

high ROI embeddings + heuristics

🟣 Month 2: Auto-Fix Pipeline

The big one. Issue → vision check → code generation → PR → review by second agent → human approval. Requires documented project vision to work safely.

most complex vision doc required

🟢 Month 2–3: Performance Benchmarks

Always-on benchmark agent. Run on every merge to main. Alert on regression. Start with your existing test suite — just wrap it in an agent loop.

easy win baseline + delta alert

🟡 Month 3+: Advanced Patterns

Meeting-driven agents, ephemeral reproduction, functional decomposition. These require more infrastructure but deliver compound returns over time.

infrastructure needed highest long-term value
The Economics

"But what about cost?"

The question isn't whether you can afford to run 100 agents.
It's whether you can afford not to.

💰 Without Agent Fleet

  • 3 senior engineers doing code review (expensive)
  • Security review is intermittent at best
  • Issue backlog grows monotonically
  • Bugs found in production, not in review
  • Manual reproduction of every reported bug
  • Performance regressions found by users

🤖 With Agent Fleet

  • Senior engineers focus on architecture decisions
  • Every commit reviewed for security, always
  • Issue backlog is self-cleaning
  • Bugs caught pre-merge by parallel review
  • Automated reproduction with video evidence
  • Performance regressions caught instantly
The shift: Stop thinking of AI spend as a cost center. It's an engineering multiplier. One senior engineer + 100 agents outperforms a team of 20 working traditionally.

What You Actually Need

The infrastructure requirements are simpler than you think.

📝 Documented Vision

A written, version-controlled document describing what the project is, what it's not, and where it's going. This is the alignment layer for every autonomous agent.

required first

🔗 Webhook Infrastructure

GitHub webhooks → agent dispatch. Every event (push, issue, PR, comment) triggers the right agent. Can be as simple as a Cloudflare Worker.

GH webhooks CF Worker

🧠 Agent Runtime

Cloud agents that can run code, read repos, and open PRs. Codex, Claude Code, or custom. Need: code execution, git access, PR creation.

Codex / Claude git + GH API

📊 Baseline Metrics

Benchmark suite for your critical paths. Without baselines, you can't detect regressions. Start with 5–10 key benchmarks, not 500.

benchmarks delta detection

💬 Alert Channel

Discord/Slack channel for agent reports. Agents need somewhere to surface findings. Keep it high-signal — only actionable alerts.

Discord Slack GH comments

🛡️ Human Gate

Final approval always requires a human. Agents propose, humans decide. This is the safety rail that makes autonomy safe.

non-negotiable merge protection
"All that automation allows us
to run this project extremely lean." — Peter Steinberger

The future isn't replacing engineers.
It's multiplying them.

One engineer + 100 agents > 20 engineers working traditionally.
The teams that figure this out first will move impossibly fast.

OpenClaw Codex Claude Code crabbox.sh clawpatch.ai