Stu Kennedy · stu.kennedy@multiverse.io · May 2026
AI Engineering
at Scale
What happens when cost stops being the constraint —
and agents become your entire engineering org.
Zero
Security Gaps Missed
The Premise
"How would we build software in the future
if tokens don't matter?"
— Peter Steinberger, on building OpenClaw
This isn't speculative. One team is running ~100 cloud agents continuously —
reviewing code, triaging issues, hunting bugs, shipping fixes.
Here are the patterns they use — and how any AI engineering team can adopt them.
10 Production Patterns
The agent fleet running a modern open-source project.
01
Continuous Code Review
Every PR, every commit — agents review before humans touch it.
02
Security Gate
Dedicated security review on every commit. Humans miss things.
03
Stale Issue Resolution
6-month-old issues matched to recent fixes and auto-closed.
04
Issue Triage & Clustering
Deduplicate issues, find clusters, surface the pressing ones.
05
Autonomous Fix PRs
New issues matched to project vision → auto-generated PRs.
06
Spam & Abuse Defense
Scanning comments, blocking bad actors, keeping signal high.
07
Performance Regression
Benchmark agents that catch regressions and alert the team.
08
Meeting-Driven Agents
Listen in meetings, start work on discussed features in real-time.
09
Ephemeral Reproduction
Spin up disposable machines, reproduce bugs, record evidence.
10
Functional Decomposition
Split projects into units for targeted bug & security scanning.
Pattern 01
Continuous Code Review
Every PR and every commit reviewed by agents — before a human ever opens the file.
Traditional Review
- Humans review when they have time
- Bottleneck at senior engineers
- Context-switching cost is high
- Inconsistent depth across reviewers
- Security review is a separate process
Agent-Fleet Review
- Every commit reviewed within seconds
- Consistent standards, no fatigue
- Parallel review — ~100 agents at once
- Security baked into every review pass
- Humans approve, agents surface issues
The key insight: This isn't replacing human review — it's augmenting it. Humans make final decisions. Agents ensure nothing is missed.
Pattern 02
Security Gate
Dedicated security-focused agents scan every commit — because it's far too easy to miss things.
🔍 Deep Security Scanning
Agents trained specifically on security patterns review every diff. Not "also check security" — dedicated security agents with full context of the codebase's threat model.
🛡️ Dual-Layer Approach
Using both custom agents and tools like Vercel's deepsec + Codex Security in parallel — catching regressions and new vulnerabilities that either system alone would miss.
TRIGGER
↓
SECURITY AGENTS
Codex Sec
deepsec
Custom Rules
↓
OUTPUT
PR Comment
Block Merge
Alert Team
Pattern 03
Stale Issue Resolution
When a fix lands on main, agents find the 6-month-old issue and close it with an exact reference.
EVENT
↓
AGENT: ISSUE MATCHER
Diff Analysis
→
Issue Search
→
Semantic Match
↓
ACTION
Close Issue
+
Reference Commit
Why this matters: Issue backlogs are entropy. Every unresolved issue is a drag on velocity and morale. Agents turn the backlog into a self-cleaning system — the more you ship, the cleaner it gets.
Pattern 04
Issue Triage & Clustering
Deduplicate reports, find patterns, and surface what actually matters.
🔗 Deduplication
When 15 people report the same bug, the agent recognizes the pattern, merges the issues, and preserves unique context from each report.
NLP similarity
stack trace match
📊 Cluster Detection
Groups issues by root cause, not symptom. Three different error messages might all trace back to one race condition.
root cause
graph analysis
🚨 Priority Reports
Generates weekly reports of the most pressing clusters — ranked by user impact, frequency, and alignment to roadmap.
impact scoring
roadmap align
For your team: Start with deduplication — it's the highest-ROI agent you can build. One agent that merges duplicate GitHub issues saves hours per week immediately.
Pattern 05
Autonomous Fix PRs
Watch new issues. If the fix aligns with the documented vision — generate a PR automatically.
📋 The Pipeline
- New issue opened
- Agent reads issue + project vision docs
- Semantic alignment check — does this fit?
- Agent generates code fix
- Agent opens PR with issue reference
- Another agent reviews the PR
- Human does final approval
🧠 Design Principles
- Vision-aligned: Only acts when the fix matches documented project direction
- Dual-agent: Creator and reviewer are separate agents — no self-approval
- Human gate: Final merge always requires human approval
- Full context: Agents have access to codebase, tests, docs
The critical guardrail: "Documented vision" is the constraint. Without it, agents will "fix" things that shouldn't exist. The vision doc is the alignment layer between human intent and agent action.
Patterns 06–07
Spam Defense & Performance
Two always-on agents that protect quality from opposite directions.
🛡️ Spam & Abuse Defense
Agents continuously scan issue comments, PR comments, and discussion threads for spam, abuse, and off-topic content.
Action: auto-hide, auto-block, flag for human review. Keeps the signal-to-noise ratio high on public repos.
comment scanning
auto-block
moderation queue
⚡ Performance Regression Watch
Agents run benchmarks on every meaningful change and compare against baselines.
Regressions get reported to Discord immediately — with the specific commit, the metric that regressed, and the magnitude.
benchmark suite
regression alert
Discord webhook
Common thread: These are always-on background agents — not CI jobs that run on schedule. They're part of the environment, like a immune system for your codebase.
Patterns 08–09
Meeting-Driven & Ephemeral
Agents that act in real-time during discussions, and disposable environments for reproduction.
🎙️ Meeting-Driven Agents
Agents listen to team meetings (via transcription). When a feature is discussed, they proactively start work — creating PRs while the discussion is still happening.
The team finishes the call and the first draft is already waiting.
real-time transcription
intent extraction
draft PR
🖥️ Ephemeral Reproduction
Agents spin up disposable environments (crabbox.sh machines), reproduce complex bugs, log into services, record before/after videos, and post evidence on the PR.
Full reproduction pipeline: environment → bug → fix → video proof. All automated.
ephemeral VMs
video evidence
repro pipeline
Pattern 10
Functional Decomposition
Split the entire project into functional units — then scan each one independently for bugs, regressions, and vulnerabilities.
MONOLITH REPO
↓ clawpatch.ai
FUNCTIONAL UNITS
Auth
API
WebSocket
Storage
CLI
↓
PARALLEL AGENT SCAN
Security
Bug Hunt
Regression
Performance
↓
OUTPUT
Report per unit
Auto-fix PRs
Priority ranking
Why decomposition matters: Agents have context windows. Scanning a 100k-file repo as one blob misses deep issues. Splitting into functional units gives each agent focused scope — and focused scope means deeper analysis.
The Full Architecture
How all 10 patterns connect into a single agent-powered engineering system.
EVENT SOURCES
git push
new issue
new comment
meeting
schedule
↓
AGENT FLEET (~100 parallel)
Code Review
Security
Issue Match
Triage
Auto-Fix
Benchmarks
↓
CROSS-CUTTING CAPABILITIES
Ephemeral VMs
Video Recording
Functional Split
Vision Docs
↓
OUTPUTS
PR Reviews
Auto PRs
Issue Closes
Discord Alerts
Human Approval
Adoption Roadmap
How to bring these patterns to your team — in priority order.
🔴 Week 1–2: Code Review Agent
Start with automated PR review on every commit. Highest immediate ROI. Use existing tools: Codex, Claude Code, or custom agents triggered by GitHub webhooks.
start here
webhook → agent → PR comment
🔴 Week 2–3: Security Gate
Add a dedicated security review agent. Run alongside code review — different system prompt, different focus. Block merge on critical findings.
security-first
deepsec + custom rules
🟢 Month 1: Issue Triage + Stale Cleanup
Build the issue matcher. When commits land, search open issues for semantic matches. Start with keyword matching, evolve to embedding-based similarity.
high ROI
embeddings + heuristics
🟣 Month 2: Auto-Fix Pipeline
The big one. Issue → vision check → code generation → PR → review by second agent → human approval. Requires documented project vision to work safely.
most complex
vision doc required
🟢 Month 2–3: Performance Benchmarks
Always-on benchmark agent. Run on every merge to main. Alert on regression. Start with your existing test suite — just wrap it in an agent loop.
easy win
baseline + delta alert
🟡 Month 3+: Advanced Patterns
Meeting-driven agents, ephemeral reproduction, functional decomposition. These require more infrastructure but deliver compound returns over time.
infrastructure needed
highest long-term value
The Economics
"But what about cost?"
The question isn't whether you can afford to run 100 agents.
It's whether you can afford not to.
💰 Without Agent Fleet
- 3 senior engineers doing code review (expensive)
- Security review is intermittent at best
- Issue backlog grows monotonically
- Bugs found in production, not in review
- Manual reproduction of every reported bug
- Performance regressions found by users
🤖 With Agent Fleet
- Senior engineers focus on architecture decisions
- Every commit reviewed for security, always
- Issue backlog is self-cleaning
- Bugs caught pre-merge by parallel review
- Automated reproduction with video evidence
- Performance regressions caught instantly
The shift: Stop thinking of AI spend as a cost center. It's an engineering multiplier. One senior engineer + 100 agents outperforms a team of 20 working traditionally.
What You Actually Need
The infrastructure requirements are simpler than you think.
📝 Documented Vision
A written, version-controlled document describing what the project is, what it's not, and where it's going. This is the alignment layer for every autonomous agent.
required first
🔗 Webhook Infrastructure
GitHub webhooks → agent dispatch. Every event (push, issue, PR, comment) triggers the right agent. Can be as simple as a Cloudflare Worker.
GH webhooks
CF Worker
🧠 Agent Runtime
Cloud agents that can run code, read repos, and open PRs. Codex, Claude Code, or custom. Need: code execution, git access, PR creation.
Codex / Claude
git + GH API
📊 Baseline Metrics
Benchmark suite for your critical paths. Without baselines, you can't detect regressions. Start with 5–10 key benchmarks, not 500.
benchmarks
delta detection
💬 Alert Channel
Discord/Slack channel for agent reports. Agents need somewhere to surface findings. Keep it high-signal — only actionable alerts.
Discord
Slack
GH comments
🛡️ Human Gate
Final approval always requires a human. Agents propose, humans decide. This is the safety rail that makes autonomy safe.
non-negotiable
merge protection
"All that automation allows us
to run this project extremely lean."
— Peter Steinberger
The future isn't replacing engineers.
It's multiplying them.
One engineer + 100 agents > 20 engineers working traditionally.
The teams that figure this out first will move impossibly fast.
OpenClaw
Codex
Claude Code
crabbox.sh
clawpatch.ai