Stu Kennedy · stu.kennedy@multiverse.io · May 2026

AI Engineering
at Scale

What happens when cost stops being the constraint —
and agents become your entire engineering org.

~100

Parallel Agents

Every

Commit Reviewed

6mo

Stale Issues Closed

Zero

Security Gaps Missed

The Premise

"How would we build software in the future
if tokens don't matter?" — Peter Steinberger, on building OpenClaw

This isn't speculative. One team is running ~100 cloud agents continuously —
reviewing code, triaging issues, hunting bugs, shipping fixes.
Here are the patterns they use — and how any AI engineering team can adopt them.

10 Production Patterns

The agent fleet running a modern open-source project.

01

Continuous Code Review

Every PR, every commit — agents review before humans touch it.

02

Security Gate

Dedicated security review on every commit. Humans miss things.

03

Stale Issue Resolution

6-month-old issues matched to recent fixes and auto-closed.

04

Issue Triage & Clustering

Deduplicate issues, find clusters, surface the pressing ones.

05

Autonomous Fix PRs

New issues matched to project vision → auto-generated PRs.

06

Spam & Abuse Defense

Scanning comments, blocking bad actors, keeping signal high.

07

Performance Regression

Benchmark agents that catch regressions and alert the team.

08

Meeting-Driven Agents

Listen in meetings, start work on discussed features in real-time.

09

Ephemeral Reproduction

Spin up disposable machines, reproduce bugs, record evidence.

10

Functional Decomposition

Split projects into units for targeted bug & security scanning.

Pattern 01

Continuous Code Review

Every PR and every commit reviewed by agents — before a human ever opens the file.

Traditional Review

Humans review when they have time
Bottleneck at senior engineers
Context-switching cost is high
Inconsistent depth across reviewers
Security review is a separate process

Agent-Fleet Review

Every commit reviewed within seconds
Consistent standards, no fatigue
Parallel review — ~100 agents at once
Security baked into every review pass
Humans approve, agents surface issues

The key insight: This isn't replacing human review — it's augmenting it. Humans make final decisions. Agents ensure nothing is missed.

Pattern 02

Security Gate

Dedicated security-focused agents scan every commit — because it's far too easy to miss things.

🔍 Deep Security Scanning

Agents trained specifically on security patterns review every diff. Not "also check security" — dedicated security agents with full context of the codebase's threat model.

🛡️ Dual-Layer Approach

Using both custom agents and tools like Vercel's deepsec + Codex Security in parallel — catching regressions and new vulnerabilities that either system alone would miss.

TRIGGER

git push

↓

SECURITY AGENTS

Codex Sec

deepsec

Custom Rules

↓

OUTPUT

PR Comment

Block Merge

Alert Team

Pattern 03

Stale Issue Resolution

When a fix lands on main, agents find the 6-month-old issue and close it with an exact reference.

EVENT

Commit merges to main

↓

AGENT: ISSUE MATCHER

Diff Analysis

→

Issue Search

→

Semantic Match

↓

ACTION

Close Issue

+

Reference Commit

Why this matters: Issue backlogs are entropy. Every unresolved issue is a drag on velocity and morale. Agents turn the backlog into a self-cleaning system — the more you ship, the cleaner it gets.

Pattern 04

Issue Triage & Clustering

Deduplicate reports, find patterns, and surface what actually matters.

🔗 Deduplication

When 15 people report the same bug, the agent recognizes the pattern, merges the issues, and preserves unique context from each report.

NLP similarity stack trace match

📊 Cluster Detection

Groups issues by root cause, not symptom. Three different error messages might all trace back to one race condition.

root cause graph analysis

🚨 Priority Reports

Generates weekly reports of the most pressing clusters — ranked by user impact, frequency, and alignment to roadmap.

impact scoring roadmap align

For your team: Start with deduplication — it's the highest-ROI agent you can build. One agent that merges duplicate GitHub issues saves hours per week immediately.

Pattern 05

Autonomous Fix PRs

Watch new issues. If the fix aligns with the documented vision — generate a PR automatically.

📋 The Pipeline

New issue opened
Agent reads issue + project vision docs
Semantic alignment check — does this fit?
Agent generates code fix
Agent opens PR with issue reference
Another agent reviews the PR
Human does final approval

🧠 Design Principles

Vision-aligned: Only acts when the fix matches documented project direction
Dual-agent: Creator and reviewer are separate agents — no self-approval
Human gate: Final merge always requires human approval
Full context: Agents have access to codebase, tests, docs

The critical guardrail: "Documented vision" is the constraint. Without it, agents will "fix" things that shouldn't exist. The vision doc is the alignment layer between human intent and agent action.

Patterns 06–07

Spam Defense & Performance

Two always-on agents that protect quality from opposite directions.

🛡️ Spam & Abuse Defense

Agents continuously scan issue comments, PR comments, and discussion threads for spam, abuse, and off-topic content.

Action: auto-hide, auto-block, flag for human review. Keeps the signal-to-noise ratio high on public repos.

comment scanning auto-block moderation queue

⚡ Performance Regression Watch

Agents run benchmarks on every meaningful change and compare against baselines.

Regressions get reported to Discord immediately — with the specific commit, the metric that regressed, and the magnitude.

benchmark suite regression alert Discord webhook

Common thread: These are always-on background agents — not CI jobs that run on schedule. They're part of the environment, like a immune system for your codebase.

Patterns 08–09

Meeting-Driven & Ephemeral

Agents that act in real-time during discussions, and disposable environments for reproduction.

🎙️ Meeting-Driven Agents

Agents listen to team meetings (via transcription). When a feature is discussed, they proactively start work — creating PRs while the discussion is still happening.

The team finishes the call and the first draft is already waiting.

real-time transcription intent extraction draft PR

🖥️ Ephemeral Reproduction

Agents spin up disposable environments (crabbox.sh machines), reproduce complex bugs, log into services, record before/after videos, and post evidence on the PR.

Full reproduction pipeline: environment → bug → fix → video proof. All automated.

ephemeral VMs video evidence repro pipeline

Pattern 10

Functional Decomposition

Split the entire project into functional units — then scan each one independently for bugs, regressions, and vulnerabilities.

MONOLITH REPO

Full Codebase

↓ clawpatch.ai

FUNCTIONAL UNITS

Auth

API

WebSocket

Storage

CLI

↓

PARALLEL AGENT SCAN

Security

Bug Hunt

Regression

Performance

↓

OUTPUT

Report per unit

Auto-fix PRs

Priority ranking

Why decomposition matters: Agents have context windows. Scanning a 100k-file repo as one blob misses deep issues. Splitting into functional units gives each agent focused scope — and focused scope means deeper analysis.

The Full Architecture

How all 10 patterns connect into a single agent-powered engineering system.

EVENT SOURCES

git push

new issue

new comment

meeting

schedule

↓

AGENT FLEET (~100 parallel)

Code Review

Security

Issue Match

Triage

Auto-Fix

Benchmarks

↓

CROSS-CUTTING CAPABILITIES

Ephemeral VMs

Video Recording

Functional Split

Vision Docs

↓

OUTPUTS

PR Reviews

Auto PRs

Issue Closes

Discord Alerts

Human Approval

Adoption Roadmap

How to bring these patterns to your team — in priority order.

🔴 Week 1–2: Code Review Agent

Start with automated PR review on every commit. Highest immediate ROI. Use existing tools: Codex, Claude Code, or custom agents triggered by GitHub webhooks.

start here webhook → agent → PR comment

🔴 Week 2–3: Security Gate

Add a dedicated security review agent. Run alongside code review — different system prompt, different focus. Block merge on critical findings.

security-first deepsec + custom rules

🟢 Month 1: Issue Triage + Stale Cleanup

Build the issue matcher. When commits land, search open issues for semantic matches. Start with keyword matching, evolve to embedding-based similarity.

high ROI embeddings + heuristics

🟣 Month 2: Auto-Fix Pipeline

The big one. Issue → vision check → code generation → PR → review by second agent → human approval. Requires documented project vision to work safely.

most complex vision doc required

🟢 Month 2–3: Performance Benchmarks

Always-on benchmark agent. Run on every merge to main. Alert on regression. Start with your existing test suite — just wrap it in an agent loop.

easy win baseline + delta alert

🟡 Month 3+: Advanced Patterns

Meeting-driven agents, ephemeral reproduction, functional decomposition. These require more infrastructure but deliver compound returns over time.

infrastructure needed highest long-term value

The Economics

"But what about cost?"

The question isn't whether you can afford to run 100 agents.
It's whether you can afford not to.

💰 Without Agent Fleet

3 senior engineers doing code review (expensive)
Security review is intermittent at best
Issue backlog grows monotonically
Bugs found in production, not in review
Manual reproduction of every reported bug
Performance regressions found by users

🤖 With Agent Fleet

Senior engineers focus on architecture decisions
Every commit reviewed for security, always
Issue backlog is self-cleaning
Bugs caught pre-merge by parallel review
Automated reproduction with video evidence
Performance regressions caught instantly

The shift: Stop thinking of AI spend as a cost center. It's an engineering multiplier. One senior engineer + 100 agents outperforms a team of 20 working traditionally.

What You Actually Need

The infrastructure requirements are simpler than you think.

📝 Documented Vision

A written, version-controlled document describing what the project is, what it's not, and where it's going. This is the alignment layer for every autonomous agent.

required first

🔗 Webhook Infrastructure

GitHub webhooks → agent dispatch. Every event (push, issue, PR, comment) triggers the right agent. Can be as simple as a Cloudflare Worker.

GH webhooks CF Worker

🧠 Agent Runtime

Cloud agents that can run code, read repos, and open PRs. Codex, Claude Code, or custom. Need: code execution, git access, PR creation.

Codex / Claude git + GH API

📊 Baseline Metrics

Benchmark suite for your critical paths. Without baselines, you can't detect regressions. Start with 5–10 key benchmarks, not 500.

benchmarks delta detection

💬 Alert Channel

Discord/Slack channel for agent reports. Agents need somewhere to surface findings. Keep it high-signal — only actionable alerts.

Discord Slack GH comments

🛡️ Human Gate

Final approval always requires a human. Agents propose, humans decide. This is the safety rail that makes autonomy safe.

non-negotiable merge protection

"All that automation allows us
to run this project extremely lean." — Peter Steinberger

The future isn't replacing engineers.
It's multiplying them.

One engineer + 100 agents > 20 engineers working traditionally.
The teams that figure this out first will move impossibly fast.

OpenClaw Codex Claude Code crabbox.sh clawpatch.ai

AI Engineeringat Scale

10 Production Patterns

Continuous Code Review

Security Gate

Stale Issue Resolution

Issue Triage & Clustering

Autonomous Fix PRs

Spam & Abuse Defense

Performance Regression

Meeting-Driven Agents

Ephemeral Reproduction

Functional Decomposition

Continuous Code Review

Traditional Review

Agent-Fleet Review

Security Gate

🔍 Deep Security Scanning

🛡️ Dual-Layer Approach

Stale Issue Resolution

Issue Triage & Clustering

🔗 Deduplication

📊 Cluster Detection

🚨 Priority Reports

Autonomous Fix PRs

📋 The Pipeline

🧠 Design Principles

Spam Defense & Performance

🛡️ Spam & Abuse Defense

⚡ Performance Regression Watch

Meeting-Driven & Ephemeral

🎙️ Meeting-Driven Agents

🖥️ Ephemeral Reproduction

Functional Decomposition

The Full Architecture

Adoption Roadmap

🔴 Week 1–2: Code Review Agent

🔴 Week 2–3: Security Gate

🟢 Month 1: Issue Triage + Stale Cleanup

🟣 Month 2: Auto-Fix Pipeline

🟢 Month 2–3: Performance Benchmarks

🟡 Month 3+: Advanced Patterns

"But what about cost?"

💰 Without Agent Fleet

🤖 With Agent Fleet

What You Actually Need

📝 Documented Vision

🔗 Webhook Infrastructure

🧠 Agent Runtime

📊 Baseline Metrics

💬 Alert Channel

🛡️ Human Gate

The future isn't replacing engineers.It's multiplying them.

AI Engineering
at Scale

The future isn't replacing engineers.
It's multiplying them.