Agent fleet fan-out — orchestrator at center, worker agents radiating out, MCP coordinator hub.

May 7, 2026 · Anton Grishko

When one agent isn't enough — fanning out work across an agent fleet

Most AI-for-DevOps demos show one agent. Real DevOps work is embarrassingly parallel — and the fleet shape is its own design problem.

TL;DR — Single-agent demos don't scale to real DevOps. Embarrassingly parallel work (cross-env Terragrunt rewrites, mass PR opens, ExternalSecret audits) wants an orchestrator + N short-lived workers coordinated via a small MCP server. We run this in production. Here's the shape, a worked example, and the three failure classes we hit.

The single-agent ceiling

Most "AI for DevOps" demos show one agent. It reads a PR, writes a comment. It debugs an alert, posts a diagnosis. One question, one agent, one answer.

Real DevOps work isn't shaped like that. It's shaped like: "rename this Terragrunt input across 23 environments and re-plan every one of them." Or: "audit every ExternalSecret in seven clusters and tell me which ones reference a key that no longer exists in AWS Secrets Manager." Or: "open 12 PRs, one per service, bumping the Karpenter NodePool selector."

These are embarrassingly parallel. A single agent does them serially. A fleet does them in seconds.

The fleet shape

                    ┌────────────────┐
                    │  orchestrator  │   ← the planner, runs once
                    └────────┬───────┘
              ┌────────┬─────┴────┬────────┐
              ▼        ▼          ▼        ▼
          ┌──────┐ ┌──────┐   ┌──────┐ ┌──────┐
          │ wkr1 │ │ wkr2 │   │ wkr3 │ │ wkr4 │   ← N Claude Code processes,
          └───┬──┘ └───┬──┘   └───┬──┘ └───┬──┘     each in a Cmux tab
              │        │          │        │
              └────────┴──────────┴────────┘
                            │
                            ▼
                ┌────────────────────────┐
                │   custom MCP server    │   ← coordinates state
                └────────────────────────┘

The orchestrator is one agent that decomposes the task. It writes a list of sub-tasks to a small MCP server. Each worker is a Claude Code (or OpenCode, or whatever) process running in its own Cmux tab or tmux pane. Workers poll the MCP server for the next task, do it, post the result. For our take on the multiplexer side, see Cmux vs tmux for AI agent fleets.

This is not a research idea. It's how we run at-scale operations on customer repos. We've used it for cross-env Terragrunt rewrites, audits, mass-PR opens, and "make this change in N places" jobs.

The MCP server is the only new piece

The fan-out coordinator is small. Five tools:

claim_task() → {task_id, payload, parent_id}
report_progress(task_id, status, log_chunk)
report_done(task_id, result, artifact_paths)
report_blocked(task_id, reason)
list_siblings(parent_id) → [{task_id, status, summary}]

list_siblings is the load-bearing tool. When a worker reports done, it can ask what its siblings are doing. If three siblings already ran into the same issue, the worker reports blocked instead of repeating. Fan-out without coordination is just N times the cost.

The orchestrator separately owns:

plan(task_description) → [{sub_task_id, payload}, ...]
review(parent_id) → {summary, artifacts, recommendations}

The plan tool is where most of the work happens. We give the orchestrator access to kuberly-graph (the MCP-readable infra graph we wrote about in One graph, every source) and a system prompt that says "decompose the task into independent sub-tasks; if you can't, say so." About 80% of the requests we throw at it produce a clean fan-out plan. The other 20% it says "this isn't fan-out-shaped" and runs serially. Honesty is a feature here.

A worked-out run

The ask: "Bump the cnpg operator from 1.22 to 1.24 across all clusters, but only in the staging-or-lower envs first. Open one PR per cluster."

The orchestrator's plan:

parent_id = job-2026-05-07-cnpg-bump
sub-tasks (8):
  - cluster=dev-us-east-1
  - cluster=dev-eu-west-1
  - cluster=staging-us-east-1
  - cluster=staging-eu-west-1
  - cluster=staging-ap-south-1
  - cluster=qa-us-east-1
  - cluster=qa-eu-west-1
  - cluster=qa-ap-south-1

Four workers spin up in Cmux. Each claims a task. Per worker, the loop is:

Check out the IaC repo into a git worktree
Edit the cnpg values for the assigned cluster
Run terragrunt run-all plan --terragrunt-include-dir live/<cluster>/cnpg
Open a PR with the plan output in the description
report_done with the PR URL

Eight PRs open in roughly 90 seconds. The orchestrator runs review(parent_id) and writes a summary comment on each PR linking to the others — so reviewers know "this is one of eight; the others all show the same plan output."

If one cluster's plan diverges, the orchestrator flags it. (One did, in this run — staging-eu-west-1 had a stale CRD that needed an extra step. The orchestrator caught it because plans diverged.)

What broke when we scaled it

Three classes of failure, in order of how often we hit them:

Lock contention on the IaC repo. Eight workers cloning, branching, and pushing simultaneously will fight each other on the same remote. We solved it by giving each worker its own git worktree and a unique branch namespace.
Rate limits. Eight Claude Code processes in parallel will hit your Anthropic API rate limit faster than you'd expect. We added exponential backoff at the worker level and a global token-bucket in the MCP server. claim_task returns immediately if the bucket is empty.
Coordination drift. Two workers picking the same task is a race we didn't initially handle. The fix is on the MCP server: claim_task is a transactional read-and-update. Boring, but required.

Why this beats one big agent with one big context

We tried "one agent, the whole task in one prompt, all 8 clusters in context" first. It works for small fan-outs (2–3 sub-tasks). It falls apart at scale because:

Context grows quadratically. Each sub-task wants to see the others' state. The prompt gets enormous.
Failure of one sub-task pollutes the others. The agent gets confused mid-run.
You can't cancel and restart one branch of the work without losing all of it.

The fleet shape — one orchestrator, N short-lived workers, an MCP server as the only shared state — sidesteps all three. Each worker has a clean context. Failures isolate. You can kill and restart any worker. The token-cost reasoning is in Cheap agents — four token-saving moves.

What you can run today

The pattern is what's portable, not our specific code. You need three things:

A planner agent (we use Claude Sonnet for the orchestrator; Opus for sub-tasks that need deep reasoning)
A worker pool (Cmux is what we run; tmux is fine; Kubernetes Jobs for fully headless setups)
A small MCP server with five tools — claim, progress, done, blocked, siblings

The hard part isn't the agent loop; it's deciding which tasks are fan-out-shaped and being honest about which aren't. We've watched teams spend weeks trying to fan-out tasks that have linear dependencies. Don't.

One agent is fine for thinking. A fleet is what you need for doing.