DevOps with AI leverage — one engineer plus an AI fleet shipping many customers.

March 5, 2026 · Anton Grishko

DevOps with AI is just DevOps with leverage

Three years in, the question stopped being 'will AI replace DevOps' and became 'where does AI give the most leverage.' Here's our list.

TL;DR — Three years in, AI is not replacing DevOps — it's giving each engineer more leverage. Tier 1 wins (already paid back): PR risk review, incident triage via MCP, dashboard self-service. Tier 3 (not there yet): greenfield architecture, chaos engineering, compliance attestation. The pattern: live data + structured tools + human judgment.

The replacement frame is dead

Nobody who actually runs a platform team thinks AI is replacing DevOps in 2026. The interesting question is narrower: where does AI give the most leverage per engineer-hour?

Our list, ranked by ROI we've measured:

Tier 1 — already paid back its cost

1. Plan review on PRs. Every IaC PR gets an AI-generated risk summary commented on it. The agent reads the terraform plan output, classifies the change (additive / mutative / destructive), and flags risk patterns (dropping a non-empty IAM role, deleting a non-empty S3 bucket, narrowing a security group, etc.). The human reviewer then reads the AI summary and the raw plan. We catch 2–3 risky changes per week we wouldn't have caught otherwise.

2. Incident triage. First-line incident response is "what changed and what's broken?" The agent can answer both: query the audit log for recent changes, query Prometheus for the on-fire metric, query Loki for related errors. It's not solving the incident — but it's getting the engineer to the right hypothesis 5 minutes faster, and at 3am that matters. The architecture is in MCP for DevOps.

3. Self-service Q&A in the dashboard. "Why is my checkout pod restarting?" — the agent queries pod events, recent commits to the workload, and resource limits. Most of the time the answer is "OOMKilled, you set the memory request too low." That conversation never reaches your DevOps engineer's queue.

Tier 2 — works but needs supervision

4. Generating new IaC modules. AI is great at writing the boilerplate of a new Terraform module — variables, outputs, README. It's bad at the security-critical defaults (force_destroy, deletion protection, public ingress). We let it draft, we review carefully. For the layout we ship, see Why we ship Terragrunt, not raw Terraform.

5. Cost analysis. "Why did our AWS bill go up 30% this month" with AI digesting Cost Explorer data and tagging-by-resource. Useful for the first cut. Final attribution still needs a human checking that the agent didn't miss a transitive cost (NAT Gateway data transfer is the classic).

6. Runbook generation from incidents. Take the postmortem, summarize the steps, generate a checklist for next time. Useful, but the resulting runbook still needs a human pass — AI tends to over-include "obvious" steps.

Tier 3 — not there yet

7. Greenfield architecture decisions. "Should we use Aurora or Postgres-on-CNPG?" requires reading your roadmap, your team's expertise, and your cost ceiling. AI doesn't have that context, even with RAG. We don't ask it.

8. Chaos engineering / fault injection. AI can describe LitmusChaos / Chaos Mesh experiments but doesn't understand which ones are safe in your specific cluster topology. Manual still.

9. Compliance attestation. AI can map controls to evidence but the auditor wants a human sign-off. Same workflow as before, AI just speeds up the document gathering for SOC 2 and PCI DSS.

The pattern that works

Three things are true at once:

Give the AI read access to live data (logs, metrics, IaC). Without that, it's a chatbot.
Give the AI structured tools, not free-text (MCP, function calling, JSON schemas). Free-text degrades over context length.
Keep humans on the judgment calls (apply, security review, customer comms). The AI surfaces options; the human picks.

That's the leverage shape. One Kuberly DevOps engineer comfortably handles tens of customers across hundreds of clusters because the AI handles the toil. The judgment calls — which we've decided not to automate — still take the same 15 minutes they always did. The toil is what scaled out. The fleet pattern that makes it work is in When one agent isn't enough.

If you're a single founder running on AWS and you're hesitating to hire a DevOps because the cost-benefit doesn't pencil at your stage: it doesn't have to. The autopilot does the toil, the engineer (yours or ours) does the judgment.