Field notes
Notes from the platform.
Engineering, infrastructure, and operating-the-thing-in-prod posts from the Kuberly team.
May 15, 2026 · Anton Grishko
AI Dark Factory: A 15-Persona Multi-Agent System on the Model Context Protocol
A 15-persona multi-agent system on the Model Context Protocol (MCP): six tiers, hierarchical with parallel fan-out, Tier-0 router that makes the bypass observable. With benchmarks from the 2025–2026 literature and concrete code from the Kuberly platform.
May 14, 2026 · Anton Grishko
Teaching an Agent to Think in Graphs
Why the next leap for AI platform agents isn't more tools — it's a graph to walk, a notebook to scribble in, and heuristics that pick which tool to load next.
May 14, 2026 · Anton Grishko
The Tiny Language Trick: Embedded Scripting for AI Agents
Tool calls return huge JSON. The wrong fix is another MCP tool. The right fix is giving the agent a sandboxed scripting language — Kavun, CEL, Starlark, Expr, Risor — and one query tool, not a hundred.
May 7, 2026 · Anton Grishko
OpenCode, Claude Code, Cursor — picking the right one for your IaC repo
All three work with the autopilot. That's a bad answer. Here's how we actually pick when a customer asks.
May 7, 2026 · Anton Grishko
When one agent isn't enough — fanning out work across an agent fleet
Most AI-for-DevOps demos show one agent. Real DevOps work is embarrassingly parallel — and the fleet shape is its own design problem.
May 7, 2026 · Anton Grishko
One graph, every source — what kuberly-graph sees now
A few months ago we shipped a knowledge graph for the IaC repo. It now indexes six sources — Terraform code and state, live Kubernetes, ArgoCD, CUE, and the docs. Here's what that unlocks.
May 7, 2026 · Anton Grishko
Cmux is what tmux would be if it were designed for AI agents
tmux is one of the great pieces of software. For running multiple AI agents in parallel on a laptop, Cmux is what we moved to, and why.
May 7, 2026 · Anton Grishko
Cheap agents — four moves that keep our token bill from eating us
Default agentic-DevOps setups burn tokens like firewood. Four moves — graph-backed MCP, the caveman skill loaded first, smart per-session tool loading, and the orchestrator pattern — keep our bill roughly 10x lower than naive.
May 3, 2026 · Anton Grishko
Monitoring MCP servers — what HolmesGPT got right, what's still missing
MCP turns Loki, Prometheus, and Grafana into first-class tools for AI agents. HolmesGPT was first; we built our own. Here's what worked, what didn't, and the design we settled on.
May 2, 2026 · Anton Grishko
Per-NodePool cost in Karpenter — a Grafana panel that pays for itself
Karpenter ships excellent node metrics but no per-NodePool cost. A Prometheus recording rule plus a Grafana panel cover most FinOps questions without a separate operator.
May 1, 2026 · Anton Grishko
Terragrunt 1.0 — what changed and why we bet on it
Terragrunt 1.0 stabilized the CLI surface, swapped command names, and added Stacks. Here's what we use, what we ignore, and what it means for IaC repos that already exist.
Apr 30, 2026 · Anton Grishko
ArgoCD or Flux — and why we picked ArgoCD
Both projects do GitOps well. After running both in production for years, here's the honest breakdown of where each one lands.
Apr 28, 2026 · Anton Grishko
DevOps on autopilot
How a single repo and a Dockerfile become production AWS in hours — without anyone writing Terraform by hand.
Apr 22, 2026 · Anton Grishko
Karpenter is the most underrated EKS upgrade in years
If you're still running Cluster Autoscaler on EKS in 2026, you're paying 30–40% more for compute and getting slower scale-up. Here's the case for Karpenter.
Apr 15, 2026 · Anton Grishko
Production AWS in hours, not weeks
Most teams budget 3–6 weeks for the EKS-to-production journey. Here's how a typical Kuberly customer ships in 2–3 hours.
Apr 10, 2026 · Anton Grishko
Why we ship Terragrunt, not raw Terraform
Terraform without Terragrunt at scale is copy-paste with extra steps. Here's what Terragrunt adds and where it bites.
Apr 2, 2026 · Anton Grishko
You own the IaC. You own the infra.
On the difference between a managed PaaS and a managed service. Why the eject path matters more than any feature.
Mar 25, 2026 · Anton Grishko
MCP for DevOps: pulling your live cluster into Claude
Model Context Protocol turns 'AI knows about my infra' into 'AI queries my infra.' Here's how we use it.
Mar 15, 2026 · Anton Grishko
Knowledge graphs are the missing piece for AI in your infra
Vector search is great for unstructured docs. Infrastructure isn't unstructured. Here's why a graph beats embeddings for IaC.
Mar 5, 2026 · Anton Grishko
DevOps with AI is just DevOps with leverage
Three years in, the question stopped being 'will AI replace DevOps' and became 'where does AI give the most leverage.' Here's our list.