MCP for DevOps — Claude querying Loki, Prometheus, k8s, Terraform via cluster and IaC MCP servers.

March 25, 2026 · Anton Grishko

MCP for DevOps: pulling your live cluster into Claude

Model Context Protocol turns 'AI knows about my infra' into 'AI queries my infra.' Here's how we use it.

TL;DR — The Model Context Protocol lets AI agents call structured tools instead of reading screenshots. We ship two MCP servers with every Kuberly cluster: one for the IaC graph, one for the monitoring stack (Loki, Prometheus, Tempo, pod events). Every query is fresh, every answer is grounded.

What MCP changed for us

A year ago, helping a customer debug production meant: you tail their logs, you read their dashboards, you tell them what to type. Now you ask the AI tool open in their IDE, "why is checkout-api returning 5xx," and it queries Loki directly via MCP. The data comes back, the AI grounds its answer in actual logs, and it tells you (and the user) what's happening.

The shift is small but real: AI for DevOps stops being a chatbot that knows generic patterns and becomes a colleague that has access to your stack. The wider design philosophy is in Teaching an Agent to Think in Graphs.

What MCP actually is

A protocol for exposing tools and resources to LLMs. An MCP server registers a list of tools (with JSON schemas), the host (Claude, Cursor, Claude Code, etc.) calls them, the server returns structured data. That's it. The protocol itself is small.

The interesting part is what you expose.

The two MCP servers we ship

kuberly-graph — exposes the IaC repo as a knowledge graph. Tools include:

blast_radius(resource: str)
  → "what breaks if I change this?"

drift(env_a: str, env_b: str)
  → "what's different between prod and dev?"

shortest_path(from: str, to: str)
  → "how is loki connected to vpc?"

The data is your repo. The graph is built from Terragrunt's dependency declarations and Kubernetes labels. The agent reads the graph and reasons about your specific stack, not a generic one. For why graphs beat embedding search on infra, see Knowledge graphs are the missing piece for AI in your infra.

kuberly-monitor — scoped to your cluster's monitoring stack. Tools include:

loki(query: LogQL, range: str)
prometheus(query: PromQL, step: str)
tempo(trace_id: str)
events(namespace: str, since: str)

The IAM-scoped client lives inside your VPC. Outbound traffic from the LLM goes to your monitoring stack, not the other way around. No data leaves your environment unless the AI quotes it back to the user.

How developers use it

Common ask: "the api is slow, why?"

What happens:

Agent calls prometheus(p99_latency by (route)) → finds /v1/checkout is hot
Agent calls loki({app="api"} |= "/v1/checkout" |= "error") → finds connection-pool errors
Agent calls blast_radius("api") → finds api → cnpg-pooler → cnpg-cluster
Agent calls prometheus(pg_active_connections{cluster="cnpg"}) → pool maxed out
Agent answers: "Checkout p99 is up because the connection pool to CNPG is exhausted. Pool max is 20, current usage 19. Bump pool size or increase replica count."

That's a four-tool sequence. Each tool returns structured data. The agent's job is composition, not generation.

Why this beats RAG

We tried RAG over a dump of dashboards + alerts + runbooks for ~18 months. It works for "how do I deploy to staging" type questions because those are static. It fails for "what's wrong right now" because the data is stale by the time it's indexed.

MCP is the opposite — every query is fresh. The AI doesn't memorize the system, it queries it. Same way a human engineer would.

What's still hard

Permissions. The MCP server can read everything its IAM role allows. Audit what it touches. We default to read-only.
Quoting back full logs. Some logs contain PII. Add a redaction filter before MCP returns the data.
Cost. A loop of "let me check Loki one more time" can run up Loki query costs. Set rate limits.

But the model — AI that actually queries your infra rather than describing infra in general — is here. Every Kuberly cluster ships both servers wired and ready. For how this looks when applied to an alert and an OOMKill, read Monitoring MCP servers — HolmesGPT and the design we shipped.