May 7, 2026 · Anton Grishko
One graph, every source — what kuberly-graph sees now
A few months ago we shipped a knowledge graph for the IaC repo. It now indexes six sources — Terraform code and state, live Kubernetes, ArgoCD, CUE, and the docs. Here's what that unlocks.
TL;DR —
kuberly-graphstarted as an index of the Terragrunt repo. It now spans Terraform code and state, live Kubernetes, ArgoCD, CUE, and docs — all readable by AI agents over MCP. The same handful of tools (owners,consumers,depends_on,docs_for,path,drift) work across all six layers.
The recap
We argued earlier in Knowledge graphs are the missing piece for AI in your infra that infrastructure is a graph, not a document. We shipped kuberly-graph — an MCP-readable index of the Terragrunt repo, with blast_radius, consumers, and path queries. Agents could finally answer "what does this change touch" with a real transitive walk instead of a vector lookup.
That was one source. The graph now indexes six:
- Terraform / OpenTofu code — modules, variables, outputs, dependency declarations
- Terraform state — what's actually deployed, with the parameters it was deployed with
- Live Kubernetes — Deployments, Services, ConfigMaps, ExternalSecrets, ownership chains, namespace boundaries
- ArgoCD — Applications, ApplicationSets, sync status, target clusters, source repos
- CUE — package imports, value references, schema constraints
- Docs — design docs, runbooks, postmortems, with edges back to the resources they describe
Six sources. One graph. One MCP endpoint. Same handful of tools the agent already knew. For the storage shape (we use Memgraph and Cypher), see Teaching an Agent to Think in Graphs.
What one actually looks like
Numbers from one production customer install (anonymized — names omitted, structure is real):
Total 1,339 nodes 4,186 edges
Environments 2 (dev, prod)
Modules 60
Components 43
Applications 4 (2 dev · 2 prod)
Docs indexed 31
Nodes by source layer
─────────────────────
Live Kubernetes 864 (39 namespaces, 67 Deployments,
110 Services, 207 ConfigMaps,
279 Secrets, 160 ServiceAccounts)
Terraform state 324 (top types: kubectl_manifest 81,
helm_release 23, aws_iam_role 14)
Terraform code 89 (modules, resources, variables, outputs)
Docs 31 (26 runbooks/design docs · 5 OpenSpec)
ArgoCD rendered 21
CUE schemas 5
CI/CD workflows 5
Top edge relations
──────────────────
depends_on 3,128 (75% of all edges)
contains 350
reads_configmap 99
selects 97
uses_sa 92
provides 60
configures_module 59
mentions 59 (docs → resources)
reads_secret 59
owns 53
That's a mid-sized stack. Bigger customers run 3–5x these numbers.
The shape is the part that matters more than the size. K8s is by far the biggest source layer (864 nodes — about 65% of the graph) because live cluster state has the most resources. Terraform state is second. Docs and CUE are small but punch above their weight: a 31-node doc layer produces 59 mentions edges that link postmortems and runbooks back to the exact resources they're about. That layer is what makes docs_for(resource) work.
Why six beat one
The single-source graph already beat vector RAG for "what does this change touch" — but it was answering an intent, not a reality. The repo says vpc consumes shared-vpc.json. State says vpc is at version 1.4.2 with enable_nat_gateway = true. Live Kubernetes says ten Deployments restarted in the last hour. ArgoCD says one app is OutOfSync. A postmortem from February says "we tried this and broke webhook signing keys."
Each is a different question. Stitching them is the point. Agents stop reasoning about what should be true and start reasoning about what is true.
A walked-through query
Composite from a real one we ran yesterday. The user asked: "I want to bump the Strapi image. What's the impact?"
1. find(kind=workload, name~"strapi")
→ Deployment kuberly-web-cms (namespace: kuberly)
2. owners(Deployment kuberly-web-cms)
→ ArgoCD App kuberly-web (env: prod, sync: Synced, health: Healthy)
→ ExternalSecret kuberly-web-secrets (last refresh: 1h ago, OK)
3. parents(ArgoCD App kuberly-web)
→ ApplicationSet kuberly-web-envs (3 clusters: prod, staging, dev)
4. consumers(Service kuberly-web-cms)
→ Deployment kuberly-web-frontend (reads STRAPI_URL)
→ Ingress kuberly-web (host: cms.kuberly.com)
5. docs_for(Deployment kuberly-web-cms)
→ runbook: "Strapi rollouts must drain admin sessions first"
→ postmortem 2026-02-14: "image bump broke webhook signing keys"
6. drift(state="kuberly-web-cms", live="kuberly-web-cms")
→ no drift; image tag in state matches Deployment
The agent's answer: "A Strapi image bump propagates to three envs via the ApplicationSet. The frontend reads the URL via STRAPI_URL — expect a brief 5xx during rollout. The runbook says drain admin sessions first. The February postmortem says verify webhook signing keys are valid in the new image. ExternalSecret is healthy. No state drift."
Six tool calls. Six structured queries. Every node ID quoted so a human can click through and verify.
What's wired and how often
Source Refresh
────── ───────
Terraform code on push
Terraform state on apply
K8s live ~5s informer
ArgoCD ~30s poll
CUE on push
Docs on commit
Edges that span sources (e.g. ArgoCD App → Deployment → Terraform module that deploys it) are computed at query time, not pre-materialized. Each source is small enough that the join cost is in the low tens of milliseconds. We initially tried materializing every cross-source edge into Neo4j and it was a maintenance disaster — invalidations everywhere, drift between materialized and live, hours of debugging per release. The query-time approach is dumber and faster.
What we're not doing
- No graph DSL. The agent has a fixed set of tools —
find,owners,parents,depends_on,consumers,docs_for,path,drift,diff. We tried exposing a Cypher-style surface and the agent generated queries that were either too narrow or too broad. Well-shaped tools beat one expressive one. - No mutations. The graph is read-only from the agent's perspective. Changes happen via PRs the autopilot opens against the IaC repo. Same trust model
kuberly-graphalways had — see DevOps on autopilot. - No public hosting. The graph runs in your VPC with the same IAM scope as
kuberly-monitor. Data does not leave your environment.
What it unlocks for the autopilot
- PRs ship with impact pre-computed. When the autopilot opens a PR, the graph queries are already inlined in the body. Reviewers don't have to ask "what does this touch."
- Cross-source drift becomes a tool, not a panic. Repo says one thing, live cluster says another, state says a third — the graph spots it. We catch a handful per month that previously nobody noticed.
- Postmortems become active context.
docs_for(resource)runs before every proposed change. If a February postmortem said "we tried this and it broke X," the agent flags it in the PR. This was the surprise — postmortems used to be dead text. They're now load-bearing. In the install above, 59mentionsedges quietly connect 31 docs to the resources they describe.
What's available
If you're a Kuberly customer: this is on. Open Cursor, Claude Code, Copilot, or OpenCode in your IaC repo and the MCP server registers automatically. The agent has six-source graph access. No setup.
If you're not: the design is repeatable. Pick your sources. Write a thin extractor per source. Normalize into nodes and edges with stable IDs. Expose a small set of tools over MCP. The hard part isn't the storage — it's deciding which tools to expose. Start with owners, consumers, depends_on, path, find, and docs_for. You'll add more later, but those six cover most of the "what does this touch" surface.
The graph stops being a feature when it becomes the substrate. That's where we are now.
Further reading
- Memgraph documentation — in-memory graph database with Cypher.
- openCypher — the query language spec.
- Kubernetes API concepts — what the live K8s extractor reads.
- CUE language overview — the schema layer.
- Google SRE — Postmortem culture — why docs deserve edges.
- Knowledge graphs are the missing piece for AI in your infra — the first principles.
- Teaching an Agent to Think in Graphs — agent architecture.
Want a six-source graph indexing your stack? Talk to us.