Knowledge graphs for infra — flat vector list vs structured graph with relationships preserved.

March 15, 2026 · Anton Grishko

Knowledge graphs are the missing piece for AI in your infra

Vector search is great for unstructured docs. Infrastructure isn't unstructured. Here's why a graph beats embeddings for IaC.

TL;DR — Vector search and RAG are great for unstructured docs. Infrastructure isn't unstructured — it's a graph. Modules depend on modules, services on services. We model the IaC repo as nodes and edges so agents can do real transitive walks instead of similarity guesses.

The bet most infra-AI products are making is wrong

The dominant pattern: dump your repo, runbooks, and Confluence into a vector store. When the user asks a question, retrieve the top-k similar chunks and stuff them into the LLM's context. Call it RAG.

This works well for unstructured docs. It fails for infrastructure.

The reason: infrastructure is a graph, not a document. Modules depend on modules. Services depend on services. Network policies declare allowed traffic. IAM roles grant access to specific resources. Helm charts compose values from other charts. When you ask "what breaks if I change shared-vpc.json," the answer is a transitive dependency walk, not a similarity search.

What a graph buys you

Three operations vector search can't do:

1. Transitive dependency. "What modules depend on vpc?" — that's a graph traversal of one or more hops. Vector search returns "modules that mention vpc," which is a different question.

2. Path queries. "How is payments-api connected to cnpg-cluster?" — the answer is a sequence of edges (Service → ServiceMonitor → Prometheus → Alert → Runbook → Database). A graph returns the path. Vector search returns documents containing both names, which usually misses intermediate hops.

3. Symmetry/asymmetry checks. "Is prod-vpc-cidr configured the same way as dev-vpc-cidr?" — a graph compares structured fields. Vector search compares text similarity, which conflates "the same value" with "wording is similar."

The kuberly-graph schema

We model an IaC repo as nodes and edges:

Nodes:
  module(name, source, version)
  resource(type, name, module)
  output(name, value_type, module)
  variable(name, type, module)
  config_file(path, env)
  workload(name, namespace, kind)

Edges:
  depends_on (module → module, from terragrunt dependency blocks)
  produces (module → output)
  consumes (module → output, from input.var = dependency.X.outputs.Y)
  configured_by (workload → config_file)
  declared_in (resource → module)

The graph builds from:

Parsing terragrunt.hcl files for dependency blocks → depends_on
Parsing the inputs block for dependency.X.outputs.Y references → consumes
Parsing Terraform plan JSON for resource and output declarations
Parsing rendered Helm output for workload and label-based associations

Refresh on every IaC commit. Cached for ~1 minute on lookup. For the storage and query shape (we use Memgraph and Cypher), see Teaching an Agent to Think in Graphs.

What agents do with it

> show blast_radius for shared-infra.json

  shared-infra.json
  ↳ vpc (consumes)
    ↳ eks (consumes vpc.outputs)
      ↳ argo-rollouts (deployed to eks)
      ↳ external-secrets (deployed to eks)
      ↳ kuberly-web (deployed to eks)
    ↳ rds (consumes vpc.outputs)
      ↳ cnpg-pooler (uses rds endpoint)
        ↳ payments-api (configured_by cnpg-pooler)

  Total downstream resources: 47
  Direct downstream modules: 5
  Risk: HIGH — VPC change requires VPC reschedule of all EKS nodes

That's a query over the graph. The agent didn't just describe the blast radius — it returned the actual list. The user can act on it. For the broader topology that now spans Terraform code, state, live Kubernetes, ArgoCD, CUE, and docs, see One graph, every source.

When vector search is the right tool

Unstructured docs: postmortems, runbooks, design docs
Searching for "the time we had this same issue 6 months ago"
Onboarding lookup: "where is the deploy guide"

For those, embeddings + RAG are great. We use them.

But for "what does this infra change touch" and "how is X reachable from Y," nothing beats a real graph. That's the difference between an AI that talks about your infra and an AI that reasons about your specific infra.