May 14, 2026 · Anton Grishko

Teaching an Agent to Think in Graphs

Why the next leap for AI platform agents isn't more tools — it's a graph to walk, a notebook to scribble in, and heuristics that pick which tool to load next.

TL;DR — Stop adding tools to your AI agent. Add a graph, short memory, and a way to pick the right tool at the right time. We use Memgraph for the platform graph, Postgres for short memory, and a small ranking heuristic to load only the MCP tools the agent needs right now.

The tool-bloat trap

The instinct, when an AI agent doesn't know how to do something, is to give it another tool. Another MCP server. Another function. Another endpoint. Six months in, your agent has 200 tools, a 40k system prompt, and worse answers than it had at 20.

Context windows are finite. Attention is more finite. Every tool you bolt on dilutes the rest. Past a threshold, you're not making the agent more capable — you're making it more distractible.

The interesting question isn't how many tools can my agent hold. It's how does my agent decide which tools it needs right now. That question pulls you, fairly quickly, into graphs.

A graph instead of a flat catalog

Most agent tool catalogs are flat. A list. Maybe tagged. Maybe vector-indexed. You ask "which tools relate to deployments?" and you get a similarity score over names and descriptions.

That works until your platform has structure — and platforms always have structure. A Kubernetes module belongs to a cluster. A cluster belongs to an organization. A skill modifies a module. A runbook references a skill. A capability is exposed by a tool. These aren't tags. They're edges.

A graph database makes those edges first-class. In Memgraph, using the Cypher query language, the platform looks like this:

(:Org {id: 'acme'})-[:OWNS]->(:Cluster {name: 'prod-eu'})
(:Cluster)-[:HAS_MODULE]->(:IaCModule {name: 'eks-cluster'})
(:Skill {name: 'rotate-iam-key'})-[:TOUCHES]->(:IaCModule)
(:Tool {name: 'k8s_logs'})-[:EXPOSES]->(:Capability {name: 'read_logs'})
(:Capability)-[:RELEVANT_TO]->(:Skill)

Now "which tools relate to deployments" is no longer a search problem. It's a traversal:

MATCH (c:Cluster {name: $cluster})-[:HAS_MODULE]->(m:IaCModule)
      <-[:TOUCHES]-(s:Skill)<-[:RELEVANT_TO]-(cap:Capability)
      <-[:EXPOSES]-(t:Tool)
WHERE m.kind IN ['deployment','workload']
RETURN DISTINCT t.name, count(*) AS proximity
ORDER BY proximity DESC
LIMIT 10

The agent isn't asking "what's near the word deploy". It's asking "what's near me, in the graph, right now". That's a different and much sharper question.

Heuristics over pure search

Vector search is a hammer. It will find you something. It will not always find you the right thing, because cosine similarity has no opinion about where you're standing.

A better answer is a hybrid: vector recall to widen the funnel, then a graph-based boost to rerank. We call ours graph_boost, and the shape is unromantic:

final_score = semantic_similarity * w_sem
            + graph_proximity   * w_graph
            + recency_decay     * w_recent
            + usage_prior       * w_usage

semantic_similarity — embedding distance between the user's intent and the candidate's description.
graph_proximity — inverse hop count from the current task node to the candidate. Two hops beats four hops.
recency_decay — was this tool used successfully in the last N minutes on a similar node? Slight nudge up.
usage_prior — how often does this tool actually help here historically? Heavy nudge.

Nothing exotic. The point is that semantic similarity is one signal of four, not the whole answer. When the agent is standing on a cluster node and asks for help, a generic "deployment" tool that's seven hops away loses to a cluster-scoped tool that's one hop away — even if their descriptions read identically.

This is what we mean by heuristics: cheap, explainable scoring functions that exploit the structure you already have. They are not ML. They do not require training. They are a dozen lines of Cypher and a weighted sum, and they outperform pure embedding search by a wide margin on any platform that has real structure.

Short memory, persisted

The second leg of the stool is memory. Not the long-term, embedding-based "remember everything forever" kind — the short, scoped, task-bounded kind.

The agent learns things mid-session: the user clarified they only care about staging; a previous tool call returned an error worth not repeating; the cluster they're investigating is in eu-west-1. None of that belongs in the system prompt forever. All of it needs to survive until the task is done.

We keep that in Postgres, scoped per org and per cluster, exposed as MCP tools the agent itself can call:

{
  "name": "memory_write",
  "input": {
    "scope": "cluster:prod-eu",
    "kind": "observation",
    "key": "recent_error.iam_rotation",
    "value": "AccessDeniedException when rotating user `ci-runner`; suspect SCP",
    "ttl_seconds": 3600
  }
}

{
  "name": "memory_read",
  "input": {
    "scope": "cluster:prod-eu",
    "kind": "observation",
    "since_minutes": 60
  }
}

The table is boring — (scope, kind, key, value, expires_at) — and that's the point. The system prompt stays small. The agent reads what's relevant at the start of each turn and writes back what mattered. A 1-hour TTL on "observation" keeps the working set tight. A longer TTL on "decision" or "preference" lets the agent remember something the user explicitly told it.

This is the notebook. It pairs with the map (the graph). The agent uses one to figure out where it is and the other to remember what it just learned.

Smart tool loading

The third leg is dynamic tool loading. Instead of declaring 200 tools at session start, the agent declares a handful of meta-tools — an atlas — and asks the atlas which capabilities it needs right now.

The loop looks like this:

01  Agent receives task
02  Agent calls atlas.locate(task) → returns a graph anchor (a node in Memgraph)
03  Agent calls atlas.recommend(anchor, k=8) → returns top-K tools by graph_boost
04  Agent loads those tool schemas into context, executes
05  On dead-end, agent calls atlas.expand(anchor, hops=2) → widens the candidate set
06  Successful tool + outcome → memory_write(usage_prior)

The atlas itself is implemented as a small MCP server with three tools: locate, recommend, expand. The catalog of real tools stays in a registry; only the active subset is materialized into the LLM context per turn.

This pattern — tools as apps on a phone, not buttons on a cockpit — is the most underrated agent architecture decision I've seen this year. It collapses token cost, it improves latency, and — the part most people miss — it improves answer quality, because the model isn't being asked to ignore 190 irrelevant function signatures while reasoning about the 10 that matter.

Why graphs, why now

Graph databases aren't new. Knowledge graphs aren't new. What's new is the workload: agents loop. Agents traverse. Agents make dozens of small decisions about what to look at next, and each of those decisions benefits from cheap, structured proximity queries.

The combination that's been working for us:

Memgraph for the platform graph — sub-millisecond Cypher, in-memory, easy to embed alongside an MCP server.
Postgres for short memory — boring, transactional, indexed on (scope, kind, expires_at).
Heuristics, not models, for ranking — a weighted sum is auditable; a fine-tune is not.
Dynamic tool loading via an atlas — keep the prompt small, keep the choice space close to where the agent is standing.

None of these pieces is exotic on its own. The leverage comes from connecting them: the graph gives the agent a place to be, the memory gives it a place to remember, the heuristics give it a way to choose, and dynamic loading keeps the cockpit clean enough to actually fly.

How we do this at Kuberly

The agent that runs against Kuberly platforms uses exactly this shape. Our Memgraph instance holds the IaC graph for every managed cluster — modules, dependencies, blast radius, owning team. The MCP atlas server exposes recommend_skills and a handful of context bundlers; the rest of the tool surface is loaded on demand. Short memory lives in our Postgres, scoped per org and per cluster, and the agent reads it at the start of every turn before it picks a tool.

The practical effect: a platform agent that handles tens of customers across hundreds of clusters from a single context window, without the system prompt growing past a few thousand tokens. Most of the agent's competence isn't in the prompt at all — it's in the graph it walks and the notebook it keeps.

If you want to see this idea applied end-to-end, read our companion piece — DevOps on autopilot — which walks through what happens when this agent meets a real PR on a real cluster.