Embedded scripting for AI agents — a cached 80k-token tool result filtered by a Kavun pipeline expression into a tiny output

May 14, 2026 · Anton Grishko

The Tiny Language Trick: Embedded Scripting for AI Agents

Tool calls return huge JSON. The wrong fix is another MCP tool. The right fix is giving the agent a sandboxed scripting language — Kavun, CEL, Starlark, Expr, Risor — and one query tool, not a hundred.

TL;DR — When a tool call returns 80k tokens of JSON, don't add another MCP tool to filter it. Embed a tiny scripting language (Kavun, Tengo, Starlark, CEL, Expr, Risor) and expose one tool that runs an expression over the cached result. The agent learns the language once; the tool catalog stops growing; the token bill stops climbing.

The problem: tool results are too big

Real-world AI agent workloads have an uncomfortable shape. The agent calls list_pods, gets back 320 pods × 40 fields each. It calls describe_deployment, gets back the full spec, status, events, and conditions. It calls loki_query, gets back 4,000 log lines. Each of those responses lands in the LLM context, where it crowds out everything else.

The naïve fix is to add another tool. list_pods_by_namespace. describe_deployment_short. loki_query_filtered. Six months in, you have 200 MCP tools and the agent still doesn't have the one you'd actually want, because the one you'd actually want is whatever filter the agent invents in the moment.

The better shape: cache the big result on the server side, give it an id, and let the agent write a small expression against it. This is the jq pattern, but embedded inside your service so the data never leaves the host.

Why Kavun is a clean fit

Kavun is a small, embeddable bytecode-VM scripting language for Go. It descends from Tengo (whose author it credits), and it bets on pipeline-style expressions:

result = pods
  .filter(p => p.status.phase == "Running")
  .map(p => { name: p.metadata.name, node: p.spec.nodeName })
  .filter(p => p.node == "ip-10-0-3-14")

Three things about that snippet matter for agent use:

It reads like data, not control flow. The agent doesn't have to invent a for loop and an accumulator. The expression is the query.
It is sandboxable. Kavun runs on a VM you control. You decide which modules are importable. The script can't exec your shell or open sockets unless you wired that up.
It is fast. Bytecode-compiled and executed in-process. No JSON marshalling round-trips. No remote calls. On the published benchmarks, Kavun sits at the top of the Go-embedded-language perf chart alongside Tengo, beating GopherLua, Starlark, and Goja on CPU geomean.

For an agent host, that combination — expression-oriented, sandboxable, fast — is exactly the trifecta you want.

The landscape: similar projects

Kavun isn't alone. The embedded-Go-scripting space is unusually rich. The right choice depends on what you're using it for.

              expression       full        sandbox     speed       host
              vs language?     stdlib?     story       (geomean)   language
────────────────────────────────────────────────────────────────────────────
Kavun         pipeline DSL     small       strong      very fast   Go
Tengo         small language   small       strong      very fast   Go
Starlark      Python-like      none*       deterministic medium    Go / Java / Rust
CEL           expressions      none        very strong fast        Go / C++ / Java
Expr          expressions      none        strong      very fast   Go
Risor         small language   batteries   strong      fast        Go
GopherLua     full Lua         full Lua    medium      medium      Go
Goja          full JS (ES5)    full JS     medium      medium      Go
Yaegi         full Go          full Go     weak        slow        Go

Starlark intentionally omits I/O.

Quick mental model of when to reach for each:

Tengo — Kavun's predecessor. Same idea, slightly less pipeline-friendly syntax, very mature.
Starlark — when you want deterministic execution. Famously used in Bazel and Buck build files. Excellent if your agent expressions need to be reproducible.
CEL (spec) — Google's Common Expression Language. One-expression-per-eval, no loops. Lives inside Kubernetes admission policies, Envoy, and grpc auth. Pick it when expressions must be auditable.
Expr — Go's lighter answer to CEL. Single expressions, very fast, great for policy and feature flag rules.
Risor — newer, batteries-included scripting language for Go. More like a small Python with a stdlib. Reach for it when you want the agent (or a power user) to write multi-line scripts with file/HTTP modules behind a feature flag.
GopherLua / Goja — full languages. Bigger attack surface, bigger stdlib. Right if your agent already speaks Lua/JS or you need ecosystem packages.
Yaegi — a Go interpreter. Wonderful for plugins, overkill (and unsafe by default) for agent expressions.

This is a wide spectrum, but the design space collapses fast once you state the actual requirements: must run inside a Go service, must be sandboxable, agents must learn it in a single example, expressions must be cheap to run a thousand times a day. That tends to leave you in Kavun / Tengo / Expr / CEL territory.

The concept: one query tool, not a hundred

Once you have an embedded scripting language, the right architecture for tool results becomes obvious:

01  Agent calls a heavy tool         → server caches the result, returns an id
02  Result is too big to inline?     → return only a summary + the cached_id
03  Agent calls inspect(id, expr)    → server runs `expr` against the cached value
04  Server returns the small result  → fits in 200 tokens instead of 20,000

The inspect tool is the only extra tool you ever need. The expression is whatever the agent decides it needs.

A worked example. Suppose the agent just ran list_pods and got back a cached_id. To get the names of all CrashLoopBackOff pods on a specific node, the agent calls inspect:

{
  "name": "inspect",
  "input": {
    "cached_id": "toolres_8f1c...",
    "language": "kavun",
    "expr": "data.items.filter(p => p.status.phase == \"Running\" && p.spec.nodeName == \"ip-10-0-3-14\" && p.status.containerStatuses.any(c => c.state.waiting != null && c.state.waiting.reason == \"CrashLoopBackOff\")).map(p => p.metadata.name)"
  }
}

The server resolves cached_id, runs the expression on the cached payload, and returns:

{ "value": ["api-gateway-7c4f...", "worker-2-pq91..."], "truncated": false }

Two names. Eighty bytes. The agent didn't make a second tool call. The cluster wasn't queried a second time. The result was filtered in the cache, in-process, in microseconds.

Do this across an agent loop and you're routinely 10× cheaper. Often more — because the alternative isn't just "large response", it's "large response, then a reflection turn, then a second tool call, then another large response". You're killing the whole spiral.

What you give up

Embedded scripting isn't free. The tradeoffs that show up in production:

Prompt overhead. The agent needs to know the language exists and how to use it. We keep a one-paragraph crib in the system prompt and rely on the model's general programming literacy. Expect a few tokens of context per turn.
Sandbox discipline. "Embeddable" is not the same as "safe by default". Whitelist the importable modules. Cap CPU and memory (Kavun's VM gives you levers for both). Treat any expression that touches I/O as suspect.
Error surface. Bad expressions still consume a turn. Return crisp errors — line number, the offending token, the type the expression expected — so the agent's next attempt is one shot, not three.
Determinism. Sort outputs. Cap result sizes. Two identical inspect calls should return identical bytes. Agents reason terribly about "sometimes I get this, sometimes I get that".

None of these is a deal-breaker. All of them are easier than fighting tool-catalog bloat.

Why this matters for agent design

If you've read our piece on thinking in graphs, this is the same shape of argument from a different angle. Smart loading of tools says the agent shouldn't carry 200 function schemas in context. Embedded scripting says it shouldn't need 200 in the first place. One inspect tool replaces a long tail of get_X_filtered_by_Y siblings.

This composes especially well with smart tool loading: if the meta-tool catalog is small and stable, you can keep inspect always-loaded and load everything else on demand. The agent's working surface gets dramatically smaller without losing reach.

The deeper pattern: agents prefer composable primitives over named recipes. A function that filters by phase, namespace, and node is a recipe. A scripting language is a primitive. You only build the primitive once, and the agent figures out the recipes as needed.

How we do this at Kuberly

Kuberly's MCP server exposes a tool result cache and a Kavun-backed inspect tool exactly like the example above. Heavy queries — listing pods across a fleet, dumping a Loki tail, walking a Terraform state tree — go through the cache. The agent gets back a small summary and a cached_id. From there it writes whatever filter or projection it wants in Kavun, and the result lands in two or three hundred tokens instead of twenty or thirty thousand.

The practical effect: an agent can investigate a Kubernetes incident across hundreds of pods, dozens of services, and tens of thousands of log lines on a single context window, because the bulk of the data never has to round-trip through the model. We picked Kavun for the readable pipeline syntax and the strong sandbox story, but the architecture would work just as well with Tengo, Expr, or CEL — the language is replaceable, the pattern is what matters.

If you're thinking through this for your own platform, our companion pieces on agent fleets and graph-backed MCP cover the other two legs of the stool.