← Back to blog
11 min read

Causal Retrieval: The Memory Problem Vector Search Can't Solve

knowledge-grapharchitectureneuroscience
causal-retrieval-ai-memory.md

Causal Retrieval: The Memory Problem Vector Search Can't Solve

Ask your AI agent: "Why did we decide to use Rust for the backend?"

A retrieval-augmented system embeds that question, searches its vector store for the nearest passages, and hands back whatever text issemantically similar to the words "decide," "Rust," and "backend." Sometimes that works — if the decision and its reasoning happen to sit in one passage that looks like the question. Often it does not, because thereason for a decision is rarely phrased like the decision itself, and the chain that led to it is scattered across many memories laid down at different times.

This is not a tuning problem. It is a structural one. Semantic similarity answers a specific question —"what looks like this?" — and it answers it well. But "why did this happen?" is a different question entirely, and no amount of better embeddings will turn a nearest-neighbor search into a causal explanation. Recent surveys of agent memory name this gap directly, listing hybrid retrievers that blend semantic similarity with temporal ordering and causal graph traversal as one of the field'slargely-unexplored frontiers. This post is about why that gap exists and what it actually takes to close it.

Two different questions

It helps to be precise about the distinction:

"What looks like this?" — asimilarity query. Given a point, find the nearest points. This is what vector search does. It is associative, content-addressable, and order-free: it does not care when things happened or what led to what.
"What caused this?" — acausal query. Given an outcome, find the chain of antecedents that produced it: the decision behind the result, the incident behind the decision, the observation behind the incident. This is awalk backward through a structure, not a lookup of nearby points.

These two questions need two different data structures. Similarity needs a metric space — a place where "near" is defined. Causality needs adirected graph — a place where "X led to Y" is an edge you can follow, in a direction. A vector store is a metric space. It has no edges and no direction. You cannot traverse a causal chain in a structure that has no chains in it.

Why flat retrieval cannot fake it

A natural objection: can't you just retrieve a big enough neighborhood and let the language model reconstruct the causality from the text? Sometimes, at small scale, with a strong model and a short history — yes. But it degrades exactly where it matters:

The links are not local. Cause and effect are often many hops and many sessions apart. "We chose Rust because the Go garbage-collector pauses hurt the edge latency we'd measured during the incident in March." The cause (a latency measurement), the context (an incident), and the decision (Rust) may be three separate memories with little surface similarity. A neighborhood around the question reaches the decision and misses the chain.
Similarity actively misleads. The passages mostsimilar to "why did we choose Rust" are often other discussions of Rust — not the reasoning that preceded the choice. Similarity pulls toward the topic and away from the cause.
It does not scale. Reconstructing causality by stuffing a large neighborhood into a model's context is expensive, slow, and gets worse as the memory grows and the relevant chain gets buried deeper in the pool.

The honest summary: flat retrieval cansometimes surface a cause when it happens to be similar and nearby. It cannotreliably reconstruct a causal chain, because it has no representation of causality to reconstruct from.

What causal retrieval actually requires

To answer "what caused this?" you need three things that a vector store does not have:

1. Typed, directed edges. Not a generic "these two memories are related," but "this memorycaused that one" — a relation with a direction. The arrow matters: "the incident caused the decision" is true; "the decision caused the incident" is false. Direction is the whole point of causality.

2. A graph you can walk backward. Given an outcome node, follow its incoming causal edges to antecedents, then their antecedents, building the chain. This is a graph traversal, and it requires the graph to exist.

3. A way to score the walk. Real histories are noisy; not every "because" is load-bearing. The walk has to be scored and bounded — favoring the strongest causal paths, decaying with distance, stopping before it floods — so it returnsthe origin, not every weakly-connected ancestor.

How shodh-memory does it

shodh-memory is built on a knowledge graph where memories and the entities in them are nodes, and the relationships between them are typed, directed edges — including causal ones. When an experience carries causal structure (an outcome, a "because," a decision following an incident), that structure is recorded as directed edges at ingest, without sending anything to a large language model.

Recall over that graph includes a causal-origin walk: given a memory or a query, the system walksbackward along causal edges to reconstruct the chain of antecedents that led to it. The walk is a scored, bounded traversal — it follows the strongest causal paths, attenuates with each hop, and returns the top origins rather than every distant ancestor. The result is not "here are some passages that look like your question." It is "here is the decision, here is the incident behind it, here is the observation behind that" — the actual chain.

Two properties fall out of doing it this way. First, it is explainable: the answer to "why did I decide X" is a path you can read, not a ranking you have to trust. Second, it is deterministic and auditable: the same query against the same memory walks the same chain every time, which is exactly what a regulated or safety-critical deployment needs when it asks "show me what the decision was based on, and prove it."

The honest scope

Causal retrieval is acapability, not a leaderboard number. It is the thing flat retrieval structurally cannot do, and the thing the literature flags as underexplored — but it is not a claim that our overall recall beats every system on every benchmark, which is a separate and ongoing effort we report honestly elsewhere. What causal retrieval gives you is a differentkind of answer — the why behind the what — that a nearest-neighbor search cannot produce at any quality. The two are complementary: similarity finds what is relevant; the causal walk explains how it came to be.

Where this matters

Agent debugging and trust. "Why did the agent do that?" answered by the chain of memories it acted on, not a post-hoc guess.
Decision provenance. Months later, reconstruct why a choice was made and what it ruled out — across many sessions.
Root-cause analysis. From a failure, walk back to the precursor conditions, the way a good engineer reconstructs an incident.
Regulated and safety-critical systems. An inspectable, reproducible causal lineage is something an opaque, stochastic retrieval pipeline cannot offer.

The takeaway

"What looks like this?" and "what caused this?" are different questions that need different machinery. The first is a similarity search; the second is a backward walk through a directed, typed graph. Most AI memory systems only have the first, which is why "why did we decide X?" so often returns a plausible-looking near-miss. Causal retrieval is the missing half — and building it does not take a bigger model, it takes a graph with causality in it and a walk that knows which way the arrows point.

Related reading: [Knowledge Graphs and Spreading Activation](/blog/knowledge-graph-spreading-activation) · [RAG Is Not Memory](/blog/rag-is-not-memory) · [Why Not Just Vector Search?](/blog/why-not-just-vector-search) · [LLM-Free Memory](/llm-free-memory).

$ subscribe

Get updates on releases, features, and AI memory research.