AI Memory Is a Database Problem

Jun 01, 2026

∙ Paid

There’s a pattern I keep running into with AI systems. We keep treating data storage and retrieval like it’s a new problem. We build context windows, vector stores, graphs, and markdown wikis, and then we try to layer Artificial Intelligence on top of them. The result is predictable. The system slows down, answers become inconsistent, and the reliability never quite reaches the bar we expect from production software.

So instead of continuing to debate architecture in the abstract, I ran an experiment. I wanted to isolate one variable: how context is stored and retrieved. Not how it is modeled or how it is interpreted, but how the system actually gets the data it needs to answer a question.

“One of the miseries of life is that everybody names things a little bit wrong. And so it makes everything a little harder to understand in the world than it would be if it were named differently. A computer does not primarily compute in the sense of doing arithmetic. […] They primarily are filing systems.”
— Richard Feynman, Idiosyncratic Thinking seminar (1985)

I think that framing is closer to what we are actually building with AI systems than most modern discussions about “memory” or “reasoning.” The dominant problem in practical AI systems is actually retrieval rather than intelligence. In chats we want it to look up something, or in agents we want it to accomplish a task based on provided resources. Models only know what they have been trained on and Agents can only pull in what the system gives them access to. The quality, structure, and accessibility of that context determines whether the system feels fast, reliable, and useful; or slow, inconsistent, and impossible to trust.

This isn’t a model problem, it’s a retrieval problem.

The instinct in for most AI users is to blame the model. If the answer is wrong, we assume the model misunderstood the task. In practice, the model is almost always doing exactly what we told it to do with the data it was given. The failure is actually upstream.

If a we provide too much context, latency increases and signal gets diluted. If we provide the wrong context, the model hallucinates. If the context is inconsistent across sources, the model produces answers that appear coherent but are fundamentally incorrect. These are not model failures, they are retrieval failures.

Once you accept that, the problem shifts. The system you are designing is not primarily an AI system. It is a data retrieval system that happens to use an AI model at the end.

Not all memory systems are solving the same problem.

I evaluated three approaches to storing and retrieving context from a codebase. These systems were published recently to help with the Context Graph Trillion-dollar Problem1

One stored context as wiki-style markdown, one materialized a full graph of the codebase, and one used a database-style indexing approach with incremental updates. Each system was given the same corpus, the same queries, and the same interface.

The important constraint is that although the representation of the data was allowed to differ, but the evaluation focused on how each system retrieved information at query time. This is where I find “AI memory” becomes the most confused. We conflate how data is structured with how it is accessed, and those are not the same problem.

Query: "Where is authentication handled?"

Markdown (LLMWiki):
- search across documents using grep/index

Graph (Graphify):
- load entire graph
- construct traversal structure
- traverse nodes

Wiki (MemPalace):
- lookup indexed entities
- follow relationships
- return scoped results

Each system answers the same question, but the path it takes is fundamentally different. That path is what determines performance.

Loading everything is the actual bottleneck.

The initial assumption was that graph-based systems would perform well. Graph traversal is computationally efficient when the graph is already in memory. The problem is that real systems do not operate under that assumption. The graph must be loaded, parsed, and constructed before any traversal occurs.

That cost dominates the query lifecycle. It is not an implementation detail. It is the system.

Graphify query path:

Load 3.7 GB graph from disk
Construct in-memory graph
Execute traversal

In contrast, a system that relies on indexed retrieval avoids this entirely.

MemPalace query path:

Query indexed store
Retrieve matching rows
Return scoped context

The difference between these systems is not theoretical. It is the difference between global state and selective access. One system assumes it needs everything to answer a question. The other assumes it only needs a subset.

Database behavior beats representation.

Once evaluated under system-level conditions, the results were not ambiguous. The database-style system outperformed both markdown and graph-based approaches by orders of magnitude in both latency and storage footprint.

Latency (system-level):

MemPalace: ~34 ms
LLMWiki: ~2.2 s
Graphify: ~35 s

Storage:

MemPalace: 8.8 MB
LLMWiki: 33.9 MB
Graphify: 3.7 GB

This is not an incremental improvement. It is a structural difference in how the system behaves. The database-style system is faster not because it is simpler or more optimized, but because it avoids unnecessary work. It does not attempt to load or reason over the entire dataset.

The winning systems all follow database principles.

The system that performed best was not the one with the richest representation, it was the one that enforced constraints on how data could be accessed. It indexed entities, narrowed the search space before retrieval, and avoided full scans of the dataset.

These are not novel ideas. They are foundational database principles that have been well understood for decades. What is notable is how quickly AI systems expose the consequences of ignoring them.

Example lookup pattern:

SELECT entity_id

FROM entities

WHERE name LIKE ‘%auth%’;

SELECT *

FROM relationships

WHERE subject_id IN (...)

AND predicate IN (’implements’, ‘depends_on’);

The system does not attempt to infer everything at query time. It retrieves a bounded, relevant subset of data and passes that to the model. This constraint is what enables both performance and reliability.

Context stores are an engineering primitive, not an AI feature.

This work was motivated by a practical need. I was watching an entire code directory across multiple repositories, capturing changes as they occurred. From those changes, I extracted patterns, tracked dependencies, and stored reusable context.

The goal was not to build an abstract memory system. It was to answer questions that come up in real engineering workflows, such as identifying prior implementations or understanding how services interact.

Captured context:

auth_middleware implements user_authorization

source: /internal/auth/middleware.go

payment_service depends_on auth_service

source: /services/payment/main.go

This is structured data derived from the system itself. The AI model is not responsible for generating this context. It is responsible for using it. That distinction is important.

If retrieval is slow, everything built on top of it breaks.

It is tempting to optimize prompts, tune models, or improve agent logic. None of those changes address the core issue if the retrieval layer is inefficient. Slow retrieval increases latency across the system. Inconsistent retrieval leads to unreliable outputs. Incorrect retrieval results in hallucinations.

The system fails because it lacks discipline in how it accesses data. The model becomes the visible surface of a deeper architectural problem.

We are rediscovering databases in real time.

Continue reading this post for free, courtesy of SoyPete Tech.

Or purchase a paid subscription.