← Back to blog
12 min read

Language Models Are Few-Shot Learners — But They're Amnesiacs

agentic-aiarchitectureneuroscience
language-models-few-shot-learners-amnesiacs.md

Language Models Are Few-Shot Learners — But They're Amnesiacs

In 2020, the GPT-3 paper carried one of the most quietly consequential titles in machine learning:Language Models are Few-Shot Learners. The claim was that a large enough model could learn a new task from a handful of examples placed directly in its prompt — no fine-tuning, no gradient updates, just a few demonstrations and a question. It was true, it was surprising, and it reorganized the field around in-context learning.

It is also only half a sentence. The full version is: language models are few-shot learners who forget everything the moment the context window scrolls.

Few-shot learning is real, and it is ephemeral

In-context learning is genuine learning in one narrow sense: the model's behavior adapts to the examples you show it. But that adaptation has the lifespan of a single context window. Scroll past the examples, start a new session, exceed the token budget — and the learning is gone. The model does notretain what it learned from you. It re-derives your context from scratch every time, or it forgets it entirely.

This is the central limitation that the scaling story papers over. A bigger model is abetter few-shot learner — it needs fewer examples, handles harder tasks — but it is not amore persistent one. Doubling the parameters does not give the model a memory of last Tuesday. The few-shot learner is brilliant and amnesiac at the same time, and the amnesia is structural, not a matter of scale.

Why this matters more in the agent era

For a chatbot answering one-off questions, ephemeral few-shot learning is fine. For an agent — something that acts over days, coordinates sub-tasks, accumulates context about a codebase or a customer or a household — it is crippling. An agent that re-learns its entire context every session is doing the equivalent of the movieGroundhog Day: maximally capable in the moment, incapable of compounding. The intelligence is real but it does not accumulate, and accumulation is the entire point of an agent.

The instinct in the field has been to fix this with more model: longer context windows, retrieval that stuffs more text into the prompt, and various schemes that try to bake experience back into the weights. These are serious and interesting. They also share an assumption worth questioning: that the place experience should live isinside the model.

The alternative: put the learning in the memory, not the model

There is a different architecture. Instead of making the model remember, give the agent a memory — a persistent, structured store that accumulates experience and retrieves the right fragment when it is needed. The model stays a few-shot learner; the memory is what makes the few-shot contextpersist across sessions, so the agent learns you once and keeps it.

This reframes the famous title. Language models are few-shot learners — so feed them the right few shots, every time, from a memory that has been watching. The few examples that make in-context learning work do not have to be re-supplied by a human each session; they can berecalled from an accumulated memory of everything the agent has experienced. Few-shot learning plus memory is what ephemeral few-shot learning was always supposed to become.

Memory that learns, without an LLM in the loop

Here is the key design choice. shodh-memory's faculties are designed as a frozen seed plus a small online adapter: the entity recognizer, the relation typer, even the embedder are conceived as a pre-trained seed that personalizes to a specific user, agent, or robotfrom the memory's own accumulated experience. The memory has a feedback loop — a dopaminergic prediction-error signal, in the spirit of Schultz's reward-learning work — that scales how much it learns from a retrieval by howsurprising the outcome was. The memory learns from the consequences of its own recalls.

That is a form of self-improvement from experience — but located in the memory substrate rather than in a large model's weights, and crucially with no LLM in the loop. The recognizers are small. The whole thing fits in tens of megabytes of RAM and runs on-device. The learning is private, it is yours, and it compounds with use in a way a re-reading model never can, because the model re-derives you while the memory has been adapting to you.

Why "no LLM in the loop" is the unlock, not a constraint

If your strategy for persistence is to bake experience into a large model's weights, you inherit the large model's costs: you pay tokens or training compute per unit of experience, you cannot run it on the edge, and the learned state is coupled to one specific model that will be obsolete in a year. If instead the learning lives in a small, LLM-free memory, the cost scales with CPU, it runs anywhere, and the accumulated memory outlives the model — you can swap your generation model and keep everything the agent has learned about you. The memory is the durable asset; the few-shot learner is the swappable part.

The takeaway

Language models are few-shot learners. That sentence is true and it is famous for good reason. But the missing half — that the learning evaporates without memory — is where the next frontier actually is. The agents that compound will not be the ones with the biggest few-shot learner; they will be the ones with the best memory feeding it. And the best memory is small, local, learning, and has no language model anywhere in its loop.

Related reading: [Why Your AI Agent's Memory Is Broken](/blog/why-your-ai-agents-memory-is-broken) · [Hebbian Learning for AI Agents](/blog/hebbian-learning-ai-agents) · [RAG Is Not Memory](/blog/rag-is-not-memory) · [LLM-Free Memory](/llm-free-memory).

$ subscribe

Get updates on releases, features, and AI memory research.