Types of Memory in AI: From Working Memory to Long-Term Potentiation
Types of Memory in AI: From Working Memory to Long-Term Potentiation
The human brain stores memories in at least five fundamentally different systems. Each has different capacity, duration, encoding mechanism, and retrieval pattern. Neuroscience has spent a century mapping these systems — and AI is only now starting to catch up.
Most AI systems have one type of memory: a context window. That is like having only working memory — you can hold a few things in mind right now, but the moment you look away, they are gone. No consolidation. No long-term storage. No learning.
This guide covers every type of memory — how brains implement them, how AI systems should implement them, and where the field is headed.
┌─────────────────────────────────────────────────────┐
│ THE MEMORY TAXONOMY │
│ │
│ MEMORY │
│ │ │
│ ┌─────────┼──────────┐ │
│ │ │ │ │
│ Sensory Short-Term Long-Term │
│ (<1s) (seconds) (permanent) │
│ │ │ │ │
│ ┌──┴──┐ Working ├── Explicit │
│ │ │ Memory │ ├── Episodic │
│ Iconic Echoic (4-7 │ └── Semantic │
│ (vis) (aud) items) │ │
│ └── Implicit │
│ ├── Procedural │
│ ├── Priming │
│ └── Conditioning │
└─────────────────────────────────────────────────────┘
---
Sensory Memory
Duration: Less than 1 second
Capacity: Large but unprocessed
Brain region: Primary sensory cortices
Sensory memory is the raw buffer. Everything your senses detect hits sensory memory first — a brief, high-fidelity snapshot of the world before any processing occurs.
┌──────────────────────────────────────────────────┐
│ SENSORY MEMORY │
│ │
│ Input Stream ──→ ┌──────────┐ ──→ Decay │
│ │ Raw │ (< 1 sec) │
│ Visual ─────────→│ Buffer │ │
│ Auditory ───────→│ │──→ Attention ──→ │
│ Tactile ────────→│ (large │ Filter │
│ │ capacity)│ │ │
│ └──────────┘ ↓ │
│ Working │
│ Memory │
│ │
│ Iconic (visual): ~250ms │
│ Echoic (auditory): ~3-4 seconds │
└──────────────────────────────────────────────────┘
George Sperling (1960) proved sensory memory exists with his partial report paradigm: subjects saw a grid of letters for 50ms and could report any row if cued immediately, but only 4-5 letters if asked to report everything. The information was there — it just decayed before they could report it all.
In AI systems: The closest analogue is the token input buffer — the raw text or image that arrives at the model before any attention is applied. In transformer architectures, the initial token embeddings before self-attention are sensory memory. The information exists but has not been processed.
In shodh-memory: The raw input to the remember endpoint — the full text before entity extraction, embedding computation, and graph integration. It exists briefly as unprocessed data before the cognitive pipeline processes it.
---
Short-Term Memory / Working Memory
Duration: Seconds to minutes
Capacity: 4 plus or minus 1 items (Cowan 2001), or 7 plus or minus 2 chunks (Miller 1956)
Brain region: Prefrontal cortex, parietal cortex
Working memory is where active thinking happens. It is the mental workspace where you hold information while manipulating it — doing arithmetic in your head, following a conversation, debugging code.
┌──────────────────────────────────────────────────────┐
│ WORKING MEMORY — BADDELEY'S MODEL (1974) │
│ │
│ ┌──────────────────────┐ │
│ │ Central Executive │ │
│ │ (attention control)│ │
│ └──────┬───────┬───────┘ │
│ │ │ │
│ ┌────────┘ └────────┐ │
│ ↓ ↓ │
│ ┌───────────────┐ ┌────────────────┐ │
│ │ Phonological │ │ Visuospatial │ │
│ │ Loop │ │ Sketchpad │ │
│ │ (inner voice) │ │ (inner eye) │ │
│ │ ~2 sec buffer │ │ ~3-4 items │ │
│ └───────────────┘ └────────────────┘ │
│ │ │ │
│ └──────────┬──────────────┘ │
│ ↓ │
│ ┌────────────────┐ │
│ │ Episodic Buffer│ │
│ │ (integration) │ │
│ └────────────────┘ │
│ │
│ Capacity: 4 +/- 1 chunks (Cowan 2001) │
│ Duration: ~20 seconds without rehearsal │
└──────────────────────────────────────────────────────┘
The critical distinction between short-term memory and working memory: short-term memory is passive storage (holding a phone number), while working memory is active manipulation (rearranging that phone number's digits). Baddeley's (1974) model formalized this with specialized subsystems — a phonological loop for verbal information, a visuospatial sketchpad for spatial information, and a central executive that controls attention.
Cowan (2001) refined the capacity estimate. Miller's (1956) famous "7 plus or minus 2" was measuringchunks, not items. When you control for chunking, the true capacity of the focus of attention is about 4 items. This has profound implications for AI: context windows are not working memory. A 200K token context window is more like a very large short-term buffer — theactive working set at any moment is much smaller.
In AI systems: The context window is often called "working memory," but this is imprecise. The context window is the total buffer. The model's actual working memory is the set of tokens actively influencing the current generation — determined by attention patterns, not window size.
In shodh-memory: The working memory tier holds the most recent 4-7 memories with highest activation. These are the items in the agent's immediate focus — capacity-limited, rapidly accessible, and the first to be checked during recall. Memories that leave working memory decay into session memory, then potentially consolidate into long-term storage.
---
Long-Term Memory
Duration: Days to a lifetime
Capacity: Effectively unlimited
Brain region: Distributed (hippocampus for consolidation, neocortex for storage)
Long-term memory is everything you know that is not currently in your conscious awareness. Your name, how to ride a bicycle, what happened on your birthday last year, the meaning of the word "epistemology" — all long-term memory, but fundamentally different types.
┌──────────────────────────────────────────────────────┐
│ LONG-TERM MEMORY — THE FULL TAXONOMY │
│ │
│ Long-Term Memory │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ │ │
│ Explicit Implicit │
│ (Declarative) (Non-declarative) │
│ "knowing that" "knowing how" │
│ │ │ │
│ ┌─────┴─────┐ ┌──────┴──────┐ │
│ │ │ │ │ │ │
│ Episodic Semantic Proced Priming Condit │
│ "events" "facts" "skills" "cues" "assoc" │
│ │
│ "I debugged "JWT uses "type faster anxiety │
│ a seg fault RS256 by faster when before │
│ yesterday" default" now" primed" deploys │
└──────────────────────────────────────────────────────┘
Episodic Memory — Events and Experiences
Episodic memory stores specific events anchored in time and place. Endel Tulving (1972) identified it as a distinct system — it is not just facts, but theexperience of having been there.
┌──────────────────────────────────────────────────┐
│ EPISODIC MEMORY EXAMPLES │
│ │
│ "Yesterday at 3pm I debugged a segfault in │
│ the memory allocator. The fix was to check │
│ alignment before the mmap call. I was │
│ frustrated because it took 4 hours." │
│ │
│ Components: │
│ ├── What: segfault fix (alignment check) │
│ ├── When: yesterday, 3pm │
│ ├── Where: memory allocator module │
│ ├── Context: 4 hours of debugging │
│ └── Emotion: frustration │
│ │
│ Key property: autonoetic consciousness │
│ (mental time travel — re-experiencing) │
└──────────────────────────────────────────────────┘
Episodic memories are rich, contextual, and decay fastest. You remember yesterday's lunch vividly. Last week's lunch is vague. Last month's is gone — unless something unusual happened. This decay is not a flaw. It is a feature. Most episodic details are irrelevant long-term.
In shodh-memory: Every memory stored with context (timestamp, source, emotional valence) is episodic. The system preserves the full context of when and why information was stored. Episodic memories form the backbone of the knowledge graph — entities extracted from episodes become nodes, relationships become edges.
Semantic Memory — Facts and Concepts
Semantic memory stores general knowledge stripped of episodic context. You know that Paris is the capital of France, but you probably do not remember the specific moment you learned it. The fact has beendecontextualized — extracted from its episodic origin and stored as pure knowledge.
┌──────────────────────────────────────────────────┐
│ EPISODIC → SEMANTIC TRANSFORMATION │
│ │
│ Episodic: "In the code review on March 3, │
│ Sarah said JWT tokens should use RS256 │
│ because HS256 has a known vulnerability." │
│ │ │
│ │ consolidation │
│ │ (repeated access) │
│ ↓ │
│ Semantic: "JWT tokens should use RS256." │
│ │
│ The fact persists. The episode fades. │
└──────────────────────────────────────────────────┘
This episodic-to-semantic transformation is one of the most important processes in memory. Through repeated access and consolidation, specific experiences become general knowledge. The brain does this during sleep (memory consolidation). AI systems need an explicit mechanism for it.
In shodh-memory: The fact extraction pipeline identifies factual statements in memories and stores them separately with confidence scores and support counts. Facts with high support (multiple episodic sources) gain permanence. Facts with low support decay. This mirrors the biological process — a fact mentioned once is fragile; a fact reinforced across sessions becomes durable knowledge.
Procedural Memory — Skills and Habits
Procedural memory storeshow to do things — motor skills, cognitive procedures, habitual behaviors. You do not consciously remember learning to type, but your fingers know where the keys are. This knowledge is implicit: it influences behavior without conscious recall.
┌──────────────────────────────────────────────────┐
│ PROCEDURAL MEMORY IN AI AGENTS │
│ │
│ Explicit: "Always run tests before commits" │
│ (you can state the rule) │
│ │
│ Procedural: The agent automatically runs │
│ tests before every commit without │
│ being prompted — it learned the │
│ pattern from repeated behavior. │
│ │
│ ┌─────────────┐ │
│ │ Repetition │ │
│ │ x50 sessions│──→ Automatic behavior │
│ │ │ (no explicit recall needed) │
│ └─────────────┘ │
└──────────────────────────────────────────────────┘
Procedural memory is the least explored area in AI memory systems. Current agents rely on explicit instructions (system prompts, tool definitions) rather than learned procedures. But the potential is enormous — an agent that haslearned your workflow through repetition, not instruction, is fundamentally more capable.
In shodh-memory: Pattern detection identifies repeated behaviors across sessions. When the system observes the same sequence of actions occurring reliably (e.g., running tests before commits), it strengthens those associations via Hebbian learning. The pattern becomes a strong edge in the knowledge graph — not as an explicit rule, but as a learned association that naturally surfaces during recall.
---
How Memory Moves Between Types
Memory is not static. Information flows between systems through encoding, consolidation, and retrieval — each a distinct process with its own mechanisms and failure modes.
┌──────────────────────────────────────────────────────────┐
│ THE MEMORY PIPELINE │
│ │
│ Sensory ──→ Working ──→ Long-Term ──→ Retrieval │
│ Buffer Memory Memory (recall) │
│ │ │ │ │ │
│ │ attention │ encoding │ consolid. │ cue-dependent │
│ │ filter │ rehearsal │ (sleep/rest) │ reconstruction│
│ │ │ │ │ │
│ ↓ ↓ ↓ ↓ │
│ ~99% ~80% ~40% Variable │
│ lost lost in lost in (depends on │
│ in <1s 30 sec 24 hours cue quality) │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Cowan's Embedded-Processes Model (2001) │ │
│ │ │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ Long-Term Memory (activated portion) │ │ │
│ │ │ ┌──────────────────────────────┐ │ │ │
│ │ │ │ Short-Term Store │ │ │ │
│ │ │ │ ┌────────────────────┐ │ │ │ │
│ │ │ │ │ Focus of Attention │ │ │ │ │
│ │ │ │ │ (4 items) │ │ │ │ │
│ │ │ │ └────────────────────┘ │ │ │ │
│ │ │ └──────────────────────────────┘ │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Cowan's (2001) embedded-processes model is the most accurate picture of how these systems relate. Working memory is not a separate box — it is the activated portion of long-term memory, with a sharply capacity-limited focus of attention sitting inside it. This is the model shodh-memory implements:
---
Forgetting: Not a Bug, a Feature
Hermann Ebbinghaus (1885) discovered the forgetting curve — memory strength drops exponentially in the first hours, then levels off into a slower decline. Over a century of research has confirmed and refined this finding.
┌─────────────────────────────────────────────────────┐
│ THE FORGETTING CURVE │
│ │
│ Retention │
│ 100% │\ │
│ │ \ │
│ 80% │ \ │
│ │ \ │
│ 60% │ \ │
│ │ \_ │
│ 40% │ \__ │
│ │ \___ │
│ 20% │ \______ │
│ │ \________________ │
│ 0% │──────────────────────────────────────── │
│ 0 1h 6h 1d 2d 6d 31d │
│ │
│ Phase 1 (0-3 days): Exponential decay │
│ Phase 2 (3+ days): Power-law decay │
│ │
│ Wixted (2004): Two-phase model explains why │
│ some memories last years while most fade in days │
└─────────────────────────────────────────────────────┘
Forgetting is essential. Without it, memory becomes an undifferentiated pile of information with no signal in the noise. The brain forgetsstrategically — discarding low-value information while preserving high-value knowledge. This is why you forget what you ate for lunch on an ordinary Tuesday but remember your wedding day.
Wixted (2004) showed that forgetting follows two distinct phases: rapid exponential decay in the first few days, then a much slower power-law decline. Memories that survive the initial exponential drop are relatively durable — they have been partially consolidated and resist further decay.
In shodh-memory: The decay engine implements this exact two-phase model:
┌──────────────────────────────────────────────────┐
│ SHODH DECAY MODEL │
│ │
│ Days 0-3: Exponential decay │
│ strength= e^(-lambda t) │
│ Fast drop. Weak memories die here. │
│ │
│ Days 3+: Power-law decay │
│ strength *= (t/t0)^(-alpha) │
│ Slow decline. Strong memories persist.│
│ │
│ Result: Frequently accessed memories become │
│ nearly permanent. Rarely accessed │
│ memories fade naturally. │
└──────────────────────────────────────────────────┘
This is biologically faithful. It is also computationally efficient — decay is calculated lazily at access time, not continuously. The system does not waste cycles decaying memories nobody is asking about.
---
Strengthening: Hebbian Learning and LTP
"Neurons that fire together wire together." Donald Hebb (1949) proposed this rule decades before neuroscience could prove it. The idea is simple: when two neurons are active simultaneously, the connection between them strengthens. This is the basis of all learning.
┌──────────────────────────────────────────────────────┐
│ HEBBIAN LEARNING — SYNAPSE STRENGTHENING │
│ │
│ Before co-activation: │
│ │
│ [Neuron A] ───weak (0.2)──→ [Neuron B] │
│ │
│ After repeated co-activation: │
│ │
│ [Neuron A] ═══STRONG (0.8)══→ [Neuron B] │
│ │
│ The rule: │
│ Δw = η activation_A activation_B │
│ │
│ Co-access → stronger connection → easier recall │
│ No co-access → decay → weaker connection │
└──────────────────────────────────────────────────────┘
Long-Term Potentiation (LTP) is the biological mechanism behind Hebb's rule. Discovered by Bliss and Lomo (1973), LTP is a persistent strengthening of synapses based on recent patterns of activity. It has three stages:
┌──────────────────────────────────────────────────────┐
│ LONG-TERM POTENTIATION — THREE TIERS │
│ │
│ Tier 1: Early LTP (minutes → hours) │
│ ├── Requires: single burst of co-activation │
│ ├── Mechanism: existing protein modification │
│ └── Duration: fades in hours without reinforcement │
│ │
│ Tier 2: Late LTP (hours → weeks) │
│ ├── Requires: repeated bursts over hours │
│ ├── Mechanism: new protein synthesis │
│ └── Duration: persists for days to weeks │
│ │
│ Tier 3: Structural LTP (weeks → permanent) │
│ ├── Requires: sustained activity over days │
│ ├── Mechanism: new synapse growth │
│ └── Duration: permanent (physically new structure) │
│ │
│ Progression: │
│ Burst → Early → Weekly → Full │
│ (minutes) (hours) (days) (permanent) │
└──────────────────────────────────────────────────────┘
In shodh-memory: The knowledge graph implements Hebbian learning with 3-tier LTP that mirrors the biological system:
The asymmetry is deliberate: additive strengthening (+0.025 per co-access), multiplicative weakening (*0.90 per decay cycle). Connections build slowly through repeated use but decay gently — exactly like biological synapses.
---
Knowledge Graphs as Semantic Memory
Semantic memory in the brain is a distributed network of concepts connected by associations. "Dog" connects to "animal," "pet," "bark," "fur" — and through those connections, to "cat," "veterinarian," "leash." Retrieval is not lookup — it isspreading activation through a web of associations.
┌──────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH — SEMANTIC NETWORK │
│ │
│ [PostgreSQL] │
│ / \ │
│ 0.85 / \ 0.72 │
│ / \ │
│ [Database]────0.91────[SQL] │
│ │ │ │
│ 0.67 │ │ 0.58 │
│ │ │ │
│ [RocksDB] [Queries] │
│ │ │ │
│ 0.78 │ │ 0.43 │
│ │ │ │
│ [Embedded] [Performance] │
│ \ / │
│ 0.55 \ / 0.62 │
│ \ / │
│ [Optimization] │
│ │
│ Edge weights = Hebbian strength │
│ Stronger edges = more frequent co-access │
│ Query activates node → spreads to neighbors │
└──────────────────────────────────────────────────────┘
Spreading Activation
When you hear "database," you do not just retrieve the concept "database" — activation spreads to related concepts. "SQL" lights up, then "queries," then "performance." This is how you can answer questions you have never explicitly been asked — by traversing the association network.
Anderson's (1983) spreading activation theory models this precisely: activation originates at a source node and propagates along edges, decaying with distance. Nodes receive activation proportional to the edge strength connecting them to the source.
┌──────────────────────────────────────────────────────┐
│ SPREADING ACTIVATION │
│ │
│ Query: "database performance" │
│ │
│ Step 1: [Database]=1.0 [Performance]=1.0 │
│ │
│ Step 2: [RocksDB]=0.67 [Optimization]=0.62 │
│ [SQL]=0.91 [Queries]=0.43 │
│ [PostgreSQL]=0.85 │
│ │
│ Step 3: [Embedded]=0.52 [Queries]=0.82 │
│ (from RocksDB) (combined from SQL + │
│ Performance) │
│ │
│ Result: retrieves memories about RocksDB │
│ optimization and PostgreSQL query tuning — │
│ neither mentioned in the query, but reachable │
│ through the association network. │
└──────────────────────────────────────────────────────┘
This is fundamentally different from vector similarity search. Vectors find memories thatlook like the query. Spreading activation finds memories that areconnected to the query through learned associations. Both are valuable. The combination is more powerful than either alone.
In shodh-memory: The 5-layer retrieval pipeline combines vector search, knowledge graph traversal with spreading activation, temporal context, and recency — then fuses results using reciprocal rank fusion (RRF). Memories that score high on both vector similarityand graph connectivity rank highest. This is why shodh retrieves relevant context that pure vector databases miss.
---
Human Memory vs shodh-memory: The Complete Map
┌────────────────────┬─────────────────────────────────┐
│ Human Memory │ shodh-memory Implementation │
├────────────────────┼─────────────────────────────────┤
│ Sensory buffer │ Raw input to remember endpoint │
│ (<1 second) │ (before processing) │
├────────────────────┼─────────────────────────────────┤
│ Working memory │ Working memory tier │
│ (4-7 items) │ (capacity-limited focus) │
├────────────────────┼─────────────────────────────────┤
│ Short-term store │ Session memory tier │
│ (seconds-minutes) │ (current session context) │
├────────────────────┼─────────────────────────────────┤
│ Episodic memory │ Memories with full context │
│ (events) │ (timestamp, source, emotion) │
├────────────────────┼─────────────────────────────────┤
│ Semantic memory │ Extracted facts + knowledge │
│ (facts) │ graph entities and edges │
├────────────────────┼─────────────────────────────────┤
│ Procedural memory │ Pattern detection via Hebbian │
│ (skills) │ edge strengthening │
├────────────────────┼─────────────────────────────────┤
│ Forgetting curve │ Exponential (0-3d) then │
│ (Ebbinghaus) │ power-law (3d+) decay │
├────────────────────┼─────────────────────────────────┤
│ Hebbian learning │ +0.025 additive per co-access │
│ (Hebb 1949) │ *0.90 multiplicative decay │
├────────────────────┼─────────────────────────────────┤
│ LTP tiers │ L1 Working → L2 Episodic → │
│ (Bliss 1973) │ L3 Semantic (3-tier promotion) │
├────────────────────┼─────────────────────────────────┤
│ Spreading │ Graph traversal with energy │
│ activation │ propagation and edge-weighted │
│ (Anderson 1983) │ decay across hops │
├────────────────────┼─────────────────────────────────┤
│ Consolidation │ Periodic maintenance: promote │
│ (sleep cycles) │ tiers, extract facts, prune │
│ │ weak edges, detect orphans │
└────────────────────┴─────────────────────────────────┘
---
Why This Matters for AI
Every serious AI system will need a memory architecture. Not a vector database. Not a chat history buffer. Amemory architecture — with distinct types, encoding mechanisms, consolidation processes, and retrieval strategies.
The context window is not memory. It is the computational workspace where memory getsused. Confusing the two is like confusing RAM with a hard drive — they serve fundamentally different purposes, and a system with only one will always be crippled.
The AI systems that will dominate the next decade are the ones that remember like brains do: selectively, associatively, with graceful forgetting and continuous learning. Not because neuroscience is always right, but because a billion years of evolution produced the only general intelligence we have evidence for. The architecture is worth copying.
Give your AI agents real memory
npx @shodh/memory-mcp@latest
shodh-memory implements the complete memory taxonomy in a single Rust binary. Working memory, session memory, long-term storage, episodic context, semantic facts, Hebbian learning, biologically plausible decay, spreading activation, and 3-tier LTP. 100% offline. No API keys. Apache 2.0.
References
1. Atkinson, R.C. & Shiffrin, R.M. (1968). Human Memory: A Proposed System and its Control Processes.Psychology of Learning and Motivation, 2, 89-195.
2. Baddeley, A.D. & Hitch, G. (1974). Working Memory.Psychology of Learning and Motivation, 8, 47-89.
3. Miller, G.A. (1956). The Magical Number Seven, Plus or Minus Two.Psychological Review, 63(2), 81-97.
4. Cowan, N. (2001). The Magical Number 4 in Short-Term Memory.Behavioral and Brain Sciences, 24(1), 87-114.
5. Tulving, E. (1972). Episodic and Semantic Memory. InOrganization of Memory, Academic Press.
6. Sperling, G. (1960). The Information Available in Brief Visual Presentations.Psychological Monographs, 74(11), 1-29.
7. Ebbinghaus, H. (1885). Memory: A Contribution to Experimental Psychology.
8. Wixted, J.T. (2004). The Psychology and Neuroscience of Forgetting.Annual Review of Psychology, 55, 235-269.
9. Hebb, D.O. (1949). The Organization of Behavior. Wiley.
10. Bliss, T.V.P. & Lomo, T. (1973). Long-Lasting Potentiation of Synaptic Transmission.Journal of Physiology, 232(2), 331-356.
11. Anderson, J.R. (1983). A Spreading Activation Theory of Memory.Journal of Verbal Learning and Verbal Behavior, 22(3), 261-295.
12. Squire, L.R. (2004). Memory Systems of the Brain.Neurobiology of Learning and Memory, 82(3), 171-177.