Shodh-Memory: A Cognitive Memory System for Edge-Native AI Agents

Q: How is shodh-memory different from a vector database?

Vector databases give you similarity search. Shodh-memory gives you cognition — memories strengthen when accessed together (Hebbian learning), decay naturally over time (power-law forgetting), and form associative networks via a knowledge graph. It's the difference between storage and memory.

Q: Does shodh-memory require an internet connection?

No. Shodh-memory runs 100% offline. The embeddings, vector index, knowledge graph — everything runs locally. Perfect for edge devices, air-gapped systems, or anywhere you need data privacy.

Q: What's the memory overhead?

The binary is ~30MB. Models add ~50MB (22MB MiniLM embeddings + 14MB NER model + 14MB ONNX runtime). Each memory entry uses roughly 2-5KB. A system with 10,000 memories uses approximately 50MB of storage.

Q: Can shodh-memory run on a Raspberry Pi?

Yes. Shodh-memory is designed for edge deployment. It runs on Raspberry Pi Zero, Jetson Nano, industrial PCs, and other resource-constrained devices. Graph lookups are under 1 microsecond.

Q: How does memory decay work?

Shodh-memory uses a hybrid model: exponential decay for the first 3 days (consolidation phase), then power-law decay for long-term retention. Memories accessed 10+ times become potentiated and decay 10x slower. Based on Wixted & Ebbesen (1991).

Q: What is Hebbian learning in AI agent memory?

Cells that fire together, wire together. When memories are accessed together, their connection strengthens. When memories compete, interference effects occur. It's how biological brains work, now applied to AI agent memory.

Q: Is there a cloud version?

No, and that's intentional. Shodh-memory is built for local-first, privacy-preserving AI. Your agent's memories stay on your hardware. If you need multi-device sync, you can replicate the RocksDB storage yourself.

Q: What languages and frameworks does shodh-memory support?

The core is Rust. We provide: an MCP server (for Claude, Cursor, and other AI agents), Python bindings (via PyO3/maturin), and a REST API. The Rust crate can be embedded directly in your application.

Q: How do I contribute?

Check out github.com/varun29ankuS/shodh-memory. Open issues, submit PRs, or join discussions. The codebase is well-documented with 688+ tests. All constants have neuroscience citations.

Varun Sharma

doi:10.5281/zenodo.18668709

2026-03-18•12 min read

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

neurosciencelearningalgorithms

learning-without-backpropagation.md

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

Every neural network you've ever used learns through backpropagation. Every one. GPT-4, Stable Diffusion, AlphaFold, your MNIST classifier from a tutorial. Backprop is so dominant that most practitioners have never considered an alternative.

But backpropagation has a fundamental problem: the brain can't do it.

And in December 2022, Geoffrey Hinton — the person most responsible for making backprop dominant — published a paper proposing an alternative. His motivation? Backpropagation's biological implausibility.

This matters for memory systems because the question isn't academic. The choice between global gradients and local learning rules determines whether a memory system can learn in real-time, on edge devices, without a training pipeline. The answer has practical consequences for every AI agent that needs to remember.

What Backpropagation Actually Requires

To understand why local rules matter, you need to understand what backprop demands:

1. A global loss function. The entire network must agree on a single objective. Every weight update depends on how that weight contributed to the final output error. This means the system needs a centralized error signal that propagates through every layer.

2. Reverse information flow. Error gradients flow backward through the network — from output layer to input layer, through every connection. This requires either storing all intermediate activations (memory-expensive) or recomputing them (compute-expensive).

3. Symmetric weights. The backward pass uses the transpose of the forward weights. In biology, synapses are unidirectional — neuron A's connection to neuron B has no relationship to B's connection to A. Backprop requires them to be mathematically linked.

4. Synchronous updates. All layers must wait for the forward pass to complete, then the backward pass to complete, before any weights update. The entire network is locked in lockstep.

```

Backpropagation requires:

Input → Layer 1 → Layer 2 → ... → Output → Loss

|

Update ← ∂L/∂w₁ ← ∂L/∂w₂ ← ... ← ∂L/∂wₙ ←─┘

Problems for biological / real-time systems:

• Needs global error signal (brain has no central loss)

• Needs backward pass through same weights (synapses are one-way)

• Needs stored activations (brain doesn't cache forward pass)

• Needs synchronous lock-step (neurons fire asynchronously)

```

None of these requirements are met by biological neural systems. The brain has no global loss function. Synapses transmit in one direction. Neurons fire asynchronously. There is no backward pass.

Yet the brain learns. It learns faster than any artificial system at many tasks, consumes 20 watts, and runs on wetware that operates at millisecond timescales — a million times slower than silicon.

How the Brain Actually Learns

Biological learning uses local rules. Each synapse updates based only on information available at that synapse — the activity of the presynaptic neuron, the activity of the postsynaptic neuron, and neuromodulatory signals. No global error. No backward pass. No weight transport.

Hebb's Rule (1949)

The simplest local rule: if neuron A consistently fires before neuron B, strengthen the connection from A to B. "Cells that fire together, wire together."

This was a theoretical proposal for 50 years until Bi & Poo (1998) measured it directly in hippocampal neurons. The strengthening rate: 3-7% per co-activation. The weakening rate when activation is uncorrelated: similar magnitude but slightly faster.

```

Hebb's rule: Δwᵢⱼ = η × xᵢ × xⱼ

xᵢ = presynaptic activity

xⱼ = postsynaptic activity

η = learning rate

No error signal needed.

No backward pass needed.

Only local information needed.

Updates happen in real-time, at the synapse.

```

Spike-Timing Dependent Plasticity (STDP)

The timing-sensitive refinement of Hebb's rule, documented by Markram et al. If neuron A firesbefore neuron B (within ~20ms), the synapse from A to B strengthens. If A firesafter B, it weakens. The precise timing window creates directional, causal learning.

This is how the brain learns sequences, temporal patterns, and cause-effect relationships — all without backpropagation.

Long-Term Potentiation (LTP)

Repeated Hebbian strengthening triggers a cascade of molecular changes that make the synaptic change permanent. Magee & Grienberger (2020) documented three phases:

1. Early LTP (minutes): Temporary phosphorylation of existing receptors. Easily reversed.

2. Late LTP (hours-days): New protein synthesis. More receptors inserted into the synapse.

3. Structural LTP (permanent): Physical growth of new synaptic connections. The change is anatomical.

This is why you can remember your childhood home but not what you had for lunch three Tuesdays ago. Some memories undergo full LTP. Most don't.

Hinton's Forward-Forward Algorithm (2022)

Geoffrey Hinton won the Nobel Prize partly for popularizing backpropagation. Then he published a paper trying to replace it.

The Forward-Forward algorithm replaces the forward-backward passes of backprop with two forward passes:

Positive pass: Real data flows through the network. Each layer adjusts its weights toincrease a local "goodness" measure (the sum of squared activations).

Negative pass: Synthetic "negative" data flows through. Each layer adjusts weights todecrease the same goodness measure.

```

Backpropagation: Forward-Forward:

Forward: x → h₁ → h₂ → ŷ Positive: real data → h₁↑ → h₂↑ → h₃↑

Loss: L = f(ŷ, y) Negative: fake data → h₁↓ → h₂↓ → h₃↓

Backward: ∂L/∂w propagated

Each layer has its own local objective.

Global error signal. No backward pass. No weight transport.

Synchronous. Can pipeline and learn while streaming.

```

The critical property: each layer learns independently using only local information. There is no backward pass. No global loss. No weight transport problem. The network can learn while data is still streaming through it.

On MNIST, the Forward-Forward algorithm performed "only slightly worse" than backpropagation. On its first attempt. With minimal optimization. Hinton explicitly stated this was motivated by backprop's biological implausibility.

Why This Matters for Memory Systems

The choice between global gradients and local learning rules has direct engineering consequences for memory systems:

Real-Time Learning

Backpropagation requires a training phase separate from inference. You collect data, compute gradients, update weights, then deploy. This doesn't work for memory that needs to learn from every interaction.

Local rules learn in real-time. When a user accesses two memories together, the edge between them strengthens immediately. No training pipeline. No batch updates. No gradient computation. The system learns from use, at the speed of use.

Edge Deployment

Backpropagation requires storing all intermediate activations for the backward pass. For a large model, this means gigabytes of memory and significant compute. You can't run backprop-based learning on a Raspberry Pi or a drone's onboard computer.

Local rules have O(1) memory overhead per update. Strengthening an edge requires only the two nodes' activation values and the current weight. shodh-memory runs Hebbian learning at 20W on ARM devices.

No LLM Dependencies

Most AI memory systems use LLM calls for learning — "extract the key insight from this conversation" or "summarize what changed." This is outsourcing learning to a global model (the LLM) rather than implementing local learning rules.

The result: 2+ API calls per memory store, 20+ seconds latency, cloud dependency, and costs that scale with usage. Mem0, Cognee, and Zep all follow this pattern.

shodh-memory uses zero LLM calls. Learning happens through Hebbian edge updates, activation decay, and spreading activation — all local rules that execute in microseconds, offline, on any hardware.

```

Memory systems using global learning (LLM calls):

Store: content → LLM extract → embeddings → vector DB

Time: ~20 seconds | Cost: 2+ API calls | Requires: cloud

shodh-memory using local learning rules:

Store: content → local embed → graph + vector index

Learn: co-access → Hebbian strengthen (+0.025)

Forget: idle → exponential/power-law decay

Time: 55ms | Cost: $0 | Requires: nothing

```

shodh-memory's Local Learning Implementation

Every learning mechanism in shodh-memory is a local rule. No global objective. No backward pass. No external model calls.

Hebbian Edge Strengthening

When two memories are accessed in the same session, the edge weight increases by +0.025 (additive). When an edge goes unused through a consolidation cycle, it decays by ×0.90 (multiplicative). The asymmetry is deliberate: building takes 40 co-accesses to reach maximum; decay takes 22 idle cycles to return to minimum.

This mirrors the biological asymmetry documented by Bi & Poo (1998): potentiation and depression have different rates and mechanisms.

Three-Tier LTP

Edges in shodh-memory progress through tiers that directly model LTP phases:

• L1 Working (early-LTP): New edges. Temporary. Decay rapidly if unused. Minutes to hours.

• L2 Episodic (late-LTP): Edges that survived initial decay. More resistant. Hours to days.

• L3 Semantic (structural-LTP): Edges that have been consistently strengthened. Near-permanent. Resist decay almost completely.

Promotion is automatic and based on usage patterns — exactly like biological LTP, where repeated stimulation triggers progressively more permanent molecular changes.

Activation Decay

Every memory has a strength that decays over time following a hybrid curve: exponential for the first 3 days (rapid initial drop, matching synaptic consolidation timelines from Wixted 2004), then power-law for longer periods (slow fade, matching long-term forgetting curves from Ebbinghaus and Anderson & Schooler 1991).

Decay is a local operation. Each memory's strength depends only on its own access history and the elapsed time. No global recomputation needed.

Spreading Activation

When a memory is retrieved, activation spreads to connected memories through the knowledge graph, weighted by edge strength and decaying by 0.7× per hop. This surfaces associated context without an explicit query — the network's structure determines what's relevant.

This is a local computation: each node activates its neighbors proportionally to edge weight. No global coordination. The graph structure — shaped by Hebbian learning — determines the activation pattern.

The Bigger Picture

The AI field chose backpropagation in the 1980s because it was effective and easy to implement on the hardware available. It remains the best method for training large models on static datasets.

But for systems that need to learn continuously, in real-time, on constrained hardware, without a training pipeline — local rules are not just an alternative. They're the only viable approach.

Hinton's Forward-Forward algorithm, the renewed interest in Hopfield networks, Intel's Loihi neuromorphic chip, IBM's TrueNorth — all point to the same conclusion: the field is beginning to question whether backpropagation's dominance is a fundamental truth or an accident of GPU architecture.

For memory systems, the answer is already clear. You can't run backprop every time a user stores a memory. You can't call an LLM every time you want to strengthen an association. You can't require a training pipeline for a system that needs to learn from every interaction.

Local rules — Hebbian learning, activation decay, spreading activation, long-term potentiation — are how biological memory works. They're also how practical AI memory systems will work. Not because biology is always optimal, but because the engineering constraints of real-time, always-on memory are the same constraints that biological memory evolved to satisfy.

The brain doesn't use backpropagation. It uses local rules. And it remembers just fine.

References

1. Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations.arXiv:2212.13345.

2. Hebb, D.O. (1949). The Organization of Behavior. Wiley.

3. Bi, G.Q. & Poo, M.M. (1998). Synaptic Modifications in Cultured Hippocampal Neurons.J. Neuroscience, 18(24), 10464-10472.

4. Magee, J.C. & Grienberger, C. (2020). Synaptic Plasticity Forms and Functions.Annual Review of Neuroscience, 43, 95-117.

5. Markram, H. et al. (2012). A History of Spike-Timing-Dependent Plasticity.Frontiers in Synaptic Neuroscience, 3, 4.

6. Wixted, J.T. (2004). The Psychology and Neuroscience of Forgetting.Annual Review of Psychology, 55, 235-269.

7. Anderson, J.R. & Schooler, L.J. (1991). Reflections of the Environment in Memory.Psychological Science, 2(6), 396-408.

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

What Backpropagation Actually Requires

How the Brain Actually Learns

Hebb's Rule (1949)

Spike-Timing Dependent Plasticity (STDP)

Long-Term Potentiation (LTP)

Hinton's Forward-Forward Algorithm (2022)

Why This Matters for Memory Systems

Real-Time Learning

Edge Deployment

No LLM Dependencies

shodh-memory's Local Learning Implementation

Hebbian Edge Strengthening

Three-Tier LTP

Activation Decay

Spreading Activation

The Bigger Picture

References

Related Posts

Long-Term Potentiation in Code: Making Memories Permanent

Hebbian Learning for AI Agents: Neurons That Fire Together Wire Together

Memory Decay and Forgetting Curves: The Math Behind Remembering

$ subscribe