← Back to blog
2026-03-1812 min read

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

neurosciencelearningalgorithms
learning-without-backpropagation.md

Learning Without Backpropagation: Why Local Rules Beat Global Gradients for Memory

Every neural network you've ever used learns through backpropagation. Every one. GPT-4, Stable Diffusion, AlphaFold, your MNIST classifier from a tutorial. Backprop is so dominant that most practitioners have never considered an alternative.

But backpropagation has a fundamental problem: **the brain can't do it.**

And in December 2022, Geoffrey Hinton — the person most responsible for making backprop dominant — published a paper proposing an alternative. His motivation? Backpropagation's biological implausibility.

This matters for memory systems because the question isn't academic. The choice between global gradients and local learning rules determines whether a memory system can learn in real-time, on edge devices, without a training pipeline. The answer has practical consequences for every AI agent that needs to remember.

What Backpropagation Actually Requires

To understand why local rules matter, you need to understand what backprop demands:

**1. A global loss function.** The entire network must agree on a single objective. Every weight update depends on how that weight contributed to the final output error. This means the system needs a centralized error signal that propagates through every layer.

**2. Reverse information flow.** Error gradients flow backward through the network — from output layer to input layer, through every connection. This requires either storing all intermediate activations (memory-expensive) or recomputing them (compute-expensive).

**3. Symmetric weights.** The backward pass uses the transpose of the forward weights. In biology, synapses are unidirectional — neuron A's connection to neuron B has no relationship to B's connection to A. Backprop requires them to be mathematically linked.

**4. Synchronous updates.** All layers must wait for the forward pass to complete, then the backward pass to complete, before any weights update. The entire network is locked in lockstep.

```

Backpropagation requires:

Input → Layer 1 → Layer 2 → ... → Output → Loss

|

Update ← ∂L/∂w₁ ← ∂L/∂w₂ ← ... ← ∂L/∂wₙ ←─┘

Problems for biological / real-time systems:

• Needs global error signal (brain has no central loss)

• Needs backward pass through same weights (synapses are one-way)

• Needs stored activations (brain doesn't cache forward pass)

• Needs synchronous lock-step (neurons fire asynchronously)

```

None of these requirements are met by biological neural systems. The brain has no global loss function. Synapses transmit in one direction. Neurons fire asynchronously. There is no backward pass.

Yet the brain learns. It learns faster than any artificial system at many tasks, consumes 20 watts, and runs on wetware that operates at millisecond timescales — a million times slower than silicon.

How the Brain Actually Learns

Biological learning uses **local rules**. Each synapse updates based only on information available at that synapse — the activity of the presynaptic neuron, the activity of the postsynaptic neuron, and neuromodulatory signals. No global error. No backward pass. No weight transport.

Hebb's Rule (1949)

The simplest local rule: if neuron A consistently fires before neuron B, strengthen the connection from A to B. "Cells that fire together, wire together."

This was a theoretical proposal for 50 years until Bi & Poo (1998) measured it directly in hippocampal neurons. The strengthening rate: 3-7% per co-activation. The weakening rate when activation is uncorrelated: similar magnitude but slightly faster.

```

Hebb's rule: Δwᵢⱼ = η × xᵢ × xⱼ

xᵢ = presynaptic activity

xⱼ = postsynaptic activity

η = learning rate

No error signal needed.

No backward pass needed.

Only local information needed.

Updates happen in real-time, at the synapse.

```

Spike-Timing Dependent Plasticity (STDP)

The timing-sensitive refinement of Hebb's rule, documented by Markram et al. If neuron A fires *before* neuron B (within ~20ms), the synapse from A to B strengthens. If A fires *after* B, it weakens. The precise timing window creates directional, causal learning.

This is how the brain learns sequences, temporal patterns, and cause-effect relationships — all without backpropagation.

Long-Term Potentiation (LTP)

Repeated Hebbian strengthening triggers a cascade of molecular changes that make the synaptic change permanent. Magee & Grienberger (2020) documented three phases:

1. **Early LTP** (minutes): Temporary phosphorylation of existing receptors. Easily reversed.

2. **Late LTP** (hours-days): New protein synthesis. More receptors inserted into the synapse.

3. **Structural LTP** (permanent): Physical growth of new synaptic connections. The change is anatomical.

This is why you can remember your childhood home but not what you had for lunch three Tuesdays ago. Some memories undergo full LTP. Most don't.

Hinton's Forward-Forward Algorithm (2022)

Geoffrey Hinton won the Nobel Prize partly for popularizing backpropagation. Then he published a paper trying to replace it.

The [Forward-Forward algorithm](https://www.cs.toronto.edu/~hinton/FFA13.pdf) replaces the forward-backward passes of backprop with two forward passes:

**Positive pass:** Real data flows through the network. Each layer adjusts its weights to *increase* a local "goodness" measure (the sum of squared activations).

**Negative pass:** Synthetic "negative" data flows through. Each layer adjusts weights to *decrease* the same goodness measure.

```

Backpropagation: Forward-Forward:

Forward: x → h₁ → h₂ → ŷ Positive: real data → h₁↑ → h₂↑ → h₃↑

Loss: L = f(ŷ, y) Negative: fake data → h₁↓ → h₂↓ → h₃↓

Backward: ∂L/∂w propagated

Each layer has its own local objective.

Global error signal. No backward pass. No weight transport.

Synchronous. Can pipeline and learn while streaming.

```

The critical property: **each layer learns independently using only local information.** There is no backward pass. No global loss. No weight transport problem. The network can learn while data is still streaming through it.

On MNIST, the Forward-Forward algorithm performed "only slightly worse" than backpropagation. On its first attempt. With minimal optimization. Hinton explicitly stated this was motivated by backprop's biological implausibility.

Why This Matters for Memory Systems

The choice between global gradients and local learning rules has direct engineering consequences for memory systems:

Real-Time Learning

Backpropagation requires a training phase separate from inference. You collect data, compute gradients, update weights, then deploy. This doesn't work for memory that needs to learn from every interaction.

Local rules learn in real-time. When a user accesses two memories together, the edge between them strengthens immediately. No training pipeline. No batch updates. No gradient computation. The system learns from use, at the speed of use.

Edge Deployment

Backpropagation requires storing all intermediate activations for the backward pass. For a large model, this means gigabytes of memory and significant compute. You can't run backprop-based learning on a Raspberry Pi or a drone's onboard computer.

Local rules have O(1) memory overhead per update. Strengthening an edge requires only the two nodes' activation values and the current weight. shodh-memory runs Hebbian learning at 20W on ARM devices.

No LLM Dependencies

Most AI memory systems use LLM calls for learning — "extract the key insight from this conversation" or "summarize what changed." This is outsourcing learning to a global model (the LLM) rather than implementing local learning rules.

The result: 2+ API calls per memory store, 20+ seconds latency, cloud dependency, and costs that scale with usage. Mem0, Cognee, and Zep all follow this pattern.

shodh-memory uses zero LLM calls. Learning happens through Hebbian edge updates, activation decay, and spreading activation — all local rules that execute in microseconds, offline, on any hardware.

```

Memory systems using global learning (LLM calls):

Store: content → LLM extract → embeddings → vector DB

Time: ~20 seconds | Cost: 2+ API calls | Requires: cloud

shodh-memory using local learning rules:

Store: content → local embed → graph + vector index

Learn: co-access → Hebbian strengthen (+0.025)

Forget: idle → exponential/power-law decay

Time: 55ms | Cost: $0 | Requires: nothing

```

shodh-memory's Local Learning Implementation

Every learning mechanism in shodh-memory is a local rule. No global objective. No backward pass. No external model calls.

Hebbian Edge Strengthening

When two memories are accessed in the same session, the edge weight increases by +0.025 (additive). When an edge goes unused through a consolidation cycle, it decays by ×0.90 (multiplicative). The asymmetry is deliberate: building takes 40 co-accesses to reach maximum; decay takes 22 idle cycles to return to minimum.

This mirrors the biological asymmetry documented by Bi & Poo (1998): potentiation and depression have different rates and mechanisms.

Three-Tier LTP

Edges in shodh-memory progress through tiers that directly model LTP phases:

**L1 Working** (early-LTP): New edges. Temporary. Decay rapidly if unused. Minutes to hours.
**L2 Episodic** (late-LTP): Edges that survived initial decay. More resistant. Hours to days.
**L3 Semantic** (structural-LTP): Edges that have been consistently strengthened. Near-permanent. Resist decay almost completely.

Promotion is automatic and based on usage patterns — exactly like biological LTP, where repeated stimulation triggers progressively more permanent molecular changes.

Activation Decay

Every memory has a strength that decays over time following a hybrid curve: exponential for the first 3 days (rapid initial drop, matching synaptic consolidation timelines from Wixted 2004), then power-law for longer periods (slow fade, matching long-term forgetting curves from Ebbinghaus and Anderson & Schooler 1991).

Decay is a local operation. Each memory's strength depends only on its own access history and the elapsed time. No global recomputation needed.

Spreading Activation

When a memory is retrieved, activation spreads to connected memories through the knowledge graph, weighted by edge strength and decaying by 0.7× per hop. This surfaces associated context without an explicit query — the network's structure determines what's relevant.

This is a local computation: each node activates its neighbors proportionally to edge weight. No global coordination. The graph structure — shaped by Hebbian learning — determines the activation pattern.

The Bigger Picture

The AI field chose backpropagation in the 1980s because it was effective and easy to implement on the hardware available. It remains the best method for training large models on static datasets.

But for systems that need to learn continuously, in real-time, on constrained hardware, without a training pipeline — local rules are not just an alternative. They're the only viable approach.

Hinton's Forward-Forward algorithm, the renewed interest in Hopfield networks, Intel's Loihi neuromorphic chip, IBM's TrueNorth — all point to the same conclusion: the field is beginning to question whether backpropagation's dominance is a fundamental truth or an accident of GPU architecture.

For memory systems, the answer is already clear. You can't run backprop every time a user stores a memory. You can't call an LLM every time you want to strengthen an association. You can't require a training pipeline for a system that needs to learn from every interaction.

Local rules — Hebbian learning, activation decay, spreading activation, long-term potentiation — are how biological memory works. They're also how practical AI memory systems will work. Not because biology is always optimal, but because the engineering constraints of real-time, always-on memory are the same constraints that biological memory evolved to satisfy.

The brain doesn't use backpropagation. It uses local rules. And it remembers just fine.

References

1. Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations. *arXiv:2212.13345*.

2. Hebb, D.O. (1949). The Organization of Behavior. Wiley.

3. Bi, G.Q. & Poo, M.M. (1998). Synaptic Modifications in Cultured Hippocampal Neurons. *J. Neuroscience*, 18(24), 10464-10472.

4. Magee, J.C. & Grienberger, C. (2020). Synaptic Plasticity Forms and Functions. *Annual Review of Neuroscience*, 43, 95-117.

5. Markram, H. et al. (2012). A History of Spike-Timing-Dependent Plasticity. *Frontiers in Synaptic Neuroscience*, 3, 4.

6. Wixted, J.T. (2004). The Psychology and Neuroscience of Forgetting. *Annual Review of Psychology*, 55, 235-269.

7. Anderson, J.R. & Schooler, L.J. (1991). Reflections of the Environment in Memory. *Psychological Science*, 2(6), 396-408.

$ subscribe

Get updates on releases, features, and AI memory research.