Shodh-Memory: A Cognitive Memory System for Edge-Native AI Agents

Q: How is shodh-memory different from a vector database?

Vector databases give you similarity search. Shodh-memory gives you cognition — memories strengthen when accessed together (Hebbian learning), decay naturally over time (power-law forgetting), and form associative networks via a knowledge graph. It's the difference between storage and memory.

Q: Does shodh-memory require an internet connection?

No. Shodh-memory runs 100% offline. The embeddings, vector index, knowledge graph — everything runs locally. Perfect for edge devices, air-gapped systems, or anywhere you need data privacy.

Q: What's the memory overhead?

The binary is ~30MB. Models add ~50MB (22MB MiniLM embeddings + 14MB NER model + 14MB ONNX runtime). Each memory entry uses roughly 2-5KB. A system with 10,000 memories uses approximately 50MB of storage.

Q: Can shodh-memory run on a Raspberry Pi?

Yes. Shodh-memory is designed for edge deployment. It runs on Raspberry Pi Zero, Jetson Nano, industrial PCs, and other resource-constrained devices. Graph lookups are under 1 microsecond.

Q: How does memory decay work?

Shodh-memory uses a hybrid model: exponential decay for the first 3 days (consolidation phase), then power-law decay for long-term retention. Memories accessed 10+ times become potentiated and decay 10x slower. Based on Wixted & Ebbesen (1991).

Q: What is Hebbian learning in AI agent memory?

Cells that fire together, wire together. When memories are accessed together, their connection strengthens. When memories compete, interference effects occur. It's how biological brains work, now applied to AI agent memory.

Q: Is there a cloud version?

No, and that's intentional. Shodh-memory is built for local-first, privacy-preserving AI. Your agent's memories stay on your hardware. If you need multi-device sync, you can replicate the RocksDB storage yourself.

Q: What languages and frameworks does shodh-memory support?

The core is Rust. We provide: an MCP server (for Claude, Cursor, and other AI agents), Python bindings (via PyO3/maturin), and a REST API. The Rust crate can be embedded directly in your application.

Q: How do I contribute?

Check out github.com/varun29ankuS/shodh-memory. Open issues, submit PRs, or join discussions. The codebase is well-documented with 688+ tests. All constants have neuroscience citations.

Varun Sharma

doi:10.5281/zenodo.18668709

2026-04-03•12 min read

ChatGPT Memory Is Full? Here's Unlimited AI Memory That Never Fills Up

comparisonagentic-aiprivacy

chatgpt-memory-full-unlimited-alternative.md

ChatGPT Memory Is Full? Here's Unlimited AI Memory That Never Fills Up

You've seen the message. Everyone has.

"Memory full. ChatGPT may not remember new information until you manage your saved memories."

You're mid-conversation. You've been teaching your assistant about your codebase, your preferences, your project conventions. And then it stops learning. Not because the model ran out of capability. Because it ran out of storage slots.

You open Settings, scroll through a list of facts, and start manually deleting memories about your Kubernetes namespace preferences so that ChatGPT can remember your new API endpoint. This is the state of AI memory in 2026.

It doesn't have to be.

How ChatGPT Memory Actually Works

Let's be precise about what ChatGPT's memory system does. It's simpler than most people think:

```

┌─────────────────────────────────────────────────┐

│ ChatGPT Memory Architecture │

├─────────────────────────────────────────────────┤

│ │

│ Conversation │

│ │ │

│ ▼ │

│ ┌─────────────────────┐ │

│ │ LLM decides what │ │

│ │ "seems important" │ │

│ └────────┬────────────┘ │

│ │ │

│ ▼ │

│ ┌─────────────────────┐ ┌──────────────┐ │

│ │ Flat list of facts │───▶│ HARD CAP: │ │

│ │ (key-value pairs) │ │ ~6000 tokens │ │

│ └─────────────────────┘ └──────────────┘ │

│ │ │ │

│ │ ▼ │

│ │ ┌──────────────┐ │

│ │ │ FULL? Sorry. │ │

│ │ │ Delete some. │ │

│ ▼ └──────────────┘ │

│ ┌─────────────────────┐ │

│ │ Injected into next │ │

│ │ conversation system │ │

│ │ prompt (verbatim) │ │

│ └─────────────────────┘ │

│ │

└─────────────────────────────────────────────────┘

```

That's it. There is no vector search. No knowledge graph. No decay model. No relationship between memories. It's a flat list with a hard ceiling, and the LLM itself decides what to extract and what to discard.

When the list fills up, you become the garbage collector.

Why the Hard Cap Exists

This isn't a technical limitation. OpenAI can store terabytes. The cap exists because of a design choice: every saved memory gets injected verbatim into your system prompt. More memories means more tokens consumed before you even type a message.

```

┌──────────────── Context Window ─────────────────┐

│ │

│ ┌──────────────────────┐ ◀── System prompt │

│ │ Saved memories │ (~6000 tokens) │

│ │ (all of them, │ │

│ │ every time) │ │

│ └──────────────────────┘ │

│ ┌──────────────────────┐ ◀── Your conversation │

│ │ User messages │ (remaining space) │

│ │ + Assistant replies │ │

│ └──────────────────────┘ │

│ ┌──────────────────────┐ │

│ │ Tool calls, output │ ◀── What's left │

│ └──────────────────────┘ │

│ │

└──────────────────────────────────────────────────┘

```

The more memories you store, the less room you have for actual conversation. So OpenAI caps it. A reasonable engineering trade-off for a general-purpose chatbot. A terrible architecture for anyone who needs their AI to actually learn.

The Real Problem: Flat Lists Don't Scale

Even if OpenAI doubled or tripled the cap, the architecture is fundamentally broken for power users. Here's why:

1. No retrieval. Every memory loads every time. A memory about your Kubernetes config loads when you're asking about recipes. There's no relevance filtering.

2. No relationships. Memories are isolated facts. The system can't link "prefers TypeScript" to "uses Next.js" to "deploys on Vercel" into a coherent understanding.

3. No decay. A memory from six months ago has the same weight as one from five minutes ago. You manually curate or it clutters forever.

4. No learning. Accessing a memory doesn't strengthen it. Frequently-used knowledge and rarely-used knowledge are treated identically.

5. LLM-decided extraction. The model guesses what's worth remembering. It often guesses wrong, storing trivia while missing critical instructions.

What Cognitive Memory Looks Like

What if memory worked the way your brain does? Not a clipboard with a page limit, but a system that:

• Stores everything you tell it, with no hard cap

• Retrieves only what's relevant to the current conversation

• Strengthens memories that get used frequently (Hebbian learning)

• Decays memories that are never accessed (forgetting curves)

• Links related concepts into a knowledge graph

• Surfaces context proactively before you ask for it

This is not a hypothetical. This is how shodh-memory works.

```

┌───────────────────────────────────────────────┐

│ Cognitive Decay vs Hard Cap │

├──────────────────┬────────────────────────────┤

│ ChatGPT │ shodh-memory │

├──────────────────┼────────────────────────────┤

│ │ │

│ Strength │ Strength │

│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │

│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░ │

│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ▓▓▓▓▓▓▓▓▓░░░░░░░░░ │

│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ▓▓▓▓▓░░░░░░░░░░░░░ │

│ ──────FULL────── │ ▓▓░░░░░░░░░░░░░░░░░ │

│ (can't add more) │ ░░░░░░░░░░░░░░░░░░░ │

│ │ ↑ │

│ │ Old memories decay │

│ │ naturally, making room │

│ │ for new ones. │

│ │ Used memories stay strong. │

│ │ │

└──────────────────┴────────────────────────────┘

```

How shodh-memory Works

shodh-memory is an open-source cognitive memory system written in Rust. It runs as a single binary on your machine. No cloud. No API keys. No Docker required. Here's what happens when you store a memory:

```

┌──────────────────────────────────────────────┐

│ What happens when you remember something │

├──────────────────────────────────────────────┤

│ │

│ "Use RS256 for JWT signing in auth svc" │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 1. Embed locally │ MiniLM-L6-v2 │

│ │ (384-dim, <5ms) │ via ONNX Runtime │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 2. Store in RocksDB │ Async write <1ms │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 3. Index in Vamana │ Graph-based ANN │

│ │ vector graph │ (auto → SPANN │

│ │ │ at 100k vectors) │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 4. Extract entities │ "JWT", "RS256", │

│ │ (NER pipeline) │ "auth service" │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 5. Update knowledge │ Edges form: │

│ │ graph │ JWT ──▶ RS256 │

│ │ │ JWT ──▶ auth svc │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ 6. Hebbian learning │ Co-accessed │

│ │ strengthens edges │ memories wire │

│ │ │ together (+0.025) │

│ └──────────────────────┘ │

│ │

└──────────────────────────────────────────────┘

```

When you recall, it's not just a keyword lookup:

```

┌──────────────────────────────────────────────┐

│ Multi-Stage Retrieval Pipeline │

├──────────────────────────────────────────────┤

│ │

│ Query: "How does auth work?" │

│ │ │

│ ├──▶ Vector search (semantic) │

│ │ finds: auth memories │

│ │ │

│ ├──▶ BM25 reranking (lexical) │

│ │ boosts: exact matches │

│ │ │

│ ├──▶ Temporal boost │

│ │ recent > old │

│ │ │

│ ├──▶ Graph expansion │

│ │ auth → JWT → RS256 │

│ │ auth → signing keys │

│ │ auth → rotation schedule │

│ │ │

│ └──▶ Hebbian boost │

│ frequently accessed = │

│ higher relevance │

│ │

│ ▼ │

│ Ranked results (composite score) │

│ │

└──────────────────────────────────────────────┘

```

ChatGPT Memory vs shodh-memory: Feature Comparison

| Feature | ChatGPT Memory | shodh-memory |

| --- | --- | --- |

| Storage limit | ~6000 tokens (hard cap) | Unlimited (disk-bound) |

| Memory format | Flat list of facts | Structured + graph-linked |

| Retrieval | Load all, every time | Semantic search + graph expansion |

| Memory decay | None (manual delete) | Hybrid exponential + power-law |

| Learning | None | Hebbian (+0.025 per co-access) |

| Knowledge graph | No | Yes (spreading activation) |

| Relationships | No | Entities + typed edges |

| Tiers | Single flat list | Working → Session → Long-term |

| Privacy | OpenAI servers | 100% local, your machine |

| Offline | No | Yes, fully offline |

| Open source | No | Yes (Apache 2.0) |

| Custom extraction | No (LLM decides) | NER pipeline + manual control |

| API | None (UI only) | 60+ REST endpoints + MCP |

| Cost | ChatGPT Plus ($20/mo) | Free (open source) |

Three-Tier Memory Architecture

ChatGPT treats all memories the same. shodh-memory uses a three-tier model based on Cowan's embedded-processes theory from cognitive science:

```

┌─────────────────────────────────────────┐

│ ┌─────────────────────────────────┐ │

│ │ ┌─────────────────────────┐ │ │

│ │ │ Working Memory │ │ │

│ │ │ (seconds, 4-7 items) │ │ │

│ │ │ Current focus only │ │ │

│ │ └─────────────────────────┘ │ │

│ │ Session Memory │ │

│ │ (hours, current convo) │ │

│ │ Promotes after 30 min │ │

│ └─────────────────────────────────┘ │

│ Long-Term Memory │

│ (permanent, consolidated) │

│ Promotes after 24 hours │

│ Resists decay via LTP │

└─────────────────────────────────────────┘

```

Memories start in working memory and promote through tiers based on usage and time. Frequently accessed knowledge reaches long-term memory, where Long-Term Potentiation (LTP) makes it resistant to decay. Rarely accessed memories naturally fade, just like in the human brain. No manual pruning required.

Getting Started: 3 Ways to Connect

Option 1: MCP Server (Claude Code, Cursor, Windsurf)

The fastest path. One command gives your coding assistant persistent memory:

```bash

npx @shodh/memory-mcp@latest

```

Add to your Claude Code config (~/.claude/settings.json):

```json

{

"mcpServers": {

"shodh-memory": {

"command": "npx",

"args": ["-y", "@shodh/memory-mcp@latest"]

}

```

That's it. 45 MCP tools are now available: remember, recall, proactive_context, add_todo, set_reminder, and more. Your AI assistant now has memory that persists between sessions, strengthens with use, and surfaces relevant context automatically.

Option 2: Python SDK

```bash

pip install shodh-memory

```

```python

from shodh_memory import ShodhMemory

memory = ShodhMemory()

Store a memory

memory.remember(

content="User prefers TypeScript with strict mode",

tags=["preference", "typescript"]

)

Recall relevant memories

results = memory.recall("What language does the user prefer?")

for r in results:

print(r.content, r.relevance)

```

Option 3: REST API

shodh-memory exposes 60+ HTTP endpoints on localhost:3030:

```bash

Store a memory

curl -X POST http://localhost:3030/api/remember \

-H 'Content-Type: application/json' \

-d '{"content": "Deploy to staging before prod", "tags": ["workflow"]}'

Recall memories

curl -X POST http://localhost:3030/api/recall \

-H 'Content-Type: application/json' \

-d '{"query": "deployment process"}'

```

Your Data Never Leaves Your Machine

This is not a marketing claim. It's an architectural fact.

```

┌──────────────────────────────────────────────┐

│ Where Your Data Lives │

├──────────────────┬───────────────────────────┤

│ ChatGPT │ shodh-memory │

├──────────────────┼───────────────────────────┤

│ │ │

│ Your machine │ Your machine │

│ │ │ │ │

│ ▼ │ ▼ │

│ ┌──────────┐ │ ┌─────────────────────┐ │

│ │ OpenAI │ │ │ Local RocksDB │ │

│ │ servers │ │ │ + Vamana index │ │

│ │ (US) │ │ │ + Knowledge graph │ │

│ └──────────┘ │ │ │ │

│ │ │ │ No network. │ │

│ ▼ │ │ No API keys. │ │

│ ┌──────────┐ │ │ No cloud. │ │

│ │ Stored │ │ │ │ │

│ │ on their │ │ │ Runs on: │ │

│ │ infra │ │ │ - Mac/Win/Linux │ │

│ └──────────┘ │ │ - Raspberry Pi │ │

│ │ │ - Air-gapped nets │ │

│ You trust │ │ │ │

│ OpenAI with │ └─────────────────────┘ │

│ your agent's │ │

│ knowledge. │ You own everything. │

│ │ │

└──────────────────┴───────────────────────────┘

```

shodh-memory embeds text locally using MiniLM-L6-v2 via ONNX Runtime. The model ships with the binary. No API calls, no internet required. Your memories, your preferences, your code context, your private data -- all of it stays on your disk, indexed locally, queried locally.

This matters for healthcare, defense, finance, legal, and anyone who takes data sovereignty seriously. It also matters for anyone who's tired of paying $20/month for a memory system that fills up in a day.

The Numbers

| Metric | Value |

| --- | --- |

| Binary size | ~30MB |

| Embedding latency | <5ms per memory |

| Write latency | <1ms (async) |

| Semantic search | 34-58ms |

| Graph lookup | <1 microsecond |

| Vector dimensions | 384 (MiniLM-L6-v2) |

| Test suite | 1089 tests |

| License | Apache 2.0 |

| Platforms | Linux, macOS, Windows, ARM64 |

Your Memory Shouldn't Have Someone Else's Storage Quota

ChatGPT memory is a product feature designed for casual users. It was never meant to be a memory system. It's a notepad with a page limit, managed by an LLM that guesses what you consider important.

If you're building AI agents, coding assistants, research tools, or robotic systems that need to learn and remember, you need something that was designed from the ground up as a cognitive memory system.

shodh-memory is that system. It's open source. It runs locally. It never fills up. And your data stays yours.

```bash

Start remembering

npx @shodh/memory-mcp@latest

```

• GitHub

• Documentation

• Research Paper (DOI: 10.5281/zenodo.18668709)

ChatGPT Memory Is Full? Here's Unlimited AI Memory That Never Fills Up

ChatGPT Memory Is Full? Here's Unlimited AI Memory That Never Fills Up

How ChatGPT Memory Actually Works

Why the Hard Cap Exists

The Real Problem: Flat Lists Don't Scale

What Cognitive Memory Looks Like

How shodh-memory Works

ChatGPT Memory vs shodh-memory: Feature Comparison

Three-Tier Memory Architecture

Getting Started: 3 Ways to Connect

Option 1: MCP Server (Claude Code, Cursor, Windsurf)

Option 2: Python SDK

Store a memory

Recall relevant memories

Option 3: REST API

Store a memory

Recall memories

Your Data Never Leaves Your Machine

The Numbers

Your Memory Shouldn't Have Someone Else's Storage Quota

Start remembering

Related Posts

Best AI Agent Frameworks 2026: LangChain, CrewAI, AutoGen, OpenAI Agents SDK Compared

AI Model Pricing Guide 2026: Claude, GPT-4.1, Grok, Gemini, DeepSeek Compared

shodh-memory vs mem0 vs Zep vs MemGPT: Which AI Agent Memory System Should You Use?

$ subscribe