Shodh-Memory: A Cognitive Memory System for Edge-Native AI Agents

Q: How is shodh-memory different from a vector database?

Vector databases give you similarity search. Shodh-memory gives you cognition — memories strengthen when accessed together (Hebbian learning), decay naturally over time (power-law forgetting), and form associative networks via a knowledge graph. It's the difference between storage and memory.

Q: Does shodh-memory require an internet connection?

No. Shodh-memory runs 100% offline. The embeddings, vector index, knowledge graph — everything runs locally. Perfect for edge devices, air-gapped systems, or anywhere you need data privacy.

Q: What's the memory overhead?

The binary is ~30MB. Models add ~50MB (22MB MiniLM embeddings + 14MB NER model + 14MB ONNX runtime). Each memory entry uses roughly 2-5KB. A system with 10,000 memories uses approximately 50MB of storage.

Q: Can shodh-memory run on a Raspberry Pi?

Yes. Shodh-memory is designed for edge deployment. It runs on Raspberry Pi Zero, Jetson Nano, industrial PCs, and other resource-constrained devices. Graph lookups are under 1 microsecond.

Q: How does memory decay work?

Shodh-memory uses a hybrid model: exponential decay for the first 3 days (consolidation phase), then power-law decay for long-term retention. Memories accessed 10+ times become potentiated and decay 10x slower. Based on Wixted & Ebbesen (1991).

Q: What is Hebbian learning in AI agent memory?

Cells that fire together, wire together. When memories are accessed together, their connection strengthens. When memories compete, interference effects occur. It's how biological brains work, now applied to AI agent memory.

Q: Is there a cloud version?

No, and that's intentional. Shodh-memory is built for local-first, privacy-preserving AI. Your agent's memories stay on your hardware. If you need multi-device sync, you can replicate the RocksDB storage yourself.

Q: What languages and frameworks does shodh-memory support?

The core is Rust. We provide: an MCP server (for Claude, Cursor, and other AI agents), Python bindings (via PyO3/maturin), and a REST API. The Rust crate can be embedded directly in your application.

Q: How do I contribute?

Check out github.com/varun29ankuS/shodh-memory. Open issues, submit PRs, or join discussions. The codebase is well-documented with 688+ tests. All constants have neuroscience citations.

Varun Sharma

doi:10.5281/zenodo.18668709

2026-04-03•14 min read

OpenAI Killed the Assistants API. Here's How to Own Your Memory Layer

agentic-aiarchitecturemigration

openai-assistants-api-deprecated-alternative.md

OpenAI Killed the Assistants API. Here's How to Own Your Memory Layer

On March 31, 2026, OpenAI announced the deprecation of the Assistants API. The deadline is August 2026. After that, threads, runs, and file search — the stateful layer that thousands of production applications depend on — will stop working.

If you built on the Assistants API, you're now in one of two positions: migrating to the Responses API under deadline pressure, or rethinking your architecture from the ground up.

This post is for the second group.

The Deprecation Timeline

```

┌───────────────────────────────────────────────────┐

│ Assistants API Deprecation │

├───────────────────────────────────────────────────┤

│ │

│ 2023-11 Assistants API launched (DevDay) │

│ │ "Persistent threads! File search!" │

│ │ "No more managing conversation state!" │

│ │ │

│ ▼ │

│ 2024-04 Assistants API v2 │

│ │ Vector stores, streaming, tooling │

│ │ │

│ ▼ │

│ 2025-03 Responses API launched │

│ │ "The future of OpenAI APIs" │

│ │ Assistants API: "legacy" │

│ │ │

│ ▼ │

│ 2026-03 Deprecation announced │

│ │ "Migrate by August 2026" │

│ │ │

│ ▼ │

│ 2026-08 ████████████████████████ │

│ ██ API SHUTDOWN ██ │

│ ████████████████████████ │

│ Threads deleted. │

│ Runs terminated. │

│ Conversation state: gone. │

│ │

└───────────────────────────────────────────────────┘

```

Two and a half years. That's how long the Assistants API lasted. For anyone who invested months building on threads and runs, this is a painful lesson in platform risk.

What You're Losing

The Assistants API gave you three things that the Responses API does not:

1. Persistent Threads

Conversations that maintained state across API calls. You could create a thread, append messages over days or weeks, and the assistant remembered everything in that thread. No token management. No context window juggling.

2. Managed Conversation State

OpenAI handled the conversation history. You didn't need to store messages, truncate context, or decide what to include. The API managed it all.

3. Built-in File Search

Upload files, create vector stores, and the assistant could search across them. Retrieval-augmented generation without building a RAG pipeline.

The Responses API replaces the inference part — it's a better API for calling the model. But it explicitly does not replace the state management part. That's now your problem.

```

┌──────────────────────────────────────────────────┐

│ What the Responses API Gives You │

├──────────────────────────────────────────────────┤

│ │

│ ✓ Model inference (better than before) │

│ ✓ Tool calling (same) │

│ ✓ Streaming (improved) │

│ ✓ Multi-modal inputs │

│ │

│ ✗ Persistent threads ← GONE │

│ ✗ Managed conversation ← YOUR PROBLEM NOW │

│ ✗ Built-in file search ← BUILD YOUR OWN │

│ ✗ Stateful sessions ← GONE │

│ ✗ Cross-session memory ← NEVER EXISTED │

│ │

└──────────────────────────────────────────────────┘

```

Why Coupling Memory to a Provider Was Always Risky

Let's be honest: the Assistants API was a convenience trap.

It was easy. You didn't have to think about memory. OpenAI handled it. But "handled it" meant:

• Your conversation state lived on OpenAI's servers

• You had no access to the underlying storage layer

• You couldn't query across threads (no semantic search over all conversations)

• You couldn't port your state to another provider

• You couldn't inspect, export, or back up your threads

• And now, you can't keep them at all

This is the provider-coupled memory anti-pattern. It looks like this:

```

┌──────────────────────────────────────────────────┐

│ Provider-Coupled Architecture │

│ (what you had with Assistants) │

├──────────────────────────────────────────────────┤

│ │

│ ┌──────────────┐ │

│ │ Your App │ │

│ └──────┬───────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────────────────────┐ │

│ │ OpenAI Assistants API │ │

│ │ ┌─────────┐ ┌─────────┐ ┌──────┐ │ │

│ │ │ Threads │ │ Runs │ │ Files│ │ │

│ │ │ (state) │ │ (logic) │ │ (RAG)│ │ │

│ │ └─────────┘ └─────────┘ └──────┘ │ │

│ │ │ │

│ │ Model + Memory + State = ONE vendor │ │

│ └──────────────────────────────────────┘ │

│ │

│ Problem: vendor deprecates API │

│ Result: you lose EVERYTHING │

│ │

└──────────────────────────────────────────────────┘

```

The fix is architectural separation. Memory should be a standalone layer that you own and control, independent of whichever LLM provider you use today.

The Standalone Memory Layer Pattern

```

┌──────────────────────────────────────────────────┐

│ Decoupled Architecture │

│ (what you should build) │

├──────────────────────────────────────────────────┤

│ │

│ ┌──────────────┐ │

│ │ Your App │ │

│ └──────┬───────┘ │

│ │ │

│ ┌────┴────┐ │

│ │ │ │

│ ▼ ▼ │

│ ┌────────┐ ┌──────────────────────────────┐ │

│ │ LLM │ │ Memory Layer (shodh-memory) │ │

│ │ API │ │ │ │

│ │ │ │ ┌────────────────────────┐ │ │

│ │OpenAI │ │ │ Vector search │ │ │

│ │Anthropic│ │ │ Knowledge graph │ │ │

│ │Google │ │ │ Hebbian learning │ │ │

│ │Local │ │ │ Memory decay │ │ │

│ │ │ │ │ 3-tier promotion │ │ │

│ └────────┘ │ │ Entity extraction │ │ │

│ ▲ │ └────────────────────────┘ │ │

│ │ │ │ │

│ Swap any │ Runs locally. You own it. │ │

│ time. │ Survives any deprecation. │ │

│ └──────────────────────────────┘ │

│ │

└──────────────────────────────────────────────────┘

```

When the LLM provider is just the inference layer and memory is a separate system you control, deprecation becomes a non-event. Switch from OpenAI to Anthropic? Your memory stays. Switch from Anthropic to a local model? Your memory stays. The knowledge your agent accumulated over months of use belongs to you, not to the provider.

Assistants API Threads vs shodh-memory

| Feature | Assistants API Threads | shodh-memory |

| --- | --- | --- |

| Persistence | OpenAI servers (gone Aug 2026) | Your local disk (forever) |

| Cross-thread search | No | Yes (semantic + graph) |

| Knowledge graph | No | Yes (spreading activation) |

| Learning from access | No | Yes (Hebbian learning) |

| Memory decay | No | Yes (exponential + power-law) |

| Memory tiers | No | Working → Session → Long-term |

| Entity extraction | No | Yes (NER pipeline) |

| File search / RAG | Yes (built-in) | Via vector index + embeddings |

| Offline capable | No | Yes, fully offline |

| Provider lock-in | 100% OpenAI | Provider-agnostic |

| Data portability | None (no export) | Full (RocksDB, backup/restore) |

| Open source | No | Yes (Apache 2.0) |

| Cost | Per-thread storage fees | Free |

| API surface | Threads/Runs/Messages | 60+ REST endpoints + MCP |

Migration Guide: Assistants to Responses + shodh-memory

Here's the practical migration path, step by step.

Step 1: Export Your Thread Data

Before August 2026, pull everything out of the Assistants API:

```python

import openai

import json

client = openai.OpenAI()

List all your assistants and threads

Store messages locally before the shutdown

threads = [] # your thread IDs

for thread_id in threads:

messages = client.beta.threads.messages.list(thread_id)

with open(f"thread_{thread_id}.json", "w") as f:

json.dump([m.model_dump() for m in messages], f)

```

Step 2: Install shodh-memory

```bash

Option A: Python bindings

pip install shodh-memory

Option B: MCP server (for Claude Code, Cursor)

npx @shodh/memory-mcp@latest

Option C: Rust crate

cargo install shodh-memory

```

Step 3: Import Thread History into shodh-memory

```python

import json

import requests

SHODH_URL = "http://localhost:3030"

def import_thread(thread_file: str):

with open(thread_file) as f:

messages = json.load(f)

for msg in messages:

if msg["role"] == "assistant":

continue # store user context, not LLM outputs

content = msg["content"][0]["text"]["value"]

requests.post(f"{SHODH_URL}/api/remember", json={

"content": content,

"tags": ["imported", "assistants-api"],

"metadata": {

"source_thread": msg.get("thread_id", ""),

"original_timestamp": msg.get("created_at", "")

}

})

Import all your exported threads

import glob

for f in glob.glob("thread_*.json"):

import_thread(f)

print(f"Imported {f}")

```

Step 4: Replace Assistants API with Responses + shodh-memory

Here's the before and after:

```

┌──────────────────────────────────────────────────┐

│ BEFORE (Assistants) │

├──────────────────────────────────────────────────┤

│ │

│ User message │

│ │ │

│ ▼ │

│ thread.messages.create(thread_id, content) │

│ │ │

│ ▼ │

│ thread.runs.create(thread_id, assistant_id) │

│ │ │

│ ▼ │

│ OpenAI manages everything: │

│ context, history, file search, state │

│ │ │

│ ▼ │

│ Response (includes full thread context) │

│ │

└──────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────┐

│ AFTER (Responses + shodh) │

├──────────────────────────────────────────────────┤

│ │

│ User message │

│ │ │

│ ├──▶ shodh: recall(query) │

│ │ returns relevant memories │

│ │ │

│ ▼ │

│ Build prompt: │

│ system = base_instructions + memories │

│ user = current message │

│ │ │

│ ▼ │

│ openai.responses.create( │

│ model, system, user │

│ ) │

│ │ │

│ ├──▶ shodh: remember(user_msg) │

│ │ store for future recall │

│ │ │

│ ▼ │

│ Response (with full context from memory) │

│ │

└──────────────────────────────────────────────────┘

```

Step 5: The Code

Here's a complete working example of an agent that uses the Responses API for inference and shodh-memory for persistent state:

```python

import openai

import requests

client = openai.OpenAI()

SHODH = "http://localhost:3030"

def chat(user_message: str, user_id: str = "default") -> str:

# 1. Recall relevant memories

recall_resp = requests.post(

f"{SHODH}/api/recall",

json={"query": user_message, "limit": 5},

headers={"X-User-Id": user_id}

)

memories = recall_resp.json().get("memories", [])

# 2. Build context from memories

memory_context = ""

if memories:

memory_lines = [m["content"] for m in memories]

memory_context = (

"\nRelevant context from memory:\n"

+ "\n".join(f"- {line}" for line in memory_lines)

)

# 3. Call Responses API with memory context

response = client.responses.create(

model="gpt-4.1",

instructions=f"You are a helpful assistant.{memory_context}",

input=user_message

)

answer = response.output_text

# 4. Store the interaction in memory

requests.post(

f"{SHODH}/api/remember",

json={

"content": user_message,

"tags": ["user-message"],

},

headers={"X-User-Id": user_id}

)

return answer

Usage

print(chat("I prefer Python for data work and Rust for systems"))

Later, in a new session:

print(chat("What language should I use for this parser?"))

shodh recalls the preference automatically

```

What You Gain by Decoupling

The migration isn't just about surviving a deprecation. A standalone memory layer gives you capabilities the Assistants API never had:

Cross-Conversation Intelligence

Assistants API threads were isolated silos. You couldn't search across threads. With shodh-memory, every memory is indexed and searchable. A preference stated in conversation #1 surfaces automatically in conversation #500.

Knowledge That Strengthens With Use

Assistants API treated all messages equally. shodh-memory uses Hebbian learning: memories accessed together form stronger connections. Over time, your agent develops genuine expertise in the topics it encounters most.

Natural Forgetting

Assistants API threads grew forever until you deleted them. shodh-memory implements biologically plausible decay (Wixted 2004): exponential decay in the first 3 days, then power-law decay for older memories. Knowledge that matters gets reinforced. Trivia fades naturally.

Provider Freedom

```

┌──────────────────────────────────────────────────┐

│ Provider-Agnostic Memory │

├──────────────────────────────────────────────────┤

│ │

│ ┌────────────────────────────────────┐ │

│ │ shodh-memory (your data) │ │

│ │ ┌─────────────────────────┐ │ │

│ │ │ vectors + graph + decay │ │ │

│ │ └─────────────────────────┘ │ │

│ └──────────┬─────────────────────────┘ │

│ │ │

│ ┌────────┼────────┬──────────┐ │

│ │ │ │ │ │

│ ▼ ▼ ▼ ▼ │

│ OpenAI Anthropic Google Local LLM │

│ GPT-4.1 Claude Gemini Llama/Mistral │

│ │

│ Switch providers. Memory stays. │

│ Provider goes down. Memory stays. │

│ API deprecated. Memory stays. │

│ │

└──────────────────────────────────────────────────┘

```

The Architectural Lesson

The Assistants API deprecation is not an isolated event. It's a pattern. Every cloud provider will eventually deprecate, sunset, or "upgrade" the services you build on. If your agent's intelligence is stored inside a provider's API, you will lose it.

The architectural lesson is simple: separate inference from memory.

• Inference is a commodity. Models get better, cheaper, and more interchangeable every quarter. Use whichever provider gives you the best price-performance today. Switch tomorrow.

• Memory is your moat. The knowledge your agent accumulates over months of use is irreplaceable. It should live on infrastructure you control, in formats you own, with no single point of failure.

```

┌──────────────────────────────────────────────────┐

│ Separation of Concerns │

├──────────────────────────────────────────────────┤

│ │

│ Inference Layer │ Memory Layer │

│ (commodity) │ (your competitive moat) │

│ ───────────────── │ ────────────────────── │

│ OpenAI / Anthropic │ shodh-memory │

│ Google / Local │ localhost:3030 │

│ │ │

│ Swap any time │ Persists forever │

│ No state needed │ All the state │

│ Stateless calls │ Knowledge graph │

│ Pay per token │ Free, open source │

│ │ │

│ Monthly cost: │ Monthly cost: │

│ $20-500 │ $0 │

│ │ │

└─────────────────────┴────────────────────────────┘

```

Getting Started Today

Don't wait for the August deadline. Start decoupling now.

```bash

Install shodh-memory

pip install shodh-memory

Or use the MCP server with Claude Code / Cursor

npx @shodh/memory-mcp@latest

```

shodh-memory is a single Rust binary (~30MB). It runs 100% offline. No Docker, no cloud, no API keys. Apache 2.0 licensed. 1089 tests. Published on crates.io, npm, and PyPI.

Your agent's memory is too important to live on someone else's infrastructure. Own it.

• GitHub

• Documentation

• Research Paper (DOI: 10.5281/zenodo.18668709)

OpenAI Killed the Assistants API. Here's How to Own Your Memory Layer

OpenAI Killed the Assistants API. Here's How to Own Your Memory Layer

The Deprecation Timeline

What You're Losing

1. Persistent Threads

2. Managed Conversation State

3. Built-in File Search

Why Coupling Memory to a Provider Was Always Risky

The Standalone Memory Layer Pattern

Assistants API Threads vs shodh-memory

Migration Guide: Assistants to Responses + shodh-memory

Step 1: Export Your Thread Data

List all your assistants and threads

Store messages locally before the shutdown

Step 2: Install shodh-memory

Option A: Python bindings

Option B: MCP server (for Claude Code, Cursor)

Option C: Rust crate

Step 3: Import Thread History into shodh-memory

Import all your exported threads

Step 4: Replace Assistants API with Responses + shodh-memory

Step 5: The Code

Usage

Later, in a new session:

shodh recalls the preference automatically

What You Gain by Decoupling

Cross-Conversation Intelligence

Knowledge That Strengthens With Use

Natural Forgetting

Provider Freedom

The Architectural Lesson

Getting Started Today

Install shodh-memory

Or use the MCP server with Claude Code / Cursor

Related Posts

Language Models Are Few-Shot Learners — But They're Amnesiacs

Best AI Agent Frameworks 2026: LangChain, CrewAI, AutoGen, OpenAI Agents SDK Compared

Graph Databases for AI Memory: Why Your Agent Needs a Knowledge Graph

$ subscribe