RocksDB for AI Workloads: Lessons from Building a Memory Engine
RocksDB for AI Workloads: Lessons from Building a Memory Engine
When we started building shodh-memory, the storage question seemed straightforward. SQLite? PostgreSQL? Custom B-tree? We ended up with RocksDB, and the reasons are worth understanding.
Why Not SQLite
SQLite is excellent for structured relational data. But AI memory workloads are weird:
SQLite handles the first three adequately. The fourth is the dealbreaker. In SQLite, everything shares one B-tree namespace. In RocksDB, column families are independent LSM trees with separate compaction, separate bloom filters, and separate block caches.
When you're scanning the entity-episodes index (prefix scan, sequential reads), you don't want that competing with random-read point lookups on the embeddings column family. Column families give you this isolation.
Why Not PostgreSQL
Two words: single binary. shodh-memory ships as one 28MB executable. Adding a PostgreSQL dependency means your users need a database server. On a Raspberry Pi. In an air-gapped factory.
No.
The RocksDB Architecture
shodh-memory uses 12+ column families:
memories — Core memory records (MessagePack)
embeddings — 384-dim float32 vectors
entities — Knowledge graph nodes
edges — Knowledge graph relationships
entity_episodes — Entity-to-episode index
todos — GTD task records
projects — Todo project metadata
reminders — Prospective memory triggers
facts — Extracted factual assertions
files — File access records
feedback — Implicit feedback signals
audit — Operation audit log
Each column family has tuned options. Embeddings use larger block sizes (64KB) because reads are always full-vector. The entity_episodes index uses prefix bloom filters for fast prefix scans. The audit log uses FIFO compaction to auto-prune old entries.
MessagePack Over JSON
We serialize memory records with MessagePack instead of JSON. The reasons:
For backward compatibility, we have a 4-level deserialization fallback: MessagePack → JSON (legacy) → bincode (historical) → raw bytes. Migrations happen lazily — records are upgraded when they're next written.
Write-Ahead Logging
Every memory write goes through RocksDB's WAL before it's acknowledged. This means a power failure can't corrupt your memory database. On edge devices where power is unreliable (robots, IoT), this is non-negotiable.
We default to async writes (<1ms latency) for normal operations and sync writes (2-10ms) for critical paths like backup. The async mode doesn't skip the WAL — it just doesn't wait for the OS to flush to disk. In practice, you lose at most the last few milliseconds of writes on a hard crash.
Prefix Iterators for Graph Traversal
The knowledge graph's entity-episode index is keyed as `{entity_uuid}:{episode_uuid}`. To find all episodes for an entity, we use RocksDB's prefix iterator:
let prefix = format!("{entity_uuid}:");
let iter = db.prefix_iterator(prefix.as_bytes());
This is a seek + sequential scan, hitting only the relevant key range. With prefix bloom filters enabled, the seek is O(1) amortized. Compare this to a SQL query that would scan an index and then do random page reads for each row.