← Back to blog
9 min read

RocksDB for AI Workloads: Lessons from Building a Memory Engine

storageengineeringrocksdb
rocksdb-for-ai-workloads.md

RocksDB for AI Workloads: Lessons from Building a Memory Engine

When we started building shodh-memory, the storage question seemed straightforward. SQLite? PostgreSQL? Custom B-tree? We ended up with RocksDB, and the reasons are worth understanding.

Why Not SQLite

SQLite is excellent for structured relational data. But AI memory workloads are weird:

Variable-length blobs (serialized memory records, embeddings)
High write throughput (every interaction generates memories)
Prefix scans ("find all episodes for entity X")
Column-family isolation (memories, embeddings, entities, edges — each with different access patterns)

SQLite handles the first three adequately. The fourth is the dealbreaker. In SQLite, everything shares one B-tree namespace. In RocksDB, column families are independent LSM trees with separate compaction, separate bloom filters, and separate block caches.

When you're scanning the entity-episodes index (prefix scan, sequential reads), you don't want that competing with random-read point lookups on the embeddings column family. Column families give you this isolation.

Why Not PostgreSQL

Two words: single binary. shodh-memory ships as one 28MB executable. Adding a PostgreSQL dependency means your users need a database server. On a Raspberry Pi. In an air-gapped factory.

No.

The RocksDB Architecture

shodh-memory uses 12+ column families:

```

memories — Core memory records (MessagePack)

embeddings — 384-dim float32 vectors

entities — Knowledge graph nodes

edges — Knowledge graph relationships

entity_episodes — Entity-to-episode index

todos — GTD task records

projects — Todo project metadata

reminders — Prospective memory triggers

facts — Extracted factual assertions

files — File access records

feedback — Implicit feedback signals

audit — Operation audit log

```

Each column family has tuned options. Embeddings use larger block sizes (64KB) because reads are always full-vector. The entity_episodes index uses prefix bloom filters for fast prefix scans. The audit log uses FIFO compaction to auto-prune old entries.

MessagePack Over JSON

We serialize memory records with MessagePack instead of JSON. The reasons:

30-40% smaller on disk (no key repetition, binary encoding)
2-3x faster serialization/deserialization
Native binary support (embeddings don't need base64 encoding)

For backward compatibility, we have a 4-level deserialization fallback: MessagePack → JSON (legacy) → bincode (historical) → raw bytes. Migrations happen lazily — records are upgraded when they're next written.

Write-Ahead Logging

Every memory write goes through RocksDB's WAL before it's acknowledged. This means a power failure can't corrupt your memory database. On edge devices where power is unreliable (robots, IoT), this is non-negotiable.

We default to async writes (<1ms latency) for normal operations and sync writes (2-10ms) for critical paths like backup. The async mode doesn't skip the WAL — it just doesn't wait for the OS to flush to disk. In practice, you lose at most the last few milliseconds of writes on a hard crash.

Prefix Iterators for Graph Traversal

The knowledge graph's entity-episode index is keyed as {entity_uuid}:{episode_uuid}. To find all episodes for an entity, we use RocksDB's prefix iterator:

```

let prefix = format!("{entity_uuid}:");

let iter = db.prefix_iterator(prefix.as_bytes());

```

This is a seek + sequential scan, hitting only the relevant key range. With prefix bloom filters enabled, the seek is O(1) amortized. Compare this to a SQL query that would scan an index and then do random page reads for each row.

Lessons Learned

Tune per column family. Default settings are mediocre for everything. Profile your actual access patterns.
Use multi_get for batch reads. A single multi_get call with 50 keys is 5-10x faster than 50 individual get calls. We learned this the hard way (issue #20: 48-second cold-start queries).
Compaction matters. Level compaction for frequently-read data. FIFO for append-only logs. Universal compaction for write-heavy workloads.
Don't forget bloom filters. A 10-bit bloom filter reduces negative lookups from O(log N) to O(1). For a memory system where "not found" is a common result, this is transformative.

$ subscribe

Get updates on releases, features, and AI memory research.