← Back to blog
12 min read

Vector Databases Explained: How Semantic Search Powers AI Memory

architecturetutorialcomparison
vector-databases-how-they-work.md

Vector Databases Explained: How Semantic Search Powers AI Memory

Ctrl+F finds exact matches. Vector search finds meaning. Here is how.

When you search your codebase for connection_pool_timeout, you find every file that contains that exact string. But what about the file that discusses "how long to wait before dropping idle database connections"? That is the same concept, described in different words. Traditional search misses it. Vector search finds it.

Vector databases are the backbone of modern AI applications — RAG pipelines, semantic search, recommendation engines, and memory systems. This post explains what vectors are, how similarity search works, why indexing algorithms matter, and how shodh-memory uses Vamana and SPANN to power its semantic memory layer.

---

What Is a Vector? What Is an Embedding?

A vector is a list of numbers. An embedding is a vector produced by a neural network that captures themeaning of its input. Texts with similar meanings produce vectors that are close together in vector space.

```

┌─────────────────────────────────────────────────────────────────────┐

│ TEXT → EMBEDDING → VECTOR SPACE │

├─────────────────────────────────────────────────────────────────────┤

│ │

│ "The database connection timed out" │

│ │ │

│ ▼ │

│ ┌──────────────────────┐ │

│ │ Embedding Model │ (MiniLM-L6-v2, 384 dimensions) │

│ │ (neural network) │ │

│ └──────────┬───────────┘ │

│ │ │

│ ▼ │

│ [0.12, -0.45, 0.78, 0.03, ..., -0.22] (384 numbers) │

│ │

│ │

│ Vector Space (simplified to 2D): │

│ │

│ ▲ dim 2 │

│ │ │

│ │ ● "DB connection failed" │

│ │ ● "connection timed out" │

│ │ ● "database timeout error" │

│ │ │

│ │ ● "deploy to staging" │

│ │ ● "push to production" │

│ │ │

│ │ ● "the weather is nice" │

│ │ │

│ └──────────────────────────────────▶ dim 1 │

│ │

│ Nearby points = semantically similar text │

│ │

└─────────────────────────────────────────────────────────────────────┘

```

shodh-memory uses MiniLM-L6-v2 (via ONNX Runtime), which produces 384-dimensional embeddings — no GPU required, no API calls, everything runs locally.

How Vector Search Works

Given a query, vector search has three steps:

1. Embed the query — convert query text to a vector using the same model

2. Find nearest neighbors — search the database for vectors closest to the query vector

3. Return results — return the original texts associated with those vectors

"Closest" is measured by a distance metric:

```

┌────────────────────┬────────────────────────────┬────────────────────┐

│ Metric │ Formula │ Best For │

├────────────────────┼────────────────────────────┼────────────────────┤

│ Cosine Similarity │ dot(A,B) / (|A| × |B|) │ Text similarity │

│ Dot Product │ Σ(Aᵢ × Bᵢ) │ Normalized vectors │

│ L2 (Euclidean) │ √Σ(Aᵢ - Bᵢ)² │ Dense clustering │

└────────────────────┴────────────────────────────┴────────────────────┘

```

Cosine similarity is the most common for text. It measures the angle between vectors, ignoring magnitude.

Indexing Algorithms: From Brute Force to Vamana

The naive approach — compare the query against every vector — is O(n) per query. For 1,000,000 vectors, it is unacceptable. This is where ANN indexing comes in.

```

┌─────────────────────────────────────────────────────────────────────┐

│ NEAREST NEIGHBOR SEARCH (2D simplified) │

├─────────────────────────────────────────────────────────────────────┤

│ │

│ ▲ │

│ │ · │

│ │ · · │

│ │ · · │

│ │ · ★ query │

│ │ · ◉ ←nearest · │

│ │ ◉ ←2nd nearest │

│ │ · · │

│ │ · · · · │

│ └──────────────────────────────────▶ │

│ │

│ With index: only compare against candidates from graph/clusters │

│ Accuracy: ~95-99% Speed: 10-1000× faster │

│ │

└─────────────────────────────────────────────────────────────────────┘

```

IVF — Partition vectors into clusters. Only search nearest clusters.

HNSW — Multi-layer navigable graph. Very fast, very memory-hungry.

Vamana — Single-layer graph with aggressive pruning. HNSW-quality recall with less memory. Developed by Microsoft for DiskANN.

SPANN — Combines IVF with disk-based storage and product quantization. Designed for billion-scale.

```

┌─────────────────────────────────────────────────────────────────────┐

│ VAMANA GRAPH STRUCTURE (simplified) │

├─────────────────────────────────────────────────────────────────────┤

│ │

│ Each node = one vector. Edges connect nearby vectors. │

│ Search: greedy walk from entry point toward query. │

│ │

│ ┌─────┐ │

│ ┌───▶│ A │◀────┐ │

│ │ └──┬──┘ │ │

│ │ │ │ │

│ ┌────┴──┐ │ ┌────┴──┐ │

│ │ B │◀───┘ │ C │ │

│ └───┬───┘ └───┬───┘ │

│ │ │ │

│ ┌────┴──┐ ┌───┴────┐ │

│ │ D │ │ E │ │

│ └───────┘ └────────┘ │

│ │

│ Max out-degree R (e.g., 64) keeps memory low │

│ RobustPrune removes redundant edges │

│ │

└─────────────────────────────────────────────────────────────────────┘

```

shodh-memory's Approach: Adaptive Vamana + SPANN

shodh-memory automatically switches based on dataset size:

```

┌─────────────────────────────────────────────────────────────────────┐

│ shodh-memory ADAPTIVE INDEX SELECTION │

├─────────────────────────────────────────────────────────────────────┤

│ │

│ Memory count Index Storage Query latency │

│ ────────────── ───────────── ──────────── ────────────── │

│ 1 - 1K Brute force In-memory < 0.1ms │

│ 1K - 100K Vamana graph In-memory < 1ms │

│ 100K - 10M+ SPANN + PQ Disk + RAM < 5ms │

│ │

│ PQ compression: 384 × f32 (1,536 bytes) → 48 bytes (~32×) │

│ Recall@10: > 95% across all tiers │

│ │

└─────────────────────────────────────────────────────────────────────┘

```

Vector Database Comparison

```

┌────────────────────┬───────────┬───────────┬───────────┬───────────┬──────────────┐

│ Feature │ Pinecone │ Milvus │ Weaviate │ Qdrant │ shodh-memory │

├────────────────────┼───────────┼───────────┼───────────┼───────────┼──────────────┤

│ Deployment │ Cloud │ Self-host │ Self-host │ Self-host │ Embedded │

│ Index types │ Propriet. │ IVF,HNSW │ HNSW │ HNSW │ Vamana,SPANN │

│ Embedding included │ No │ No │ Yes │ No │ Yes (MiniLM) │

│ Knowledge graph │ No │ No │ No │ No │ Yes (Hebbian)│

│ Memory decay │ No │ No │ No │ No │ Yes (Wixted) │

│ Latency (p99) │ ~20ms │ ~10ms │ ~15ms │ ~5ms │ < 1ms │

│ Privacy │ Cloud │ Optional │ Optional │ Optional │ 100% local │

│ Pricing │ Per-query │ Open src │ Open src │ Open src │ Free (Apache)│

│ Purpose │ General │ General │ General │ General │ AI memory │

└────────────────────┴───────────┴───────────┴───────────┴───────────┴──────────────┘

```

Why Vector Search Alone Is Not Memory

Vector databases solve retrieval but miss three critical aspects:

1. Decay

A memory stored a year ago has the same retrieval probability as one stored today. shodh-memory applies hybrid decay (Wixted 2004) that deprioritizes stale memories without deleting them.

2. Strengthening

When you retrieve a memory, it should becomeeasier to retrieve next time. Vector databases treat retrieval as read-only. shodh-memory updates access counts, timestamps, and graph edge weights on every recall.

3. Relationships

Vector similarity captures semantic relatedness but not causal, temporal, or structural relationships. Only a knowledge graph can capture "the server crashed" → "we increased the connection pool."

The Full Stack: Vectors + Graph + Decay

```

┌─────────────────────────────────────────────────────────────────────┐

│ THE COGNITIVE MEMORY STACK │

├─────────────────────────────────────────────────────────────────────┤

│ │

│ ┌─────────────────────────────────────────────────────────────┐ │

│ │ Layer 5: RANKED RESULTS │ │

│ │ Final ranked memories returned to the agent │ │

│ └──────────────────────────┬──────────────────────────────────┘ │

│ │ │

│ ┌──────────────────────────┴──────────────────────────────────┐ │

│ │ Layer 4: FUSION (Reciprocal Rank Fusion) │ │

│ │ Merge vector results + graph results + fact boosts │ │

│ └──────────┬─────────────────────────────────┬────────────────┘ │

│ │ │ │

│ ┌──────────┴──────────┐ ┌──────────┴──────────────┐ │

│ │ Layer 3: GRAPH │ │ Layer 3: DECAY │ │

│ │ Spreading activ. │ │ Time-based weighting │ │

│ └──────────┬──────────┘ └──────────┬──────────────┘ │

│ │ │ │

│ ┌──────────┴──────────────────────────────────┴──────────────┐ │

│ │ Layer 2: VECTOR SEARCH │ │

│ │ Vamana / SPANN nearest neighbor search │ │

│ └──────────────────────────┬──────────────────────────────────┘ │

│ │ │

│ ┌──────────────────────────┴──────────────────────────────────┐ │

│ │ Layer 1: EMBEDDING │ │

│ │ MiniLM-L6-v2 (ONNX, local, no API calls) │ │

│ └─────────────────────────────────────────────────────────────┘ │

│ │

└─────────────────────────────────────────────────────────────────────┘

```

Vector search is Layer 2 of a 5-layer retrieval pipeline. Each layer adds intelligence that a standalone vector database cannot provide.

Getting Started

shodh-memory includes its own vector database — no external service required:

```bash

Docker

docker run -d -p 3030:3030 -v shodh-data:/data ghcr.io/varun29ankus/shodh-memory:latest

npm (MCP server)

npm install -g @shodh/memory-mcp

Rust crate

cargo install shodh-memory

Python

pip install shodh-memory

```
GitHub
npm
PyPI

---

Vector databases answer "what is similar?" That is necessary but not sufficient for memory. Memory requires decay, strengthening, and relationships. shodh-memory gives you all three — vectors, graph, and cognitive decay — in a single embedded system.

$ subscribe

Get updates on releases, features, and AI memory research.