Vector databases are specialised systems for storing and querying high-dimensional vectors (embeddings). They enable semantic search by finding vectors similar to a query vector.
Core Concepts
Vector Search
Given a query vector, find the K most similar vectors in the database.
Similarity metrics:
- Cosine similarity — Angle between vectors (most common for text)
- Euclidean distance — Straight-line distance
- Dot product — Magnitude-aware similarity
Exact vs Approximate Search
Exact (brute force):
- Compare query to every vector
- O(n) complexity
- Perfect accuracy
- Only viable for small datasets
Approximate Nearest Neighbour (ANN):
- Trade accuracy for speed
- O(log n) or O(1) typical
- 95-99% recall is usually acceptable
- Essential for production scale
ANN Algorithms
HNSW (Hierarchical Navigable Small World)
Graph-based algorithm. Builds a multi-layer graph of connections.
- Fast queries
- Good recall
- Higher memory usage
- Most popular choice
IVF (Inverted File Index)
Clusters vectors, searches only relevant clusters.
- Lower memory than HNSW
- Tuneable speed/recall trade-off
- Works well with quantisation
Product Quantisation (PQ)
Compresses vectors by encoding subvectors to centroids.
- Dramatic memory reduction
- Some accuracy loss
- Often combined with IVF (IVF-PQ)
Flat
No indexing, brute force search.
- Perfect recall
- Use for small datasets or ground truth
Vector Databases
Purpose-built
| Database | Highlights |
|---|---|
| Pinecone | Fully managed, serverless option, easy to use |
| Weaviate | Open-source, GraphQL API, hybrid search |
| Qdrant | Open-source, Rust-based, filtering |
| Milvus | Open-source, highly scalable |
| Chroma | Lightweight, embedded, great for prototyping |
| LanceDB | Embedded, columnar, serverless |
Extensions to Existing Databases
| Database | Extension |
|---|---|
| PostgreSQL | pgvector, pgvecto.rs |
| Redis | Redis Stack |
| Elasticsearch | Dense vector field |
| MongoDB | Atlas Vector Search |
| SQLite | sqlite-vss |
When to Use What
Pinecone/Weaviate/Qdrant — Production workloads, need scale
Chroma/LanceDB — Prototyping, embedded use cases
pgvector — Already using PostgreSQL, moderate scale
Redis — Need caching alongside vectors
Key Features
Filtering (Metadata)
Query with both vector similarity AND attribute filters:
Find vectors similar to query
WHERE category = "electronics"
AND price < 100
Pre-filtering vs post-filtering affects performance and recall.
Hybrid Search
Combine vector search with keyword search (BM25). Usually merged with Reciprocal Rank Fusion (RRF).
Multi-tenancy
Isolate data between users/organisations:
- Namespace-based isolation
- Metadata filtering
- Separate collections
Sparse Vectors
Some databases support sparse vectors for keyword matching alongside dense vectors.
Architecture Considerations
Embedding Dimensions
Higher dimensions = more storage and slower queries.
- 384d: Lightweight, good for simple use cases
- 768-1024d: Common middle ground
- 1536-3072d: High quality, more resources
Index Build Time
HNSW index building is expensive. Plan for:
- Initial bulk load time
- Incremental updates
- Index rebuild strategies
Memory vs Disk
- HNSW typically memory-resident
- IVF can use disk with memory-mapped files
- Quantisation reduces memory requirements
Sharding
For large-scale deployments:
- Partition by ID range or hash
- Partition by metadata
- Managed services handle this automatically
Operations
Indexing Pipeline
Documents → Chunking → Embedding → Vector DB
↓
Metadata extraction
Update Strategies
- Full reindex — Simple but slow
- Incremental — Add/delete individual vectors
- Batch upsert — Efficient bulk updates
Backup & Recovery
- Point-in-time snapshots
- Continuous replication (managed services)
- Export/import functionality
Evaluation
Recall@K
What fraction of true nearest neighbours are returned in top K results?
Query Latency
p50, p95, p99 latencies under load.
Throughput
Queries per second at acceptable latency.
Build Time
Time to index N vectors.
Memory Usage
RAM required for index + vectors.
Benchmarks
- ANN Benchmarks — Algorithm comparison
- Vector DB Benchmark — Database comparison
Integration
Frameworks
- RAG implementations (LangChain, LlamaIndex)
- Direct SDK usage
Example (pgvector)
CREATE EXTENSION vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
SELECT content
FROM documents
ORDER BY embedding <=> '[query_vector]'
LIMIT 10;Example (Chroma)
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
documents=["Document 1", "Document 2"],
ids=["id1", "id2"]
)
results = collection.query(
query_texts=["search query"],
n_results=5
)