Vector Databases

Reference notes.

Vector databases are specialised systems for storing and querying high-dimensional vectors (embeddings). They enable semantic search by finding vectors similar to a query vector.

Core Concepts

Vector Search

Given a query vector, find the K most similar vectors in the database.

Similarity metrics:

Cosine similarity — Angle between vectors (most common for text)
Euclidean distance — Straight-line distance
Dot product — Magnitude-aware similarity

Exact vs Approximate Search

Exact (brute force):

Compare query to every vector
O(n) complexity
Perfect accuracy
Only viable for small datasets

Approximate Nearest Neighbour (ANN):

Trade accuracy for speed
O(log n) or O(1) typical
95-99% recall is usually acceptable
Essential for production scale

ANN Algorithms

HNSW (Hierarchical Navigable Small World)

Graph-based algorithm. Builds a multi-layer graph of connections.

Fast queries
Good recall
Higher memory usage
Most popular choice

IVF (Inverted File Index)

Clusters vectors, searches only relevant clusters.

Lower memory than HNSW
Tuneable speed/recall trade-off
Works well with quantisation

Product Quantisation (PQ)

Compresses vectors by encoding subvectors to centroids.

Dramatic memory reduction
Some accuracy loss
Often combined with IVF (IVF-PQ)

Flat

No indexing, brute force search.

Perfect recall
Use for small datasets or ground truth

Vector Databases

Purpose-built

Database	Highlights
Pinecone	Fully managed, serverless option, easy to use
Weaviate	Open-source, GraphQL API, hybrid search
Qdrant	Open-source, Rust-based, filtering
Milvus	Open-source, highly scalable
Chroma	Lightweight, embedded, great for prototyping
LanceDB	Embedded, columnar, serverless

Extensions to Existing Databases

Database	Extension
PostgreSQL	pgvector, pgvecto.rs
Redis	Redis Stack
Elasticsearch	Dense vector field
MongoDB	Atlas Vector Search
SQLite	sqlite-vec

When to Use What

Pinecone/Weaviate/Qdrant — Production workloads, need scale
Chroma/LanceDB — Prototyping, embedded use cases
pgvector — Already using PostgreSQL, moderate scale
Redis — Need caching alongside vectors

Key Features

Filtering (Metadata)

Query with both vector similarity AND attribute filters:

Find vectors similar to query
WHERE category = "electronics"
AND price < 100

Pre-filtering vs post-filtering affects performance and recall.

Hybrid Search

Combine vector search with keyword search (BM25). Usually merged with Reciprocal Rank Fusion (RRF).

Multi-tenancy

Isolate data between users/organisations:

Namespace-based isolation
Metadata filtering
Separate collections

Sparse Vectors

Some databases support sparse vectors for keyword matching alongside dense vectors.

Architecture Considerations

Embedding Dimensions

Higher dimensions = more storage and slower queries.

384d: Lightweight, good for simple use cases
768-1024d: Common middle ground
1536-3072d: High quality, more resources

Index Build Time

HNSW index building is expensive. Plan for:

Initial bulk load time
Incremental updates
Index rebuild strategies

Memory vs Disk

HNSW typically memory-resident
IVF can use disk with memory-mapped files
Quantisation reduces memory requirements

Sharding

For large-scale deployments:

Partition by ID range or hash
Partition by metadata
Managed services handle this automatically

Operations

Indexing Pipeline

Documents → Chunking → Embedding → Vector DB
              ↓
         Metadata extraction

Update Strategies

Full reindex — Simple but slow
Incremental — Add/delete individual vectors
Batch upsert — Efficient bulk updates

Backup & Recovery

Point-in-time snapshots
Continuous replication (managed services)
Export/import functionality

Evaluation

Recall@K

What fraction of true nearest neighbours are returned in top K results?

Query Latency

p50, p95, p99 latencies under load.

Throughput

Queries per second at acceptable latency.

Build Time

Time to index N vectors.

Memory Usage

RAM required for index + vectors.

Benchmarks

ANN Benchmarks — Algorithm comparison
Vector DB Benchmark — Database comparison

Integration

Frameworks

RAG implementations (LangChain, LlamaIndex)
Direct SDK usage

Example (pgvector)

CREATE EXTENSION vector;
 
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)
);
 
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
 
SELECT content
FROM documents
ORDER BY embedding <=> '[query_vector]'
LIMIT 10;

Example (Chroma)

import chromadb
 
client = chromadb.Client()
collection = client.create_collection("docs")
 
collection.add(
    documents=["Document 1", "Document 2"],
    ids=["id1", "id2"]
)
 
results = collection.query(
    query_texts=["search query"],
    n_results=5
)

Rai Notes

Explorer

Vector Databases

Core Concepts

Vector Search

Exact vs Approximate Search

ANN Algorithms

HNSW (Hierarchical Navigable Small World)

IVF (Inverted File Index)

Product Quantisation (PQ)

Flat

Vector Databases

Purpose-built

Extensions to Existing Databases

When to Use What

Key Features

Filtering (Metadata)

Hybrid Search

Multi-tenancy

Sparse Vectors

Architecture Considerations

Embedding Dimensions

Index Build Time

Memory vs Disk

Sharding

Operations

Indexing Pipeline

Update Strategies

Backup & Recovery

Evaluation

Recall@K

Query Latency

Throughput

Build Time

Memory Usage

Benchmarks

Integration

Frameworks

Example (pgvector)

Example (Chroma)

Resources

Graph View

Table of Contents

Backlinks