Vector databases are specialised systems for storing and querying high-dimensional vectors (embeddings). They enable semantic search by finding vectors similar to a query vector.

Core Concepts

Given a query vector, find the K most similar vectors in the database.

Similarity metrics:

  • Cosine similarity — Angle between vectors (most common for text)
  • Euclidean distance — Straight-line distance
  • Dot product — Magnitude-aware similarity

Exact (brute force):

  • Compare query to every vector
  • O(n) complexity
  • Perfect accuracy
  • Only viable for small datasets

Approximate Nearest Neighbour (ANN):

  • Trade accuracy for speed
  • O(log n) or O(1) typical
  • 95-99% recall is usually acceptable
  • Essential for production scale

ANN Algorithms

HNSW (Hierarchical Navigable Small World)

Graph-based algorithm. Builds a multi-layer graph of connections.

  • Fast queries
  • Good recall
  • Higher memory usage
  • Most popular choice

IVF (Inverted File Index)

Clusters vectors, searches only relevant clusters.

  • Lower memory than HNSW
  • Tuneable speed/recall trade-off
  • Works well with quantisation

Product Quantisation (PQ)

Compresses vectors by encoding subvectors to centroids.

  • Dramatic memory reduction
  • Some accuracy loss
  • Often combined with IVF (IVF-PQ)

Flat

No indexing, brute force search.

  • Perfect recall
  • Use for small datasets or ground truth

Vector Databases

Purpose-built

DatabaseHighlights
PineconeFully managed, serverless option, easy to use
WeaviateOpen-source, GraphQL API, hybrid search
QdrantOpen-source, Rust-based, filtering
MilvusOpen-source, highly scalable
ChromaLightweight, embedded, great for prototyping
LanceDBEmbedded, columnar, serverless

Extensions to Existing Databases

DatabaseExtension
PostgreSQLpgvector, pgvecto.rs
RedisRedis Stack
ElasticsearchDense vector field
MongoDBAtlas Vector Search
SQLitesqlite-vss

When to Use What

Pinecone/Weaviate/Qdrant — Production workloads, need scale
Chroma/LanceDB — Prototyping, embedded use cases
pgvector — Already using PostgreSQL, moderate scale
Redis — Need caching alongside vectors

Key Features

Filtering (Metadata)

Query with both vector similarity AND attribute filters:

Find vectors similar to query
WHERE category = "electronics"
AND price < 100

Pre-filtering vs post-filtering affects performance and recall.

Combine vector search with keyword search (BM25). Usually merged with Reciprocal Rank Fusion (RRF).

Multi-tenancy

Isolate data between users/organisations:

  • Namespace-based isolation
  • Metadata filtering
  • Separate collections

Sparse Vectors

Some databases support sparse vectors for keyword matching alongside dense vectors.

Architecture Considerations

Embedding Dimensions

Higher dimensions = more storage and slower queries.

  • 384d: Lightweight, good for simple use cases
  • 768-1024d: Common middle ground
  • 1536-3072d: High quality, more resources

Index Build Time

HNSW index building is expensive. Plan for:

  • Initial bulk load time
  • Incremental updates
  • Index rebuild strategies

Memory vs Disk

  • HNSW typically memory-resident
  • IVF can use disk with memory-mapped files
  • Quantisation reduces memory requirements

Sharding

For large-scale deployments:

  • Partition by ID range or hash
  • Partition by metadata
  • Managed services handle this automatically

Operations

Indexing Pipeline

Documents → Chunking → Embedding → Vector DB
              ↓
         Metadata extraction

Update Strategies

  • Full reindex — Simple but slow
  • Incremental — Add/delete individual vectors
  • Batch upsert — Efficient bulk updates

Backup & Recovery

  • Point-in-time snapshots
  • Continuous replication (managed services)
  • Export/import functionality

Evaluation

Recall@K

What fraction of true nearest neighbours are returned in top K results?

Query Latency

p50, p95, p99 latencies under load.

Throughput

Queries per second at acceptable latency.

Build Time

Time to index N vectors.

Memory Usage

RAM required for index + vectors.

Benchmarks

Integration

Frameworks

  • RAG implementations (LangChain, LlamaIndex)
  • Direct SDK usage

Example (pgvector)

CREATE EXTENSION vector;
 
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)
);
 
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops);
 
SELECT content
FROM documents
ORDER BY embedding <=> '[query_vector]'
LIMIT 10;

Example (Chroma)

import chromadb
 
client = chromadb.Client()
collection = client.create_collection("docs")
 
collection.add(
    documents=["Document 1", "Document 2"],
    ids=["id1", "id2"]
)
 
results = collection.query(
    query_texts=["search query"],
    n_results=5
)

Resources