Databases

A database is an organised collection of data stored and accessed electronically. Databases are fundamental to nearly all software applications, from simple websites to complex distributed systems.

Types of Databases

Relational (SQL)

Store data in tables with rows and columns, using structured query language (SQL) for queries. Data is organised with strict schemas and relationships between tables via foreign keys.

Examples: PostgreSQL, MySQL, MariaDB, SQLite, Microsoft SQL Server, Oracle

Document

Store data as semi-structured documents (typically JSON or BSON). Flexible schemas allow different fields per document within the same collection.

Examples: MongoDB, CouchDB, Amazon DocumentDB

Key-Value

Simple storage model mapping keys to values. Extremely fast for lookups but limited query capabilities. Often used for caching and session storage.

Examples: Redis, Amazon DynamoDB, Memcached, etcd

Wide-Column

Store data in tables with rows and dynamic columns. Each row can have different columns, suited for large-scale analytical workloads.

Examples: Apache Cassandra, Apache HBase, Google Bigtable, ScyllaDB

Graph

Optimised for storing and querying relationships between entities. Use nodes, edges, and properties to represent and traverse connected data.

Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph

Time-Series

Optimised for time-stamped data with efficient storage and queries over time ranges. Used for metrics, monitoring, and IoT data.

Examples: InfluxDB, TimescaleDB, Prometheus, QuestDB

Vector

Store and search high-dimensional vectors for similarity queries. Essential for AI/ML applications, semantic search, and recommendation systems.

Examples: Pinecone, Weaviate, Milvus, Chroma, pgvector (PostgreSQL extension)

See: Vector Databases

Key Concepts

ACID Properties

Guarantees for reliable database transactions:

Atomicity - Transactions are all-or-nothing
Consistency - Data remains valid after transactions
Isolation - Concurrent transactions don’t interfere
Durability - Committed data survives failures

CAP Theorem

In distributed systems, you can only guarantee two of three properties:

Consistency - All nodes see the same data simultaneously
Availability - Every request receives a response
Partition Tolerance - System operates despite network failures

Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency + partition tolerance) or AP (availability + partition tolerance) systems.

Type	Examples
CP	MongoDB, Redis, HBase
AP	Cassandra, CouchDB, DynamoDB

Indexing

Data structures that speed up read queries at the cost of write performance and storage. Common types:

B-tree - General-purpose, good for range queries
Hash - Fast exact-match lookups
GIN/GiST - Full-text and complex data types
Bitmap - Low-cardinality columns

Transactions

Logical units of work that group multiple operations. Support varies:

Full ACID transactions (PostgreSQL, MySQL)
Single-document transactions (MongoDB, DynamoDB)
Eventual consistency (Cassandra)

Replication

Copying data across multiple nodes for fault tolerance and read scaling:

Primary-replica - One write node, multiple read nodes
Multi-primary - Multiple write nodes with conflict resolution
Synchronous - Wait for replicas before confirming writes
Asynchronous - Confirm writes immediately, replicate later

Sharding

Horizontal partitioning of data across multiple servers to scale beyond single-node limits. Strategies include:

Hash-based - Distribute by hash of shard key
Range-based - Distribute by value ranges
Geographic - Distribute by location

When to Use What

Use Case	Recommended Type	Examples
Complex queries, joins, transactions	Relational	PostgreSQL, MySQL
Flexible schema, rapid iteration	Document	MongoDB, CouchDB
Caching, sessions, real-time data	Key-Value	Redis, Memcached
High write throughput, time-series	Wide-Column / Time-Series	Cassandra, InfluxDB
Relationship-heavy data	Graph	Neo4j
AI/ML, semantic search	Vector	Pinecone, pgvector
Serverless, auto-scaling	Managed NoSQL	DynamoDB

Performance Considerations

Query Optimisation

Use EXPLAIN / EXPLAIN ANALYZE to understand query plans
Index columns used in WHERE, JOIN, and ORDER BY clauses
Avoid SELECT * - fetch only needed columns
Use connection pooling (PgBouncer, ProxySQL)

Scaling Strategies

Vertical - Increase CPU, RAM, storage (simpler but limited)
Horizontal - Add more nodes via sharding/replication (complex but scalable)
Read replicas - Offload read traffic from primary
Caching layer - Redis/Memcached for frequently accessed data

Common Pitfalls

N+1 query problem - Use eager loading or batch queries
Missing indexes on foreign keys
Over-indexing (slows writes)
Not using prepared statements (SQL injection risk, no query plan caching)

Popular Databases Compared

Database	Type	Best For	Licence
PostgreSQL	Relational	General purpose, complex queries, extensibility	PostgreSQL (permissive)
MySQL	Relational	Web applications, read-heavy workloads	GPL / Commercial
MongoDB	Document	Flexible schemas, rapid development	SSPL
Redis	Key-Value	Caching, sessions, real-time features	BSD-3
DynamoDB	Key-Value/Document	Serverless, auto-scaling workloads	Proprietary (AWS)
Cassandra	Wide-Column	High availability, massive scale	Apache 2.0
Neo4j	Graph	Social networks, recommendations, fraud detection	GPL / Commercial

External Resources

DB-Engines Ranking - Database popularity rankings
Use The Index, Luke - SQL indexing and tuning
PostgreSQL Documentation
MongoDB Manual
Redis Documentation

Rai Notes

Explorer

Databases

Types of Databases

Relational (SQL)

Document

Key-Value

Wide-Column

Graph

Time-Series

Vector

Key Concepts

ACID Properties

CAP Theorem

Indexing

Transactions

Replication

Sharding

When to Use What

Performance Considerations

Query Optimisation

Scaling Strategies

Common Pitfalls

Popular Databases Compared

External Resources

Graph View

Table of Contents

Backlinks

Rai Notes

Explorer

Databases

Types of Databases

Relational (SQL)

Document

Key-Value

Wide-Column

Graph

Time-Series

Vector

Key Concepts

ACID Properties

CAP Theorem

Indexing

Transactions

Replication

Sharding

When to Use What

Performance Considerations

Query Optimisation

Scaling Strategies

Common Pitfalls

Popular Databases Compared

Related Pages

External Resources

Graph View

Table of Contents

Backlinks