A database is an organised collection of data stored and accessed electronically. Databases are fundamental to nearly all software applications, from simple websites to complex distributed systems.

Types of Databases

Relational (SQL)

Store data in tables with rows and columns, using structured query language (SQL) for queries. Data is organised with strict schemas and relationships between tables via foreign keys.

Examples: PostgreSQL, MySQL, MariaDB, SQLite, Microsoft SQL Server, Oracle

Document

Store data as semi-structured documents (typically JSON or BSON). Flexible schemas allow different fields per document within the same collection.

Examples: MongoDB, CouchDB, Amazon DocumentDB

Key-Value

Simple storage model mapping keys to values. Extremely fast for lookups but limited query capabilities. Often used for caching and session storage.

Examples: Redis, Amazon DynamoDB, Memcached, etcd

Wide-Column

Store data in tables with rows and dynamic columns. Each row can have different columns, suited for large-scale analytical workloads.

Examples: Apache Cassandra, Apache HBase, Google Bigtable, ScyllaDB

Graph

Optimised for storing and querying relationships between entities. Use nodes, edges, and properties to represent and traverse connected data.

Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph

Time-Series

Optimised for time-stamped data with efficient storage and queries over time ranges. Used for metrics, monitoring, and IoT data.

Examples: InfluxDB, TimescaleDB, Prometheus, QuestDB

Vector

Store and search high-dimensional vectors for similarity queries. Essential for AI/ML applications, semantic search, and recommendation systems.

Examples: Pinecone, Weaviate, Milvus, Chroma, pgvector (PostgreSQL extension)

See: Vector Databases

Key Concepts

ACID Properties

Guarantees for reliable database transactions:

  • Atomicity - Transactions are all-or-nothing
  • Consistency - Data remains valid after transactions
  • Isolation - Concurrent transactions don’t interfere
  • Durability - Committed data survives failures

CAP Theorem

In distributed systems, you can only guarantee two of three properties:

  • Consistency - All nodes see the same data simultaneously
  • Availability - Every request receives a response
  • Partition Tolerance - System operates despite network failures

Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency + partition tolerance) or AP (availability + partition tolerance) systems.

TypeExamples
CPMongoDB, Redis, HBase
APCassandra, CouchDB, DynamoDB

Indexing

Data structures that speed up read queries at the cost of write performance and storage. Common types:

  • B-tree - General-purpose, good for range queries
  • Hash - Fast exact-match lookups
  • GIN/GiST - Full-text and complex data types
  • Bitmap - Low-cardinality columns

Transactions

Logical units of work that group multiple operations. Support varies:

  • Full ACID transactions (PostgreSQL, MySQL)
  • Single-document transactions (MongoDB, DynamoDB)
  • Eventual consistency (Cassandra)

Replication

Copying data across multiple nodes for fault tolerance and read scaling:

  • Primary-replica - One write node, multiple read nodes
  • Multi-primary - Multiple write nodes with conflict resolution
  • Synchronous - Wait for replicas before confirming writes
  • Asynchronous - Confirm writes immediately, replicate later

Sharding

Horizontal partitioning of data across multiple servers to scale beyond single-node limits. Strategies include:

  • Hash-based - Distribute by hash of shard key
  • Range-based - Distribute by value ranges
  • Geographic - Distribute by location

When to Use What

Use CaseRecommended TypeExamples
Complex queries, joins, transactionsRelationalPostgreSQL, MySQL
Flexible schema, rapid iterationDocumentMongoDB, CouchDB
Caching, sessions, real-time dataKey-ValueRedis, Memcached
High write throughput, time-seriesWide-Column / Time-SeriesCassandra, InfluxDB
Relationship-heavy dataGraphNeo4j
AI/ML, semantic searchVectorPinecone, pgvector
Serverless, auto-scalingManaged NoSQLDynamoDB

Performance Considerations

Query Optimisation

  • Use EXPLAIN / EXPLAIN ANALYZE to understand query plans
  • Index columns used in WHERE, JOIN, and ORDER BY clauses
  • Avoid SELECT * - fetch only needed columns
  • Use connection pooling (PgBouncer, ProxySQL)

Scaling Strategies

  • Vertical - Increase CPU, RAM, storage (simpler but limited)
  • Horizontal - Add more nodes via sharding/replication (complex but scalable)
  • Read replicas - Offload read traffic from primary
  • Caching layer - Redis/Memcached for frequently accessed data

Common Pitfalls

  • N+1 query problem - Use eager loading or batch queries
  • Missing indexes on foreign keys
  • Over-indexing (slows writes)
  • Not using prepared statements (SQL injection risk, no query plan caching)
DatabaseTypeBest ForLicence
PostgreSQLRelationalGeneral purpose, complex queries, extensibilityPostgreSQL (permissive)
MySQLRelationalWeb applications, read-heavy workloadsGPL / Commercial
MongoDBDocumentFlexible schemas, rapid developmentSSPL
RedisKey-ValueCaching, sessions, real-time featuresBSD-3
DynamoDBKey-Value/DocumentServerless, auto-scaling workloadsProprietary (AWS)
CassandraWide-ColumnHigh availability, massive scaleApache 2.0
Neo4jGraphSocial networks, recommendations, fraud detectionGPL / Commercial

External Resources