Database Fundamentals¶
From AWS Re/Start (August 2025) and apprenticeship coursework.
Ultra-Short Summary¶
Databases organise data for efficient storage and retrieval. The two main families are relational (structured tables, SQL, ACID transactions) and non-relational (flexible schemas, optimised for specific access patterns). Knowing when to use which — and why — is one of the most recurring decision points in cloud architecture.
Relational vs Non-Relational¶
Relational (SQL)
→ Data in tables with rows and columns
→ Relationships via foreign keys
→ Schema defined upfront
→ ACID transactions (Atomicity, Consistency, Isolation, Durability)
→ Best for: structured data, complex queries, financial records
Non-Relational (NoSQL)
→ Flexible schema — each record can have different fields
→ Optimised for specific access patterns
→ Trades ACID for speed/scale in most cases
→ Types: key-value, document, wide-column, graph
→ Best for: high-throughput, variable structure, caching
SQL Fundamentals¶
-- Basic query
SELECT name, email FROM users WHERE created_at > '2025-01-01';
-- Join two tables
SELECT orders.id, users.name
FROM orders
JOIN users ON orders.user_id = users.id;
-- Insert
INSERT INTO users (name, email) VALUES ('Cain', 'cain@example.com');
-- Update
UPDATE users SET email = 'new@email.com' WHERE id = 42;
-- Delete
DELETE FROM users WHERE id = 42;
ACID Transactions¶
The guarantee that makes relational databases safe for financial and critical data:
| Property | What It Means | Example |
|---|---|---|
| Atomicity | All-or-nothing — if any part fails, the whole transaction rolls back | Bank transfer: debit AND credit must both succeed |
| Consistency | Database always goes from one valid state to another | A constraint (e.g. balance >= 0) is never violated |
| Isolation | Concurrent transactions don't interfere with each other | Two people booking the last seat can't both succeed |
| Durability | Once committed, data survives crashes | Written to disk, not just memory |
Database Types¶
| Type | Structure | AWS Service | Best For |
|---|---|---|---|
| Relational | Tables, SQL | RDS, Aurora | Transactions, reporting, structured data |
| Key-Value | Key → Value | DynamoDB, ElastiCache | Session data, caching, leaderboards |
| Document | JSON-like documents | DynamoDB, DocumentDB | User profiles, product catalogs |
| Wide-Column | Rows with many columns | Keyspaces (Cassandra) | Time-series, IoT |
| Graph | Nodes and edges | Neptune | Social networks, fraud detection |
| In-Memory | RAM-based | ElastiCache (Redis/Memcached) | Caching, session state, real-time |
| Data Warehouse | Analytics, columnar | Redshift | Business intelligence, large-scale queries |
| Time-Series | Timestamped metrics | Timestream | IoT, monitoring, metrics |
The Hashing Connection¶
Hashing appears in databases too:
Password storage:
NEVER store plaintext password
Hash it: bcrypt("password123") → $2b$12$7HVD...
On login: hash the input, compare hashes
Index hashing:
Hash indexes for fast equality lookups
B-tree indexes for range queries
Content addressing:
S3 ETags = MD5 hash of the object
Used to check if content changed
AWS Database Services¶
RDS (Relational Database Service)¶
Managed relational database — AWS handles patching, backups, failover.
Supported engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Db2
Key features:
- Multi-AZ — synchronous standby replica in another AZ for HA and automatic failover
- Read Replicas — asynchronous copies for read scaling (up to 15 for Aurora)
- Automated backups — daily snapshot + transaction logs → point-in-time recovery
- Encryption — at rest (KMS) and in transit (SSL/TLS)
Aurora¶
AWS's own relational engine — MySQL and PostgreSQL compatible but designed for cloud:
Aurora architecture:
Shared distributed storage layer (6 copies across 3 AZs)
Up to 15 read replicas
Writer endpoint → points to primary
Reader endpoint → load-balances across read replicas
Auto-scales storage from 10GB to 128TB
Up to 5x faster than MySQL RDS
Aurora Serverless v2 — scales capacity automatically (no fixed instance size).
DynamoDB¶
AWS's fully managed NoSQL service — key-value and document:
Table → Items (like rows) → Attributes (like columns, but flexible)
Every item needs a Partition Key
Optional Sort Key for range queries within a partition
Key concepts:
| Concept | What It Is |
|---|---|
| Partition Key | Determines which partition stores the item (choose high-cardinality values) |
| Sort Key | Secondary key for ordering within a partition |
| Global Secondary Index (GSI) | Query on non-key attributes |
| DAX | In-memory cache for DynamoDB — microsecond response |
| DynamoDB Streams | Change data capture — trigger Lambda on record changes |
ElastiCache¶
Managed in-memory caching:
| Engine | Best For |
|---|---|
| Redis | Persistent cache, pub/sub, sorted sets, geospatial, Lua scripts |
| Memcached | Simple cache, multi-threaded, no persistence |
Pattern: App queries ElastiCache first → if cache miss → query RDS/DynamoDB → store result in cache.
Redshift¶
Data warehouse for analytics — columnar storage, petabyte scale, SQL interface.
OLTP (transactions) → RDS / Aurora / DynamoDB
OLAP (analytics) → Redshift
Don't use RDS for "how much revenue did we make last year across all orders"
— use Redshift or Athena (S3 query engine)
The Sakila Database¶
A sample MySQL database used throughout Re/Start for SQL practice. Models a fictional DVD rental company: film, actor, customer, rental, payment tables with realistic relationships.
Good for practicing: JOINs, aggregations, subqueries, stored procedures.
Mental Model¶
Relational DB = spreadsheet with rules
→ Rows are related across tables via IDs
→ Transactions ensure nothing is "half-done"
→ Good when data structure is known and stable
NoSQL DB = optimised filing cabinet
→ Fast because it's designed for your specific access pattern
→ Flexible because schema is not enforced
→ Bad for complex queries across many different attributes
Cache = whiteboard by your desk
→ Fastest possible access
→ Temporary — wiped when power goes off (or TTL expires)
→ Only store what you read frequently
SAA Patterns¶
| Scenario | Answer |
|---|---|
| Transactional app, need ACID | RDS (MySQL/PostgreSQL) or Aurora |
| Best performance relational DB, MySQL compatible | Aurora |
| Serverless NoSQL, millisecond latency | DynamoDB |
| Reduce RDS read load | RDS Read Replicas |
| Session storage, sub-millisecond | ElastiCache Redis |
| Analytics on terabytes of data | Redshift |
| RDS Multi-AZ vs Read Replicas | Multi-AZ = HA/failover, Read Replica = read scaling |
| Social graph queries | Neptune (graph) |
Self-Quiz¶
- What does ACID stand for and why does "Atomicity" matter for a bank transfer?
- What's the difference between RDS Multi-AZ and Read Replicas?
- When would you choose DynamoDB over RDS?
- Why do you never store plaintext passwords? What do you store instead?
- What's the Partition Key in DynamoDB and why should it be high-cardinality?
- What's the difference between OLTP and OLAP workloads?
- Why is columnar storage faster for analytics than row storage?
- What does ElastiCache solve that RDS can't on its own?