DynamoDB¶
From AWS Re/Start (August 2025) and apprenticeship coursework.
Ultra-Short Summary¶
DynamoDB is AWS's fully managed NoSQL database — serverless, millisecond latency at any scale, no infrastructure to manage. The design requires a fundamentally different mental model from SQL: you model your data around access patterns upfront, not tables and joins. Get the partition key right and DynamoDB scales to any load; get it wrong and you hit hot partitions.
Core Concepts¶
Table -> top-level container (like a SQL table, but flexible schema)
Item -> a single record (like a row) -- each can have different attributes
Attribute -> a field on an item (like a column, but not required on every item)
Every item needs:
Partition Key -> required, determines which partition stores the item
Sort Key -> optional, used to sort items within a partition
Partition Key and Sort Key¶
Partition Key only (simple primary key):
Table: Users
PK: user_id
Each user_id maps to exactly one item
Partition Key + Sort Key (composite primary key):
Table: Orders
PK: user_id (partition)
SK: order_date (sort)
One user can have many orders, sorted by date within their partition
High cardinality matters: The partition key determines which physical partition holds the item. If many items share the same partition key (e.g. country = "GB"), one partition gets all the traffic — a "hot partition". Use high-cardinality values like user_id, device_id, or UUID.
Data Model Example¶
Table: OrderHistory
PK (user_id) | SK (order_id) | total | status | items
---------------------------------------------------------------------------
user#123 | order#2025-001 | 49.99 | shipped | [{...}]
user#123 | order#2025-002 | 12.00 | pending | [{...}]
user#456 | order#2025-003 | 99.00 | complete | [{...}]
Query: "Get all orders for user#123"
-> KeyConditionExpression: PK = "user#123"
-> Returns both items, sorted by SK
Query: "Get all orders between Jan-Mar 2025 for user#123"
-> KeyConditionExpression: PK = "user#123" AND SK BETWEEN "order#2025-01" AND "order#2025-03"
Reads and Writes¶
Read Consistency¶
| Mode | What It Means | Cost |
|---|---|---|
| Eventually Consistent | May return slightly stale data (milliseconds) | 1 RCU per 4KB |
| Strongly Consistent | Always returns the latest data | 2 RCU per 4KB |
For most reads, eventually consistent is fine and cheaper. Use strong consistency when your app logic depends on reading the data it just wrote.
Capacity Modes¶
| Mode | How It Works | Best For |
|---|---|---|
| On-Demand | Auto-scales, pay per request | Unpredictable traffic, new tables |
| Provisioned | You set RCU/WCU, pay for reserved capacity | Predictable traffic, cost optimisation |
RCU = Read Capacity Unit = 1 strongly consistent read of 4KB/sec WCU = Write Capacity Unit = 1 write of 1KB/sec
Secondary Indexes¶
What if you need to query on something other than the primary key?
| Index Type | Partition Key | Sort Key | Scope | Count |
|---|---|---|---|---|
| GSI (Global Secondary Index) | Any attribute | Any attribute | Whole table | Up to 20 |
| LSI (Local Secondary Index) | Same as table PK | Different attribute | Single partition | Up to 5 |
Table: Orders (PK: user_id, SK: order_id)
GSI example: "Get all orders with status = 'pending'"
GSI PK: status
GSI SK: created_at
-> Query GSI, not the main table
-> Has its own RCU/WCU (separate from table)
LSI example: "Get all orders for user#123 sorted by total price"
LSI SK: total
-> Must be created at table creation time
-> Shares RCU/WCU with the main table
DynamoDB Streams¶
Change data capture — every write to a table generates a stream event:
Item written/updated/deleted -> Stream record created
Stream record contains:
-> KEYS_ONLY: just the PK and SK
-> NEW_IMAGE: the full item after the change
-> OLD_IMAGE: the full item before the change
-> NEW_AND_OLD_IMAGES: both
Common patterns:
-> Trigger Lambda on every order created (NEW_IMAGE)
-> Replicate changes to another table (cross-region)
-> Audit log of all changes
-> Search index sync (DynamoDB -> OpenSearch)
DAX — DynamoDB Accelerator¶
In-memory cache for DynamoDB — microsecond response times:
Without DAX:
App -> DynamoDB (single-digit milliseconds)
With DAX:
App -> DAX cache (microseconds for cache hits)
-> DynamoDB (cache miss only)
Use when: read-heavy workloads, same item queried repeatedly (e.g. product catalogue)
Not useful for: write-heavy workloads, strongly consistent reads (DAX serves eventual only)
Key Design Patterns¶
Single-Table Design¶
Advanced pattern: store multiple entity types in one table using composite keys:
PK | SK | type | data...
-----------------------------------------------------------------
USER#123 | USER#123 | user | {name, email}
USER#123 | ORDER#2025-001 | order | {total, status}
USER#123 | ORDER#2025-002 | order | {total, status}
PRODUCT#abc | PRODUCT#abc | product | {name, price}
"Get user + all their orders":
Query: PK = "USER#123"
Returns user item AND all order items in one query -- no JOIN needed
This is efficient but complex. Start with separate tables unless performance demands it.
Time-Series Pattern¶
PK: device_id
SK: timestamp (ISO 8601, e.g. "2025-08-01T12:00:00Z")
Query: "Get all readings for device#xyz in the last hour"
KeyConditionExpression: PK = "device#xyz" AND SK > "2025-08-01T11:00:00Z"
Limits to Know¶
| Limit | Value |
|---|---|
| Max item size | 400KB |
| Max partition key length | 2048 bytes |
| Max sort key length | 1024 bytes |
| GSIs per table | 20 |
| LSIs per table | 5 |
| Table names | Must be unique per account per region |
Mental Model¶
DynamoDB = a distributed hash map at cloud scale
SQL thinking: "What data do I have? Design tables, then write queries."
DynamoDB thinking: "What queries will my app make? Design the keys to answer them."
The access pattern IS the data model.
If you know: "I'll always look up orders by user_id, sorted by date"
Then: PK = user_id, SK = order_date -- done, it's fast
If you try to add a query DynamoDB wasn't designed for, you need a GSI or a full table scan
(full scans are slow and expensive -- avoid them)
SAA Patterns¶
| Scenario | Answer |
|---|---|
| Serverless NoSQL, millisecond latency | DynamoDB |
| Reduce DynamoDB read latency further | DAX (microseconds) |
| Trigger Lambda when record changes | DynamoDB Streams + Lambda |
| Query on non-key attribute | Global Secondary Index (GSI) |
| Store session state for web app | DynamoDB (or ElastiCache) |
| IoT device telemetry, high write throughput | DynamoDB with time-series key design |
| Cost-optimise unpredictable DynamoDB traffic | On-Demand capacity mode |
| Cost-optimise predictable DynamoDB traffic | Provisioned + Auto Scaling |
Self-Quiz¶
- What's the difference between a Partition Key and a Sort Key?
- Why should a partition key have high cardinality?
- What's the difference between a GSI and an LSI?
- When would you choose strongly consistent reads over eventually consistent?
- What does DynamoDB Streams capture and what can you do with it?
- When would you use DAX and when wouldn't you?
- You need to query all orders by a customer's email (not the PK). What do you add?
- What's the maximum item size and why does it matter for design?