DynamoDB¶

From AWS Re/Start (August 2025) and apprenticeship coursework.

Ultra-Short Summary¶

DynamoDB is AWS's fully managed NoSQL database — serverless, millisecond latency at any scale, no infrastructure to manage. The design requires a fundamentally different mental model from SQL: you model your data around access patterns upfront, not tables and joins. Get the partition key right and DynamoDB scales to any load; get it wrong and you hit hot partitions.

Core Concepts¶

Table      -> top-level container (like a SQL table, but flexible schema)
Item       -> a single record (like a row) -- each can have different attributes
Attribute  -> a field on an item (like a column, but not required on every item)

Every item needs:
  Partition Key  -> required, determines which partition stores the item
  Sort Key       -> optional, used to sort items within a partition

Partition Key and Sort Key¶

Partition Key only (simple primary key):
  Table: Users
  PK: user_id
  Each user_id maps to exactly one item

Partition Key + Sort Key (composite primary key):
  Table: Orders
  PK: user_id    (partition)
  SK: order_date (sort)
  One user can have many orders, sorted by date within their partition

High cardinality matters: The partition key determines which physical partition holds the item. If many items share the same partition key (e.g. country = "GB"), one partition gets all the traffic — a "hot partition". Use high-cardinality values like user_id, device_id, or UUID.

Data Model Example¶

Table: OrderHistory

PK (user_id)  | SK (order_id)    | total | status   | items
---------------------------------------------------------------------------
user#123      | order#2025-001   | 49.99 | shipped  | [{...}]
user#123      | order#2025-002   | 12.00 | pending  | [{...}]
user#456      | order#2025-003   | 99.00 | complete | [{...}]

Query: "Get all orders for user#123"
  -> KeyConditionExpression: PK = "user#123"
  -> Returns both items, sorted by SK

Query: "Get all orders between Jan-Mar 2025 for user#123"
  -> KeyConditionExpression: PK = "user#123" AND SK BETWEEN "order#2025-01" AND "order#2025-03"

Reads and Writes¶

Read Consistency¶

Mode	What It Means	Cost
Eventually Consistent	May return slightly stale data (milliseconds)	1 RCU per 4KB
Strongly Consistent	Always returns the latest data	2 RCU per 4KB

For most reads, eventually consistent is fine and cheaper. Use strong consistency when your app logic depends on reading the data it just wrote.

Capacity Modes¶

Mode	How It Works	Best For
On-Demand	Auto-scales, pay per request	Unpredictable traffic, new tables
Provisioned	You set RCU/WCU, pay for reserved capacity	Predictable traffic, cost optimisation

RCU = Read Capacity Unit = 1 strongly consistent read of 4KB/sec WCU = Write Capacity Unit = 1 write of 1KB/sec

Secondary Indexes¶

What if you need to query on something other than the primary key?

Index Type	Partition Key	Sort Key	Scope	Count
GSI (Global Secondary Index)	Any attribute	Any attribute	Whole table	Up to 20
LSI (Local Secondary Index)	Same as table PK	Different attribute	Single partition	Up to 5

Table: Orders (PK: user_id, SK: order_id)

GSI example: "Get all orders with status = 'pending'"
  GSI PK: status
  GSI SK: created_at
  -> Query GSI, not the main table
  -> Has its own RCU/WCU (separate from table)

LSI example: "Get all orders for user#123 sorted by total price"
  LSI SK: total
  -> Must be created at table creation time
  -> Shares RCU/WCU with the main table

DynamoDB Streams¶

Change data capture — every write to a table generates a stream event:

Item written/updated/deleted -> Stream record created

Stream record contains:
  -> KEYS_ONLY: just the PK and SK
  -> NEW_IMAGE: the full item after the change
  -> OLD_IMAGE: the full item before the change
  -> NEW_AND_OLD_IMAGES: both

Common patterns:
  -> Trigger Lambda on every order created (NEW_IMAGE)
  -> Replicate changes to another table (cross-region)
  -> Audit log of all changes
  -> Search index sync (DynamoDB -> OpenSearch)

DAX — DynamoDB Accelerator¶

In-memory cache for DynamoDB — microsecond response times:

Without DAX:
  App -> DynamoDB (single-digit milliseconds)

With DAX:
  App -> DAX cache (microseconds for cache hits)
       -> DynamoDB (cache miss only)

Use when: read-heavy workloads, same item queried repeatedly (e.g. product catalogue)
Not useful for: write-heavy workloads, strongly consistent reads (DAX serves eventual only)

Key Design Patterns¶

Single-Table Design¶

Advanced pattern: store multiple entity types in one table using composite keys:

PK            | SK              | type    | data...
-----------------------------------------------------------------
USER#123      | USER#123        | user    | {name, email}
USER#123      | ORDER#2025-001  | order   | {total, status}
USER#123      | ORDER#2025-002  | order   | {total, status}
PRODUCT#abc   | PRODUCT#abc     | product | {name, price}

"Get user + all their orders":
  Query: PK = "USER#123"
  Returns user item AND all order items in one query -- no JOIN needed

This is efficient but complex. Start with separate tables unless performance demands it.

Time-Series Pattern¶

PK: device_id
SK: timestamp (ISO 8601, e.g. "2025-08-01T12:00:00Z")

Query: "Get all readings for device#xyz in the last hour"
  KeyConditionExpression: PK = "device#xyz" AND SK > "2025-08-01T11:00:00Z"

Limits to Know¶

Limit	Value
Max item size	400KB
Max partition key length	2048 bytes
Max sort key length	1024 bytes
GSIs per table	20
LSIs per table	5
Table names	Must be unique per account per region

Mental Model¶

DynamoDB = a distributed hash map at cloud scale

SQL thinking:          "What data do I have? Design tables, then write queries."
DynamoDB thinking:     "What queries will my app make? Design the keys to answer them."

The access pattern IS the data model.

If you know: "I'll always look up orders by user_id, sorted by date"
Then:         PK = user_id, SK = order_date -- done, it's fast

If you try to add a query DynamoDB wasn't designed for, you need a GSI or a full table scan
(full scans are slow and expensive -- avoid them)

SAA Patterns¶

Scenario	Answer
Serverless NoSQL, millisecond latency	DynamoDB
Reduce DynamoDB read latency further	DAX (microseconds)
Trigger Lambda when record changes	DynamoDB Streams + Lambda
Query on non-key attribute	Global Secondary Index (GSI)
Store session state for web app	DynamoDB (or ElastiCache)
IoT device telemetry, high write throughput	DynamoDB with time-series key design
Cost-optimise unpredictable DynamoDB traffic	On-Demand capacity mode
Cost-optimise predictable DynamoDB traffic	Provisioned + Auto Scaling

Self-Quiz¶

What's the difference between a Partition Key and a Sort Key?
Why should a partition key have high cardinality?
What's the difference between a GSI and an LSI?
When would you choose strongly consistent reads over eventually consistent?
What does DynamoDB Streams capture and what can you do with it?
When would you use DAX and when wouldn't you?
You need to query all orders by a customer's email (not the PK). What do you add?
What's the maximum item size and why does it matter for design?