Skip to content

DynamoDB

From AWS Re/Start (August 2025) and apprenticeship coursework.


Ultra-Short Summary

DynamoDB is AWS's fully managed NoSQL database — serverless, millisecond latency at any scale, no infrastructure to manage. The design requires a fundamentally different mental model from SQL: you model your data around access patterns upfront, not tables and joins. Get the partition key right and DynamoDB scales to any load; get it wrong and you hit hot partitions.


Core Concepts

Table      -> top-level container (like a SQL table, but flexible schema)
Item       -> a single record (like a row) -- each can have different attributes
Attribute  -> a field on an item (like a column, but not required on every item)

Every item needs:
  Partition Key  -> required, determines which partition stores the item
  Sort Key       -> optional, used to sort items within a partition

Partition Key and Sort Key

Partition Key only (simple primary key):
  Table: Users
  PK: user_id
  Each user_id maps to exactly one item

Partition Key + Sort Key (composite primary key):
  Table: Orders
  PK: user_id    (partition)
  SK: order_date (sort)
  One user can have many orders, sorted by date within their partition

High cardinality matters: The partition key determines which physical partition holds the item. If many items share the same partition key (e.g. country = "GB"), one partition gets all the traffic — a "hot partition". Use high-cardinality values like user_id, device_id, or UUID.


Data Model Example

Table: OrderHistory

PK (user_id)  | SK (order_id)    | total | status   | items
---------------------------------------------------------------------------
user#123      | order#2025-001   | 49.99 | shipped  | [{...}]
user#123      | order#2025-002   | 12.00 | pending  | [{...}]
user#456      | order#2025-003   | 99.00 | complete | [{...}]

Query: "Get all orders for user#123"
  -> KeyConditionExpression: PK = "user#123"
  -> Returns both items, sorted by SK

Query: "Get all orders between Jan-Mar 2025 for user#123"
  -> KeyConditionExpression: PK = "user#123" AND SK BETWEEN "order#2025-01" AND "order#2025-03"

Reads and Writes

Read Consistency

Mode What It Means Cost
Eventually Consistent May return slightly stale data (milliseconds) 1 RCU per 4KB
Strongly Consistent Always returns the latest data 2 RCU per 4KB

For most reads, eventually consistent is fine and cheaper. Use strong consistency when your app logic depends on reading the data it just wrote.

Capacity Modes

Mode How It Works Best For
On-Demand Auto-scales, pay per request Unpredictable traffic, new tables
Provisioned You set RCU/WCU, pay for reserved capacity Predictable traffic, cost optimisation

RCU = Read Capacity Unit = 1 strongly consistent read of 4KB/sec WCU = Write Capacity Unit = 1 write of 1KB/sec


Secondary Indexes

What if you need to query on something other than the primary key?

Index Type Partition Key Sort Key Scope Count
GSI (Global Secondary Index) Any attribute Any attribute Whole table Up to 20
LSI (Local Secondary Index) Same as table PK Different attribute Single partition Up to 5
Table: Orders (PK: user_id, SK: order_id)

GSI example: "Get all orders with status = 'pending'"
  GSI PK: status
  GSI SK: created_at
  -> Query GSI, not the main table
  -> Has its own RCU/WCU (separate from table)

LSI example: "Get all orders for user#123 sorted by total price"
  LSI SK: total
  -> Must be created at table creation time
  -> Shares RCU/WCU with the main table

DynamoDB Streams

Change data capture — every write to a table generates a stream event:

Item written/updated/deleted -> Stream record created

Stream record contains:
  -> KEYS_ONLY: just the PK and SK
  -> NEW_IMAGE: the full item after the change
  -> OLD_IMAGE: the full item before the change
  -> NEW_AND_OLD_IMAGES: both

Common patterns:
  -> Trigger Lambda on every order created (NEW_IMAGE)
  -> Replicate changes to another table (cross-region)
  -> Audit log of all changes
  -> Search index sync (DynamoDB -> OpenSearch)

DAX — DynamoDB Accelerator

In-memory cache for DynamoDB — microsecond response times:

Without DAX:
  App -> DynamoDB (single-digit milliseconds)

With DAX:
  App -> DAX cache (microseconds for cache hits)
       -> DynamoDB (cache miss only)

Use when: read-heavy workloads, same item queried repeatedly (e.g. product catalogue)
Not useful for: write-heavy workloads, strongly consistent reads (DAX serves eventual only)

Key Design Patterns

Single-Table Design

Advanced pattern: store multiple entity types in one table using composite keys:

PK            | SK              | type    | data...
-----------------------------------------------------------------
USER#123      | USER#123        | user    | {name, email}
USER#123      | ORDER#2025-001  | order   | {total, status}
USER#123      | ORDER#2025-002  | order   | {total, status}
PRODUCT#abc   | PRODUCT#abc     | product | {name, price}

"Get user + all their orders":
  Query: PK = "USER#123"
  Returns user item AND all order items in one query -- no JOIN needed

This is efficient but complex. Start with separate tables unless performance demands it.

Time-Series Pattern

PK: device_id
SK: timestamp (ISO 8601, e.g. "2025-08-01T12:00:00Z")

Query: "Get all readings for device#xyz in the last hour"
  KeyConditionExpression: PK = "device#xyz" AND SK > "2025-08-01T11:00:00Z"

Limits to Know

Limit Value
Max item size 400KB
Max partition key length 2048 bytes
Max sort key length 1024 bytes
GSIs per table 20
LSIs per table 5
Table names Must be unique per account per region

Mental Model

DynamoDB = a distributed hash map at cloud scale

SQL thinking:          "What data do I have? Design tables, then write queries."
DynamoDB thinking:     "What queries will my app make? Design the keys to answer them."

The access pattern IS the data model.

If you know: "I'll always look up orders by user_id, sorted by date"
Then:         PK = user_id, SK = order_date -- done, it's fast

If you try to add a query DynamoDB wasn't designed for, you need a GSI or a full table scan
(full scans are slow and expensive -- avoid them)

SAA Patterns

Scenario Answer
Serverless NoSQL, millisecond latency DynamoDB
Reduce DynamoDB read latency further DAX (microseconds)
Trigger Lambda when record changes DynamoDB Streams + Lambda
Query on non-key attribute Global Secondary Index (GSI)
Store session state for web app DynamoDB (or ElastiCache)
IoT device telemetry, high write throughput DynamoDB with time-series key design
Cost-optimise unpredictable DynamoDB traffic On-Demand capacity mode
Cost-optimise predictable DynamoDB traffic Provisioned + Auto Scaling

Self-Quiz

  1. What's the difference between a Partition Key and a Sort Key?
  2. Why should a partition key have high cardinality?
  3. What's the difference between a GSI and an LSI?
  4. When would you choose strongly consistent reads over eventually consistent?
  5. What does DynamoDB Streams capture and what can you do with it?
  6. When would you use DAX and when wouldn't you?
  7. You need to query all orders by a customer's email (not the PK). What do you add?
  8. What's the maximum item size and why does it matter for design?