Skip to content

Storage Fundamentals

Notes from AWS Apprenticeship — February 2026.


Ultra-Short Summary

Storage underpins everything in cloud — your code, logs, data, backups, secrets. Understanding what can go wrong (and how AWS prevents it) gives you the mental model for why services like S3 versioning, EBS snapshots, and RDS automated backups exist. The GitLab and Toy Story 2 incidents are the canonical examples: even well-designed systems lose data when backup processes aren't validated.


Why Storage Matters More Than You Think

The GitLab Incident (2017)

  • Production engineer ran rm -rf on the production database
  • GitLab had 5 backup mechanisms — none worked correctly
  • Root cause: backups configured but never tested
  • Lost 6 hours of production data

Toy Story 2 (1998)

  • Engineer ran rm -rf * on the animation directory
  • Pixar nearly lost the entire film
  • Saved only because one employee had an informal copy on her home machine

Lesson: A backup that hasn't been tested is not a backup. It's a guess.


Backup Types

Type How It Works Use Case
Full backup Copies everything Weekly/monthly baseline
Incremental Only changes since last backup Daily — saves time and storage
Differential Changes since last full backup Faster restore than incremental
Snapshot Point-in-time copy of a volume/DB AWS EBS, RDS, FSx

AWS Backup Services

Service What It Backs Up
S3 Versioning Objects — keeps all versions, restores from any point
EBS Snapshots Block storage volumes — incremental, stored in S3
RDS Automated Backups Database — daily backup + transaction logs (point-in-time recovery)
RDS Snapshots Manual point-in-time DB snapshot
AWS Backup Centralised backup policy across EC2, EBS, RDS, DynamoDB, EFS
S3 Cross-Region Replication Replicates objects to another region (DR)

Storage Classes (S3)

Frequency of access → cost tradeoff:

Standard           → frequent access, highest cost
Standard-IA        → infrequent access, lower cost, retrieval fee
One Zone-IA        → infrequent, single AZ (lower durability)
Glacier Instant    → archive, millisecond retrieval
Glacier Flexible   → archive, minutes to hours
Glacier Deep       → long-term archive, cheapest, 12h+ retrieval
Intelligent-Tiering → auto-moves objects based on access patterns

Mental Model: Storage Types in AWS

Need block storage (like a hard drive)?
  → EBS (attach to one EC2) or Instance Store (ephemeral, fast)

Need shared file system (multiple EC2s access same files)?
  → EFS (Linux) or FSx for Windows

Need object storage (files, backups, static assets)?
  → S3

Need data warehouse (analytics, SQL on PB of data)?
  → Redshift

Shell Profile Files (Side Note)

Mentioned during class — relevant when automating backup scripts:

File When It Loads
~/.bash_profile Login shells (SSH, console login)
~/.bashrc Interactive non-login shells (new terminal tab)
~/.zshrc Zsh equivalent of .bashrc

For scripts that run on boot or via cron, make sure environment variables are sourced from the right profile file.


AWS Context

  • S3 durability = 99.999999999% (11 nines) — data stored across multiple AZs
  • EBS — block storage, attached to one instance, persists independently of the instance
  • Instance Store — ephemeral, physically attached, lost on stop/terminate — don't store anything important
  • RDS point-in-time recovery — restore to any second within the backup retention window
  • S3 MFA Delete — require MFA to permanently delete versioned objects

30-Second Takeaway

  • Test your backups. A backup you've never restored is untested.
  • S3 = durable, versioned, cheap object storage
  • EBS = persistent block storage for EC2
  • Instance Store = fast but ephemeral — never for important data
  • AWS Backup = centralised policy management across services

Self-Quiz

  1. What's the difference between incremental and differential backups?
  2. Why did GitLab still lose data despite having 5 backup mechanisms?
  3. Which S3 storage class would you use for logs accessed once a month?
  4. What's the difference between EBS and Instance Store?
  5. How does S3 versioning protect against accidental deletion?
  6. What does 11 nines of durability actually mean?
  7. Which AWS service centralises backup across EC2, RDS, and DynamoDB?