Storage Fundamentals¶
Notes from AWS Apprenticeship — February 2026.
Ultra-Short Summary¶
Storage underpins everything in cloud — your code, logs, data, backups, secrets. Understanding what can go wrong (and how AWS prevents it) gives you the mental model for why services like S3 versioning, EBS snapshots, and RDS automated backups exist. The GitLab and Toy Story 2 incidents are the canonical examples: even well-designed systems lose data when backup processes aren't validated.
Why Storage Matters More Than You Think¶
The GitLab Incident (2017)¶
- Production engineer ran
rm -rfon the production database - GitLab had 5 backup mechanisms — none worked correctly
- Root cause: backups configured but never tested
- Lost 6 hours of production data
Toy Story 2 (1998)¶
- Engineer ran
rm -rf *on the animation directory - Pixar nearly lost the entire film
- Saved only because one employee had an informal copy on her home machine
Lesson: A backup that hasn't been tested is not a backup. It's a guess.
Backup Types¶
| Type | How It Works | Use Case |
|---|---|---|
| Full backup | Copies everything | Weekly/monthly baseline |
| Incremental | Only changes since last backup | Daily — saves time and storage |
| Differential | Changes since last full backup | Faster restore than incremental |
| Snapshot | Point-in-time copy of a volume/DB | AWS EBS, RDS, FSx |
AWS Backup Services¶
| Service | What It Backs Up |
|---|---|
| S3 Versioning | Objects — keeps all versions, restores from any point |
| EBS Snapshots | Block storage volumes — incremental, stored in S3 |
| RDS Automated Backups | Database — daily backup + transaction logs (point-in-time recovery) |
| RDS Snapshots | Manual point-in-time DB snapshot |
| AWS Backup | Centralised backup policy across EC2, EBS, RDS, DynamoDB, EFS |
| S3 Cross-Region Replication | Replicates objects to another region (DR) |
Storage Classes (S3)¶
Frequency of access → cost tradeoff:
Standard → frequent access, highest cost
Standard-IA → infrequent access, lower cost, retrieval fee
One Zone-IA → infrequent, single AZ (lower durability)
Glacier Instant → archive, millisecond retrieval
Glacier Flexible → archive, minutes to hours
Glacier Deep → long-term archive, cheapest, 12h+ retrieval
Intelligent-Tiering → auto-moves objects based on access patterns
Mental Model: Storage Types in AWS¶
Need block storage (like a hard drive)?
→ EBS (attach to one EC2) or Instance Store (ephemeral, fast)
Need shared file system (multiple EC2s access same files)?
→ EFS (Linux) or FSx for Windows
Need object storage (files, backups, static assets)?
→ S3
Need data warehouse (analytics, SQL on PB of data)?
→ Redshift
Shell Profile Files (Side Note)¶
Mentioned during class — relevant when automating backup scripts:
| File | When It Loads |
|---|---|
~/.bash_profile |
Login shells (SSH, console login) |
~/.bashrc |
Interactive non-login shells (new terminal tab) |
~/.zshrc |
Zsh equivalent of .bashrc |
For scripts that run on boot or via cron, make sure environment variables are sourced from the right profile file.
AWS Context¶
- S3 durability = 99.999999999% (11 nines) — data stored across multiple AZs
- EBS — block storage, attached to one instance, persists independently of the instance
- Instance Store — ephemeral, physically attached, lost on stop/terminate — don't store anything important
- RDS point-in-time recovery — restore to any second within the backup retention window
- S3 MFA Delete — require MFA to permanently delete versioned objects
30-Second Takeaway¶
- Test your backups. A backup you've never restored is untested.
- S3 = durable, versioned, cheap object storage
- EBS = persistent block storage for EC2
- Instance Store = fast but ephemeral — never for important data
- AWS Backup = centralised policy management across services
Self-Quiz¶
- What's the difference between incremental and differential backups?
- Why did GitLab still lose data despite having 5 backup mechanisms?
- Which S3 storage class would you use for logs accessed once a month?
- What's the difference between EBS and Instance Store?
- How does S3 versioning protect against accidental deletion?
- What does
11 ninesof durability actually mean? - Which AWS service centralises backup across EC2, RDS, and DynamoDB?