S3 — Simple Storage Service¶
From AWS Re/Start labs (July 2025) and apprenticeship notes.
Ultra-Short Summary¶
S3 is AWS's object storage service — globally distributed, virtually unlimited scale, 11 nines of durability. Objects are stored in buckets identified by a key (path). S3 is not a filesystem — there are no directories, just keys that look like paths. Understanding the bucket policy model, storage classes, and access control is critical for both security and cost.
Object Storage vs Block Storage vs File Storage¶
Block Storage (EBS)
→ Like a hard drive — raw blocks
→ Attach to one EC2 instance
→ Can format, partition, install OS on it
File Storage (EFS)
→ Shared filesystem — multiple EC2s mount it simultaneously
→ Looks and feels like a directory tree
Object Storage (S3)
→ Flat namespace — bucket + key
→ No hierarchy (folders are just key prefixes)
→ Access via HTTP API, not filesystem
→ Virtually unlimited scale
→ Best for: backups, static assets, data lakes, logs
Core Concepts¶
Bucket: a container for objects — globally unique name
Key: the "path" to an object — e.g. images/profile/user123.jpg
Object: the actual data — can be any file up to 5TB
Region: buckets are region-specific (even though accessible globally)
There are no real folders in S3 — images/profile/user123.jpg is just a key that contains slashes. The console makes it look like folders.
URL format:
https://<bucket-name>.s3.<region>.amazonaws.com/<key>
https://mybucket.s3.ap-southeast-2.amazonaws.com/images/photo.jpg
Bucket Policies¶
JSON policies controlling access to a bucket. Example — make a bucket public read:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
Principal: "*" = anyone in the world. Use carefully.
s3:ListBucket is on the bucket ARN. s3:GetObject is on the object ARN (/*). These are different resources — a common mistake is putting both in the same Resource string.
Access Control¶
Multiple layers — all must allow, any deny wins:
1. Block Public Access settings (account + bucket level)
2. Bucket Policy
3. IAM Policy (for AWS principals)
4. ACLs (legacy — avoid for new buckets)
Block Public Access is the master override — if enabled, no bucket policy or ACL can make objects public. Enabled by default on new buckets since 2023.
Static Website Hosting¶
S3 can host a static website:
Enable static website hosting on bucket
→ Set index document: index.html
→ Set error document: error.html
Bucket endpoint:
http://my-bucket.s3-website.ap-southeast-2.amazonaws.com
Custom domain:
Create CNAME or Route 53 Alias record pointing to the S3 website endpoint
→ Domain appears to be your own, content served from S3
Note: The S3 website endpoint is HTTP only. For HTTPS with a custom domain, put CloudFront in front.
Hot-linking: When another website embeds a URL directly to an object in your bucket, they're hot-linking. You pay for the bandwidth. CORS policies and Referer conditions in bucket policies can restrict this.
Storage Classes¶
Choose based on access frequency — tradeoff between cost and retrieval speed:
| Class | Access Pattern | Retrieval | Min Storage | Cost |
|---|---|---|---|---|
| S3 Standard | Frequent | Immediate | None | Highest |
| S3 Standard-IA | Infrequent (monthly) | Immediate | 30 days | Lower — retrieval fee |
| S3 One Zone-IA | Infrequent, tolerate loss | Immediate | 30 days | Lower — single AZ |
| S3 Glacier Instant | Archive, occasional | Milliseconds | 90 days | Low |
| S3 Glacier Flexible | Archive, rare | Minutes–hours | 90 days | Very low |
| S3 Glacier Deep Archive | Long-term archive | 12+ hours | 180 days | Cheapest |
| S3 Intelligent-Tiering | Unknown pattern | Immediate | None | Auto-optimised |
SAA traps:
- "Access once a week, needs to be cheap" → Standard-IA (not Glacier — retrieval is too slow)
- "Accessed unpredictably" → Intelligent-Tiering
- "7-year compliance archive, never accessed" → Glacier Deep Archive
Versioning¶
Keeps all versions of an object. Protects against accidental overwrite/delete:
Enable versioning on bucket
→ Every PUT creates a new version ID
→ DELETE adds a delete marker (object still exists)
→ Restore: delete the delete marker, or specify version ID
Costs: You're billed for every version stored. Use lifecycle policies to expire old versions.
MFA Delete: Require MFA to permanently delete versioned objects — add this for high-value data.
Lifecycle Policies¶
Automatically move or expire objects based on age:
Example policy:
Day 0: Object uploaded → S3 Standard
Day 30: → Move to S3 Standard-IA
Day 90: → Move to S3 Glacier Flexible
Day 365: → Permanently delete
Set at bucket or prefix level. Also used to automatically expire old versions.
Replication¶
| Type | What It Does | Use Case |
|---|---|---|
| CRR (Cross-Region) | Copies objects to another region | Disaster recovery, compliance, latency |
| SRR (Same-Region) | Copies within same region | Log aggregation, dev/prod sync |
Both require versioning enabled on source and destination.
Encryption¶
| Method | Key | Audit | Note |
|---|---|---|---|
| SSE-S3 | AWS managed | No | Default, automatic |
| SSE-KMS | KMS key (your choice) | Yes | Compliance — you can see who accessed |
| SSE-C | You provide per request | No | AWS doesn't store your key |
| Client-side | You manage everything | No | Data encrypted before reaching S3 |
Since January 2023, all new S3 objects are encrypted by default with SSE-S3.
Mental Model¶
S3 = a virtually infinite, globally distributed hard drive
= you put things in (PUT), get them out (GET), list what's there (LIST)
= keys are just names — "folders" are just slashes in the key
Bucket = root
Key = full path including filename
Object = the actual bytes
Access = IAM + bucket policy + Block Public Access must all agree
SAA Patterns¶
| Scenario | Answer |
|---|---|
| Static website hosting | S3 static website + CloudFront (HTTPS) + Route 53 |
| Archive data for 10 years, rarely accessed | Glacier Deep Archive |
| Auto-move to cheaper storage as data ages | Lifecycle policy |
| Protect against accidental deletion | Versioning + MFA Delete |
| Shared across accounts without public access | Bucket policy with specific account ARN |
| Need audit trail for S3 object access | SSE-KMS + CloudTrail |
| Replicate for DR | Cross-Region Replication (CRR) |
Self-Quiz¶
- What's the difference between S3 object storage and EBS block storage?
- "Confidential files accessed weekly, need audit trail." Which encryption type?
s3:ListBucketvss3:GetObject— what's different about their Resource ARNs?- You deleted an object from a versioned bucket. How do you restore it?
- When would you choose Standard-IA over Glacier Flexible?
- Your S3-hosted website needs HTTPS and a custom domain. What's the architecture?
- What does Block Public Access do and when does it override a bucket policy?
- Explain the Intelligent-Tiering class — when does it make sense?