Skip to content

S3 — Simple Storage Service

From AWS Re/Start labs (July 2025) and apprenticeship notes.


Ultra-Short Summary

S3 is AWS's object storage service — globally distributed, virtually unlimited scale, 11 nines of durability. Objects are stored in buckets identified by a key (path). S3 is not a filesystem — there are no directories, just keys that look like paths. Understanding the bucket policy model, storage classes, and access control is critical for both security and cost.


Object Storage vs Block Storage vs File Storage

Block Storage (EBS)
  → Like a hard drive — raw blocks
  → Attach to one EC2 instance
  → Can format, partition, install OS on it

File Storage (EFS)
  → Shared filesystem — multiple EC2s mount it simultaneously
  → Looks and feels like a directory tree

Object Storage (S3)
  → Flat namespace — bucket + key
  → No hierarchy (folders are just key prefixes)
  → Access via HTTP API, not filesystem
  → Virtually unlimited scale
  → Best for: backups, static assets, data lakes, logs

Core Concepts

Bucket:  a container for objects — globally unique name
Key:     the "path" to an object — e.g. images/profile/user123.jpg
Object:  the actual data — can be any file up to 5TB
Region:  buckets are region-specific (even though accessible globally)

There are no real folders in S3 — images/profile/user123.jpg is just a key that contains slashes. The console makes it look like folders.

URL format:

https://<bucket-name>.s3.<region>.amazonaws.com/<key>
https://mybucket.s3.ap-southeast-2.amazonaws.com/images/photo.jpg

Bucket Policies

JSON policies controlling access to a bucket. Example — make a bucket public read:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicRead",
      "Effect": "Allow",
      "Principal": "*",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::my-bucket/*"
    }
  ]
}

Principal: "*" = anyone in the world. Use carefully.

s3:ListBucket is on the bucket ARN. s3:GetObject is on the object ARN (/*). These are different resources — a common mistake is putting both in the same Resource string.


Access Control

Multiple layers — all must allow, any deny wins:

1. Block Public Access settings (account + bucket level)
2. Bucket Policy
3. IAM Policy (for AWS principals)
4. ACLs (legacy — avoid for new buckets)

Block Public Access is the master override — if enabled, no bucket policy or ACL can make objects public. Enabled by default on new buckets since 2023.


Static Website Hosting

S3 can host a static website:

Enable static website hosting on bucket
  → Set index document: index.html
  → Set error document: error.html

Bucket endpoint:
  http://my-bucket.s3-website.ap-southeast-2.amazonaws.com

Custom domain:
  Create CNAME or Route 53 Alias record pointing to the S3 website endpoint
  → Domain appears to be your own, content served from S3

Note: The S3 website endpoint is HTTP only. For HTTPS with a custom domain, put CloudFront in front.

Hot-linking: When another website embeds a URL directly to an object in your bucket, they're hot-linking. You pay for the bandwidth. CORS policies and Referer conditions in bucket policies can restrict this.


Storage Classes

Choose based on access frequency — tradeoff between cost and retrieval speed:

Class Access Pattern Retrieval Min Storage Cost
S3 Standard Frequent Immediate None Highest
S3 Standard-IA Infrequent (monthly) Immediate 30 days Lower — retrieval fee
S3 One Zone-IA Infrequent, tolerate loss Immediate 30 days Lower — single AZ
S3 Glacier Instant Archive, occasional Milliseconds 90 days Low
S3 Glacier Flexible Archive, rare Minutes–hours 90 days Very low
S3 Glacier Deep Archive Long-term archive 12+ hours 180 days Cheapest
S3 Intelligent-Tiering Unknown pattern Immediate None Auto-optimised

SAA traps:

  • "Access once a week, needs to be cheap" → Standard-IA (not Glacier — retrieval is too slow)
  • "Accessed unpredictably" → Intelligent-Tiering
  • "7-year compliance archive, never accessed" → Glacier Deep Archive

Versioning

Keeps all versions of an object. Protects against accidental overwrite/delete:

Enable versioning on bucket
→ Every PUT creates a new version ID
→ DELETE adds a delete marker (object still exists)
→ Restore: delete the delete marker, or specify version ID

Costs: You're billed for every version stored. Use lifecycle policies to expire old versions.

MFA Delete: Require MFA to permanently delete versioned objects — add this for high-value data.


Lifecycle Policies

Automatically move or expire objects based on age:

Example policy:
  Day 0:   Object uploaded → S3 Standard
  Day 30:  → Move to S3 Standard-IA
  Day 90:  → Move to S3 Glacier Flexible
  Day 365: → Permanently delete

Set at bucket or prefix level. Also used to automatically expire old versions.


Replication

Type What It Does Use Case
CRR (Cross-Region) Copies objects to another region Disaster recovery, compliance, latency
SRR (Same-Region) Copies within same region Log aggregation, dev/prod sync

Both require versioning enabled on source and destination.


Encryption

Method Key Audit Note
SSE-S3 AWS managed No Default, automatic
SSE-KMS KMS key (your choice) Yes Compliance — you can see who accessed
SSE-C You provide per request No AWS doesn't store your key
Client-side You manage everything No Data encrypted before reaching S3

Since January 2023, all new S3 objects are encrypted by default with SSE-S3.


Mental Model

S3 = a virtually infinite, globally distributed hard drive
   = you put things in (PUT), get them out (GET), list what's there (LIST)
   = keys are just names — "folders" are just slashes in the key

Bucket = root
Key    = full path including filename
Object = the actual bytes

Access = IAM + bucket policy + Block Public Access must all agree

SAA Patterns

Scenario Answer
Static website hosting S3 static website + CloudFront (HTTPS) + Route 53
Archive data for 10 years, rarely accessed Glacier Deep Archive
Auto-move to cheaper storage as data ages Lifecycle policy
Protect against accidental deletion Versioning + MFA Delete
Shared across accounts without public access Bucket policy with specific account ARN
Need audit trail for S3 object access SSE-KMS + CloudTrail
Replicate for DR Cross-Region Replication (CRR)

Self-Quiz

  1. What's the difference between S3 object storage and EBS block storage?
  2. "Confidential files accessed weekly, need audit trail." Which encryption type?
  3. s3:ListBucket vs s3:GetObject — what's different about their Resource ARNs?
  4. You deleted an object from a versioned bucket. How do you restore it?
  5. When would you choose Standard-IA over Glacier Flexible?
  6. Your S3-hosted website needs HTTPS and a custom domain. What's the architecture?
  7. What does Block Public Access do and when does it override a bucket policy?
  8. Explain the Intelligent-Tiering class — when does it make sense?