APM & Application Monitoring¶

Notes from AWS Apprenticeship — February 2026.

Ultra-Short Summary¶

APM (Application Performance Monitoring) is the layer of observability that tells you why something is slow or broken, not just that it is. It does this through traces, spans, and correlated metrics. AWS X-Ray is the native APM service; Datadog and Sentry are common third-party alternatives.

APM vs Monitoring vs Observability¶

Monitoring      → tells you THAT something is wrong (alerts, dashboards)
APM             → shows you WHY it's slow or failing (traces, profiling)
Observability   → logs + metrics + traces combined for full system insight

APM is a subset of observability.

Key Concepts¶

Traces¶

The full path a request takes through a system
Example: Client → API Gateway → Lambda → DynamoDB
A trace spans multiple services and shows total latency

Spans¶

Individual operations inside a trace
Each hop (SQL query, Redis GET, external API call) is a span
Spans have start time, duration, and status

Metrics¶

Latency (P50 / P95 / P99 — percentile-based, P99 = worst 1% of requests)
Error rate
Requests per second (throughput)
CPU, memory, DB query performance

Profiling¶

Shows which functions or code paths are slowest
Goes deeper than traces — line-level visibility

What APM Does¶

Tracks latency, throughput, error rates
Identifies slow endpoints and bottlenecks
Shows end-to-end request flow
Breaks requests into spans
Monitors DB queries, cache hits, API calls
Correlates logs + metrics + traces for faster debugging
Helps diagnose distributed / microservice issues

Popular APM Tools¶

Tool	Strength
AWS X-Ray	Native AWS tracing — Lambda, API Gateway, ECS, SDK instrumentation
Datadog APM	Deep tracing, service maps, DB-level visibility, all-in-one platform
Sentry Performance	Error tracking + lightweight APM, great for frontend + backend
New Relic APM	Strong code-level and DB insights

AWS X-Ray Specifics¶

X-Ray is the AWS-native distributed tracing service.

Request comes in
    ↓
X-Ray SDK instruments each service
    ↓
Generates trace + segments (per service) + subsegments (per operation)
    ↓
Trace data → X-Ray service
    ↓
X-Ray console shows: service map, traces, latency histogram

Key integrations: Lambda, API Gateway, ECS, EC2, Elastic Beanstalk, SNS, SQS

What you need to enable it: - Add X-Ray SDK to your app - Attach AWSXRayDaemonWriteAccess IAM policy to the role - Enable active tracing on Lambda / API Gateway

Sentry — Error Levels & Structured Logging¶

Sentry classifies events by severity level:

Level	Meaning
`debug`	Diagnostic details, not user-visible
`info`	Expected system events (state changes, success)
`warning`	Unexpected but not breaking
`error`	Operation failed — user impact likely
`fatal`	System failure, crashes the flow

Why structured errors matter¶

Instead of console.error("something broke"), structured logging sends:

Error type (e.g. DATABASE, AUTH)
Severity level
User context (userId, session)
Stack trace
Trace ID (correlates to APM trace)

This lets you filter, group, and alert on specific error types — not just "something is broken."

Mental Model¶

User action → multiple services → something is slow

Without APM:
You know it's slow. You don't know where.

With APM:
Trace shows: API Gateway (2ms) → Lambda (8ms) → DynamoDB (340ms)
Problem isolated to DynamoDB.
→ Check: missing index? Hot partition? Table throughput?

AWS Context¶

Concept	AWS Service
Distributed tracing	X-Ray
Metrics + dashboards	CloudWatch Metrics
Log aggregation	CloudWatch Logs
Alerting	CloudWatch Alarms + SNS
Full observability platform	CloudWatch + X-Ray + Contributor Insights
Third-party integration	Datadog, Dynatrace, New Relic via CloudWatch

SAA relevance: Know that X-Ray = tracing, CloudWatch = metrics/logs/alarms. Understand the difference between monitoring (reactive) and observability (proactive insight).

30-Second Takeaway¶

APM = traces + spans + profiling → tells you where the slow part is
Trace = full request path across services
Span = one operation within that path
AWS X-Ray is the native APM; integrates with Lambda/API Gateway/ECS
Observability = logs + metrics + traces together

Self-Quiz¶

What's the difference between a trace and a span?
Your Lambda function is slow. How would X-Ray help you find the bottleneck?
What IAM policy does a Lambda need to send data to X-Ray?
Sentry logs an error level event — what does that mean vs warning?
A customer says "the API is slow sometimes." What percentile metric do you look at first?
What's the difference between APM and CloudWatch monitoring?
If you have traces, logs, and metrics — what does that give you? What's the term?
Can X-Ray trace requests across multiple AWS accounts?