APM & Application Monitoring¶
Notes from AWS Apprenticeship — February 2026.
Ultra-Short Summary¶
APM (Application Performance Monitoring) is the layer of observability that tells you why something is slow or broken, not just that it is. It does this through traces, spans, and correlated metrics. AWS X-Ray is the native APM service; Datadog and Sentry are common third-party alternatives.
APM vs Monitoring vs Observability¶
Monitoring → tells you THAT something is wrong (alerts, dashboards)
APM → shows you WHY it's slow or failing (traces, profiling)
Observability → logs + metrics + traces combined for full system insight
APM is a subset of observability.
Key Concepts¶
Traces¶
- The full path a request takes through a system
- Example:
Client → API Gateway → Lambda → DynamoDB - A trace spans multiple services and shows total latency
Spans¶
- Individual operations inside a trace
- Each hop (SQL query, Redis GET, external API call) is a span
- Spans have start time, duration, and status
Metrics¶
- Latency (P50 / P95 / P99 — percentile-based, P99 = worst 1% of requests)
- Error rate
- Requests per second (throughput)
- CPU, memory, DB query performance
Profiling¶
- Shows which functions or code paths are slowest
- Goes deeper than traces — line-level visibility
What APM Does¶
- Tracks latency, throughput, error rates
- Identifies slow endpoints and bottlenecks
- Shows end-to-end request flow
- Breaks requests into spans
- Monitors DB queries, cache hits, API calls
- Correlates logs + metrics + traces for faster debugging
- Helps diagnose distributed / microservice issues
Popular APM Tools¶
| Tool | Strength |
|---|---|
| AWS X-Ray | Native AWS tracing — Lambda, API Gateway, ECS, SDK instrumentation |
| Datadog APM | Deep tracing, service maps, DB-level visibility, all-in-one platform |
| Sentry Performance | Error tracking + lightweight APM, great for frontend + backend |
| New Relic APM | Strong code-level and DB insights |
AWS X-Ray Specifics¶
X-Ray is the AWS-native distributed tracing service.
Request comes in
↓
X-Ray SDK instruments each service
↓
Generates trace + segments (per service) + subsegments (per operation)
↓
Trace data → X-Ray service
↓
X-Ray console shows: service map, traces, latency histogram
Key integrations: Lambda, API Gateway, ECS, EC2, Elastic Beanstalk, SNS, SQS
What you need to enable it:
- Add X-Ray SDK to your app
- Attach AWSXRayDaemonWriteAccess IAM policy to the role
- Enable active tracing on Lambda / API Gateway
Sentry — Error Levels & Structured Logging¶
Sentry classifies events by severity level:
| Level | Meaning |
|---|---|
debug |
Diagnostic details, not user-visible |
info |
Expected system events (state changes, success) |
warning |
Unexpected but not breaking |
error |
Operation failed — user impact likely |
fatal |
System failure, crashes the flow |
Why structured errors matter¶
Instead of console.error("something broke"), structured logging sends:
- Error type (e.g.
DATABASE,AUTH) - Severity level
- User context (userId, session)
- Stack trace
- Trace ID (correlates to APM trace)
This lets you filter, group, and alert on specific error types — not just "something is broken."
Mental Model¶
User action → multiple services → something is slow
Without APM:
You know it's slow. You don't know where.
With APM:
Trace shows: API Gateway (2ms) → Lambda (8ms) → DynamoDB (340ms)
Problem isolated to DynamoDB.
→ Check: missing index? Hot partition? Table throughput?
AWS Context¶
| Concept | AWS Service |
|---|---|
| Distributed tracing | X-Ray |
| Metrics + dashboards | CloudWatch Metrics |
| Log aggregation | CloudWatch Logs |
| Alerting | CloudWatch Alarms + SNS |
| Full observability platform | CloudWatch + X-Ray + Contributor Insights |
| Third-party integration | Datadog, Dynatrace, New Relic via CloudWatch |
SAA relevance: Know that X-Ray = tracing, CloudWatch = metrics/logs/alarms. Understand the difference between monitoring (reactive) and observability (proactive insight).
30-Second Takeaway¶
- APM = traces + spans + profiling → tells you where the slow part is
- Trace = full request path across services
- Span = one operation within that path
- AWS X-Ray is the native APM; integrates with Lambda/API Gateway/ECS
- Observability = logs + metrics + traces together
Self-Quiz¶
- What's the difference between a trace and a span?
- Your Lambda function is slow. How would X-Ray help you find the bottleneck?
- What IAM policy does a Lambda need to send data to X-Ray?
- Sentry logs an
errorlevel event — what does that mean vswarning? - A customer says "the API is slow sometimes." What percentile metric do you look at first?
- What's the difference between APM and CloudWatch monitoring?
- If you have traces, logs, and metrics — what does that give you? What's the term?
- Can X-Ray trace requests across multiple AWS accounts?