Skip to content

APM & Application Monitoring

Notes from AWS Apprenticeship — February 2026.


Ultra-Short Summary

APM (Application Performance Monitoring) is the layer of observability that tells you why something is slow or broken, not just that it is. It does this through traces, spans, and correlated metrics. AWS X-Ray is the native APM service; Datadog and Sentry are common third-party alternatives.


APM vs Monitoring vs Observability

Monitoring      → tells you THAT something is wrong (alerts, dashboards)
APM             → shows you WHY it's slow or failing (traces, profiling)
Observability   → logs + metrics + traces combined for full system insight

APM is a subset of observability.

Key Concepts

Traces

  • The full path a request takes through a system
  • Example: Client → API Gateway → Lambda → DynamoDB
  • A trace spans multiple services and shows total latency

Spans

  • Individual operations inside a trace
  • Each hop (SQL query, Redis GET, external API call) is a span
  • Spans have start time, duration, and status

Metrics

  • Latency (P50 / P95 / P99 — percentile-based, P99 = worst 1% of requests)
  • Error rate
  • Requests per second (throughput)
  • CPU, memory, DB query performance

Profiling

  • Shows which functions or code paths are slowest
  • Goes deeper than traces — line-level visibility

What APM Does

  • Tracks latency, throughput, error rates
  • Identifies slow endpoints and bottlenecks
  • Shows end-to-end request flow
  • Breaks requests into spans
  • Monitors DB queries, cache hits, API calls
  • Correlates logs + metrics + traces for faster debugging
  • Helps diagnose distributed / microservice issues

Tool Strength
AWS X-Ray Native AWS tracing — Lambda, API Gateway, ECS, SDK instrumentation
Datadog APM Deep tracing, service maps, DB-level visibility, all-in-one platform
Sentry Performance Error tracking + lightweight APM, great for frontend + backend
New Relic APM Strong code-level and DB insights

AWS X-Ray Specifics

X-Ray is the AWS-native distributed tracing service.

Request comes in
X-Ray SDK instruments each service
Generates trace + segments (per service) + subsegments (per operation)
Trace data → X-Ray service
X-Ray console shows: service map, traces, latency histogram

Key integrations: Lambda, API Gateway, ECS, EC2, Elastic Beanstalk, SNS, SQS

What you need to enable it: - Add X-Ray SDK to your app - Attach AWSXRayDaemonWriteAccess IAM policy to the role - Enable active tracing on Lambda / API Gateway


Sentry — Error Levels & Structured Logging

Sentry classifies events by severity level:

Level Meaning
debug Diagnostic details, not user-visible
info Expected system events (state changes, success)
warning Unexpected but not breaking
error Operation failed — user impact likely
fatal System failure, crashes the flow

Why structured errors matter

Instead of console.error("something broke"), structured logging sends:

  • Error type (e.g. DATABASE, AUTH)
  • Severity level
  • User context (userId, session)
  • Stack trace
  • Trace ID (correlates to APM trace)

This lets you filter, group, and alert on specific error types — not just "something is broken."


Mental Model

User action → multiple services → something is slow

Without APM:
You know it's slow. You don't know where.

With APM:
Trace shows: API Gateway (2ms) → Lambda (8ms) → DynamoDB (340ms)
Problem isolated to DynamoDB.
→ Check: missing index? Hot partition? Table throughput?

AWS Context

Concept AWS Service
Distributed tracing X-Ray
Metrics + dashboards CloudWatch Metrics
Log aggregation CloudWatch Logs
Alerting CloudWatch Alarms + SNS
Full observability platform CloudWatch + X-Ray + Contributor Insights
Third-party integration Datadog, Dynatrace, New Relic via CloudWatch

SAA relevance: Know that X-Ray = tracing, CloudWatch = metrics/logs/alarms. Understand the difference between monitoring (reactive) and observability (proactive insight).


30-Second Takeaway

  • APM = traces + spans + profiling → tells you where the slow part is
  • Trace = full request path across services
  • Span = one operation within that path
  • AWS X-Ray is the native APM; integrates with Lambda/API Gateway/ECS
  • Observability = logs + metrics + traces together

Self-Quiz

  1. What's the difference between a trace and a span?
  2. Your Lambda function is slow. How would X-Ray help you find the bottleneck?
  3. What IAM policy does a Lambda need to send data to X-Ray?
  4. Sentry logs an error level event — what does that mean vs warning?
  5. A customer says "the API is slow sometimes." What percentile metric do you look at first?
  6. What's the difference between APM and CloudWatch monitoring?
  7. If you have traces, logs, and metrics — what does that give you? What's the term?
  8. Can X-Ray trace requests across multiple AWS accounts?