Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAP-C02
  6. >
  7. AWS SAP-C02 Exam Scenarios
  8. >
  9. Multi-Region API Failover Trade-offs | SAP-C02

Multi-Region API Failover Trade-offs | SAP-C02

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.

While preparing for the AWS SAP-C02, many candidates get confused by API Gateway endpoint types and multi-region deployment patterns. In the real world, this is fundamentally a decision about RPO/RTO requirements vs. operational complexity and cost. Let’s drill into a simulated scenario.

The Scenario
#

SkyMetrics Inc., a meteorological data provider, operates a REST API serving real-time weather analytics to enterprise clients across North America. The API infrastructure runs on Amazon API Gateway with custom domain analytics.skymetrics.io managed through Route 53. Each API endpoint invokes a dedicated AWS Lambda function, and all telemetry data resides in an Amazon DynamoDB table in us-east-1.

The CTO has mandated a cross-region disaster recovery capability following a recent outage that affected their primary region. The solution must ensure automatic failover to a secondary AWS region while maintaining data consistency and minimizing client-side configuration changes.

Key Requirements
#

Design a multi-region failover architecture for the REST API that:

  • Ensures automatic DNS-based failover
  • Maintains data consistency across regions
  • Requires no client application changes
  • Minimizes RTO (Recovery Time Objective)

The Options
#

  • A) Deploy a new set of Lambda functions in a secondary region; Update the API Gateway API to use an edge-optimized endpoint with Lambda functions from both regions as targets; Convert the DynamoDB table to a global table.

  • B) Deploy a new API Gateway API and Lambda functions in a secondary region; Modify the Route 53 DNS record to a multivalue answer record; Add both API Gateway APIs to the answer list; Enable target health checks; Convert the DynamoDB table to a global table.

  • C) Deploy a new API Gateway API and Lambda functions in a secondary region; Modify the Route 53 DNS record to a failover record; Enable target health checks; Convert the DynamoDB table to a global table.

  • D) Deploy a new API Gateway API in a secondary region; Modify Lambda functions to be global functions; Modify the Route 53 DNS record to a multivalue answer record; Add both API Gateway APIs to the answer list; Enable target health checks; Convert the DynamoDB table to a global table

Correct Answer
#

Option C.


The Architect’s Analysis
#

Correct Answer
#

Option C — Regional API Gateway + Route 53 Failover + DynamoDB Global Tables.

Step-by-Step Winning Logic
#

This solution represents the optimal balance between disaster recovery capability, cost efficiency, and operational simplicity:

  1. Complete Regional Independence: Deploying a full API Gateway API + Lambda stack in the secondary region ensures no cross-region dependencies during failover. The primary region failure doesn’t impact secondary region operation.

  2. DNS-Based Failover: Route 53 failover routing policy provides automatic, health-check-driven DNS failover. Primary endpoint serves all traffic during normal operations; secondary activates only upon health check failure. This is the standard DR pattern for API workloads.

  3. Data Layer Consistency: DynamoDB Global Tables provide multi-region active-active replication with typically sub-second latency, ensuring the secondary region has up-to-date data when it takes over.

  4. Cost Optimization: Unlike active-active patterns, this keeps the secondary region in warm standby mode. You pay for:

    • API Gateway monthly fees (~$1/month per API)
    • Lambda provisioned concurrency (optional, for faster cold starts)
    • DynamoDB global table replication (write capacity units)

    But you don’t pay for API Gateway request charges on the secondary until failover occurs.

  5. Zero Client Impact: Custom domain with Route 53 means clients always call analytics.skymetrics.io—DNS handles the regional resolution transparently.

The Traps (Distractor Analysis)
#

Why not Option A?

  • Fatal Misconception: API Gateway edge-optimized endpoints use CloudFront distribution for global edge caching, but they cannot invoke Lambda functions across multiple regions. Edge-optimized endpoints still invoke backend integrations in a single region.
  • Lambda functions are always regional resources—there’s no multi-region invocation capability within a single API Gateway deployment.
  • This option fundamentally misunderstands API Gateway endpoint types.

Why not Option B?

  • Multivalue answer routing returns multiple IP addresses randomly to clients—it’s designed for simple load distribution, not failover.
  • It lacks health-check-based automatic routing. If the primary region fails, Route 53 will still return its IP address 50% of the time (assuming two values), causing 50% failure rate for clients.
  • For DR scenarios requiring automatic failover, you need failover or geoproximity with health checks, not multivalue.

Why not Option D?

  • “Global Lambda functions” don’t exist. Lambda is a regional service. While you can use Lambda@Edge (which runs at CloudFront edge locations), it’s designed for lightweight request/response manipulation, not as a replacement for regional API backends.
  • Same multivalue routing issue as Option B—no automatic failover.
  • This option contains a conceptual error that should immediately disqualify it.

The Architect Blueprint
#

graph TB subgraph "Client Layer" Client[Enterprise API Clients] end subgraph "DNS Layer - Route 53" R53[analytics.skymetrics.io<br/>Failover Routing Policy] HC1[Health Check - Primary] HC2[Health Check - Secondary] end subgraph "Primary Region - us-east-1" APIGW1[API Gateway<br/>Regional Endpoint] Lambda1A[Lambda: GetWeather] Lambda1B[Lambda: GetForecast] Lambda1C[Lambda: GetAlerts] DDB1[(DynamoDB Table<br/>Global Table - Primary)] end subgraph "Secondary Region - us-west-2" APIGW2[API Gateway<br/>Regional Endpoint] Lambda2A[Lambda: GetWeather] Lambda2B[Lambda: GetForecast] Lambda2C[Lambda: GetAlerts] DDB2[(DynamoDB Table<br/>Global Table - Replica)] end Client -->|DNS Query| R53 R53 -->|Primary Healthy| APIGW1 R53 -.->|Primary Failed| APIGW2 HC1 -.->|Monitor| APIGW1 HC2 -.->|Monitor| APIGW2 APIGW1 --> Lambda1A APIGW1 --> Lambda1B APIGW1 --> Lambda1C Lambda1A --> DDB1 Lambda1B --> DDB1 Lambda1C --> DDB1 APIGW2 --> Lambda2A APIGW2 --> Lambda2B APIGW2 --> Lambda2C Lambda2A --> DDB2 Lambda2B --> DDB2 Lambda2C --> DDB2 DDB1 <-.->|Bi-directional Replication| DDB2 style APIGW1 fill:#FF9900,stroke:#232F3E,stroke-width:2px style APIGW2 fill:#FF9900,stroke:#232F3E,stroke-width:2px style DDB1 fill:#4053D6,stroke:#232F3E,stroke-width:2px style DDB2 fill:#4053D6,stroke:#232F3E,stroke-width:2px style R53 fill:#8C4FFF,stroke:#232F3E,stroke-width:2px

Diagram Note: Under normal operations, Route 53 directs all traffic to us-east-1 based on health check status. Upon primary region failure, DNS automatically resolves to us-west-2, while DynamoDB Global Tables ensure data consistency across both regions.

The Decision Matrix
#

Option Est. Complexity Est. Monthly Cost (10M Requests) Pros Cons
A Medium N/A - Architecturally Invalid ❌ Conceptual error—edge-optimized endpoints can’t route to multi-region Lambda Cannot achieve multi-region failover; fundamental misunderstanding of API Gateway capabilities
B Medium $420/month (dual active API Gateway + Global Tables) ✅ Full regional stack deployment
✅ DynamoDB global replication
❌ Multivalue routing = no automatic failover
❌ 50% traffic still hits failed region
❌ Higher cost due to dual-active API requests
C Medium $240/month (primary active + warm standby) ✅ True automatic DNS failover
✅ Cost-efficient warm standby
✅ Health-check driven
✅ Industry-standard DR pattern
Requires ~60s DNS TTL propagation for failover (acceptable for most DR scenarios)
D High N/A - Architecturally Invalid ❌ “Global Lambda” is not a valid AWS service concept Same multivalue routing issues as B; adds non-existent service dependencies

Cost Breakdown (Option C):

  • API Gateway: $3.50/million requests × 10M = $35 (primary only under normal ops)
  • Lambda: ~$0.20 per 1M requests (128MB, 200ms avg) × 10M = $2
  • Lambda Compute: ~$160/month (assuming 2 billion GB-seconds)
  • DynamoDB Global Tables: ~$25/month (write replication for 100 WCU)
  • Route 53: $0.50/month (hosted zone) + $0.50 (health checks)
  • Data Transfer: ~$10/month (inter-region DynamoDB replication)
  • Secondary Region (Standby): $7 (API Gateway monthly fee + minimal Lambda invocations for health checks)

Total: ~$240/month vs. Option B’s ~$420/month (due to dual active-active API invocations).

Real-World Practitioner Insight
#

Exam Rule
#

For the SAP-C02 exam, when you see:

  • “Multi-region API failover” + “automatic” → Think Route 53 Failover Routing
  • “Edge-optimized endpoint” → Understand it’s for CloudFront caching, NOT multi-region backend routing
  • “Multivalue answer” → Recognize it’s for simple load distribution, NOT DR failover
  • DynamoDB cross-region DR → Always use Global Tables (bi-directional, automatic)

Real World
#

In production at SkyMetrics-scale companies, we’d layer additional considerations:

  1. Active-Active vs. Active-Passive Decision:

    • If clients are truly global (EU + US), consider geoproximity routing with active-active regions for latency optimization
    • Current solution (failover) is optimized for North America with DR, not global latency
  2. RTO Optimization:

    • Route 53 health checks run every 30s (fast) or 10s (expensive)
    • DNS TTL caching means actual failover = health check interval + TTL (typically 60-90s total)
    • For sub-10s RTO, consider AWS Global Accelerator in front of regional API Gateways (adds ~$0.025/hour + data transfer)
  3. Lambda Cold Start Mitigation:

    • Secondary region Lambda functions will have cold starts during failover
    • Use Provisioned Concurrency (adds ~$15/month per function) for critical endpoints
    • Or accept 1-3s cold start latency as acceptable DR trade-off
  4. Cost Governance:

    • Implement CloudWatch Alarms on secondary region API Gateway invocations
    • Alert if secondary is receiving traffic during non-failover (indicates DNS misconfiguration)
    • Use AWS Cost Anomaly Detection to catch unexpected global table replication costs
  5. Testing Discipline:

    • Schedule quarterly DR drills by failing primary health check manually
    • Test not just failover, but fail-back to primary (often forgotten)
    • Validate DynamoDB global table replication lag under load

The exam tests your knowledge of service capabilities. The real world tests your ability to balance cost, risk, and operational burden within business constraints.

Accelerate Your Cloud Certification.

Stop memorizing exam dumps. Join our waitlist for logic-driven blueprints tailored to your specific certification path.