Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAA-C03
  6. >
  7. AWS SAA-C03 Exam Scenarios
  8. >
  9. Multi-AZ HA—Simplicity vs Complexity | SAA-C03

Multi-AZ HA—Simplicity vs Complexity | SAA-C03

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.

While preparing for the AWS SAA-C03, many candidates confuse high availability with disaster recovery. In the real world, this is fundamentally a decision about RPO/RTO requirements vs. operational complexity. Let’s drill into a simulated scenario.

The Scenario
#

GlobalHealthTech operates a patient portal application that physicians use to access real-time medical records. The application currently runs on EC2 instances behind an Application Load Balancer (ALB), with instances managed by an Auto Scaling group. The backend database is an Aurora PostgreSQL cluster deployed in a single Availability Zone.

The CTO has mandated that the system must withstand infrastructure failures with minimal downtime (target: < 2 minutes) and near-zero data loss (RPO < 1 second). The engineering team is small, and the solution must minimize ongoing operational maintenance.

Key Requirements
#

Design a high-availability architecture that meets the RPO/RTO requirements while minimizing operational overhead.

The Options
#

  • A) Deploy EC2 instances across multiple AWS Regions. Use Amazon Route 53 health checks to redirect traffic between regions. Implement Aurora PostgreSQL cross-region replication.
  • B) Configure the Auto Scaling group to span multiple Availability Zones. Enable Multi-AZ deployment for the Aurora database. Deploy Amazon RDS Proxy instances for database connection management.
  • C) Configure the Auto Scaling group to use a single Availability Zone. Create automated hourly snapshots of the database. In case of failure, restore the database from the most recent snapshot.
  • D) Configure the Auto Scaling group to span multiple AWS Regions. Write application data to Amazon S3. Use S3 Event Notifications to trigger AWS Lambda functions that write data to the database asynchronously.

Correct Answer
#

Option B - Multi-AZ Auto Scaling with Multi-AZ Aurora and RDS Proxy.

Step-by-Step Winning Logic
#

This solution achieves the core requirement through AWS-native, fully managed high availability:

  1. Auto Scaling across AZs: Automatically distributes EC2 instances across multiple Availability Zones. If one AZ fails, the ALB routes traffic to healthy instances in other AZs within seconds.

  2. Aurora Multi-AZ: Provides synchronous replication to a standby instance in a different AZ with automatic failover (typically 30-120 seconds). This meets the < 2 minute RTO and near-zero RPO requirements.

  3. RDS Proxy: Maintains connection pooling and reduces failover time by managing database connections intelligently. This is particularly valuable during scaling events or failover scenarios where connection storms could occur.

Operational Overhead: Near-zero. Multi-AZ is a configuration checkbox, not an architecture you build and maintain.

Cost Profile: Predictable and modest—approximately 2x the single-AZ cost for compute and database.


💎 The Architect’s Deep Dive: Why Options Fail
#

The Traps (Distractor Analysis)
#

  • Why not Option A?

    • Over-engineering for the requirement: Multi-region architecture is designed for disaster recovery scenarios (regional failures) or global latency optimization, not routine high availability.
    • Operational complexity: Cross-region replication, bidirectional DNS failover, data consistency challenges, and network costs add significant overhead.
    • Cost: 3-5x more expensive due to data transfer costs, duplicated infrastructure, and cross-region replication.
    • Latency: Cross-region writes introduce 50-200ms+ latency depending on region pairs.
  • Why not Option C?

    • Fails RTO/RPO requirements: Hourly snapshots mean up to 60 minutes of data loss (RPO = 60 min). Restoration from snapshots can take 15-45 minutes, violating the < 2 minute RTO.
    • Single AZ Auto Scaling: Entire application goes down if the AZ experiences an outage.
    • Manual intervention: Requires human detection and initiation of recovery procedures.
  • Why not Option D?

    • Architectural anti-pattern: Using S3 and Lambda as an intermediary for transactional database writes introduces:
      • Inconsistent write latency
      • Potential data loss if Lambda fails
      • Complex error handling and retry logic
      • No ACID transaction guarantees
    • Multi-region Auto Scaling without database strategy: The option describes multi-region compute but doesn’t address how the Aurora database (inherently regional) would function.
    • Event-driven database writes: Asynchronous writes via Lambda violate real-time access requirements for medical records.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

The Architect Blueprint
#

graph TD
    Users([Physicians]) --> R53[Route 53 DNS]
    R53 --> ALB[Application Load Balancer]
    
    ALB --> ASG[Auto Scaling Group - Multi-AZ]
    
    ASG --> EC2_AZ1[EC2 Instances
AZ-1a] ASG --> EC2_AZ2[EC2 Instances
AZ-1b] ASG --> EC2_AZ3[EC2 Instances
AZ-1c] EC2_AZ1 --> Proxy[RDS Proxy] EC2_AZ2 --> Proxy EC2_AZ3 --> Proxy Proxy --> Aurora_Primary[Aurora PostgreSQL
Primary - AZ-1a] Aurora_Primary -.Synchronous Replication.-> Aurora_Standby[Aurora PostgreSQL
Standby - AZ-1b] style Aurora_Primary fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff style Aurora_Standby fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff style Proxy fill:#2196F3,stroke:#333,stroke-width:2px,color:#fff style ALB fill:#9C27B0,stroke:#333,stroke-width:2px,color:#fff

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

Diagram Note: Traffic flows through the ALB to Auto Scaling instances distributed across three AZs, connecting to Aurora through RDS Proxy for optimized connection management and automatic failover handling.

The Decision Matrix
#

Option Est. Complexity Est. Monthly Cost (1000 req/sec) Pros Cons
A - Multi-Region Very High $4,500 - $6,000 • Survives regional failures
• Geographic load distribution
• Massive operational overhead
• Cross-region latency
• Data consistency challenges
• 3-5x cost premium
B - Multi-AZ Low $1,200 - $1,800 • Fully managed HA
• Automatic failover
• Minimal ops overhead
• Meets RTO/RPO
• Predictable costs
• Cannot survive regional failures
• Limited to single region latency
C - Single AZ + Snapshots Medium $600 - $800 • Lowest infrastructure cost
• Simple initial setup
• Fails RTO requirement (15-45 min)
• Fails RPO requirement (60 min)
• Manual recovery needed
• Complete AZ failure = downtime
D - Multi-Region + S3/Lambda Extremely High $5,000 - $8,000 • (None for this use case) • Architectural anti-pattern
• No ACID guarantees
• Unpredictable latency
• Complex error handling
• Violates real-time requirements

Cost Assumptions:

  • Option A: 2 regions × (3x db.r5.2xlarge Aurora + 6x m5.large EC2 + data transfer ~$800/mo)
  • Option B: 2x db.r5.2xlarge Aurora Multi-AZ ($900) + 6x m5.large EC2 Multi-AZ ($600) + RDS Proxy (~$90)
  • Option C: 1x db.r5.2xlarge Aurora ($450) + 3x m5.large EC2 ($220) + snapshot storage (~$50)
  • Option D: Multi-region compute + S3 + Lambda invocations + unpredictable data transfer

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

Real-World Practitioner Insight
#

Exam Rule
#

“For the AWS SAA-C03, when you see keywords like ‘minimize operational overhead’ + ‘high availability’ for a database, Multi-AZ is almost always the answer. Multi-region solutions are only correct when the scenario explicitly mentions disaster recovery across geographic regions or compliance requirements for data residency.”

Real World
#

In reality, we would:

  1. Start with Option B for 95% of high-availability requirements. Multi-AZ provides 99.95% availability SLA, which translates to ~22 minutes of downtime per year—acceptable for most business-critical applications.

  2. Add RDS Proxy selectively: While included in the correct answer, RDS Proxy adds ~$90/month + data processing costs. We’d only implement it if:

    • The application creates/destroys connections frequently (serverless, Lambda)
    • Connection pooling isn’t handled well at the application layer
    • Failover time requirements are extremely strict (< 30 seconds)
  3. Consider Aurora Global Database (not Multi-Region compute) if true disaster recovery is needed:

    • Provides cross-region read replicas with < 1 second replication lag
    • Allows promotion of secondary region in < 1 minute
    • Costs ~60% less than Option A’s full multi-region architecture
    • Avoids the complexity of bidirectional traffic management
  4. Monitor and optimize Aurora storage costs: Aurora charges for allocated storage and I/O. In production, we’d:

    • Implement Aurora Auto Scaling for read replicas
    • Use Performance Insights to optimize query patterns
    • Consider Aurora Serverless v2 for variable workloads
  5. Implement proper backup strategy: Multi-AZ protects against infrastructure failure, not logical errors (accidental deletes, corrupted data). We’d maintain:

    • Automated daily snapshots retained for 7-35 days
    • Point-in-time recovery enabled (automatically enabled with Aurora)
    • Test restoration procedures quarterly

The Bottom Line: The exam tests your ability to match solution complexity to requirements. In the real world, we layer protections (Multi-AZ → Backups → Cross-Region DR) based on measured business impact, not theoretical scenarios.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access