Zero-Downtime DB Migration Decision Logic

Table of Contents

While preparing for the AWS SAP-C02, many candidates get confused by database migration strategy selection. In the real world, this is fundamentally a decision about read/write isolation vs. migration complexity. Let’s drill into a simulated scenario.

The Scenario
#

GlobalSenseTech operates an industrial IoT analytics platform on-premises. Their environment consists of two Node.js microservices:

Data Collector Service: Continuously ingests sensor telemetry and writes to a MySQL database (approximately 10,000 writes/minute).
Reporting Aggregator Service: Runs heavy SQL aggregation queries every 15 minutes to generate compliance dashboards.

The Problem: During aggregation job execution, the collector service experiences ~12% write failure rate due to database lock contention. Leadership has mandated migration to AWS with these constraints:

Zero customer-facing downtime during migration
No changes to existing collector clients (DNS-based cutover only)
Resolve the write contention issue inherent in the current architecture

Key Requirements
#

Design a migration architecture that:

Eliminates read/write contention
Supports phased migration with continuous replication
Enables zero-downtime DNS cutover
Minimizes operational complexity post-migration

The Options
#

A) Set up Aurora MySQL as replication target from on-premises; create Aurora Read Replica for aggregation workload; deploy AWS Lambda behind NLB as collector endpoint using RDS Proxy to write to Aurora; after sync, disable replication and promote Read Replica to standalone; point collector DNS to NLB.

B) Set up Aurora MySQL; use AWS DMS for continuous replication from on-premises; migrate aggregation job to Aurora MySQL primary; deploy EC2 Auto Scaling Group behind ALB as collector endpoint; after sync, point collector DNS to ALB; disable DMS task post-migration.

C) Set up Aurora MySQL; use AWS DMS for continuous replication from on-premises; create Aurora Read Replica for aggregation workload; deploy AWS Lambda behind ALB as collector endpoint using RDS Proxy to write to Aurora; after sync, point collector DNS to ALB; disable DMS task post-migration.

D) Set up Aurora MySQL; create Aurora Read Replica for aggregation workload; deploy Kinesis Data Streams as collector endpoint; use Kinesis Data Firehose to replicate data to Aurora; after sync, disable replication and promote Read Replica; point collector DNS to Kinesis endpoint.

Correct Answer
#

Option C.

Step-by-Step Winning Logic
#

Option C represents the optimal trade-off matrix across five critical dimensions:

Migration Safety: AWS DMS provides continuous, low-risk replication with automatic retry and CDC (Change Data Capture) capabilities—far superior to native MySQL replication for heterogeneous migrations.
Workload Isolation: The Aurora Read Replica physically separates read-heavy aggregation workloads from write-intensive ingestion, eliminating the root cause of lock contention.
Protocol Compatibility: ALB supports HTTP/HTTPS natively (Node.js collectors likely use REST APIs), whereas NLB operates at Layer 4 (TCP) and would require additional application-layer handling.
Serverless Economics: Lambda + RDS Proxy eliminates the operational overhead of managing EC2 fleets while providing connection pooling to handle bursty write patterns efficiently.
Clean Cutover Path: DNS redirection to ALB is a standard, reversible operation; the DMS task can be disabled post-validation without architectural rework.

💎 The Architect’s Deep Dive: Why Options Fail
#

The Traps (Distractor Analysis)
#

Why not Option A?

Native Replication Complexity: Setting Aurora as a direct replication target for on-premises MySQL requires manual binlog configuration, VPN/DX networking, and lacks DMS’s built-in monitoring.
Replica Promotion Anti-Pattern: Promoting a Read Replica to a standalone instance after disabling replication creates a split-brain risk and requires application reconfiguration—violating the “no client changes” constraint.
NLB Protocol Mismatch: If collectors use HTTP(S), NLB adds unnecessary TLS termination complexity.

Why not Option B?

No Read/Write Isolation: Running aggregation jobs on the same Aurora primary instance that receives writes perpetuates the original contention problem—just on AWS infrastructure.
EC2 Operational Overhead: Auto Scaling Groups require AMI management, patching, and scaling policies—significantly higher TCO than Lambda.
Missed Serverless Opportunity: For a simple data ingestion API, EC2 is over-engineering.

Why not Option D?

Kinesis Data Firehose Misuse: Firehose is designed for batch delivery to data lakes (S3, Redshift), not transactional MySQL writes. It introduces 60-900 second buffering delays, making real-time analytics impossible.
No Migration Path: This option skips the migration phase entirely—there’s no AWS DMS, no replication from on-premises. It’s a greenfield rebuild disguised as a migration.
Architectural Overthink: Kinesis adds $0.015/GB ingestion cost and operational complexity without solving the stated problem.

💎 Professional Decision Matrix

This SAP-C02 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

The Architect Blueprint
#

graph TD
    subgraph On-Premises
        OnPremDB[(MySQL Database)]
    end
    
    subgraph AWS Cloud
        DMS[AWS DMS
Continuous Replication]
        Aurora[(Aurora MySQL
Primary)]
        ReadReplica[(Aurora Read Replica
for Aggregation)]
        Lambda[Lambda Functions
Data Collector API]
        RDSProxy[RDS Proxy
Connection Pooling]
        ALB[Application Load Balancer]
    end
    
    Collectors[IoT Sensor Collectors] -->|DNS: collector.example.com| ALB
    ALB --> Lambda
    Lambda --> RDSProxy
    RDSProxy --> Aurora
    Aurora -->|Asynchronous Replication| ReadReplica
    AggregationJob[Reporting Aggregator] --> ReadReplica
    OnPremDB -->|CDC Stream| DMS
    DMS --> Aurora
    
    style Aurora fill:#FF9900,stroke:#232F3E,color:#fff
    style ReadReplica fill:#FF9900,stroke:#232F3E,color:#fff
    style Lambda fill:#FF9900,stroke:#232F3E,color:#fff
    style DMS fill:#3F8624,stroke:#232F3E,color:#fff

💎 Professional Decision Matrix

This SAP-C02 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

Diagram Note: Data flows from on-premises via DMS to Aurora primary, while collectors write through Lambda/RDS Proxy; aggregation jobs read from the isolated Read Replica, eliminating lock contention.

The Decision Matrix
#

Option	Migration Safety	Read/Write Isolation	Est. Monthly Cost	Operational Complexity	Protocol Fit	Cutover Risk
A	Medium (Native replication fragile)	✅ Yes (Read Replica)	$620 (Aurora + Lambda + NLB)	High (Manual binlog setup, replica promotion logic)	⚠️ NLB for HTTP is awkward	High (Replica promotion = split-brain risk)
B	✅ High (AWS DMS)	❌ No (Same primary instance)	$780 (Aurora + EC2 ASG + ALB)	Medium (EC2 patching required)	✅ ALB native HTTP	Medium (Contention persists)
C	✅ High (AWS DMS)	✅ Yes (Read Replica)	$650 (Aurora + Lambda + RDS Proxy + ALB)	Low (Fully managed services)	✅ ALB native HTTP	Low (Clean DNS cutover)
D	❌ None (No migration, rebuild)	✅ Yes (Read Replica)	$920 (Aurora + Kinesis + Firehose + Lambda)	Very High (Kinesis tuning, Firehose to MySQL custom connector)	❌ Data stream ≠ API	Critical (No rollback path)

Cost Breakdown (Option C - 10,000 writes/min, 100GB storage)
#

Aurora MySQL (db.r6g.large): ~$350/month
Aurora Read Replica (db.r6g.large): ~$350/month (same instance type for aggregation workload)
AWS Lambda: ~$20/month (assuming 25M invocations at 512MB, 200ms avg)
RDS Proxy: ~$45/month (1 proxy, ~50 connections)
ALB: ~$25/month (minimal traffic)
AWS DMS: ~$140/month (dms.c5.large during migration, disabled post-cutover)
Data Transfer: ~$20/month

Total: ~$650/month (drops to ~$510/month after disabling DMS)

💎 Professional Decision Matrix

This SAP-C02 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

Real-World Practitioner Insight
#

Exam Rule
#

“For SAP-C02, when you see ‘zero downtime migration’ + ‘resolve performance issue’ + ‘database workload’, always look for:

AWS DMS (not native replication)
Aurora Read Replica (for read/write separation)
Serverless compute when workload is API-based
ALB over NLB for HTTP/HTTPS traffic”

Real World
#

In production, we would additionally:

Add CloudWatch Alarms for DMS replication lag (alert if > 5 seconds)
Implement Blue/Green DNS Weighted Routing (route 5% traffic to AWS first, validate, then 100%)
Use AWS DMS Data Validation feature to detect row-level discrepancies
Consider Aurora Serverless v2 for the Read Replica if aggregation workload is sporadic (cost optimization)
Add AWS Secrets Manager for database credentials rotation (not shown in exam options but required for production)
Evaluate Amazon DMS Fleet Advisor to analyze on-premises database schema compatibility before migration

The exam simplifies this to test architectural decision-making logic, but real migrations require 4-6 week runbooks with rollback procedures.

💎 Professional Decision Matrix

This SAP-C02 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

The Scenario #

Key Requirements #

The Options #

Correct Answer #

Step-by-Step Winning Logic #

💎 The Architect’s Deep Dive: Why Options Fail #

The Traps (Distractor Analysis) #

💎 Professional Decision Matrix

The Architect Blueprint #

💎 Professional Decision Matrix

The Decision Matrix #

Cost Breakdown (Option C - 10,000 writes/min, 100GB storage) #

💎 Professional Decision Matrix

Real-World Practitioner Insight #

Exam Rule #

Real World #

💎 Professional Decision Matrix

Related Articles

The Scenario
#

Key Requirements
#

The Options
#

Correct Answer
#

Step-by-Step Winning Logic
#

💎 The Architect’s Deep Dive: Why Options Fail
#

The Traps (Distractor Analysis)
#

The Architect Blueprint
#

The Decision Matrix
#

Cost Breakdown (Option C - 10,000 writes/min, 100GB storage)
#

Real-World Practitioner Insight
#

Exam Rule
#

Real World
#