While preparing for the AWS SAA-C03, many candidates get confused by when to use serverless vs. container vs. EC2-based architectures. In the real world, this is fundamentally a decision about operational overhead vs. control, and pay-per-execution vs. always-on costs. Let’s drill into a simulated scenario.
The Scenario #
DataPulse Analytics, a mid-sized financial data processing firm, operates a Python-based application that ingests JSON transaction reports from multiple banking partners. The application parses these documents, performs compliance validation, and writes structured records to a relational database for auditing purposes.
Currently, the application runs on a single on-premises server and executes approximately 3,500 processing jobs daily (averaging one every 25 seconds during business hours, with significant idle time overnight). The company’s cloud migration committee has mandated:
- Zero tolerance for data loss during migration
- Minimal human intervention for scaling and maintenance
- High availability across multiple data centers
- Cost efficiency given the bursty, intermittent workload pattern
The CTO has shortlisted four AWS migration architectures for evaluation.
Key Requirements #
Design a solution that maximizes availability, scalability, and minimizes operational overhead while processing thousands of JSON documents daily.
The Options #
-
A) Store JSON documents in Amazon S3. Deploy the Python application on multiple Amazon EC2 instances behind an Auto Scaling group to process documents. Store results in an Amazon Aurora database cluster.
-
B) Store JSON documents in Amazon S3. Create an AWS Lambda function that triggers when documents are uploaded to S3, executing the Python processing code. Store results in an Amazon Aurora database cluster.
-
C) Store JSON documents in an Amazon EBS volume. Use EBS Multi-Attach to connect the volume to multiple Amazon EC2 instances. Run the Python application on EC2 instances to process documents. Store results in an Amazon RDS database instance.
-
D) Store JSON documents as messages in an Amazon SQS queue. Deploy the Python code as a Docker container running on an Amazon ECS cluster with EC2 launch type. Have containers poll and process SQS messages. Store results in an Amazon RDS database instance.
Correct Answer #
Option B – S3 + Lambda + Aurora.
Step-by-Step Winning Logic #
This solution achieves the optimal trade-off across all three evaluation criteria:
-
Operational Overhead = Near Zero
- No server provisioning, patching, or scaling configuration required
- AWS manages Lambda runtime, scaling, and availability automatically
- S3 and Aurora are fully managed services
-
Scalability = Automatic & Elastic
- Lambda scales per-event: if 100 files upload simultaneously, 100 concurrent executions spawn instantly
- S3 event notifications provide native integration without polling overhead
- Aurora Auto Scaling handles database write spikes
-
Cost Efficiency = Pay-Per-Use
- 3,500 daily executions ≈ 105,000 monthly invocations
- Lambda Free Tier covers 1M requests/month (this workload is FREE for compute requests)
- Charges apply only for execution duration (likely under $5/month for sub-1GB memory functions)
- Zero cost during idle hours (nights, weekends)
-
High Availability = Built-In
- Lambda automatically runs across multiple AZs
- Aurora cluster provides multi-AZ replication with automatic failover
- S3 offers 99.999999999% durability
💎 The Architect’s Deep Dive: Why Options Fail #
The Traps (Distractor Analysis) #
Why not Option A (EC2 Auto Scaling)?
- Operational overhead: Requires AMI management, patching, Auto Scaling configuration, and health checks
- Cost inefficiency: EC2 instances incur charges 24/7, even during idle overnight periods (likely $200-400/month for minimal multi-AZ setup)
- Scaling lag: Auto Scaling policies react to metrics with 1-5 minute delays; Lambda scales instantly per upload
Why not Option C (EBS Multi-Attach + EC2)?
- EBS Multi-Attach only supports Provisioned IOPS SSD (io1/io2) in limited instance families (Nitro-based)
- No native event triggering: Requires custom polling logic to detect new files
- Single point of failure: EBS volumes are AZ-specific; multi-AZ requires complex replication
- Concurrency conflicts: Multiple EC2 instances accessing the same file system require file locking mechanisms (not provided)
- Cost: io2 Multi-Attach volumes cost significantly more than S3 Standard storage
Why not Option D (SQS + ECS with EC2 launch type)?
- Operational overhead: ECS cluster management, EC2 instance scaling, container image updates, and ECS task definition maintenance
- Cost: EC2 instances must run continuously to poll SQS (even if queue is empty), costing $150-300/month
- Over-engineering: Valid architecture for long-running, CPU-intensive tasks, but excessive complexity for simple JSON parsing
- Scaling complexity: Requires custom CloudWatch metrics to scale ECS tasks based on SQS queue depth
The Architect Blueprint #
graph TD
Partner[Banking Partners] -->|Upload JSON| S3[Amazon S3 Bucket]
S3 -->|S3 Event Notification| EventBridge[S3 Event Trigger]
EventBridge -->|Invoke| Lambda[AWS Lambda Function
Python Runtime]
Lambda -->|Parse & Validate JSON| Lambda
Lambda -->|Write Structured Records| Aurora[(Amazon Aurora
Multi-AZ Cluster)]
Lambda -->|Execution Logs| CW[CloudWatch Logs]
Aurora -->|Automated Backups| S3Backup[S3 Backup Bucket]
style Lambda fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff
style S3 fill:#569A31,stroke:#232F3E,stroke-width:2px
style Aurora fill:#527FFF,stroke:#232F3E,stroke-width:2px
Diagram Note: S3 event notifications trigger Lambda synchronously upon file upload; Lambda processes the JSON and commits results to Aurora, with all logging automatically captured in CloudWatch—zero infrastructure management required.
Real-World Practitioner Insight #
Exam Rule #
For the SAA-C03 exam, when you see:
- “Minimal operational overhead” + event-driven workload → Choose Lambda
- “Run thousands of times daily” (not continuously) → Serverless over always-on compute
- “High availability” without additional cost constraints → Aurora over single-AZ RDS
Real World #
In production environments at DataPulse Analytics, we would likely:
-
Add an SQS queue between S3 and Lambda for:
- Throttling protection: If partners suddenly upload 10,000 files, SQS buffers the load
- Retry handling: Failed Lambda invocations automatically retry via SQS dead-letter queues
- Cost control: Set Lambda reserved concurrency limits to cap costs during unexpected spikes
-
Use Aurora Serverless v2 instead of provisioned Aurora:
- Auto-scales database capacity based on actual load (ACUs scale in 0.5 increments)
- Reduces costs during idle periods (nights/weekends) by scaling to minimum ACUs
- Total solution becomes fully serverless end-to-end
-
Implement S3 Intelligent-Tiering:
- Automatically moves infrequently accessed JSON files to cheaper storage tiers
- Compliance requirement may mandate 7-year retention; tiering saves 70%+ on old files
-
Consider Lambda SnapStart (for Java) or Provisioned Concurrency:
- If Python cold starts become an issue (unlikely for this workload)
- Provisioned concurrency keeps functions “warm” during business hours
-
Monitor Lambda duration carefully:
- If processing time approaches 15 minutes (Lambda’s max), refactor to:
- AWS Batch for longer jobs
- Step Functions to orchestrate multi-stage processing
- If processing time approaches 15 minutes (Lambda’s max), refactor to:
The exam tests your ability to identify the simplest, most managed solution. Real-world production adds layers of resilience and cost optimization—but the foundation (S3 + Lambda + Aurora) remains correct for this use case.