Skip to main content
  1. Home
  2. >
  3. AWS
  4. >
  5. SAA-C03
  6. >
  7. AWS SAA-C03 Exam Scenarios
  8. >
  9. Event-Driven S3 Processing Decision | SAA-C03

Event-Driven S3 Processing Decision | SAA-C03

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | Multi-Cloud Architect & Strategist.

While preparing for the AWS SAA-C03, many candidates get confused by event-driven architecture patterns. In the real world, this is fundamentally a decision about operational overhead vs. cost efficiency for unpredictable workloads. Let’s drill into a simulated scenario.

The Scenario
#

TechMetrics Analytics, a financial data aggregation startup, is building a user-facing platform where retail investors upload bank statements and transaction records (CSV, PDF, TXT format) to extract spending patterns. Files range from 50KB to 5MB.

Once uploaded, each file must undergo a one-time transformation—parsing text, normalizing data, and outputting structured JSON for downstream ML analysis.

Traffic is highly unpredictable: During tax season (March-April), users upload 50,000+ files daily. In off-peak months, uploads drop to fewer than 500/day or even zero on weekends.

The engineering team has no dedicated DevOps staff and wants to avoid managing clusters or auto-scaling policies.

Key Requirements
#

Design a solution that:

  1. Processes files as soon as they’re uploaded (near real-time).
  2. Handles extreme variability in upload volume (50x difference between peak/off-peak).
  3. Minimizes operational overhead (no server patching, scaling configuration, or cluster management).
  4. Stores transformed JSON for querying by the analytics engine.

The Options
#

  • A) Deploy an Amazon EMR cluster to read files from S3. Run Spark jobs to transform data. Store JSON output in Amazon Aurora.
  • B) Configure S3 Event Notifications to send messages to SQS. Use EC2 instances (with Auto Scaling) to poll the queue and process files. Store JSON in DynamoDB.
  • C) Configure S3 Event Notifications to send messages to SQS. Use AWS Lambda to poll the queue and process files. Store JSON in DynamoDB.
  • D) Use Amazon EventBridge to capture S3 upload events and forward them to Kinesis Data Streams. Use Lambda to consume the stream and process files. Store JSON in Amazon Aurora.

Correct Answer
#

Option C – S3 Event Notifications → SQS → Lambda → DynamoDB.

Step-by-Step Winning Logic
#

This solution embodies the “serverless-first” principle for event-driven, variable workloads:

  1. S3 Event Notifications trigger instantly when files land—no polling overhead.
  2. SQS acts as a buffer, decoupling S3 from processing and providing retry logic if Lambda fails.
  3. Lambda scales automatically from 0 to 1,000+ concurrent executions without configuration—perfect for 50x traffic swings.
  4. DynamoDB (on-demand mode) scales writes automatically and charges only for actual usage—no wasted capacity during off-peak periods.

Key Advantage: Zero operational overhead—no servers to patch, no clusters to tune, no auto-scaling policies to debug.

Cost Efficiency: During off-peak (500 files/day), Lambda might cost $5-10/month. EMR (Option A) would burn $300-500/month even when idle.


💎 Professional-Level Analysis
#

This section breaks down the scenario from a professional exam perspective, focusing on constraints, trade-offs, and the decision signals used to eliminate incorrect options.

🔐 Expert Deep Dive: Why Options Fail
#

This walkthrough explains how the exam expects you to reason through the scenario step by step, highlighting the constraints and trade-offs that invalidate each incorrect option.

Prefer a quick walkthrough before diving deep?
[Video coming soon] This short walkthrough video explains the core scenario, the key trade-off being tested, and why the correct option stands out, so you can follow the deeper analysis with clarity.

🔐 The Traps (Distractor Analysis)
#

This section explains why each incorrect option looks reasonable at first glance, and the specific assumptions or constraints that ultimately make it fail.

The difference between the correct answer and the distractors comes down to one decision assumption most candidates overlook.

Why not Option A (EMR + Aurora)?
#

  • Massive overkill: EMR is designed for big data analytics (terabytes of processing), not simple file transformations.
  • Operational burden: Requires cluster management, version upgrades, and cost optimization tuning.
  • Cost disaster: Even a small EMR cluster costs $200-400/month 24/7, regardless of workload.
  • Aurora is wrong storage: Relational DB for JSON? DynamoDB is purpose-built for document storage.

Verdict: Over-engineered and prohibitively expensive for this use case.

Why not Option B (EC2 + SQS + DynamoDB)?
#

  • Better than Option A, but still requires:
    • Configuring Auto Scaling Groups (ASG) with scaling policies.
    • Managing EC2 AMIs, patching OS/dependencies.
    • Paying for minimum instance count even during zero-traffic periods (idle cost).
  • Hidden costs: Even with t3.micro instances, running 2 instances 24/7 = ~$15/month idle cost. Lambda charges $0 when idle.

Verdict: Violates the “minimal operational overhead” requirement.

Why not Option D (EventBridge + Kinesis + Lambda + Aurora)?
#

  • Kinesis is overkill: Kinesis Data Streams is for real-time streaming analytics with multiple consumers (e.g., dashboards + ML pipelines). This scenario has one consumer (file transformation).
  • Unnecessary cost: Kinesis costs $0.015/hour per shard (minimum $11/month), even with zero traffic.
  • EventBridge adds complexity: S3 already supports native event notifications to SQS—why add an extra hop?
  • Aurora misalignment: Relational DB is wrong for storing JSON documents (DynamoDB excels here).

Verdict: Architecturally sound but unnecessarily complex and costly.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

🔐 The Solution Blueprint
#

This blueprint visualizes the expected solution, showing how services interact and which architectural pattern the exam is testing.

Seeing the full solution end to end often makes the trade-offs—and the failure points of simpler options—immediately clear.

graph TD
    User([Retail Investor]) -->|Uploads CSV/PDF| S3[Amazon S3 Bucket]
    S3 -->|S3 Event Notification| SQS[Amazon SQS Queue]
    SQS -->|Triggers| Lambda[AWS Lambda Function]
    Lambda -->|1. Reads file from S3| S3
    Lambda -->|2. Transforms to JSON| Processing[Data Normalization Logic]
    Processing -->|3. Writes JSON| DynamoDB[(Amazon DynamoDB)]
    DynamoDB -->|Queried by| Analytics[Analytics Engine]
    
    style Lambda fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff
    style SQS fill:#FF4F8B,stroke:#232F3E,stroke-width:2px,color:#fff
    style DynamoDB fill:#4053D6,stroke:#232F3E,stroke-width:2px,color:#fff

Diagram Note: S3 events trigger SQS messages, Lambda polls the queue (automatic scaling), processes files, and writes JSON to DynamoDB—fully serverless with zero idle resources.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

🔐 Real-World Practitioner Insight
#

This section connects the exam scenario to real production environments, highlighting how similar decisions are made—and often misjudged—in practice.

This is the kind of decision that frequently looks correct on paper, but creates long-term friction once deployed in production.

Exam Rule
#

“For the SAA-C03 exam, when you see:

  • “variable/unpredictable workload” + “minimal operational overhead” → Always choose Lambda.
  • “event-driven processing” + “S3 uploads” → Use S3 Event Notifications (not EventBridge unless multi-target routing is required).
  • “simple transformation” → Avoid EMR/Kinesis (reserved for big data/streaming analytics).

Real World
#

In production, I’d add:

  1. Dead Letter Queue (DLQ) on SQS to capture failed processing attempts.
  2. CloudWatch Alarms on Lambda errors and SQS message age.
  3. S3 Lifecycle Policies to archive raw files to Glacier after 90 days (FinOps optimization).
  4. DynamoDB Global Tables if multi-region read replicas are needed for the analytics engine.

Cost Reality Check: For a startup processing 10,000 files/month:

  • Option C (Lambda + DynamoDB): ~$50-80/month.
  • Option B (EC2): ~$100-150/month (plus DevOps time).
  • Option A (EMR): ~$500-800/month (financial suicide for a startup).

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access