While preparing for the SAA-C03, many candidates get confused by when to use SNS, SQS, or Kinesis for event intake. In the real world, this is fundamentally a decision about durability and fanout vs. operational overhead. Let’s drill into a simulated scenario.
The Scenario #
A mid-sized insurtech company, Harborline Coverage, runs a web app where customers request insurance quotes. Each request must be routed to the correct underwriting workflow based on quote type (e.g., Auto, Home, Travel).
The platform team has strict expectations:
- quote requests must be durably captured (no loss),
- every request must receive a response within 24 hours,
- the architecture should maximize operational efficiency and minimize maintenance as the team is small.
Key Requirements #
- Reliability: No lost quote requests (durable buffering).
- Routing: Requests must be separated by quote type for downstream processing.
- Time window: Processing can be asynchronous but must complete within 24 hours.
- Ops goal: Prefer managed services; avoid fleets, shard math, and custom consumers.
The Options #
- A) Create multiple Amazon Kinesis Data Streams by quote type. The web app publishes to the correct stream. Each backend consumer group uses KCL to pull from its stream.
- B) For each quote type, create an AWS Lambda function and an Amazon SNS topic. Subscribe the Lambda to the SNS topic. The app publishes to the appropriate topic.
- C) Create one Amazon SNS topic and subscribe multiple Amazon SQS queues. Use SNS message filtering so messages are routed to the correct SQS queue by quote type. Each backend service consumes from its own queue.
- D) Create multiple Kinesis Data Firehose delivery streams by quote type into an Amazon OpenSearch Service cluster. Backends query OpenSearch to find and process messages.
Correct Answer #
C
The Winning Logic #
SNS + SQS with message filtering is the cleanest, lowest-ops way to achieve durable ingestion + type-based routing:
- No-loss buffering: SQS provides durable storage of messages until a consumer successfully processes them.
- Built-in routing: SNS message filtering routes events to the correct SQS queue without maintaining multiple ingestion pipelines.
- Asynchronous by design: The 24-hour requirement is naturally met by queue-based processing, with retries and backpressure handling.
- Operational efficiency: No shard scaling (Kinesis), no consumer checkpoint management (KCL), and no “search your queue” anti-pattern (OpenSearch).
FinOps note: This design usually yields a predictable, usage-based cost profile—pay per request and queue usage—without standing up always-on analytics/search infrastructure.
💎 Professional-Level Analysis #
This section breaks down the scenario from a professional exam perspective, focusing on constraints, trade-offs, and the decision signals used to eliminate incorrect options.
🔐 Expert Deep Dive: Why Options Fail #
This walkthrough explains how the exam expects you to reason through the scenario step by step, highlighting the constraints and trade-offs that invalidate each incorrect option.
Prefer a quick walkthrough before diving deep?
[Video coming soon] This short walkthrough video explains the core scenario, the key trade-off being tested, and why the correct option stands out, so you can follow the deeper analysis with clarity.
🔐 The Traps (Distractor Analysis) #
This section explains why each incorrect option looks reasonable at first glance, and the specific assumptions or constraints that ultimately make it fail.
The difference between the correct answer and the distractors comes down to one decision assumption most candidates overlook.
-
Why not A (Kinesis Data Streams + KCL)?
Works technically, but it’s overkill here and increases ops: shard sizing, scaling policies, enhanced fanout considerations, consumer checkpointing, and per-stream management per quote type. This violates “minimize maintenance.” -
Why not B (SNS → Lambda per type)?
SNS to Lambda is simple, but the requirement says requests must not be lost. SNS-to-Lambda alone can fail deliveries under certain error modes unless you add durable buffering (e.g., SQS as an intermediary) or configure robust failure handling (DLQs). Also, creating separate topic+Lambda per type increases sprawl. -
Why not D (Firehose → OpenSearch, then search to process)?
Firehose is for delivery to analytics/storage destinations, not for building a reliable work queue. OpenSearch is not a task queue, and using search queries to discover “unprocessed” work creates correctness and concurrency problems (duplicate work, missed work) plus high cost and ops burden.
🔐 The Solution Blueprint #
This blueprint visualizes the expected solution, showing how services interact and which architectural pattern the exam is testing.
Seeing the full solution end to end often makes the trade-offs—and the failure points of simpler options—immediately clear.
graph TD
U[Customers on Web App] --> A[Quote API]
A -->|Publish event w/ attribute: quoteType| SNS[Amazon SNS Topic]
SNS -->|Filter: Auto| Q1[Amazon SQS Queue - Auto]
SNS -->|Filter: Home| Q2[Amazon SQS Queue - Home]
SNS -->|Filter: Travel| Q3[Amazon SQS Queue - Travel]
Q1 --> C1[Auto Processing Service]
Q2 --> C2[Home Processing Service]
Q3 --> C3[Travel Processing Service]
- Diagram Note: The API publishes once to SNS, SNS filters by
quoteType, and each workflow consumes reliably from its own SQS queue.
🔐 Real-World Practitioner Insight #
This section connects the exam scenario to real production environments, highlighting how similar decisions are made—and often misjudged—in practice.
This is the kind of decision that frequently looks correct on paper, but creates long-term friction once deployed in production.
Exam Rule #
When you see “route messages by attribute” + “no loss” + “low ops”, the exam-friendly pattern is SNS + SQS with SNS message filtering.
Real World #
In production, you’d typically add:
- DLQs on each SQS queue for poison messages,
- visibility timeout tuned to processing time,
- idempotency keys in the processors to handle retries safely,
- and potentially FIFO queues if strict ordering per customer/policy is required (not stated here).