While preparing for the AWS SAA-C03, many candidates get confused by data transfer service selection. In the real world, this is fundamentally a decision about Reliability vs. Cost vs. Security. Let’s drill into a simulated scenario.
The Scenario #
MicroPrecision Manufacturing operates a semiconductor fabrication facility producing microchips. Their production floor hosts 40+ precision measurement devices that generate approximately 10TB of telemetry data daily in JSON format. Currently, this data resides on a Storage Area Network (SAN) within their on-premises data center adjacent to the factory floor.
The engineering leadership wants to centralize this data in AWS S3 to enable multiple downstream systems—including their real-time quality control platform, predictive maintenance engine, and compliance reporting tool—to access the data with minimal latency.
Critical constraint: The sensor data contains proprietary manufacturing parameters and defect signatures considered trade secrets. Exposure during transmission could lead to competitive disadvantage and regulatory penalties.
Key Requirements #
Select the most reliable solution to transfer 10TB of sensitive JSON files daily from the on-premises SAN to Amazon S3 while ensuring secure transmission.
The Options #
- A) AWS DataSync over the public internet
- B) AWS DataSync over AWS Direct Connect
- C) AWS Database Migration Service (DMS) over the public internet
- D) AWS Database Migration Service (DMS) over AWS Direct Connect
Correct Answer #
Option B: AWS DataSync over AWS Direct Connect.
Step-by-Step Winning Logic #
This solution optimally balances reliability, security, and purpose-built functionality:
-
Service Selection (DataSync vs. DMS):
- AWS DataSync is purpose-built for transferring large volumes of file/object data between on-premises storage and AWS storage services (S3, EFS, FSx)
- It handles network optimization, parallel transfers, data validation, and incremental transfers automatically
- AWS DMS is designed for database migration and continuous replication, not file-based JSON data stored on SAN
-
Network Path (Direct Connect vs. Public Internet):
- AWS Direct Connect provides a dedicated, private network connection from on-premises to AWS
- Reliability: Consistent bandwidth (no “best effort” variability of internet), SLA-backed performance
- Security: Traffic never traverses the public internet, reducing attack surface (though encryption is still recommended)
- Predictability: For 10TB daily transfers, dedicated bandwidth ensures “near real-time” SLA can be met
-
Why This Matters for SAA-C03:
- The exam tests whether you understand service purpose (DataSync for files, DMS for databases)
- It evaluates your grasp of security best practices (private connectivity for sensitive data)
- It validates reliability thinking (dedicated vs. shared network paths)
💎 The Architect’s Deep Dive: Why Options Fail #
The Traps (Distractor Analysis) #
-
Why not Option A (DataSync over public internet)?
- Correct service, wrong network path. While DataSync encrypts data in transit, public internet introduces:
- Variable latency and bandwidth (threats to “near real-time” requirement)
- Higher security risk profile (data traverses shared infrastructure)
- Unpredictable transfer times for 10TB (could impact downstream analytics SLAs)
- Correct service, wrong network path. While DataSync encrypts data in transit, public internet introduces:
-
Why not Option C (DMS over public internet)?
- Wrong service entirely. AWS DMS is optimized for:
- Database schema conversion and replication
- Change Data Capture (CDC) for ongoing database synchronization
- It does NOT efficiently handle file-based data like JSON files on SAN storage
- Wrong service entirely. AWS DMS is optimized for:
-
Why not Option D (DMS over Direct Connect)?
- Right network, wrong service. Same fundamental mismatch as Option C—using a database tool for a file transfer job adds unnecessary complexity and poor performance
The Architect Blueprint #
graph TB
A[On-Premises SAN
10TB JSON/day] -->|DataSync Agent| B[AWS Direct Connect
Dedicated 10Gbps]
B -->|Private VIF| C[AWS DataSync Service]
C -->|Optimized Transfer
Validation & Encryption| D[Amazon S3 Bucket]
D --> E1[Quality Control System]
D --> E2[Predictive Maintenance]
D --> E3[Compliance Reporting]
style B fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff
style C fill:#FF9900,stroke:#232F3E,stroke-width:2px,color:#fff
style D fill:#569A31,stroke:#232F3E,stroke-width:2px,color:#fff
Diagram Note: DataSync agent installed on-premises pulls data from SAN, transfers via Direct Connect’s private virtual interface to AWS DataSync service, which optimizes upload to S3 where downstream analytics systems consume the data.
Real-World Practitioner Insight #
Exam Rule #
For the AWS SAA-C03 exam:
- When you see “large file transfers” + “on-premises to S3” → think DataSync
- When you see “sensitive data” + “reliable/predictable transfer” → prefer Direct Connect over public internet
- DMS only appears correct when the source/target explicitly mentions databases (RDS, Aurora, on-prem SQL/Oracle)
Real World #
In production environments, we’d layer additional considerations:
- Hybrid Approach: Start with DataSync over VPN (cheaper, faster to provision) while Direct Connect circuit is being installed (typically 2-4 weeks lead time)
- Cost Optimization: Evaluate whether 10TB/day is steady-state or temporary migration—Direct Connect has monthly port charges ($0.30/hour for 1Gbps = ~$216/month) plus data transfer out costs
- Transfer Acceleration: For geographically distant factories, might combine DataSync with S3 Transfer Acceleration for initial sync
- Data Lifecycle: Implement S3 Intelligent-Tiering or lifecycle policies to move older telemetry to Glacier if only recent data needs “near real-time” access
- Network Redundancy: Production systems typically use redundant Direct Connect connections across diverse paths/locations for true high availability
FinOps Reality Check: For a sustained 300TB/month:
- Direct Connect (10Gbps port): ~$2,280/month port fee
- Data transfer out via DX: ~$7,500/month (300TB × $0.025/GB for first 10TB tier)
- Total: ~$9,780/month vs. public internet egress at standard rates (~$27,000/month at $0.09/GB)
- ROI: Direct Connect pays for itself while providing better security and reliability