While preparing for the AWS SAP-C02, many candidates get confused by hybrid storage synchronization patterns. In the real world, this is fundamentally a decision about migration strategy vs. continuous hybrid integration and protocol compatibility vs. operational overhead. Let’s drill into a simulated scenario.
The Scenario #
GlobalMedia Productions operates a Windows-based digital asset management system in their corporate data center. Their creative teams generate approximately 5 GB of new video project files daily on a central Windows file server using SMB protocol. The company has successfully migrated several Windows-based rendering workloads to AWS and established an AWS Direct Connect connection (10 Gbps) between their headquarters and the us-east-1 region.
The infrastructure team needs to make these project files accessible to cloud-based workloads while maintaining the existing on-premises file server for local teams. The solution must support native Windows file permissions (ACLs), minimize operational complexity, and maintain cost efficiency.
Key Requirements #
Design a solution that provides AWS-based Windows workloads with access to daily project files while maintaining compatibility with Windows file protocols and minimizing total cost of ownership.
The Options #
- A) Deploy AWS Storage Gateway in File Gateway mode to replace the existing Windows file server; reconfigure existing SMB shares to point to the new File Gateway endpoint.
- B) Configure AWS DataSync with a scheduled daily task to replicate data between the on-premises Windows file server and Amazon FSx for Windows File Server.
- C) Set up AWS Data Pipeline with a scheduled daily task to copy data between the on-premises Windows file server and Amazon Elastic File System (Amazon EFS).
- D) Configure AWS DataSync with a scheduled daily task to replicate data between the on-premises Windows file server and Amazon Elastic File System (Amazon EFS).
Correct Answer #
Option B: AWS DataSync + Amazon FSx for Windows File Server
Step-by-Step Winning Logic #
This solution represents the optimal trade-off for three critical reasons:
-
Protocol Compatibility: Amazon FSx for Windows File Server natively supports SMB protocol, Windows ACLs, Active Directory integration, and NTFS file systems—eliminating the compatibility translation layers required by EFS (which is NFS-based and POSIX-compliant).
-
Risk-Minimized Migration: AWS DataSync operates as a scheduled replication tool rather than requiring immediate infrastructure replacement. The on-premises file server remains operational, reducing migration risk and allowing gradual cutover validation.
-
Direct Connect Optimization: DataSync automatically uses the existing Direct Connect connection and handles bandwidth throttling, compression, and integrity verification—critical for 5 GB daily transfers over private connectivity.
-
Operational Simplicity: DataSync is a managed service requiring minimal configuration (agent installation + task scheduling), versus Storage Gateway’s requirement for EC2 instance management, cache sizing, and upload buffer tuning.
💎 The Architect’s Deep Dive: Why Options Fail #
The Traps (Distractor Analysis) #
-
Why not Option A (Storage Gateway File Gateway)?
- Migration Risk: Requires replacing the production Windows file server immediately, introducing significant risk. The question states the company “needs to provide access,” not “replace infrastructure.”
- Architectural Mismatch: File Gateway is designed for cloud-primary architectures with on-premises caching, not on-premises-primary with cloud replication.
- Operational Overhead: Requires managing gateway EC2 instances, cache storage sizing (based on working set), and monitoring upload buffer capacity—adding operational complexity.
- Cost Model: Incurs EC2 instance costs (minimum t3.xlarge for File Gateway ~$120/month) plus request pricing, whereas DataSync uses pay-per-GB-transferred model ($0.0125/GB = ~$1.88/month for 5 GB daily).
-
Why not Option C (Data Pipeline + EFS)?
- Service Deprecation Risk: AWS Data Pipeline is in maintenance mode; AWS explicitly recommends AWS Step Functions or Amazon MWAA for new orchestration workloads.
- Protocol Incompatibility: EFS uses NFS protocol (POSIX-compliant), not SMB. Windows workloads would require NFS client configuration and cannot leverage native Windows ACLs.
- Authentication Gap: EFS relies on IAM or NFS POSIX permissions, not Active Directory—breaking the Windows authentication model.
-
Why not Option D (DataSync + EFS)?
- Windows Protocol Gap: Same fundamental issue as Option C—EFS does not natively support SMB, Windows ACLs, or Active Directory integration.
- Hidden Integration Costs: Would require additional services (like AWS Directory Service) and custom permission mapping logic to translate Windows ACLs to POSIX permissions.
- Performance Limitation: EFS is optimized for Linux workloads; FSx for Windows is purpose-built with features like shadow copies, deduplication, and Windows-native caching.
The Architect Blueprint #
graph TB
subgraph "On-Premises Data Center"
WinServer[Windows File Server
SMB Shares
5 GB/day growth]
DSAgent[DataSync Agent
Installed on VM]
end
subgraph "AWS Direct Connect"
DX[Direct Connect
10 Gbps Private VIF]
end
subgraph "AWS us-east-1"
DSService[AWS DataSync Service
Scheduled Task: Daily 2 AM]
FSx[Amazon FSx for Windows
Multi-AZ Deployment
Active Directory Joined]
EC2Win[Windows EC2 Instances
Rendering Workloads]
end
WinServer --> DSAgent
DSAgent -->|Transfer via| DX
DX --> DSService
DSService -->|Replicate to| FSx
FSx -->|SMB Mount| EC2Win
style FSx fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff
style DSService fill:#3F8624,stroke:#232F3E,stroke-width:2px,color:#fff
style DX fill:#527FFF,stroke:#232F3E,stroke-width:2px,color:#fff
Diagram Note: AWS DataSync agent on-premises transfers data daily over Direct Connect to FSx for Windows File Server, where cloud-based Windows workloads access files via native SMB protocol without requiring on-premises infrastructure replacement.
The Decision Matrix #
| Option | Est. Complexity | Est. Monthly Cost | Pros | Cons |
|---|---|---|---|---|
| B: DataSync + FSx | Medium | $150-400 (FSx: $130 for 32 GB SSD + throughput; DataSync: $1.88 for 150 GB/month) | ✅ Native Windows protocol support ✅ Active Directory integration ✅ Low migration risk ✅ Managed service (no EC2 mgmt) ✅ Direct Connect optimized |
⚠️ FSx higher cost than EFS ⚠️ Requires DataSync agent install |
| A: File Gateway | High | $180-500 (EC2 t3.xlarge: $120; EBS cache: $40; S3 storage: $5; Data transfer: variable) | ✅ Real-time cache for on-prem access ✅ Multi-protocol support |
❌ Requires immediate server replacement ❌ High migration risk ❌ EC2 operational overhead ❌ Cache sizing complexity |
| C: Data Pipeline + EFS | High | $50-150 (EFS: $30 for 150 GB; Data Pipeline: $1/pipeline + compute) | ✅ Lower storage cost (EFS) | ❌ Service in maintenance mode ❌ No Windows protocol support ❌ Requires NFS client on Windows ❌ No native AD integration |
| D: DataSync + EFS | Medium-High | $35-100 (EFS: $30; DataSync: $1.88) | ✅ Lowest storage cost ✅ Managed sync service |
❌ Protocol incompatibility (NFS vs SMB) ❌ No Windows ACL support ❌ Requires authentication translation layer ❌ Performance not optimized for Windows |
Cost Calculation Notes:
- FSx: Assumes 32 GB initial capacity (grows with data) at $0.13/GB-month SSD + 8 MB/s throughput ($2.20/MB/s-month) = ~$130/month base
- DataSync: 5 GB/day × 30 days = 150 GB/month × $0.0125/GB = $1.88/month
- File Gateway: t3.xlarge (4 vCPU, 16 GB RAM minimum) + 150 GB cache EBS gp3 + S3 storage + request charges
- EFS: 150 GB × $0.30/GB-month (Standard storage class) = $45/month
FinOps Insight: While Option D appears $120/month cheaper, the hidden costs of protocol translation, reduced performance, and engineering time for custom authentication integration typically exceed $5,000 in the first year—making FSx the TCO winner for Windows workloads.
Real-World Practitioner Insight #
Exam Rule #
For the SAP-C02 exam, remember this decision tree:
- Windows workloads + SMB protocol → Always choose Amazon FSx for Windows File Server
- Scheduled, large-scale data transfer → Choose AWS DataSync (not Data Pipeline)
- Real-time caching with on-prem primary → Choose Storage Gateway File Gateway
- Linux workloads + NFS → Choose Amazon EFS
Real World #
In production environments, I would extend this architecture with:
-
Multi-Region DR: Configure FSx Multi-AZ in us-east-1 with cross-region backup to us-west-2 using AWS Backup (adds ~$15/month for 150 GB incremental backups).
-
DataSync Task Optimization:
- Schedule transfers during off-peak hours (2-4 AM) to maximize Direct Connect bandwidth for business-critical traffic.
- Enable DataSync bandwidth throttling to 2 Gbps (20% of DX capacity) to prevent impacting production workloads.
- Configure data verification and logging to CloudWatch for compliance auditing.
-
FSx Performance Tuning:
- Start with 32 GB storage (minimum) and 8 MB/s throughput, then monitor Amazon CloudWatch metrics (
DataReadBytes,DataWriteBytes) to right-size capacity. - Enable FSx automatic daily backups with 7-day retention ($0.05/GB-month = ~$7.50 for 150 GB).
- Start with 32 GB storage (minimum) and 8 MB/s throughput, then monitor Amazon CloudWatch metrics (
-
Cost Optimization:
- After 6 months, analyze CloudWatch access patterns: if 80% of files are accessed <1x/month, migrate cold data to FSx HDD storage tier (saves 50% on storage costs).
- Implement S3 Intelligent-Tiering for File Gateway’s backend S3 if considering that path in the future.
-
Hybrid Reality Check:
- The scenario assumes only 5 GB/day—but video production environments often spike to 50-200 GB/day during project crunch times.
- In reality, we’d provision Direct Connect with burst capacity margin and set up DataSync with multiple parallel tasks to handle variable load.
The Unspoken Trade-off: Many teams choose EFS initially due to cost, then spend 3 months building custom middleware to handle Windows authentication—ultimately migrating to FSx anyway. The exam tests whether you recognize this pattern upfront.