Choose ALB vs NLB Health Checks | SAA-C03

Table of Contents

The Jeff’s Note (Branding & Hook)
#

While preparing for the AWS SAA-C03, many candidates get confused by load balancer selection. In the real world, this is fundamentally a decision about protocol layer awareness vs. performance optimization. Let’s drill into a simulated scenario.

The Scenario
#

GlobalMart Digital operates an e-commerce checkout service that processes customer payments. The application runs on multiple Amazon EC2 instances managed by an Auto Scaling group, sitting behind a Network Load Balancer (NLB).

Recently, the operations team discovered a critical issue: when the application encounters HTTP 500 errors due to memory leaks, the NLB continues routing traffic to these failing instances because the underlying OS and network stack remain operational. These failures require manual identification and EC2 instance restarts, causing customer transaction failures and revenue loss.

The architecture team has been tasked with improving application availability automatically, without deploying custom monitoring scripts or developing proprietary health-check code.

Key Requirements
#

Implement an automated solution that detects application-layer HTTP failures and automatically remediates unhealthy instances, with zero custom code deployment.

The Options
#

A) Enable HTTP health checks on the Network Load Balancer by configuring the target group with the application’s health endpoint URL.
B) Deploy a cron job on each EC2 instance that parses local application logs every minute and restarts the web service when HTTP errors are detected.
C) Replace the Network Load Balancer with an Application Load Balancer (ALB). Configure HTTP health checks using the application’s health endpoint URL. Enable Auto Scaling group health check type to replace unhealthy instances automatically.
D) Create an Amazon CloudWatch alarm monitoring the NLB’s UnhealthyHostCount metric. Trigger an Auto Scaling action to replace instances when the alarm enters ALARM state.

Correct Answer
#

Option C.

Step-by-Step Winning Logic
#

This solution represents the optimal trade-off for the stated constraints:

Application-Layer Awareness: ALBs operate at Layer 7 (HTTP/HTTPS), enabling native parsing of HTTP response codes. Unlike NLBs (Layer 4), ALBs can distinguish between “instance is responding to TCP” vs. “application is returning HTTP 200.”
Zero Custom Code: The requirement explicitly prohibits scripting. ALB health checks + Auto Scaling group health check integration is a fully managed AWS capability requiring only configuration changes.
Automatic Remediation: By configuring the Auto Scaling group to use ELB health checks (instead of EC2 status checks), unhealthy instances are automatically terminated and replaced—no CloudWatch alarm logic required.
Best Practice Alignment: For HTTP/HTTPS workloads, AWS recommends ALBs. This aligns with the Well-Architected Framework’s Reliability pillar (automatic failure detection and recovery).

💎 The Architect’s Deep Dive: Why Options Fail
#

The Traps (Distractor Analysis)
#

Why not Option A? NLBs do NOT support HTTP health checks. While AWS documentation shows “HTTP” as a health check protocol option for NLB target groups, this only validates that the target can respond to an HTTP request—it does NOT evaluate HTTP response codes. The NLB still operates at Layer 4; it cannot detect application logic failures returning HTTP 500.
Why not Option B? Violates the “no custom scripts” requirement. Additionally, this approach introduces:
- Operational complexity (log parsing logic maintenance)
- Single point of failure (if cron fails, remediation stops)
- Audit/compliance risks (unmanaged code execution)
Why not Option D? UnhealthyHostCount for an NLB only reflects TCP connection failures, not HTTP errors. The scenario explicitly states “the NLB cannot detect HTTP errors”—so this metric won’t trigger for the actual failure condition (application returning 500 while TCP is healthy). This is a metric selection trap.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

The Architect Blueprint
#

graph TD
    User([Internet Users]) -->|HTTPS Traffic| ALB[Application Load Balancer]
    ALB -->|HTTP Health Check: GET /health| ASG[Auto Scaling Group]
    ASG --> EC2A[EC2 Instance A
Status: Healthy]
    ASG --> EC2B[EC2 Instance B
HTTP 500 Error]
    
    ALB -.->|Marks Unhealthy| EC2B
    ASG -.->|Triggers Replacement| EC2C[New EC2 Instance C]
    
    style EC2B fill:#ff6b6b,stroke:#c92a2a,color:#fff
    style EC2C fill:#51cf66,stroke:#2f9e44
    style ALB fill:#339af0,stroke:#1864ab,color:#fff

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

Diagram Note: The ALB performs HTTP health checks; when Instance B returns errors, it’s marked unhealthy and the ASG automatically provisions a replacement instance.

Real-World Practitioner Insight
#

Exam Rule
#

“For the exam, always pick ALB when you see keywords like ‘HTTP errors,’ ‘path-based routing,’ ‘host-based routing,’ or ‘WebSocket.’ Pick NLB for ‘static IP,’ ’extreme performance (millions of requests),’ or ‘TCP/UDP protocols.’”

Real World
#

Add observability first: Implement distributed tracing (AWS X-Ray) and structured logging (CloudWatch Logs Insights) to diagnose why the application is throwing 500 errors—memory leaks suggest an application bug.
Consider hybrid architecture: For extreme scale scenarios (>10M requests/sec), use NLB in front of ALB for the best of both worlds—NLB provides static IPs and connection handling, ALB provides Layer 7 intelligence.
Evaluate container migration: Persistent memory leaks often indicate the application would benefit from containerization (ECS/EKS) with proper resource limits and automatic restarts via orchestrator health checks.
Cost-optimize health checks: In production, we’d tune health check intervals and thresholds—the default 30-second interval may be too slow for a revenue-critical checkout flow; 5-10 seconds with 2 consecutive failures is more appropriate (though slightly more expensive).

The exam simplifies to “swap NLB for ALB,” but the root cause (memory leak) demands application-level remediation, not just infrastructure changes.

💎 Professional Decision Matrix

This SAA-C03 professional section is locked.
Free beta access reveals the exam logic.

100% Free Beta Access

The Jeff’s Note (Branding & Hook) #

The Scenario #

Key Requirements #

The Options #

Correct Answer #

Step-by-Step Winning Logic #

💎 The Architect’s Deep Dive: Why Options Fail #

The Traps (Distractor Analysis) #

💎 Professional Decision Matrix

The Architect Blueprint #

💎 Professional Decision Matrix

Real-World Practitioner Insight #

Exam Rule #

Real World #

💎 Professional Decision Matrix

Related Articles

The Jeff’s Note (Branding & Hook)
#

The Scenario
#

Key Requirements
#

The Options
#

Correct Answer
#

Step-by-Step Winning Logic
#

💎 The Architect’s Deep Dive: Why Options Fail
#

The Traps (Distractor Analysis)
#

The Architect Blueprint
#

Real-World Practitioner Insight
#

Exam Rule
#

Real World
#