Deep Dive: AWS NLB Sticky Sessions (stickiness) Setup, Behavior, and Hidden Pitfalls

When you deploy applications behind a Network Load Balancer (NLB) in AWS, you usually expect perfect traffic distribution, fast, fair, and stateless.
But what if your backend holds stateful sessions, like in-memory login sessions, caching, or WebSocket connections and you need a given client to keep hitting the same target every time?

That’s where NLB sticky sessions (also called connection stickiness or source IP affinity) come in. They’re powerful but also misunderstood and misconfiguring them can lead to uneven load, dropped connections, or mysterious client “resets.”

Let’s break down exactly how they work, how to set them up, what to watch for, and how to troubleshoot the tricky edge cases that appear in production.

1. What Are Sticky Sessions on an NLB?

At a high level, sticky sessions ensure that traffic from the same client consistently lands on the same target (EC2 instance, IP, or container) behind your NLB.

Unlike the Application Load Balancer (ALB) — which uses HTTP cookies for stickiness, the NLB operates at Layer 4 (TCP/UDP).
That means it doesn’t look inside your packets. Instead, it bases stickiness on network-level parameters like:

Source IP address
Destination IP and port
Source port (sometimes included in the hash)
Protocol (TCP, UDP, or TLS passthrough)

AWS refers to this as “source IP affinity.”
When enabled, the NLB creates a flow-hash mapping that ties the client to a backend target.
As long as the hash remains the same, the same client gets routed to the same target — even across multiple connections.

2. Enabling Sticky Sessions on an AWS NLB

Stickiness is configured per target group, not at the NLB level.

Step-by-Step via AWS Console

Go to EC2 → Load Balancers → Target Groups
Find the target group your NLB listener uses.
Select the Target Group → Attributes tab
Under Attributes, set:

Stickiness.enabled = true
Stickiness.type = source_ip

Save changes and confirm the attributes are updated.

Step-by-Step via AWS CLI

```bash
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123 \
--attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=source_ip

How to Verify:

aws elbv2 describe-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123

Sample Output:

{
    "Attributes": [
        { "Key": "stickiness.enabled", "Value": "true" },
        { "Key": "stickiness.type", "Value": "source_ip" }
    ]
}

3. How NLB Stickiness Actually Works (Under the Hood)

The NLB’s flow hashing algorithm calculates a hash from several parameters, often the “five-tuple”:

<protocol, source IP, source port, destination IP, destination port>

The hash is used to choose a target. When stickiness is enabled, NLB remembers this mapping for some time (typically a few minutes to hours, depending on flow expiration).

Key Behavior Points:

If the same client connects again using the same IP and port, the hash matches == same backend target.
If any part of that tuple changes (e.g. client source port changes), the hash may change == client might hit a different target.
NLBs maintain this mapping in memory; if the NLB node restarts or fails over, the mapping is lost.
Sticky mappings can also be lost when cross-zone load balancing or target health status changes.

Not Cookie Based

Because NLBs don’t inspect HTTP traffic, there’s no cookie involved.
This means:

You can’t set session duration or expiry time like in ALB stickiness.
Stickiness only works as long as the same network path and source IP persist.

4. Known Limitations & Edge Cases

Sticky sessions on NLBs are helpful but brittle. Here’s what can go wrong:

Issue	Cause	Effect
Client source IP changes	NAT, VPN, mobile switching networks	Hash changes → new target
Different source port	Client opens multiple sockets or reconnects	Each connection may map differently
TLS termination at NLB	NLB terminates TLS	Stickiness not supported (only for TCP listeners)
Unhealthy target	Health check fails	Mapping breaks; NLB reroutes
Cross-zone load balancing toggled	Distribution rules change	May break existing sticky mappings
DNS round-robin at client	NLB has multiple IPs per AZ	Client DNS resolver may change NLB node
UDP behavior	Stateless packets; different flow hash	Stickiness unreliable for UDP
Scaling up/down	New targets added	Hash table rebalanced; some clients remapped

Tip: If you rely on stickiness, keep your clients stable (same IP) and avoid frequent target registration changes.

5. Troubleshooting Sticky Session Problems

When things go wrong, these are the most common patterns you’ll see:

1. “Stickiness not working”

Check target group attributes: aws elbv2 describe-target-group-attributes --target-group-arn <arn> Ensure stickiness.enabled is true.
Make sure your listener protocol is TCP, not TLS.
Confirm that client IPs aren’t being rewritten by NAT or proxy.
Check CloudWatch metrics. If one target gets all the traffic, stickiness might be too “sticky” due to limited source IP variety.

2. “Some clients lose session state randomly”

Verify client network stability. Mobile clients or corporate proxies can rotate IPs.
Confirm health checks aren’t flapping targets.
Review your application session design, if session data lives in memory, consider an external session store (Redis, DynamoDB, etc.).

3. “Load imbalance: one instance overloaded”

This can happens when many users share one public IP (common in offices or ISPs).
All those clients hash to the same backend.
Mitigate by:
- Disabling stickiness if not strictly required.
- Using ALB with cookie based stickiness (more granular).
- Scaling target capacity.

4. “Connections drop after some time”

NLB may remove stale flow mappings.
Check TCP keepalive settings on clients and targets. Ensure keepalive_time < NLB idle timeout (350 seconds) to prevent connection resets. Linux commands below:

# Check keepalive time (seconds before sending first keepalive probe)
sysctl net.ipv4.tcp_keepalive_time

# Check keepalive interval (seconds between probes)
sysctl net.ipv4.tcp_keepalive_intvl

# Check keepalive probes (number of probes before giving up)
sysctl net.ipv4.tcp_keepalive_probes

# View all at once
sysctl -a | grep tcp_keepalive

Verify idle timeout on backend apps (e.g., web servers closing connections too early).

6. Observability & Testing

You can validate sticky behavior with:

CloudWatch metrics:
ActiveFlowCount, NewFlowCount, and per target request metrics.
VPC Flow Logs: confirm that repeated requests from the same client IP go to the same backend ENI.
Packet captures: Use tcpdump or ss on your backend instances to see if the same source IP consistently connects.

Quick test with curl:

for i in {1..100}; do 
    echo "=== Request $i at $(date) ===" | tee -a curl_test.log
    curl http://<nlb-dns-name>/ -v 2>&1 | tee -a curl_test.log
    sleep 0.5
done

Run it from the same host and check which backend responds (log hostname on each instance).
Then try from another IP or VPN; you’ll likely see a different target.

7. Best Practices

Only enable stickiness if necessary.
Stateless applications scale better without it.
If using TLS: terminate TLS at the backend or use ALB if you need session affinity.
Use shared session stores.
Tools like ElastiCache (Redis) or DynamoDB make scaling simpler and safer.
Avoid toggling cross-zone load balancing during traffic, it resets the sticky map.
Set up proper health checks. Unhealthy targets break affinity immediately.
Monitor uneven load. Large NAT’d user groups can overload a single instance.
For UDP consider designing idempotent stateless processing; sticky sessions may not behave reliably.

8. Example Architecture Pattern

Scenario: A multiplayer game server behind an NLB.
Each player connects via TCP to the game backend that stores their in-memory state.

✅ Recommended setup:

Enable stickiness.enabled = true and stickiness.type = source_ip
Disable TLS termination at NLB
Keep targets in the same AZ with cross-zone load balancing disabled to maintain stable mapping
Maintain external health and scaling logic to avoid frequent re-registrations

This setup ensures that the same player IP always lands on the same backend server, as long as their network path is stable.

9. Summary Table

Attribute	Supported Value	Notes
`stickiness.enabled`	true / false	Enables sticky sessions
`stickiness.type`	source_ip	Only option for NLB
Supported Protocols	TCP, UDP (limited)	Not supported for TLS listeners
Persistence Duration	Until flow reset	Not configurable
Cookie-based Stickiness	❌ No	Use ALB for cookie-based
Best for	Stateful TCP apps	e.g. games, custom protocols

10. When to Use ALB Instead

If you’re dealing with HTTP/HTTPS applications that manage user sessions via cookies or tokens, you’ll be much happier using an Application Load Balancer.
It offers:

Configurable cookie duration
Per application stickiness
Layer 7 routing and metrics

The NLB should be reserved for high performance, low latency, or non HTTP workloads that need raw TCP/UDP handling.

11. Closing Thoughts

AWS NLB sticky sessions are a great feature, but they’re not magic glue.
They work well when your network topology and client IPs are predictable, and your app genuinely needs flow affinity. However, if your environment involves NATs, mobile networks, or frequent scale-ups, expect surprises.

When in doubt:
1. Keep your app stateless,
2. Let the load balancer do its job, and
3. Use stickiness only as a last resort for legacy or session bound systems.

🧩 References

AWS Docs: NLB Target Group Stickiness
AWS Prescriptive Guidance: Load Balancer Stickiness Strategies
AWS re:Post — NLB Sticky Sessions Explained
Cloudar Blog: “Why AWS NLB Stickiness Is Not Always Sticky”