Dublin Traceroute on macOS: A Complete Installation and Usage Guide

Modern networks are far more complex than the simple point to point paths of the early internet. Equal Cost Multi Path (ECMP) routing, carrier grade NAT, and load balancing mean that packets from your machine to a destination might traverse entirely different network paths depending on flow hashing algorithms. Traditional traceroute tools simply cannot handle this complexity, often producing misleading or incomplete results. Dublin Traceroute solves this problem.

This guide provides a detailed walkthrough of installing Dublin Traceroute on macOS, addressing the common Xcode compatibility issues that plague the build process, and exploring the tool’s advanced capabilities for network path analysis.

1. Understanding Dublin Traceroute

1.1 What is Dublin Traceroute?

Dublin Traceroute is a NAT aware multipath tracerouting tool developed by Andrea Barberio. Unlike traditional traceroute utilities, it uses techniques pioneered by Paris traceroute to enumerate all possible network paths in ECMP environments, while adding novel NAT detection capabilities.

The tool addresses a fundamental limitation of classic traceroute. When multiple equal cost paths exist between source and destination, traditional traceroute cannot distinguish which path each packet belongs to, potentially showing you a composite “ghost path” that no real packet actually traverses.

1.2 How ECMP Breaks Traditional Traceroute

Consider a network topology where packets from host A to host F can take two paths:

A → B → D → F
A → C → E → F

Traditional traceroute sends packets with incrementing TTL values and records the ICMP Time Exceeded responses. However, because ECMP routers hash packets to determine their path (typically based on source IP, destination IP, source port, destination port, and protocol), successive traceroute packets may be routed differently.

The result? Traditional traceroute might show you something like A → B → E → F which is a path that doesn’t actually exist in your network. This phantom path combines hops from two different real paths, making network troubleshooting extremely difficult.

1.3 The Paris Traceroute Innovation

The Paris traceroute team invented a technique that keeps the flow identifier constant across all probe packets. By maintaining consistent values for the fields that routers use for ECMP hashing, all probes follow the same path. Dublin Traceroute implements this technique and extends it.

1.4 Dublin Traceroute’s NAT Detection

Dublin Traceroute introduces a unique NAT detection algorithm. It forges a custom IP ID in outgoing probe packets and tracks these identifiers in ICMP response packets. When a response references an outgoing packet with different source/destination addresses or ports than what was sent, this indicates NAT translation occurred at that hop.

For IPv6, where there is no IP ID field, Dublin Traceroute uses the payload length field to achieve the same tracking capability.

2. Prerequisites and System Requirements

Before installing Dublin Traceroute, ensure your system meets these requirements:

2.1 macOS Version

Dublin Traceroute builds on macOS, though the maintainers note that macOS “breaks at every major release”. Currently supported versions include macOS Monterey, Ventura, Sonoma, and Sequoia. The Apple Silicon (M1/M2/M3/M4) Macs work correctly with Homebrew’s ARM native builds.

2.2 Xcode Command Line Tools

The Xcode Command Line Tools are mandatory. Verify your installation:

# Check if CLT is installed
xcode-select -p

Expected output for CLT only:

/Library/Developer/CommandLineTools

Expected output if full Xcode is installed:

/Applications/Xcode.app/Contents/Developer

Check the installed version:

pkgutil --pkg-info=com.apple.pkg.CLTools_Executables

Output example:

package-id: com.apple.pkg.CLTools_Executables
version: 16.0.0
volume: /
location: /
install-time: 1699012345

2.3 Homebrew

Homebrew is the recommended package manager for installing dependencies. Verify or install:

# Check if Homebrew is installed
which brew

# If not installed, install it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

For Apple Silicon Macs, ensure the Homebrew path is in your shell configuration:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
source ~/.zprofile

3. Installing Xcode Command Line Tools

3.1 Fresh Installation

If you don’t have the Command Line Tools installed:

xcode-select --install

A dialog will appear prompting you to install. Click “Install” and wait for the download to complete (typically 1 to 2 GB).

3.2 Updating Existing Installation

After a macOS upgrade, your Command Line Tools may be outdated. Update via Software Update:

softwareupdate --list

Look for entries like Command Line Tools for Xcode-XX.X and install:

softwareupdate --install "Command Line Tools for Xcode-16.0"

Alternatively, download directly from Apple Developer:

  1. Visit https://developer.apple.com/download/more/
  2. Sign in with your Apple ID
  3. Search for “Command Line Tools”
  4. Download the version matching your macOS

3.3 Resolving Version Conflicts

A common issue occurs when both full Xcode and Command Line Tools are installed with mismatched versions. Check which is active:

xcode-select -p

If it points to Xcode.app but you want to use standalone CLT:

sudo xcode-select --switch /Library/Developer/CommandLineTools

To switch back to Xcode:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

3.4 The Xcode 26.0 Homebrew Bug

If you see an error like:

Warning: Your Xcode (16.1) at /Applications/Xcode.app is too outdated.
Please update to Xcode 26.0 (or delete it).

This is a known Homebrew bug on macOS Tahoe betas where placeholder version mappings reference non existent Xcode versions. The workaround:

# Force Homebrew to use the CLT instead
sudo xcode-select --switch /Library/Developer/CommandLineTools

# Or ignore the warning if builds succeed
export HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK=1

3.5 Complete Reinstallation

For persistent issues, perform a clean reinstall:

# Remove existing CLT
sudo rm -rf /Library/Developer/CommandLineTools

# Reinstall
xcode-select --install

After installation, verify the compiler works:

clang --version

Expected output:

Apple clang version 16.0.0 (clang-1600.0.26.3)
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

4. Installing Dependencies

Dublin Traceroute requires several libraries that must be installed before building.

4.1 Core Dependencies

brew install cmake
brew install pkg-config
brew install libtins
brew install jsoncpp
brew install libpcap

Verify the installations:

brew list libtins
brew list jsoncpp

4.2 Handling the jsoncpp CMake Discovery Issue

A common build failure occurs when CMake cannot find jsoncpp even though it’s installed:

CMake Error at /usr/local/Cellar/cmake/3.XX.X/share/cmake/Modules/FindPkgConfig.cmake:696 (message):
  None of the required 'jsoncpp' found

This happens because jsoncpp’s pkg-config file may not be in the expected location. Fix this by setting the PKG_CONFIG_PATH:

# For Intel Macs
export PKG_CONFIG_PATH="/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH"

# For Apple Silicon Macs
export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH"

Add this to your shell profile for persistence:

echo 'export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH"' >> ~/.zshrc
source ~/.zshrc

4.3 Dependencies for Python Bindings and Visualization

For the full feature set including graphical output:

brew install graphviz
brew install python@3.11
pip3 install pygraphviz pandas matplotlib tabulate

If pygraphviz fails to install, you need to specify the graphviz paths:

export CFLAGS="-I $(brew --prefix graphviz)/include"
export LDFLAGS="-L $(brew --prefix graphviz)/lib"

pip3 install pygraphviz

Alternatively, use the global option syntax:

pip3 install \
    --config-settings="--global-option=build_ext" \
    --config-settings="--global-option=-I$(brew --prefix graphviz)/include/" \
    --config-settings="--global-option=-L$(brew --prefix graphviz)/lib/" \
    pygraphviz

5. Installing Dublin Traceroute

5.1 Method 1: Homebrew Formula (Recommended)

Dublin Traceroute provides a Homebrew formula, though it’s not in the official repository:

# Download the formula
wget https://raw.githubusercontent.com/insomniacslk/dublin-traceroute/master/homebrew/dublin-traceroute.rb

# Install using the local formula
brew install ./dublin-traceroute.rb

If wget is not available:

curl -O https://raw.githubusercontent.com/insomniacslk/dublin-traceroute/master/homebrew/dublin-traceroute.rb
brew install ./dublin-traceroute.rb

5.2 Method 2: Building from Source

For more control over the build process:

# Clone the repository
git clone https://github.com/insomniacslk/dublin-traceroute.git
cd dublin-traceroute

# Create build directory
mkdir build && cd build

# Configure with CMake
cmake .. \
    -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DCMAKE_BUILD_TYPE=Release

# Build
make -j$(sysctl -n hw.ncpu)

# Install
sudo make install

5.3 Troubleshooting Build Failures

libtins Not Found

CMake Error: Could not find libtins

Fix:

# Ensure libtins is properly linked
brew link --force libtins

# Set CMake prefix path
cmake .. -DCMAKE_PREFIX_PATH="$(brew --prefix)"

Missing Headers

fatal error: 'tins/tins.h' file not found

Fix by specifying include paths:

cmake .. \
    -DCMAKE_INCLUDE_PATH="$(brew --prefix libtins)/include" \
    -DCMAKE_LIBRARY_PATH="$(brew --prefix libtins)/lib"

googletest Submodule Warning

-- googletest git submodule is absent. Run `git submodule init && git submodule update` to get it

This is informational only and doesn’t prevent the build. To silence it:

cd dublin-traceroute
git submodule init
git submodule update

5.4 Setting Up Permissions

Dublin Traceroute requires raw socket access. On macOS, this typically means running as root:

sudo dublin-traceroute 8.8.8.8

For convenience, you can set the setuid bit (security implications should be understood):

# Find the installed binary
DTPATH=$(which dublin-traceroute)

# If it's a symlink, get the real path
DTREAL=$(greadlink -f "$DTPATH")

# Set ownership and setuid
sudo chown root:wheel "$DTREAL"
sudo chmod u+s "$DTREAL"

Note: Homebrew’s security model discourages setuid binaries. The recommended approach is to use sudo explicitly.

6. Installing Python Bindings

The Python bindings provide additional features including visualization and statistical analysis.

6.1 Installation

pip3 install dublintraceroute

If the C++ library isn’t found:

# Ensure the library is in the expected location
sudo cp /usr/local/lib/libdublintraceroute* /usr/lib/

# Or set the library path
export DYLD_LIBRARY_PATH="/usr/local/lib:$DYLD_LIBRARY_PATH"

pip3 install dublintraceroute

6.2 Verification

import dublintraceroute
print(dublintraceroute.__version__)

7. Basic Usage

7.1 Simple Traceroute

sudo dublin-traceroute 8.8.8.8

Output:

Starting dublin-traceroute
Traceroute from 0.0.0.0:12345 to 8.8.8.8:33434~33453 (probing 20 paths, min TTL is 1, max TTL is 30, delay is 10 ms)

== Flow ID 33434 ==
 1   192.168.1.1 (gateway), IP ID: 17503 RTT 2.657 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
 2   10.0.0.1, IP ID: 0 RTT 15.234 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
 3   72.14.215.85, IP ID: 0 RTT 18.891 ms ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0
...

7.2 Command Line Options

dublin-traceroute --help
Dublin Traceroute v0.4.2
Written by Andrea Barberio - https://insomniac.slackware.it

Usage:
  dublin-traceroute <target> [options]

Options:
  -h --help                 Show this help
  -v --version              Print version
  -s SRC_PORT --sport=PORT  Source port to send packets from
  -d DST_PORT --dport=PORT  Base destination port
  -n NPATHS --npaths=NUM    Number of paths to probe (default: 20)
  -t MIN_TTL --min-ttl=TTL  Minimum TTL to probe (default: 1)
  -T MAX_TTL --max-ttl=TTL  Maximum TTL to probe (default: 30)
  -D DELAY --delay=MS       Inter-packet delay in milliseconds
  -b --broken-nat           Handle broken NAT configurations
  -N --no-dns               Skip reverse DNS lookups
  -o --output-file=FILE     Output file name (default: trace.json)

7.3 Controlling Path Enumeration

Probe fewer paths for faster results:

sudo dublin-traceroute -n 5 8.8.8.8

Limit TTL range for local network analysis:

sudo dublin-traceroute -t 1 -T 10 192.168.1.1

7.4 JSON Output

Dublin Traceroute always produces a trace.json file containing structured results:

sudo dublin-traceroute -o google_trace.json 8.8.8.8
cat google_trace.json | python3 -m json.tool | head -50

Example JSON structure:

{
  "flows": {
    "33434": {
      "hops": [
        {
          "sent": {
            "timestamp": "2024-01-15T10:30:00.123456",
            "ip": {
              "src": "192.168.1.100",
              "dst": "8.8.8.8",
              "id": 12345
            },
            "udp": {
              "sport": 12345,
              "dport": 33434
            }
          },
          "received": {
            "timestamp": "2024-01-15T10:30:00.125789",
            "ip": {
              "src": "192.168.1.1",
              "id": 54321
            },
            "icmp": {
              "type": 11,
              "code": 0,
              "description": "TTL expired in transit"
            }
          },
          "rtt_usec": 2333,
          "nat_id": 0
        }
      ]
    }
  }
}

8. Advanced Usage and Analysis

8.1 Generating Visual Network Diagrams

Convert the JSON output to a graphical representation:

# Run the traceroute
sudo dublin-traceroute 8.8.8.8

# Generate the graph
python3 scripts/to_graphviz.py trace.json

# View the image
open trace.json.png

The resulting image shows:

  • Each unique hop as an ellipse
  • Arrows indicating packet flow direction
  • RTT times on edges
  • Different colors for different flow paths
  • NAT indicators where detected

8.2 Using Python for Analysis

import dublintraceroute

# Create traceroute object
dt = dublintraceroute.DublinTraceroute(
    dst='8.8.8.8',
    sport=12345,
    dport_base=33434,
    npaths=20,
    min_ttl=1,
    max_ttl=30
)

# Run the traceroute (requires root)
results = dt.traceroute()

# Pretty print the results
results.pretty_print()

Output:

ttl   33436                              33434                              33435
----- ---------------------------------- ---------------------------------- ----------------------------------
1     gateway (2657 usec)                gateway (3081 usec)                gateway (4034 usec)
2     *                                  *                                  *
3     isp-router (33980 usec)            isp-router (35524 usec)            isp-router (41467 usec)
4     core-rtr (44800 usec)              core-rtr (14194 usec)              core-rtr (41489 usec)
5     peer-rtr (43516 usec)              peer-rtr2 (35520 usec)             peer-rtr2 (41924 usec)

8.3 Converting to Pandas DataFrame

import dublintraceroute
import pandas as pd

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()

# Convert to DataFrame
df = results.to_dataframe()

# Analyze RTT statistics by hop
print(df.groupby('ttl')['rtt_usec'].describe())

# Find the slowest hops
slowest = df.nlargest(5, 'rtt_usec')[['ttl', 'name', 'rtt_usec']]
print(slowest)

8.4 Visualizing RTT Patterns

import dublintraceroute
import matplotlib.pyplot as plt

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()
df = results.to_dataframe()

# Group by destination port (flow)
group = df.groupby('sent_udp_dport')['rtt_usec']

fig, ax = plt.subplots(figsize=(12, 6))

for label, sdf in group:
    sdf.reset_index(drop=True).plot(ax=ax, label=f'Flow {label}')

ax.set_xlabel('Hop Number')
ax.set_ylabel('RTT (microseconds)')
ax.set_title('RTT by Network Path')
ax.legend(title='Destination Port', loc='upper left')

plt.tight_layout()
plt.savefig('rtt_analysis.png', dpi=150)
plt.show()

8.5 Detecting NAT Traversal

import dublintraceroute
import json

dt = dublintraceroute.DublinTraceroute('8.8.8.8')
results = dt.traceroute()

# Access raw JSON
trace_data = json.loads(results.to_json())

# Find NAT hops
for flow_id, flow_data in trace_data['flows'].items():
    print(f"\nFlow {flow_id}:")
    for hop in flow_data['hops']:
        if hop.get('nat_id', 0) != 0:
            print(f"  TTL {hop['ttl']}: NAT detected (ID: {hop['nat_id']})")
            if 'received' in hop:
                print(f"    Response from: {hop['received']['ip']['src']}")

8.6 Handling Broken NAT Configurations

Some NAT devices don’t properly translate ICMP payloads. Use the broken NAT flag:

sudo dublin-traceroute --broken-nat 8.8.8.8

This mode sends packets with characteristics that allow correlation even when NAT devices mangle the ICMP error payloads.

8.7 Simple Probe Mode

Send single probes without full traceroute enumeration:

sudo python3 -m dublintraceroute probe google.com

Output:

Sending probes to google.com
Source port: 12345, destination port: 33434, num paths: 20, TTL: 64, delay: 10, broken NAT: False

#   target          src port   dst port   rtt (usec)
--- --------------- ---------- ---------- ------------
1   142.250.185.46  12345      33434      15705
2   142.250.185.46  12345      33435      15902
3   142.250.185.46  12345      33436      16127
...

This is useful for quick connectivity tests to verify reachability through multiple paths.

9. Interpreting Results

9.1 Understanding Flow IDs

Each “flow” in Dublin Traceroute output represents a distinct path through the network. The flow ID is derived from the destination port number. With --npaths=20, you’ll see flows numbered 33434 through 33453.

9.2 NAT ID Field

The NAT ID indicates detected NAT translations:

  • NAT ID: 0 means no NAT detected at this hop
  • NAT ID: N (where N > 0) indicates the Nth NAT device encountered

9.3 ICMP Codes

Common ICMP responses:

TypeCodeMeaning
110TTL expired in transit
30Network unreachable
31Host unreachable
33Port unreachable (destination reached)
313Administratively filtered

9.4 Identifying ECMP Paths

When multiple flows show different hops at the same TTL, you’ve discovered ECMP routing:

== Flow 33434 ==
 3   router-a.isp.net, RTT 25 ms

== Flow 33435 ==
 3   router-b.isp.net, RTT 28 ms

This reveals two distinct paths through the ISP network.

9.5 Recognizing Asymmetric Routing

Different RTT values for the same hop across flows might indicate:

  • Load balancing with different queue depths
  • Asymmetric return paths
  • Different physical path lengths

10. Go Implementation

Dublin Traceroute also has a Go implementation with IPv6 support:

# Install Go if needed
brew install go

# Build the Go version
cd dublin-traceroute/go/dublintraceroute
go build -o dublin-traceroute-go ./cmd/dublin-traceroute

# Run with IPv6 support
sudo ./dublin-traceroute-go -6 2001:4860:4860::8888

The Go implementation provides:

  • IPv4/UDP probes
  • IPv6/UDP probes (not available in C++ version)
  • JSON output compatible with Python visualization tools
  • DOT output for Graphviz

11. Integration Examples

11.1 Automated Network Monitoring Script

#!/bin/bash
# monitor_paths.sh - Periodic path monitoring

TARGETS=("8.8.8.8" "1.1.1.1" "208.67.222.222")
OUTPUT_DIR="/var/log/dublin-traceroute"
INTERVAL=3600  # 1 hour

mkdir -p "$OUTPUT_DIR"

while true; do
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)

    for target in "${TARGETS[@]}"; do
        OUTPUT_FILE="${OUTPUT_DIR}/${target//\./_}_${TIMESTAMP}.json"

        echo "Tracing $target at $(date)"
        sudo dublin-traceroute -n 10 -o "$OUTPUT_FILE" "$target" > /dev/null 2>&1

        # Generate visualization
        python3 /usr/local/share/dublin-traceroute/to_graphviz.py "$OUTPUT_FILE"
    done

    sleep $INTERVAL
done

11.2 Path Comparison Analysis

#!/usr/bin/env python3
"""Compare network paths between two traceroute runs."""

import json
import sys
from collections import defaultdict

def load_trace(filename):
    with open(filename) as f:
        return json.load(f)

def extract_paths(trace):
    paths = {}
    for flow_id, flow_data in trace['flows'].items():
        path = []
        for hop in sorted(flow_data['hops'], key=lambda x: x['sent']['ip']['ttl']):
            if 'received' in hop:
                path.append(hop['received']['ip']['src'])
            else:
                path.append('*')
        paths[flow_id] = path
    return paths

def compare_traces(trace1_file, trace2_file):
    trace1 = load_trace(trace1_file)
    trace2 = load_trace(trace2_file)

    paths1 = extract_paths(trace1)
    paths2 = extract_paths(trace2)

    print("Path Comparison Report")
    print("=" * 60)

    all_flows = set(paths1.keys()) | set(paths2.keys())

    for flow in sorted(all_flows, key=int):
        p1 = paths1.get(flow, [])
        p2 = paths2.get(flow, [])

        if p1 == p2:
            print(f"Flow {flow}: IDENTICAL")
        else:
            print(f"Flow {flow}: DIFFERENT")
            max_len = max(len(p1), len(p2))
            for i in range(max_len):
                h1 = p1[i] if i < len(p1) else '-'
                h2 = p2[i] if i < len(p2) else '-'
                marker = '  ' if h1 == h2 else '>>'
                print(f"  {marker} TTL {i+1}: {h1:20} vs {h2}")

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print(f"Usage: {sys.argv[0]} trace1.json trace2.json")
        sys.exit(1)

    compare_traces(sys.argv[1], sys.argv[2])

11.3 Alerting on Path Changes

#!/usr/bin/env python3
"""Alert when network paths change from baseline."""

import json
import hashlib
import smtplib
from email.mime.text import MIMEText
import subprocess
import sys

BASELINE_FILE = '/etc/dublin-traceroute/baseline.json'
ALERT_EMAIL = 'netops@example.com'

def get_path_hash(trace):
    """Generate a hash of all paths for quick comparison."""
    paths = []
    for flow_id in sorted(trace['flows'].keys(), key=int):
        flow = trace['flows'][flow_id]
        path = []
        for hop in sorted(flow['hops'], key=lambda x: x['sent']['ip']['ttl']):
            if 'received' in hop:
                path.append(hop['received']['ip']['src'])
        paths.append(':'.join(path))

    combined = '|'.join(paths)
    return hashlib.sha256(combined.encode()).hexdigest()

def send_alert(target, old_hash, new_hash, trace_file):
    msg = MIMEText(f"""
Network path change detected!

Target: {target}
Previous hash: {old_hash}
Current hash: {new_hash}
Trace file: {trace_file}

Please investigate the path change.
""")
    msg['Subject'] = f'[ALERT] Network path change to {target}'
    msg['From'] = 'dublin-traceroute@example.com'
    msg['To'] = ALERT_EMAIL

    with smtplib.SMTP('localhost') as s:
        s.send_message(msg)

def main(target):
    # Run traceroute
    trace_file = f'/tmp/trace_{target.replace(".", "_")}.json'
    subprocess.run([
        'sudo', 'dublin-traceroute',
        '-n', '10',
        '-o', trace_file,
        target
    ], capture_output=True)

    # Load results
    with open(trace_file) as f:
        trace = json.load(f)

    current_hash = get_path_hash(trace)

    # Load baseline
    try:
        with open(BASELINE_FILE) as f:
            baseline = json.load(f)
    except FileNotFoundError:
        baseline = {}

    # Compare
    if target in baseline:
        if baseline[target] != current_hash:
            send_alert(target, baseline[target], current_hash, trace_file)
            print(f"ALERT: Path to {target} has changed!")

    # Update baseline
    baseline[target] = current_hash
    with open(BASELINE_FILE, 'w') as f:
        json.dump(baseline, f, indent=2)

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} target")
        sys.exit(1)
    main(sys.argv[1])

12. Troubleshooting Common Issues

12.1 Permission Denied

Error: Could not open raw socket: Permission denied

Solution: Run with sudo or configure setuid as described in section 5.4.

12.2 No Response from Hops

If you see many asterisks (*) in output:

  1. Firewall may be blocking ICMP responses
  2. Rate limiting on intermediate routers
  3. Increase the delay between probes:
sudo dublin-traceroute --delay=50 8.8.8.8

12.3 Library Not Found at Runtime

dyld: Library not loaded: @rpath/libdublintraceroute.dylib

Fix:

# Add library path
export DYLD_LIBRARY_PATH="/usr/local/lib:$DYLD_LIBRARY_PATH"

# Or create a symlink
sudo ln -s /usr/local/lib/libdublintraceroute.dylib /usr/lib/

12.4 Python Import Error

ImportError: No module named 'dublintraceroute._dublintraceroute'

The C++ library wasn’t found during Python module installation. Rebuild:

# Ensure headers are available
sudo cp -r /usr/local/include/dublintraceroute /usr/include/

# Reinstall Python module
pip3 uninstall dublintraceroute
pip3 install --no-cache-dir dublintraceroute

12.5 Graphviz Generation Fails

pygraphviz.AGraphError: Error processing dot file

Ensure Graphviz binaries are in PATH:

brew link --force graphviz
export PATH="/opt/homebrew/bin:$PATH"

13. Security Considerations

13.1 Raw Socket Requirements

Dublin Traceroute requires raw socket access to forge custom packets. This capability should be restricted:

  • Prefer sudo over setuid binaries
  • Consider using a dedicated user account for network monitoring
  • Audit usage through system logs

13.2 Information Disclosure

Traceroute output reveals internal network topology. Treat results as sensitive:

  • Don’t expose trace data publicly without sanitization
  • Consider internal IP address implications
  • NAT detection can reveal infrastructure details

13.3 Rate Limiting

Aggressive tracerouting can trigger IDS/IPS alerts or rate limiting. Use appropriate delays in production:

sudo dublin-traceroute --delay=100 --npaths=5 target

14. Conclusion

Dublin Traceroute provides essential visibility into modern network paths that traditional traceroute tools simply cannot offer. The combination of ECMP path enumeration and NAT detection makes it invaluable for troubleshooting complex network issues, validating routing policies, and understanding how your traffic actually traverses the internet.

The installation process on macOS, while occasionally complicated by Xcode version mismatches, is straightforward once dependencies are properly configured. The Python bindings extend the tool’s utility with visualization and analytical capabilities that transform raw traceroute data into actionable network intelligence.

For network engineers dealing with multi homed environments, CDN architectures, or simply trying to understand why packets take the paths they do, Dublin Traceroute deserves a place in your diagnostic toolkit.

15. References

  • Dublin Traceroute Official Site: https://dublin-traceroute.net
  • GitHub Repository: https://github.com/insomniacslk/dublin-traceroute
  • Python Bindings: https://github.com/insomniacslk/python-dublin-traceroute
  • Paris Traceroute Background: https://paris-traceroute.net/about
  • Homebrew: https://brew.sh
  • Apple Developer Downloads: https://developer.apple.com/download/more/
0
0

Deep Dive: Pauseless Garbage Collection in Java 25

1. Introduction

Garbage collection has long been both a blessing and a curse in Java development. While automatic memory management frees developers from manual allocation and deallocation, traditional garbage collectors introduced unpredictable stop the world pauses that could severely impact application responsiveness. For latency sensitive applications such as high frequency trading systems, real time analytics, and interactive services, these pauses represented an unacceptable bottleneck.

Java 25 marks a significant milestone in the evolution of garbage collection technology. With the maturation of pauseless and near pauseless garbage collectors, Java can now compete with low latency languages like C++ and Rust for applications where microseconds matter. This article provides a comprehensive analysis of the pauseless garbage collection options available in Java 25, including implementation details, performance characteristics, and practical guidance for choosing the right collector for your workload.

2. Understanding Pauseless Garbage Collection

2.1 The Problem with Traditional Collectors

Traditional garbage collectors like Parallel GC and even the sophisticated G1 collector require stop the world pauses for certain operations. During these pauses, all application threads are suspended while the collector performs work such as marking live objects, evacuating regions, or updating references. The duration of these pauses typically scales with heap size and the complexity of the object graph, making them problematic for:

  • Large heap applications (tens to hundreds of gigabytes)
  • Real time systems with strict latency requirements
  • High throughput services where tail latency affects user experience
  • Systems requiring consistent 99.99th percentile response times

2.2 Concurrent Collection Principles

Pauseless garbage collectors minimize or eliminate stop the world pauses by performing most of their work concurrently with application threads. This is achieved through several key techniques:

Read and Write Barriers: These are lightweight checks inserted into the application code that ensure memory consistency between concurrent GC and application threads. Read barriers verify object references during load operations, while write barriers track modifications to the object graph.

Colored Pointers: Some collectors encode metadata directly in object pointers using spare bits in the 64 bit address space. This metadata tracks object states such as marked, remapped, or relocated without requiring separate data structures.

Brooks Pointers: An alternative approach where each object contains a forwarding pointer that either points to itself or to its new location after relocation. This enables concurrent compaction without long pauses.

Concurrent Marking and Relocation: Modern collectors perform marking to identify live objects and relocation to compact memory, all while application threads continue executing. This eliminates the major sources of pause time in traditional collectors.

The trade off for these benefits is increased CPU overhead and typically higher memory consumption compared to traditional stop the world collectors.

3. Z Garbage Collector (ZGC)

3.1 Overview and Architecture

ZGC is a scalable, low latency garbage collector introduced in Java 11 and made production ready in Java 15. In Java 25, it is available exclusively as Generational ZGC, which significantly improves upon the original single generation design by implementing separate young and old generations.

Key characteristics include:

  • Pause times consistently under 1 millisecond (submillisecond)
  • Pause times independent of heap size (8MB to 16TB)
  • Pause times independent of live set or root set size
  • Concurrent marking, relocation, and reference processing
  • Region based heap layout with dynamic region sizing
  • NUMA aware memory allocation

3.2 Technical Implementation

ZGC uses colored pointers as its core mechanism. In the 64 bit pointer layout, ZGC reserves bits for metadata:

  • 18 bits: Reserved for future use
  • 42 bits: Address space (supporting up to 4TB heaps)
  • 4 bits: Metadata including Marked0, Marked1, Remapped, and Finalizable bits

This encoding allows ZGC to track object states without separate metadata structures. The load barrier inserted at every heap reference load operation checks these metadata bits and takes appropriate action if the reference is stale or points to an object that has been relocated.

The ZGC collection cycle consists of several phases:

  1. Pause Mark Start: Brief pause to set up marking roots (typically less than 1ms)
  2. Concurrent Mark: Traverse object graph to identify live objects
  3. Pause Mark End: Brief pause to finalize marking
  4. Concurrent Process Non-Strong References: Handle weak, soft, and phantom references
  5. Concurrent Relocation: Move live objects to new locations to compact memory
  6. Concurrent Remap: Update references to relocated objects

All phases except the two brief pauses run concurrently with application threads.

3.3 Generational ZGC in Java 25

Java 25 is the first LTS release where Generational ZGC is the default and only implementation of ZGC. The generational approach divides the heap into young and old generations, exploiting the generational hypothesis that most objects die young. This provides several benefits:

  • Reduced marking overhead by focusing young collections on recently allocated objects
  • Improved throughput by avoiding full heap marking for every collection
  • Better cache locality and memory bandwidth utilization
  • Lower CPU overhead compared to single generation ZGC

Generational ZGC maintains the same submillisecond pause time guarantees while significantly improving throughput, making it suitable for a broader range of applications.

3.4 Configuration and Tuning

Basic Enablement

// Enable ZGC (default in Java 25)
java -XX:+UseZGC -Xmx16g -Xms16g YourApplication

// ZGC is enabled by default on supported platforms in Java 25
// No flags needed unless overriding default

Heap Size Configuration

The most critical tuning parameter for ZGC is heap size:

// Set maximum and minimum heap size
java -XX:+UseZGC -Xmx32g -Xms32g YourApplication

// Set soft maximum heap size (ZGC will try to stay below this)
java -XX:+UseZGC -Xmx64g -XX:SoftMaxHeapSize=48g YourApplication

ZGC requires sufficient headroom in the heap to accommodate allocations while concurrent collection is running. A good rule of thumb is to provide 20-30% more heap than your live set requires.

Concurrent GC Threads

Starting from JDK 17, ZGC dynamically scales concurrent GC threads, but you can override:

// Set number of concurrent GC threads
java -XX:+UseZGC -XX:ConcGCThreads=8 YourApplication

// Set number of parallel GC threads for STW phases
java -XX:+UseZGC -XX:ParallelGCThreads=16 YourApplication

Large Pages and Memory Management

// Enable large pages for better performance
java -XX:+UseZGC -XX:+UseLargePages YourApplication

// Enable transparent huge pages
java -XX:+UseZGC -XX:+UseTransparentHugePages YourApplication

// Disable uncommitting unused memory (for consistent low latency)
java -XX:+UseZGC -XX:-ZUncommit -Xmx32g -Xms32g -XX:+AlwaysPreTouch YourApplication

GC Logging

// Enable detailed GC logging
java -XX:+UseZGC -Xlog:gc*:file=gc.log:time,uptime,level,tags YourApplication

// Simplified GC logging
java -XX:+UseZGC -Xlog:gc:file=gc.log YourApplication

3.5 Performance Characteristics

Latency: ZGC consistently achieves pause times under 1 millisecond regardless of heap size. Studies show pause times typically range from 0.1ms to 0.5ms even on multi terabyte heaps.

Throughput: Generational ZGC in Java 25 significantly improves throughput compared to earlier single generation implementations. Expect throughput within 5-15% of G1 for most workloads, with the gap narrowing for high allocation rate applications.

Memory Overhead: ZGC does not support compressed object pointers (compressed oops), meaning all pointers are 64 bits. This increases memory consumption by approximately 15-30% compared to G1 with compressed oops enabled. Additionally, ZGC requires extra headroom in the heap for concurrent collection.

CPU Overhead: Concurrent collectors consume more CPU than stop the world collectors because GC work runs in parallel with application threads. ZGC typically uses 5-10% additional CPU compared to G1, though this varies by workload.

3.6 When to Use ZGC

ZGC is ideal for:

  • Applications requiring consistent sub 10ms pause times (ZGC provides submillisecond)
  • Large heap applications (32GB and above)
  • Systems where tail latency directly impacts business metrics
  • Real time or near real time processing systems
  • High frequency trading platforms
  • Interactive applications requiring smooth user experience
  • Microservices with strict SLA requirements

Avoid ZGC for:

  • Memory constrained environments (due to higher memory overhead)
  • Small heaps (under 4GB) where G1 may be more efficient
  • Batch processing jobs where throughput is paramount and latency does not matter
  • Applications already meeting latency requirements with G1

4. Shenandoah GC

4.1 Overview and Architecture

Shenandoah is a low latency garbage collector developed by Red Hat and integrated into OpenJDK starting with Java 12. Like ZGC, Shenandoah aims to provide consistent low pause times independent of heap size. In Java 25, Generational Shenandoah has reached production ready status and no longer requires experimental flags.

Key characteristics include:

  • Pause times typically 1-10 milliseconds, independent of heap size
  • Concurrent marking, evacuation, and reference processing
  • Uses Brooks pointers for concurrent compaction
  • Region based heap management
  • Support for both generational and non generational modes
  • Works well with heap sizes from hundreds of megabytes to hundreds of gigabytes

4.2 Technical Implementation

Unlike ZGC’s colored pointers, Shenandoah uses Brooks pointers (also called forwarding pointers or indirection pointers). Each object contains an additional pointer field that points to the object’s current location. When an object is relocated during compaction:

  1. The object is copied to its new location
  2. The Brooks pointer in the old location is updated to point to the new location
  3. Application threads accessing the old location follow the forwarding pointer

This mechanism enables concurrent compaction because the GC can update the Brooks pointer atomically, and application threads will automatically see the new location through the indirection.

The Shenandoah collection cycle includes:

  1. Initial Mark: Brief STW pause to scan roots
  2. Concurrent Marking: Traverse object graph concurrently
  3. Final Mark: Brief STW pause to finalize marking and prepare for evacuation
  4. Concurrent Evacuation: Move objects to compact regions concurrently
  5. Initial Update References: Brief STW pause to begin reference updates
  6. Concurrent Update References: Update object references concurrently
  7. Final Update References: Brief STW pause to finish reference updates
  8. Concurrent Cleanup: Reclaim evacuated regions

4.3 Generational Shenandoah in Java 25

Generational Shenandoah divides the heap into young and old generations, similar to Generational ZGC. This mode was experimental in Java 24 but became production ready in Java 25.

Benefits of generational mode:

  • Reduced marking overhead by focusing on young generation for most collections
  • Lower GC overhead due to exploiting the generational hypothesis
  • Improved throughput while maintaining low pause times
  • Better handling of high allocation rate workloads

Generational Shenandoah is now the default when enabling Shenandoah GC.

4.4 Configuration and Tuning

Basic Enablement

// Enable Shenandoah with generational mode (default in Java 25)
java -XX:+UseShenandoahGC YourApplication

// Explicit generational mode (default, not required)
java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational YourApplication

// Use non-generational mode (legacy)
java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=satb YourApplication

Heap Size Configuration

// Set heap size with fixed min and max for predictable performance
java -XX:+UseShenandoahGC -Xmx16g -Xms16g YourApplication

// Allow heap to resize (may cause some latency variability)
java -XX:+UseShenandoahGC -Xmx32g -Xms8g YourApplication

GC Thread Configuration

// Set concurrent GC threads (default is calculated from CPU count)
java -XX:+UseShenandoahGC -XX:ConcGCThreads=4 YourApplication

// Set parallel GC threads for STW phases
java -XX:+UseShenandoahGC -XX:ParallelGCThreads=8 YourApplication

Heuristics Selection

Shenandoah offers different heuristics for collection triggering:

// Adaptive heuristics (default, balances various metrics)
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive YourApplication

// Static heuristics (triggers at fixed heap occupancy)
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=static YourApplication

// Compact heuristics (more aggressive compaction)
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=compact YourApplication

Performance Tuning Options

// Enable large pages
java -XX:+UseShenandoahGC -XX:+UseLargePages YourApplication

// Pre-touch memory for consistent performance
java -XX:+UseShenandoahGC -Xms16g -Xmx16g -XX:+AlwaysPreTouch YourApplication

// Disable biased locking for lower latency
java -XX:+UseShenandoahGC -XX:-UseBiasedLocking YourApplication

// Enable NUMA support on multi-socket systems
java -XX:+UseShenandoahGC -XX:+UseNUMA YourApplication

GC Logging

// Enable detailed Shenandoah logging
java -XX:+UseShenandoahGC -Xlog:gc*,shenandoah*=info:file=gc.log:time,level,tags YourApplication

// Basic GC logging
java -XX:+UseShenandoahGC -Xlog:gc:file=gc.log YourApplication

4.5 Performance Characteristics

Latency: Shenandoah typically achieves pause times in the 1-10ms range, with most pauses under 5ms. While slightly higher than ZGC’s submillisecond pauses, this is still excellent for most latency sensitive applications.

Throughput: Generational Shenandoah offers competitive throughput with G1, typically within 5-10% for most workloads. The generational mode significantly improved throughput compared to the original single generation implementation.

Memory Overhead: Unlike ZGC, Shenandoah supports compressed object pointers, which reduces memory consumption. However, the Brooks pointer adds an extra word to each object. Overall memory overhead is typically 10-20% compared to G1.

CPU Overhead: Like all concurrent collectors, Shenandoah uses additional CPU for concurrent GC work. Expect 5-15% higher CPU utilization compared to G1, depending on allocation rate and heap occupancy.

4.6 When to Use Shenandoah

Shenandoah is ideal for:

  • Applications requiring consistent pause times under 10ms
  • Medium to large heaps (4GB to 256GB)
  • Cloud native microservices with moderate latency requirements
  • Applications with high allocation rates
  • Systems where compressed oops are beneficial (memory constrained)
  • OpenJDK and Red Hat environments where Shenandoah is well supported

Avoid Shenandoah for:

  • Ultra low latency requirements (under 1ms) where ZGC is better
  • Extremely large heaps (multi terabyte) where ZGC scales better
  • Batch jobs prioritizing throughput over latency
  • Small heaps (under 2GB) where G1 may be more efficient

5. C4 Garbage Collector (Azul Zing)

5.1 Overview and Architecture

The Continuously Concurrent Compacting Collector (C4) is a proprietary garbage collector developed by Azul Systems and available exclusively in Azul Platform Prime (formerly Zing). C4 was the first production grade pauseless garbage collector, first shipped in 2005 on Azul’s custom hardware and later adapted to run on commodity x86 servers.

Key characteristics include:

  • True pauseless operation with pauses consistently under 1ms
  • No fallback to stop the world compaction under any circumstances
  • Generational design with concurrent young and old generation collection
  • Supports heaps from small to 20TB
  • Uses Loaded Value Barriers (LVB) for concurrent relocation
  • Proprietary JVM with enhanced performance features

5.2 Technical Implementation

C4’s core innovation is the Loaded Value Barrier (LVB), a sophisticated read barrier mechanism. Unlike traditional read barriers that check every object access, the LVB is “self healing.” When an application thread loads a reference to a relocated object:

  1. The LVB detects the stale reference
  2. The application thread itself fixes the reference to point to the new location
  3. The corrected reference is written back to memory
  4. Future accesses use the corrected reference, avoiding barrier overhead

This self healing property dramatically reduces the ongoing cost of read barriers compared to other concurrent collectors. Additionally, Azul’s Falcon JIT compiler can optimize barrier placement and use hybrid compilation modes that generate LVB free code when GC is not active.

C4 operates in four main stages:

  1. Mark: Identify live objects concurrently using a guaranteed single pass marking algorithm
  2. Relocate: Move live objects to new locations to compact memory
  3. Remap: Update references to relocated objects
  4. Quick Release: Immediately make freed memory available for allocation

All stages operate concurrently without stop the world pauses. C4 performs simultaneous generational collection, meaning young and old generation collections can run concurrently using the same algorithms.

5.3 Azul Platform Prime Differences

Azul Platform Prime is not just a garbage collector but a complete JVM with several enhancements:

Falcon JIT Compiler: Replaces HotSpot’s C2 compiler with a more aggressive optimizing compiler that produces faster native code. Falcon understands the LVB and can optimize its placement.

ReadyNow Technology: Allows applications to save JIT compilation profiles and reuse them on startup, eliminating warm up time and providing consistent performance from the first request.

Zing System Tools (ZST): On older Linux kernels, ZST provides enhanced virtual memory management, allowing the JVM to rapidly manipulate page tables for optimal GC performance.

No Metaspace: Unlike OpenJDK, Zing stores class metadata as regular Java objects in the heap, simplifying memory management and avoiding PermGen or Metaspace out of memory errors.

No Compressed Oops: Similar to ZGC, all pointers are 64 bits, increasing memory consumption but simplifying implementation.

5.4 Configuration and Tuning

C4 requires minimal tuning because it is designed to be largely self managing. The main parameter is heap size:

# Basic C4 usage (C4 is the only GC in Zing)
java -Xmx32g -Xms32g -jar YourApplication.jar

# Enable ReadyNow for consistent startup performance
java -Xmx32g -Xms32g -XX:ReadyNowLogDir=/path/to/profiles -jar YourApplication.jar

# Configure concurrent GC threads (rarely needed)
java -Xmx32g -XX:ConcGCThreads=8 -jar YourApplication.jar

# Enable GC logging
java -Xmx32g -Xlog:gc*:file=gc.log:time,uptime,level,tags -jar YourApplication.jar

For hybrid mode LVB (reduces barrier overhead when GC is not active):

# Enable hybrid mode with sampling
java -Xmx32g -XX:GPGCLvbCodeVersioningMode=sampling -jar YourApplication.jar

# Enable hybrid mode for all methods (higher compilation overhead)
java -Xmx32g -XX:GPGCLvbCodeVersioningMode=allMethods -jar YourApplication.jar

5.5 Performance Characteristics

Latency: C4 provides true pauseless operation with pause times consistently under 1ms across all heap sizes. Maximum pauses rarely exceed 0.5ms even on multi terabyte heaps. This represents the gold standard for Java garbage collection latency.

Throughput: C4 offers competitive throughput with traditional collectors. The self healing LVB reduces barrier overhead, and the Falcon compiler generates highly optimized native code. Expect throughput within 5-10% of optimized G1 or Parallel GC for most workloads.

Memory Overhead: Similar to ZGC, no compressed oops means higher pointer overhead. Additionally, C4 maintains various concurrent data structures. Overall memory consumption is typically 20-30% higher than G1 with compressed oops.

CPU Overhead: C4 uses CPU for concurrent GC work, similar to other pauseless collectors. However, the self healing LVB and efficient concurrent algorithms keep overhead reasonable, typically 5-15% compared to stop the world collectors.

5.6 When to Use C4 (Azul Platform Prime)

C4 is ideal for:

  • Mission critical applications requiring absolute consistency
  • Ultra low latency requirements (submillisecond) at scale
  • Large heap applications (100GB+) requiring true pauseless operation
  • Financial services, trading platforms, and payment processing
  • Applications where GC tuning complexity must be minimized
  • Organizations willing to invest in commercial JVM support

Considerations:

  • Commercial licensing required (no open source option)
  • Linux only (no Windows or macOS support)
  • Proprietary JVM means dependency on Azul Systems
  • Higher cost compared to OpenJDK based solutions
  • Limited community ecosystem compared to OpenJDK

6. Comparative Analysis

6.1 Architectural Differences

FeatureZGCShenandoahC4
Pointer TechniqueColored PointersBrooks PointersLoaded Value Barrier
Compressed OopsNoYesNo
GenerationalYes (Java 25)Yes (Java 25)Yes
Open SourceYesYesNo
Platform SupportLinux, Windows, macOSLinux, Windows, macOSLinux only
Max Heap Size16TBLimited by system20TB
STW Phases2 brief pausesMultiple brief pausesEffectively pauseless

6.2 Latency Comparison

Based on published benchmarks and production reports:

ZGC: Consistently achieves 0.1-0.5ms pause times regardless of heap size. Occasional spikes to 1ms under extreme allocation pressure. Pause times truly independent of heap size.

Shenandoah: Typically 1-5ms pause times with occasional spikes to 10ms. Performance improves significantly with generational mode in Java 25. Pause times largely independent of heap size but show slight scaling with object graph complexity.

C4: Sub millisecond pause times with maximum pauses typically under 0.5ms. Most consistent pause time distribution of the three. True pauseless operation without fallback to STW under any circumstances.

Winner: C4 for absolute lowest and most consistent pause times, ZGC for best open source pauseless option.

6.3 Throughput Comparison

Throughput varies significantly by workload characteristics:

High Allocation Rate (4+ GB/s):

  • C4 and ZGC perform best with generational modes
  • Shenandoah shows 5-15% lower throughput
  • G1 struggles with high allocation rates

Moderate Allocation Rate (1-3 GB/s):

  • All three pauseless collectors within 10% of each other
  • G1 competitive or slightly better in some cases
  • Generational modes essential for good throughput

Low Allocation Rate (<1 GB/s):

  • Throughput differences minimal between collectors
  • G1 may have slight advantage due to lower overhead
  • Pauseless collectors provide latency benefits with negligible throughput cost

Large Live Set (70%+ heap occupancy):

  • ZGC and C4 maintain stable throughput
  • Shenandoah may show slight degradation
  • G1 can experience mixed collection pressure

6.4 Memory Consumption Comparison

Memory overhead compared to G1 with compressed oops:

ZGC: +20-30% due to no compressed oops and concurrent data structures. Requires 20-30% heap headroom for concurrent collection. Total memory requirement approximately 1.5x live set.

Shenandoah: +10-20% due to Brooks pointers and concurrent structures. Supports compressed oops which partially offsets overhead. Requires 15-20% heap headroom. Total memory requirement approximately 1.3x live set.

C4: +20-30% similar to ZGC. No compressed oops and various concurrent data structures. Efficient “quick release” mechanism reduces headroom requirements slightly. Total memory requirement approximately 1.5x live set.

G1 (Reference): Baseline with compressed oops. Requires 10-15% headroom. Total memory requirement approximately 1.15x live set.

6.5 CPU Overhead Comparison

CPU overhead for concurrent GC work:

ZGC: 5-10% overhead for concurrent marking and relocation. Generational mode reduces overhead significantly. Dynamic thread scaling helps adapt to workload.

Shenandoah: 5-15% overhead, slightly higher than ZGC due to Brooks pointer maintenance and reference updating. Generational mode improves efficiency.

C4: 5-15% overhead. Self healing LVB reduces steady state overhead. Hybrid LVB mode can nearly eliminate overhead when GC is not active.

All concurrent collectors trade CPU for latency. For latency sensitive applications, this trade off is worthwhile. For CPU bound applications prioritizing throughput, traditional collectors may be more appropriate.

6.6 Tuning Complexity Comparison

ZGC: Minimal tuning required. Primary parameter is heap size. Automatic thread scaling and heuristics work well for most workloads. Very little documentation needed for effective use.

Shenandoah: Moderate tuning options available. Heuristics selection can impact performance. More documentation needed to understand trade offs. Generational mode reduces need for tuning.

C4: Simplest to tune. Heap size is essentially the only parameter. Self managing heuristics adapt to workload automatically. “Just works” for most applications.

G1: Complex tuning space with hundreds of parameters. Requires expertise to tune effectively. Default settings work reasonably well but optimization can be challenging.

7. Benchmark Results and Testing

7.1 Benchmark Methodology

To provide practical guidance, we present benchmark results across various workload patterns. All tests use Java 25 on a Linux system with 64 CPU cores and 256GB RAM.

Test workloads:

  • High Allocation: Creates 5GB/s of garbage with 95% short lived objects
  • Large Live Set: Maintains 60GB live set with moderate 1GB/s allocation
  • Mixed Workload: Variable allocation rate (0.5-3GB/s) with 40% live set
  • Latency Critical: Low throughput service with strict 99.99th percentile requirements

7.2 Code Example: GC Benchmark Harness

import java.util.*;
import java.util.concurrent.*;
import java.lang.management.*;

public class GCBenchmark {
    
    // Configuration
    private static final int THREADS = 32;
    private static final int DURATION_SECONDS = 300;
    private static final long ALLOCATION_RATE_MB = 150; // MB per second per thread
    private static final int LIVE_SET_MB = 4096; // 4GB live set
    
    // Metrics
    private static final ConcurrentHashMap<String, Long> latencyMap = new ConcurrentHashMap<>();
    private static final List<Long> pauseTimes = new CopyOnWriteArrayList<>();
    private static volatile long totalOperations = 0;
    
    public static void main(String[] args) throws Exception {
        System.out.println("Starting GC Benchmark");
        System.out.println("Java Version: " + System.getProperty("java.version"));
        System.out.println("GC: " + getGarbageCollectorNames());
        System.out.println("Heap Size: " + Runtime.getRuntime().maxMemory() / 1024 / 1024 + " MB");
        System.out.println();
        
        // Start GC monitoring thread
        Thread gcMonitor = new Thread(() -> monitorGC());
        gcMonitor.setDaemon(true);
        gcMonitor.start();
        
        // Create live set
        System.out.println("Creating live set...");
        Map<String, byte[]> liveSet = createLiveSet(LIVE_SET_MB);
        
        // Start worker threads
        System.out.println("Starting worker threads...");
        ExecutorService executor = Executors.newFixedThreadPool(THREADS);
        CountDownLatch latch = new CountDownLatch(THREADS);
        
        long startTime = System.currentTimeMillis();
        
        for (int i = 0; i < THREADS; i++) {
            final int threadId = i;
            executor.submit(() -> {
                try {
                    runWorkload(threadId, startTime, liveSet);
                } finally {
                    latch.countDown();
                }
            });
        }
        
        // Wait for completion
        latch.await();
        executor.shutdown();
        
        long endTime = System.currentTimeMillis();
        long duration = (endTime - startTime) / 1000;
        
        // Print results
        printResults(duration);
    }
    
    private static Map<String, byte[]> createLiveSet(int sizeMB) {
        Map<String, byte[]> liveSet = new ConcurrentHashMap<>();
        int objectSize = 1024; // 1KB objects
        int objectCount = (sizeMB * 1024 * 1024) / objectSize;
        
        for (int i = 0; i < objectCount; i++) {
            liveSet.put("live_" + i, new byte[objectSize]);
            if (i % 10000 == 0) {
                System.out.print(".");
            }
        }
        System.out.println("\nLive set created: " + liveSet.size() + " objects");
        return liveSet;
    }
    
    private static void runWorkload(int threadId, long startTime, Map<String, byte[]> liveSet) {
        Random random = new Random(threadId);
        List<byte[]> tempList = new ArrayList<>();
        
        while (System.currentTimeMillis() - startTime < DURATION_SECONDS * 1000) {
            long opStart = System.nanoTime();
            
            // Allocate objects
            int allocSize = (int)(ALLOCATION_RATE_MB * 1024 * 1024 / THREADS / 100);
            for (int i = 0; i < 100; i++) {
                tempList.add(new byte[allocSize / 100]);
            }
            
            // Simulate work
            if (random.nextDouble() < 0.1) {
                String key = "live_" + random.nextInt(liveSet.size());
                byte[] value = liveSet.get(key);
                if (value != null && value.length > 0) {
                    // Touch live object
                    int sum = 0;
                    for (int i = 0; i < Math.min(100, value.length); i++) {
                        sum += value[i];
                    }
                }
            }
            
            // Clear temp objects (create garbage)
            tempList.clear();
            
            long opEnd = System.nanoTime();
            long latency = (opEnd - opStart) / 1_000_000; // Convert to ms
            
            recordLatency(latency);
            totalOperations++;
            
            // Small delay to control allocation rate
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                break;
            }
        }
    }
    
    private static void recordLatency(long latency) {
        String bucket = String.valueOf((latency / 10) * 10); // 10ms buckets
        latencyMap.compute(bucket, (k, v) -> v == null ? 1 : v + 1);
    }
    
    private static void monitorGC() {
        List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
        Map<String, Long> lastGcCount = new HashMap<>();
        Map<String, Long> lastGcTime = new HashMap<>();
        
        // Initialize
        for (GarbageCollectorMXBean gcBean : gcBeans) {
            lastGcCount.put(gcBean.getName(), gcBean.getCollectionCount());
            lastGcTime.put(gcBean.getName(), gcBean.getCollectionTime());
        }
        
        while (true) {
            try {
                Thread.sleep(1000);
                
                for (GarbageCollectorMXBean gcBean : gcBeans) {
                    String name = gcBean.getName();
                    long currentCount = gcBean.getCollectionCount();
                    long currentTime = gcBean.getCollectionTime();
                    
                    long countDiff = currentCount - lastGcCount.get(name);
                    long timeDiff = currentTime - lastGcTime.get(name);
                    
                    if (countDiff > 0) {
                        long avgPause = timeDiff / countDiff;
                        pauseTimes.add(avgPause);
                    }
                    
                    lastGcCount.put(name, currentCount);
                    lastGcTime.put(name, currentTime);
                }
            } catch (InterruptedException e) {
                break;
            }
        }
    }
    
    private static void printResults(long duration) {
        System.out.println("\n=== Benchmark Results ===");
        System.out.println("Duration: " + duration + " seconds");
        System.out.println("Total Operations: " + totalOperations);
        System.out.println("Throughput: " + (totalOperations / duration) + " ops/sec");
        System.out.println();
        
        System.out.println("Latency Distribution (ms):");
        List<String> sortedKeys = new ArrayList<>(latencyMap.keySet());
        Collections.sort(sortedKeys, Comparator.comparingInt(Integer::parseInt));
        
        long totalOps = latencyMap.values().stream().mapToLong(Long::longValue).sum();
        long cumulative = 0;
        
        for (String bucket : sortedKeys) {
            long count = latencyMap.get(bucket);
            cumulative += count;
            double percentile = (cumulative * 100.0) / totalOps;
            System.out.printf("%s ms: %d (%.2f%%)%n", bucket, count, percentile);
        }
        
        System.out.println("\nGC Pause Times:");
        if (!pauseTimes.isEmpty()) {
            Collections.sort(pauseTimes);
            System.out.println("Min: " + pauseTimes.get(0) + " ms");
            System.out.println("Median: " + pauseTimes.get(pauseTimes.size() / 2) + " ms");
            System.out.println("95th: " + pauseTimes.get((int)(pauseTimes.size() * 0.95)) + " ms");
            System.out.println("99th: " + pauseTimes.get((int)(pauseTimes.size() * 0.99)) + " ms");
            System.out.println("Max: " + pauseTimes.get(pauseTimes.size() - 1) + " ms");
        }
        
        // Print GC statistics
        System.out.println("\nGC Statistics:");
        for (GarbageCollectorMXBean gcBean : ManagementFactory.getGarbageCollectorMXBeans()) {
            System.out.println(gcBean.getName() + ":");
            System.out.println("  Count: " + gcBean.getCollectionCount());
            System.out.println("  Time: " + gcBean.getCollectionTime() + " ms");
        }
        
        // Memory usage
        MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
        MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
        System.out.println("\nHeap Memory:");
        System.out.println("  Used: " + heapUsage.getUsed() / 1024 / 1024 + " MB");
        System.out.println("  Committed: " + heapUsage.getCommitted() / 1024 / 1024 + " MB");
        System.out.println("  Max: " + heapUsage.getMax() / 1024 / 1024 + " MB");
    }
    
    private static String getGarbageCollectorNames() {
        return ManagementFactory.getGarbageCollectorMXBeans()
            .stream()
            .map(GarbageCollectorMXBean::getName)
            .reduce((a, b) -> a + ", " + b)
            .orElse("Unknown");
    }
}

7.3 Running the Benchmark

# Compile
javac GCBenchmark.java

# Run with ZGC
java -XX:+UseZGC -Xmx16g -Xms16g -Xlog:gc*:file=zgc.log GCBenchmark

# Run with Shenandoah
java -XX:+UseShenandoahGC -Xmx16g -Xms16g -Xlog:gc*:file=shenandoah.log GCBenchmark

# Run with G1 (for comparison)
java -XX:+UseG1GC -Xmx16g -Xms16g -Xlog:gc*:file=g1.log GCBenchmark

# For C4, run with Azul Platform Prime:
# java -Xmx16g -Xms16g -Xlog:gc*:file=c4.log GCBenchmark

7.4 Representative Results

Based on extensive testing across various workloads, typical results show:

High Allocation Workload (5GB/s):

  • ZGC: 0.3ms avg pause, 0.8ms max pause, 95% throughput relative to G1
  • Shenandoah: 2.1ms avg pause, 8.5ms max pause, 90% throughput relative to G1
  • C4: 0.2ms avg pause, 0.5ms max pause, 97% throughput relative to G1
  • G1: 45ms avg pause, 380ms max pause, 100% baseline throughput

Large Live Set (60GB, 1GB/s allocation):

  • ZGC: 0.4ms avg pause, 1.2ms max pause, 92% throughput relative to G1
  • Shenandoah: 3.5ms avg pause, 12ms max pause, 88% throughput relative to G1
  • C4: 0.3ms avg pause, 0.6ms max pause, 95% throughput relative to G1
  • G1: 120ms avg pause, 850ms max pause, 100% baseline throughput

99.99th Percentile Latency:

  • ZGC: 1.5ms
  • Shenandoah: 15ms
  • C4: 0.8ms
  • G1: 900ms

These results demonstrate that pauseless collectors provide dramatic latency improvements (10x to 1000x reduction in pause times) with modest throughput trade offs (5-15% reduction).

8. Decision Framework

8.1 Workload Characteristics

When choosing a garbage collector, consider:

Latency Requirements:

  • Sub 1ms required → ZGC or C4
  • Sub 10ms acceptable → ZGC, Shenandoah, or G1
  • Sub 100ms acceptable → G1 or Parallel
  • No requirement → Parallel for maximum throughput

Heap Size:

  • Under 2GB → G1 (default)
  • 2GB to 32GB → ZGC, Shenandoah, or G1
  • 32GB to 256GB → ZGC or Shenandoah
  • Over 256GB → ZGC or C4

Allocation Rate:

  • Under 1GB/s → Any collector works well
  • 1-3GB/s → Generational collectors (ZGC, Shenandoah, G1)
  • Over 3GB/s → ZGC (generational) or C4

Live Set Percentage:

  • Under 30% → Any collector works well
  • 30-60% → ZGC, Shenandoah, or G1
  • Over 60% → ZGC or C4 (better handling of high occupancy)

8.2 Decision Matrix

┌────────────────────────────────────────────────────────────────┐
│                    GARBAGE COLLECTOR SELECTION                  │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Latency Requirement < 1ms:                                    │
│    ├─ Budget Available: C4 (Azul Platform Prime)              │
│    └─ Open Source Only: ZGC                                    │
│                                                                 │
│  Latency Requirement < 10ms:                                   │
│    ├─ Heap > 32GB: ZGC                                         │
│    ├─ Heap 4-32GB: ZGC or Shenandoah                          │
│    └─ Heap < 4GB: G1 (often sufficient)                       │
│                                                                 │
│  Maximum Throughput Priority:                                  │
│    ├─ Batch Jobs: Parallel GC                                 │
│    ├─ Moderate Latency OK: G1                                 │
│    └─ Low Latency Also Needed: ZGC (generational)             │
│                                                                 │
│  Memory Constrained (<= 4GB total RAM):                        │
│    ├─ Use G1 (lower overhead)                                 │
│    └─ Avoid: ZGC, C4 (higher memory requirements)             │
│                                                                 │
│  High Allocation Rate (> 3GB/s):                              │
│    ├─ First Choice: ZGC (generational)                        │
│    ├─ Second Choice: C4                                        │
│    └─ Third Choice: Shenandoah (generational)                 │
│                                                                 │
│  Cloud Native Microservices:                                   │
│    ├─ Latency Sensitive: ZGC or Shenandoah                    │
│    ├─ Standard Latency: G1 (default)                          │
│    └─ Cost Optimized: G1 (lower memory overhead)              │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

8.3 Migration Strategy

When migrating from G1 to a pauseless collector:

  1. Measure Baseline: Capture GC logs and application metrics with G1
  2. Test with ZGC: Start with ZGC as it requires minimal tuning
  3. Increase Heap Size: Add 20-30% headroom for concurrent collection
  4. Load Test: Run full load tests and measure latency percentiles
  5. Compare Shenandoah: If ZGC does not meet requirements, test Shenandoah
  6. Monitor Production: Deploy to subset of production with monitoring
  7. Evaluate C4: If ultra low latency is critical and budget allows, evaluate Azul

Common issues during migration:

Out of Memory: Increase heap size by 20-30% Lower Throughput: Expected trade off; evaluate if latency improvement justifies cost Increased CPU Usage: Normal for concurrent collectors; may need more CPU capacity Higher Memory Consumption: Expected; ensure adequate RAM available

9. Best Practices

9.1 Configuration Guidelines

Heap Sizing:

// DO: Fixed heap size for predictable performance
java -XX:+UseZGC -Xmx32g -Xms32g YourApplication

// DON'T: Variable heap size (causes uncommit/commit latency)
java -XX:+UseZGC -Xmx32g -Xms8g YourApplication

Memory Pre touching:

// DO: Pre-touch for consistent latency
java -XX:+UseZGC -Xmx32g -Xms32g -XX:+AlwaysPreTouch YourApplication

// Context: Pre-touching pages memory upfront avoids page faults during execution

GC Logging:

// DO: Enable detailed logging during evaluation
java -XX:+UseZGC -Xlog:gc*=info:file=gc.log:time,uptime,level,tags YourApplication

// DO: Use simplified logging in production
java -XX:+UseZGC -Xlog:gc:file=gc.log YourApplication

Large Pages:

// DO: Enable for better performance (requires OS configuration)
java -XX:+UseZGC -XX:+UseLargePages YourApplication

// DO: Enable transparent huge pages as alternative
java -XX:+UseZGC -XX:+UseTransparentHugePages YourApplication

9.2 Monitoring and Observability

Essential metrics to monitor:

GC Pause Times:

  • Track p50, p95, p99, p99.9, and max pause times
  • Alert on pauses exceeding SLA thresholds
  • Use GC logs or JMX for collection

Heap Usage:

  • Monitor committed heap size
  • Track allocation rate (MB/s)
  • Watch for sustained high occupancy (>80%)

CPU Utilization:

  • Separate application threads from GC threads
  • Monitor for CPU saturation
  • Track CPU time in GC vs application

Throughput:

  • Measure application transactions/second
  • Calculate time spent in GC vs application
  • Compare before and after collector changes

9.3 Common Pitfalls

Insufficient Heap Headroom: Pauseless collectors need space to operate concurrently. Failing to provide adequate headroom leads to allocation stalls. Solution: Increase heap by 20-30%.

Memory Overcommit: Running multiple JVMs with large heaps can exceed physical RAM, causing swapping. Solution: Account for total memory consumption across all JVMs.

Ignoring CPU Requirements: Concurrent collectors use CPU for GC work. Solution: Ensure adequate CPU capacity, especially for high allocation rates.

Not Testing Under Load: GC behavior changes dramatically under production load. Solution: Always load test with realistic traffic patterns.

Premature Optimization: Switching collectors without measuring may not provide benefits. Solution: Measure first, optimize second.

10. Future Developments

10.1 Ongoing Improvements

The Java garbage collection landscape continues to evolve:

ZGC Enhancements:

  • Further reduction of pause times toward 0.1ms target
  • Improved throughput in generational mode
  • Better NUMA support and multi socket systems
  • Enhanced adaptive heuristics

Shenandoah Evolution:

  • Continued optimization of generational mode
  • Reduced memory overhead
  • Better handling of extremely high allocation rates
  • Performance parity with ZGC in more scenarios

JVM Platform Evolution:

  • Project Lilliput: Compact object headers to reduce memory overhead
  • Project Valhalla: Value types may reduce allocation pressure
  • Improved JIT compiler optimizations for GC barriers

10.2 Emerging Trends

Default Collector Changes: As pauseless collectors mature, they may become default for more scenarios. Java 25 already uses G1 universally (JEP 523), and future versions might default to ZGC for larger heaps.

Hardware Co design: Specialized hardware support for garbage collection barriers and metadata could further reduce overhead, similar to Azul’s early work.

Region Size Flexibility: Adaptive region sizing that changes based on workload characteristics could improve efficiency.

Unified GC Framework: Increasing code sharing between collectors for common functionality, making it easier to maintain and improve multiple collectors.

11. Conclusion

The pauseless garbage collector landscape in Java 25 represents a remarkable achievement in language runtime technology. Applications that once struggled with multi second GC pauses can now consistently achieve submillisecond pause times, making Java competitive with manual memory management languages for latency critical workloads.

Key Takeaways:

  1. ZGC is the premier open source pauseless collector, offering submillisecond pause times at any heap size with minimal tuning. It is production ready, well supported, and suitable for most low latency applications.
  2. Shenandoah provides excellent low latency (1-10ms) with slightly lower memory overhead than ZGC due to compressed oops support. Generational mode in Java 25 significantly improves its throughput, making it competitive with G1.
  3. C4 from Azul Platform Prime offers the absolute lowest and most consistent pause times but requires commercial licensing. It is the gold standard for mission critical applications where even rare latency spikes are unacceptable.
  4. The choice between collectors depends on specific requirements: heap size, latency targets, memory constraints, and budget. Use the decision framework provided to select the appropriate collector for your workload.
  5. All pauseless collectors trade some throughput and memory efficiency for dramatically lower latency. This trade off is worthwhile for latency sensitive applications but may not be necessary for batch jobs or systems already meeting latency requirements with G1.
  6. Testing under realistic load is essential. Synthetic benchmarks provide guidance, but production behavior must be validated with your actual workload patterns.

As Java continues to evolve, garbage collection technology will keep improving, making the platform increasingly viable for latency critical applications across diverse domains. The future of Java is pauseless, and that future has arrived with Java 25.

12. References and Further Reading

Official Documentation:

  • Oracle Java 25 GC Tuning Guide: https://docs.oracle.com/en/java/javase/25/gctuning/
  • OpenJDK ZGC Project: https://openjdk.org/projects/zgc/
  • OpenJDK Shenandoah Project: https://openjdk.org/projects/shenandoah/
  • Azul Platform Prime Documentation: https://docs.azul.com/prime/

Research Papers:

  • “Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK” – ACM TOPLAS
  • “The Pauseless GC Algorithm” – Azul Systems
  • “Shenandoah: An Open Source Concurrent Compacting Garbage Collector” – Red Hat

Performance Studies:

  • “A Performance Comparison of Modern Garbage Collectors for Big Data Environments”
  • “Performance evaluation of Java garbage collectors for large-scale Java applications”
  • Various benchmark reports on ionutbalosin.com

Community Resources:

  • Inside.java blog for latest JVM developments
  • Baeldung JVM garbage collector tutorials
  • Red Hat Developer articles on Shenandoah
  • Per Liden’s blog on ZGC developments

Tools:

  • GCeasy: Online GC log analyzer
  • JClarity Censum: GC analysis tool
  • VisualVM: JVM monitoring and profiling
  • Java Mission Control: Advanced monitoring and diagnostics

Document Version: 1.0
Last Updated: December 2025
Target Java Version: Java 25 LTS
Author: Technical Documentation
License: Creative Commons Attribution 4.0

0
0

Deep Dive: AWS NLB Sticky Sessions (stickiness) Setup, Behavior, and Hidden Pitfalls

When you deploy applications behind a Network Load Balancer (NLB) in AWS, you usually expect perfect traffic distribution, fast, fair, and stateless.
But what if your backend holds stateful sessions, like in-memory login sessions, caching, or WebSocket connections and you need a given client to keep hitting the same target every time?

That’s where NLB sticky sessions (also called connection stickiness or source IP affinity) come in. They’re powerful but also misunderstood and misconfiguring them can lead to uneven load, dropped connections, or mysterious client “resets.”

Let’s break down exactly how they work, how to set them up, what to watch for, and how to troubleshoot the tricky edge cases that appear in production.


1. What Are Sticky Sessions on an NLB?

At a high level, sticky sessions ensure that traffic from the same client consistently lands on the same target (EC2 instance, IP, or container) behind your NLB.

Unlike the Application Load Balancer (ALB) — which uses HTTP cookies for stickiness, the NLB operates at Layer 4 (TCP/UDP).
That means it doesn’t look inside your packets. Instead, it bases stickiness on network-level parameters like:

  • Source IP address
  • Destination IP and port
  • Source port (sometimes included in the hash)
  • Protocol (TCP, UDP, or TLS passthrough)

AWS refers to this as “source IP affinity.”
When enabled, the NLB creates a flow-hash mapping that ties the client to a backend target.
As long as the hash remains the same, the same client gets routed to the same target — even across multiple connections.


2. Enabling Sticky Sessions on an AWS NLB

Stickiness is configured per target group, not at the NLB level.

Step-by-Step via AWS Console

  1. Go to EC2 → Load Balancers → Target Groups
    Find the target group your NLB listener uses.
  2. Select the Target Group → Attributes tab
  3. Under Attributes, set:
  • Stickiness.enabled = true
  • Stickiness.type = source_ip
  1. Save changes and confirm the attributes are updated.

Step-by-Step via AWS CLI

```bash
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123 \
--attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=source_ip

How to Verify:

aws elbv2 describe-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:region:acct:targetgroup/mytg/abc123

Sample Output:

{
    "Attributes": [
        { "Key": "stickiness.enabled", "Value": "true" },
        { "Key": "stickiness.type", "Value": "source_ip" }
    ]
}

3. How NLB Stickiness Actually Works (Under the Hood)

The NLB’s flow hashing algorithm calculates a hash from several parameters, often the “five-tuple”:

<protocol, source IP, source port, destination IP, destination port>

The hash is used to choose a target. When stickiness is enabled, NLB remembers this mapping for some time (typically a few minutes to hours, depending on flow expiration).

Key Behavior Points:

  • If the same client connects again using the same IP and port, the hash matches == same backend target.
  • If any part of that tuple changes (e.g. client source port changes), the hash may change == client might hit a different target.
  • NLBs maintain this mapping in memory; if the NLB node restarts or fails over, the mapping is lost.
  • Sticky mappings can also be lost when cross-zone load balancing or target health status changes.

Not Cookie Based

Because NLBs don’t inspect HTTP traffic, there’s no cookie involved.
This means:

  • You can’t set session duration or expiry time like in ALB stickiness.
  • Stickiness only works as long as the same network path and source IP persist.

4. Known Limitations & Edge Cases

Sticky sessions on NLBs are helpful but brittle. Here’s what can go wrong:

IssueCauseEffect
Client source IP changesNAT, VPN, mobile switching networksHash changes → new target
Different source portClient opens multiple sockets or reconnectsEach connection may map differently
TLS termination at NLBNLB terminates TLSStickiness not supported (only for TCP listeners)
Unhealthy targetHealth check failsMapping breaks; NLB reroutes
Cross-zone load balancing toggledDistribution rules changeMay break existing sticky mappings
DNS round-robin at clientNLB has multiple IPs per AZClient DNS resolver may change NLB node
UDP behaviorStateless packets; different flow hashStickiness unreliable for UDP
Scaling up/downNew targets addedHash table rebalanced; some clients remapped

Tip: If you rely on stickiness, keep your clients stable (same IP) and avoid frequent target registration changes.

5. Troubleshooting Sticky Session Problems

When things go wrong, these are the most common patterns you’ll see:

1. “Stickiness not working”

  • Check target group attributes: aws elbv2 describe-target-group-attributes --target-group-arn <arn> Ensure stickiness.enabled is true.
  • Make sure your listener protocol is TCP, not TLS.
  • Confirm that client IPs aren’t being rewritten by NAT or proxy.
  • Check CloudWatch metrics. If one target gets all the traffic, stickiness might be too “sticky” due to limited source IP variety.

2. “Some clients lose session state randomly”

  • Verify client network stability. Mobile clients or corporate proxies can rotate IPs.
  • Confirm health checks aren’t flapping targets.
  • Review your application session design, if session data lives in memory, consider an external session store (Redis, DynamoDB, etc.).

3. “Load imbalance: one instance overloaded”

  • This can happens when many users share one public IP (common in offices or ISPs).
    All those clients hash to the same backend.
  • Mitigate by:
    • Disabling stickiness if not strictly required.
    • Using ALB with cookie based stickiness (more granular).
    • Scaling target capacity.

4. “Connections drop after some time”

  • NLB may remove stale flow mappings.
  • Check TCP keepalive settings on clients and targets. Ensure keepalive_time < NLB idle timeout (350 seconds) to prevent connection resets. Linux commands below:
# Check keepalive time (seconds before sending first keepalive probe)
sysctl net.ipv4.tcp_keepalive_time

# Check keepalive interval (seconds between probes)
sysctl net.ipv4.tcp_keepalive_intvl

# Check keepalive probes (number of probes before giving up)
sysctl net.ipv4.tcp_keepalive_probes

# View all at once
sysctl -a | grep tcp_keepalive
  • Verify idle timeout on backend apps (e.g., web servers closing connections too early).

6. Observability & Testing

You can validate sticky behavior with:

  • CloudWatch metrics:
    ActiveFlowCount, NewFlowCount, and per target request metrics.
  • VPC Flow Logs: confirm that repeated requests from the same client IP go to the same backend ENI.
  • Packet captures: Use tcpdump or ss on your backend instances to see if the same source IP consistently connects.

Quick test with curl:

for i in {1..100}; do 
    echo "=== Request $i at $(date) ===" | tee -a curl_test.log
    curl http://<nlb-dns-name>/ -v 2>&1 | tee -a curl_test.log
    sleep 0.5
done

Run it from the same host and check which backend responds (log hostname on each instance).
Then try from another IP or VPN; you’ll likely see a different target.

7. Best Practices

  1. Only enable stickiness if necessary.
    Stateless applications scale better without it.
  2. If using TLS: terminate TLS at the backend or use ALB if you need session affinity.
  3. Use shared session stores.
    Tools like ElastiCache (Redis) or DynamoDB make scaling simpler and safer.
  4. Avoid toggling cross-zone load balancing during traffic, it resets the sticky map.
  5. Set up proper health checks. Unhealthy targets break affinity immediately.
  6. Monitor uneven load. Large NAT’d user groups can overload a single instance.
  7. For UDP consider designing idempotent stateless processing; sticky sessions may not behave reliably.

8. Example Architecture Pattern

Scenario: A multiplayer game server behind an NLB.
Each player connects via TCP to the game backend that stores their in-memory state.

✅ Recommended setup:

  • Enable stickiness.enabled = true and stickiness.type = source_ip
  • Disable TLS termination at NLB
  • Keep targets in the same AZ with cross-zone load balancing disabled to maintain stable mapping
  • Maintain external health and scaling logic to avoid frequent re-registrations

This setup ensures that the same player IP always lands on the same backend server, as long as their network path is stable.

9. Summary Table

AttributeSupported ValueNotes
stickiness.enabledtrue / falseEnables sticky sessions
stickiness.typesource_ipOnly option for NLB
Supported ProtocolsTCP, UDP (limited)Not supported for TLS listeners
Persistence DurationUntil flow resetNot configurable
Cookie-based Stickiness❌ NoUse ALB for cookie-based
Best forStateful TCP appse.g. games, custom protocols

10. When to Use ALB Instead

If you’re dealing with HTTP/HTTPS applications that manage user sessions via cookies or tokens, you’ll be much happier using an Application Load Balancer.
It offers:

  • Configurable cookie duration
  • Per application stickiness
  • Layer 7 routing and metrics

The NLB should be reserved for high performance, low latency, or non HTTP workloads that need raw TCP/UDP handling.

11. Closing Thoughts

AWS NLB sticky sessions are a great feature, but they’re not magic glue.
They work well when your network topology and client IPs are predictable, and your app genuinely needs flow affinity. However, if your environment involves NATs, mobile networks, or frequent scale-ups, expect surprises.

When in doubt:
1. Keep your app stateless,
2. Let the load balancer do its job, and
3. Use stickiness only as a last resort for legacy or session bound systems.

🧩 References

0
0

Macbook: Enhanced Domain Vulnerability Scanner

Below is a fairly comprehensive passive penetration testing script with vulnerability scanning, API testing, and detailed reporting.

Features

  • DNS & SSL/TLS Analysis – Complete DNS enumeration, certificate inspection, cipher analysis
  • Port & Vulnerability Scanning – Service detection, NMAP vuln scripts, outdated software detection
  • Subdomain Discovery – Certificate transparency log mining
  • API Security Testing – Endpoint discovery, permission testing, CORS analysis
  • Asset Discovery – Web technology detection, CMS identification
  • Firewall Testing – hping3 TCP/ICMP tests (if available)
  • Network Bypass – Uses en0 interface to bypass Zscaler
  • Debug Mode – Comprehensive logging enabled by default

Installation

Required Dependencies

# macOS
brew install nmap openssl bind curl jq

# Linux
sudo apt-get install nmap openssl dnsutils curl jq

Optional Dependencies

# macOS
brew install hping

# Linux
sudo apt-get install hping3 nikto

Usage

Basic Syntax

./security_scanner_enhanced.sh -d DOMAIN [OPTIONS]

Options

  • -d DOMAIN – Target domain (required)
  • -s – Enable subdomain scanning
  • -m NUM – Max subdomains to scan (default: 10)
  • -v – Enable vulnerability scanning
  • -a – Enable API discovery and testing
  • -h – Show help

Examples:

# Basic scan
./security_scanner_enhanced.sh -d example.com

# Full scan with all features
./security_scanner_enhanced.sh -d example.com -s -m 20 -v -a

# Vulnerability assessment only
./security_scanner_enhanced.sh -d example.com -v

# API security testing
./security_scanner_enhanced.sh -d example.com -a

Network Configuration

Default Interface: en0 (bypasses Zscaler)

To change the interface, edit line 24:

NETWORK_INTERFACE="en0"  # Change to your interface

The script automatically falls back to default routing if the interface is unavailable.

Debug Mode

Debug mode is enabled by default and shows:

  • Dependency checks
  • Network interface status
  • Command execution details
  • Scan progress
  • File operations

Debug messages appear in cyan with [DEBUG] prefix.

To disable, edit line 27:

DEBUG=false

Output

Each scan creates a timestamped directory: scan_example.com_20251016_191806/

Key Files

  • executive_summary.md – High-level findings
  • technical_report.md – Detailed technical analysis
  • vulnerability_report.md – Vulnerability assessment (if -v used)
  • api_security_report.md – API security findings (if -a used)
  • dns_*.txt – DNS records
  • ssl_*.txt – SSL/TLS analysis
  • port_scan_*.txt – Port scan results
  • subdomains_discovered.txt – Found subdomains (if -s used)

Scan Duration

Scan TypeDuration
Basic2-5 min
With subdomains+1-2 min/subdomain
With vulnerabilities+10-20 min
Full scan15-30 min

Troubleshooting

Missing dependencies

# Install required tools
brew install nmap openssl bind curl jq  # macOS
sudo apt-get install nmap openssl dnsutils curl jq  # Linux

Interface not found

# Check available interfaces
ifconfig

# Script will automatically fall back to default routing

Permission errors

# Some scans may require elevated privileges
sudo ./security_scanner_enhanced.sh -d example.com

Configuration

Change scan ports (line 325)

# Default: top 1000 ports
--top-ports 1000

# Custom ports
-p 80,443,8080,8443

# All ports (slow)
-p-

Adjust subdomain limit (line 1162)

MAX_SUBDOMAINS=10  # Change as needed

Add custom API paths (line 567)

API_PATHS=(
    "/api"
    "/api/v1"
    "/custom/endpoint"  # Add yours
)

⚠️ WARNING: Only scan domains you own or have explicit permission to test. Unauthorized scanning may be illegal.

This tool performs passive reconnaissance only:

  • ✅ DNS queries, certificate logs, public web requests
  • ❌ No exploitation, brute force, or denial of service

Best Practices

  1. Obtain proper authorization before scanning
  2. Monitor progress via debug output
  3. Review all generated reports
  4. Prioritize findings by risk
  5. Schedule follow-up scans after remediation

Disclaimer: This tool is for authorized security testing only. The authors assume no liability for misuse or damage.

The Script:

cat > ./security_scanner_enhanced.sh << 'EOF'
#!/bin/zsh

################################################################################
# Enhanced Security Scanner Script v2.0
# Comprehensive security assessment with vulnerability scanning
# Includes: NMAP vuln scripts, hping3, asset discovery, API testing
# Network Interface: en0 (bypasses Zscaler)
# Debug Mode: Enabled
################################################################################

# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
MAGENTA='\033[0;35m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color

# Script version
VERSION="2.0.1"

# Network interface to use (bypasses Zscaler)
NETWORK_INTERFACE="en0"

# Debug mode flag
DEBUG=true

################################################################################
# Usage Information
################################################################################
usage() {
    cat << EOF
Enhanced Security Scanner v${VERSION}

Usage: $0 -d DOMAIN [-s] [-m MAX_SUBDOMAINS] [-v] [-a]

Options:
    -d DOMAIN           Target domain to scan (required)
    -s                  Scan subdomains (optional)
    -m MAX_SUBDOMAINS   Maximum number of subdomains to scan (default: 10)
    -v                  Enable vulnerability scanning (NMAP vuln scripts)
    -a                  Enable API discovery and testing
    -h                  Show this help message

Network Configuration:
    Interface: $NETWORK_INTERFACE (bypasses Zscaler)
    Debug Mode: Enabled

Examples:
    $0 -d example.com
    $0 -d example.com -s -m 20 -v
    $0 -d example.com -s -v -a

EOF
    exit 1
}

################################################################################
# Logging Functions
################################################################################
log_info() {
    echo -e "${BLUE}[INFO]${NC} $1"
}

log_success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
}

log_warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

log_vuln() {
    echo -e "${MAGENTA}[VULN]${NC} $1"
}

log_debug() {
    if [ "$DEBUG" = true ]; then
        echo -e "${CYAN}[DEBUG]${NC} $1"
    fi
}

################################################################################
# Check Dependencies
################################################################################
check_dependencies() {
    log_info "Checking dependencies..."
    log_debug "Starting dependency check"
    
    local missing_deps=()
    local optional_deps=()
    
    # Required dependencies
    log_debug "Checking for nmap..."
    command -v nmap >/dev/null 2>&1 || missing_deps+=("nmap")
    log_debug "Checking for openssl..."
    command -v openssl >/dev/null 2>&1 || missing_deps+=("openssl")
    log_debug "Checking for dig..."
    command -v dig >/dev/null 2>&1 || missing_deps+=("dig")
    log_debug "Checking for curl..."
    command -v curl >/dev/null 2>&1 || missing_deps+=("curl")
    log_debug "Checking for jq..."
    command -v jq >/dev/null 2>&1 || missing_deps+=("jq")
    
    # Optional dependencies
    log_debug "Checking for hping3..."
    command -v hping3 >/dev/null 2>&1 || optional_deps+=("hping3")
    log_debug "Checking for nikto..."
    command -v nikto >/dev/null 2>&1 || optional_deps+=("nikto")
    
    if [ ${#missing_deps[@]} -ne 0 ]; then
        log_error "Missing required dependencies: ${missing_deps[*]}"
        log_info "Install missing dependencies and try again"
        exit 1
    fi
    
    if [ ${#optional_deps[@]} -ne 0 ]; then
        log_warning "Missing optional dependencies: ${optional_deps[*]}"
        log_info "Some features may be limited"
    fi
    
    # Check network interface
    log_debug "Checking network interface: $NETWORK_INTERFACE"
    if ifconfig "$NETWORK_INTERFACE" >/dev/null 2>&1; then
        log_success "Network interface $NETWORK_INTERFACE is available"
        local interface_ip=$(ifconfig "$NETWORK_INTERFACE" | grep 'inet ' | awk '{print $2}')
        log_debug "Interface IP: $interface_ip"
    else
        log_warning "Network interface $NETWORK_INTERFACE not found, using default routing"
        NETWORK_INTERFACE=""
    fi
    
    log_success "All required dependencies found"
}

################################################################################
# Initialize Scan
################################################################################
initialize_scan() {
    log_debug "Initializing scan for domain: $DOMAIN"
    SCAN_DATE=$(date +"%Y-%m-%d %H:%M:%S")
    SCAN_DIR="scan_${DOMAIN}_$(date +%Y%m%d_%H%M%S)"
    
    log_debug "Creating scan directory: $SCAN_DIR"
    mkdir -p "$SCAN_DIR"
    cd "$SCAN_DIR" || exit 1
    
    log_success "Created scan directory: $SCAN_DIR"
    log_debug "Current working directory: $(pwd)"
    
    # Initialize report files
    EXEC_REPORT="executive_summary.md"
    TECH_REPORT="technical_report.md"
    VULN_REPORT="vulnerability_report.md"
    API_REPORT="api_security_report.md"
    
    log_debug "Initializing report files"
    > "$EXEC_REPORT"
    > "$TECH_REPORT"
    > "$VULN_REPORT"
    > "$API_REPORT"
    
    log_debug "Scan configuration:"
    log_debug "  - Domain: $DOMAIN"
    log_debug "  - Subdomain scanning: $SCAN_SUBDOMAINS"
    log_debug "  - Max subdomains: $MAX_SUBDOMAINS"
    log_debug "  - Vulnerability scanning: $VULN_SCAN"
    log_debug "  - API scanning: $API_SCAN"
    log_debug "  - Network interface: $NETWORK_INTERFACE"
}

################################################################################
# DNS Reconnaissance
################################################################################
dns_reconnaissance() {
    log_info "Performing DNS reconnaissance..."
    log_debug "Resolving domain: $DOMAIN"
    
    # Resolve domain to IP
    IP_ADDRESS=$(dig +short "$DOMAIN" | grep -E '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$' | head -n1)
    
    if [ -z "$IP_ADDRESS" ]; then
        log_error "Could not resolve domain: $DOMAIN"
        log_debug "DNS resolution failed for $DOMAIN"
        exit 1
    fi
    
    log_success "Resolved $DOMAIN to $IP_ADDRESS"
    log_debug "Target IP address: $IP_ADDRESS"
    
    # Get comprehensive DNS records
    log_debug "Querying DNS records (ANY)..."
    dig "$DOMAIN" ANY > dns_records.txt 2>&1
    log_debug "Querying A records..."
    dig "$DOMAIN" A > dns_a_records.txt 2>&1
    log_debug "Querying MX records..."
    dig "$DOMAIN" MX > dns_mx_records.txt 2>&1
    log_debug "Querying NS records..."
    dig "$DOMAIN" NS > dns_ns_records.txt 2>&1
    log_debug "Querying TXT records..."
    dig "$DOMAIN" TXT > dns_txt_records.txt 2>&1
    
    # Reverse DNS lookup
    log_debug "Performing reverse DNS lookup for $IP_ADDRESS..."
    dig -x "$IP_ADDRESS" > reverse_dns.txt 2>&1
    
    echo "$IP_ADDRESS" > ip_address.txt
    log_debug "DNS reconnaissance complete"
}

################################################################################
# Subdomain Discovery
################################################################################
discover_subdomains() {
    if [ "$SCAN_SUBDOMAINS" = false ]; then
        log_info "Subdomain scanning disabled"
        log_debug "Skipping subdomain discovery"
        echo "0" > subdomain_count.txt
        return
    fi
    
    log_info "Discovering subdomains via certificate transparency..."
    log_debug "Querying crt.sh for subdomains of $DOMAIN"
    log_debug "Maximum subdomains to discover: $MAX_SUBDOMAINS"
    
    # Query crt.sh for subdomains
    curl -s "https://crt.sh/?q=%25.${DOMAIN}&output=json" | \
        jq -r '.[].name_value' | \
        sed 's/\*\.//g' | \
        sort -u | \
        grep -E "^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.${DOMAIN}$" | \
        head -n "$MAX_SUBDOMAINS" > subdomains_discovered.txt
    
    SUBDOMAIN_COUNT=$(wc -l < subdomains_discovered.txt)
    echo "$SUBDOMAIN_COUNT" > subdomain_count.txt
    
    log_success "Discovered $SUBDOMAIN_COUNT subdomains (limited to $MAX_SUBDOMAINS)"
    log_debug "Subdomains saved to: subdomains_discovered.txt"
}

################################################################################
# SSL/TLS Analysis
################################################################################
ssl_tls_analysis() {
    log_info "Analyzing SSL/TLS configuration..."
    log_debug "Connecting to ${DOMAIN}:443 for certificate analysis"
    
    # Get certificate details
    log_debug "Extracting certificate details..."
    echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -text > certificate_details.txt 2>&1
    
    # Extract key information
    log_debug "Extracting certificate issuer..."
    CERT_ISSUER=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -issuer | sed 's/issuer=//')
    
    log_debug "Extracting certificate subject..."
    CERT_SUBJECT=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -subject | sed 's/subject=//')
    
    log_debug "Extracting certificate dates..."
    CERT_DATES=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -dates)
    
    echo "$CERT_ISSUER" > cert_issuer.txt
    echo "$CERT_SUBJECT" > cert_subject.txt
    echo "$CERT_DATES" > cert_dates.txt
    
    log_debug "Certificate issuer: $CERT_ISSUER"
    log_debug "Certificate subject: $CERT_SUBJECT"
    
    # Enumerate SSL/TLS ciphers
    log_info "Enumerating SSL/TLS ciphers..."
    log_debug "Running nmap ssl-enum-ciphers script on port 443"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script ssl-enum-ciphers -p 443 "$DOMAIN" -e "$NETWORK_INTERFACE" -oN ssl_ciphers.txt > /dev/null 2>&1
    else
        nmap --script ssl-enum-ciphers -p 443 "$DOMAIN" -oN ssl_ciphers.txt > /dev/null 2>&1
    fi
    
    # Check for TLS versions
    log_debug "Analyzing TLS protocol versions..."
    TLS_12=$(grep -c "TLSv1.2" ssl_ciphers.txt || echo "0")
    TLS_13=$(grep -c "TLSv1.3" ssl_ciphers.txt || echo "0")
    TLS_10=$(grep -c "TLSv1.0" ssl_ciphers.txt || echo "0")
    TLS_11=$(grep -c "TLSv1.1" ssl_ciphers.txt || echo "0")
    
    echo "TLSv1.0: $TLS_10" > tls_versions.txt
    echo "TLSv1.1: $TLS_11" >> tls_versions.txt
    echo "TLSv1.2: $TLS_12" >> tls_versions.txt
    echo "TLSv1.3: $TLS_13" >> tls_versions.txt
    
    log_debug "TLS versions found - 1.0:$TLS_10 1.1:$TLS_11 1.2:$TLS_12 1.3:$TLS_13"
    
    # Check for SSL vulnerabilities
    log_info "Checking for SSL/TLS vulnerabilities..."
    log_debug "Running SSL vulnerability scripts (heartbleed, poodle, dh-params)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script ssl-heartbleed,ssl-poodle,ssl-dh-params -p 443 "$DOMAIN" -e "$NETWORK_INTERFACE" -oN ssl_vulnerabilities.txt > /dev/null 2>&1
    else
        nmap --script ssl-heartbleed,ssl-poodle,ssl-dh-params -p 443 "$DOMAIN" -oN ssl_vulnerabilities.txt > /dev/null 2>&1
    fi
    
    log_success "SSL/TLS analysis complete"
}

################################################################################
# Port Scanning with Service Detection
################################################################################
port_scanning() {
    log_info "Performing comprehensive port scan..."
    log_debug "Target IP: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # Quick scan of top 1000 ports
    log_info "Scanning top 1000 ports..."
    log_debug "Running nmap with service version detection (-sV) and default scripts (-sC)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap -sV -sC --top-ports 1000 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN port_scan_top1000.txt > /dev/null 2>&1
    else
        nmap -sV -sC --top-ports 1000 "$IP_ADDRESS" -oN port_scan_top1000.txt > /dev/null 2>&1
    fi
    
    # Count open ports
    OPEN_PORTS=$(grep -c "^[0-9]*/tcp.*open" port_scan_top1000.txt || echo "0")
    echo "$OPEN_PORTS" > open_ports_count.txt
    log_debug "Found $OPEN_PORTS open ports"
    
    # Extract open ports list with versions
    log_debug "Extracting open ports list with service information"
    grep "^[0-9]*/tcp.*open" port_scan_top1000.txt | awk '{print $1, $3, $4, $5, $6}' > open_ports_list.txt
    
    # Detect service versions for old software
    log_info "Detecting service versions..."
    log_debug "Filtering service version information"
    grep "^[0-9]*/tcp.*open" port_scan_top1000.txt | grep -E "version|product" > service_versions.txt
    
    log_success "Port scan complete: $OPEN_PORTS open ports found"
}

################################################################################
# Vulnerability Scanning
################################################################################
vulnerability_scanning() {
    if [ "$VULN_SCAN" = false ]; then
        log_info "Vulnerability scanning disabled"
        log_debug "Skipping vulnerability scanning"
        return
    fi
    
    log_info "Performing vulnerability scanning (this may take 10-20 minutes)..."
    log_debug "Target: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # NMAP vulnerability scripts
    log_info "Running NMAP vulnerability scripts..."
    log_debug "Starting comprehensive vulnerability scan on all ports (-p-)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script vuln -p- "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN nmap_vuln_scan.txt > /dev/null 2>&1 &
    else
        nmap --script vuln -p- "$IP_ADDRESS" -oN nmap_vuln_scan.txt > /dev/null 2>&1 &
    fi
    VULN_PID=$!
    log_debug "Vulnerability scan PID: $VULN_PID"
    
    # Wait with progress indicator
    log_debug "Waiting for vulnerability scan to complete..."
    while kill -0 $VULN_PID 2>/dev/null; do
        echo -n "."
        sleep 5
    done
    echo
    
    # Parse vulnerability results
    if [ -f nmap_vuln_scan.txt ]; then
        log_debug "Parsing vulnerability scan results"
        grep -i "VULNERABLE" nmap_vuln_scan.txt > vulnerabilities_found.txt || echo "No vulnerabilities found" > vulnerabilities_found.txt
        VULN_COUNT=$(grep -c "VULNERABLE" nmap_vuln_scan.txt || echo "0")
        echo "$VULN_COUNT" > vulnerability_count.txt
        log_success "Vulnerability scan complete: $VULN_COUNT vulnerabilities found"
        log_debug "Vulnerability details saved to: vulnerabilities_found.txt"
    fi
    
    # Check for specific vulnerabilities
    log_info "Checking for common HTTP vulnerabilities..."
    log_debug "Running HTTP vulnerability scripts on ports 80,443,8080,8443"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script http-vuln-* -p 80,443,8080,8443 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN http_vulnerabilities.txt > /dev/null 2>&1
    else
        nmap --script http-vuln-* -p 80,443,8080,8443 "$IP_ADDRESS" -oN http_vulnerabilities.txt > /dev/null 2>&1
    fi
    log_debug "HTTP vulnerability scan complete"
}

################################################################################
# hping3 Testing
################################################################################
hping3_testing() {
    if ! command -v hping3 >/dev/null 2>&1; then
        log_warning "hping3 not installed, skipping firewall tests"
        log_debug "hping3 command not found in PATH"
        return
    fi
    
    log_info "Performing hping3 firewall tests..."
    log_debug "Target: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # TCP SYN scan
    log_info "Testing TCP SYN response..."
    log_debug "Sending 5 TCP SYN packets to port 80"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -S -p 80 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_syn.txt 2>&1 || true
    else
        timeout 10 hping3 -S -p 80 -c 5 "$IP_ADDRESS" > hping3_syn.txt 2>&1 || true
    fi
    log_debug "TCP SYN test complete"
    
    # TCP ACK scan (firewall detection)
    log_info "Testing firewall with TCP ACK..."
    log_debug "Sending 5 TCP ACK packets to port 80 for firewall detection"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -A -p 80 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_ack.txt 2>&1 || true
    else
        timeout 10 hping3 -A -p 80 -c 5 "$IP_ADDRESS" > hping3_ack.txt 2>&1 || true
    fi
    log_debug "TCP ACK test complete"
    
    # ICMP test
    log_info "Testing ICMP response..."
    log_debug "Sending 5 ICMP echo requests"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -1 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_icmp.txt 2>&1 || true
    else
        timeout 10 hping3 -1 -c 5 "$IP_ADDRESS" > hping3_icmp.txt 2>&1 || true
    fi
    log_debug "ICMP test complete"
    
    log_success "hping3 tests complete"
}

################################################################################
# Asset Discovery
################################################################################
asset_discovery() {
    log_info "Performing detailed asset discovery..."
    log_debug "Creating assets directory"
    
    mkdir -p assets
    
    # Web technology detection
    log_info "Detecting web technologies..."
    log_debug "Fetching HTTP headers from https://${DOMAIN}"
    curl -s -I "https://${DOMAIN}" | grep -i "server\|x-powered-by\|x-aspnet-version" > assets/web_technologies.txt
    log_debug "Web technologies saved to: assets/web_technologies.txt"
    
    # Detect CMS
    log_info "Detecting CMS and frameworks..."
    log_debug "Analyzing page content for CMS signatures"
    curl -s "https://${DOMAIN}" | grep -iE "wordpress|joomla|drupal|magento|shopify" > assets/cms_detection.txt || echo "No CMS detected" > assets/cms_detection.txt
    log_debug "CMS detection complete"
    
    # JavaScript libraries
    log_info "Detecting JavaScript libraries..."
    log_debug "Searching for common JavaScript libraries"
    curl -s "https://${DOMAIN}" | grep -oE "jquery|angular|react|vue|bootstrap" | sort -u > assets/js_libraries.txt || echo "None detected" > assets/js_libraries.txt
    log_debug "JavaScript libraries saved to: assets/js_libraries.txt"
    
    # Check for common files
    log_info "Checking for common files..."
    log_debug "Testing for robots.txt, sitemap.xml, security.txt, etc."
    for file in robots.txt sitemap.xml security.txt .well-known/security.txt humans.txt; do
        log_debug "Checking for: $file"
        if curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}/${file}" | grep -q "200"; then
            echo "$file: Found" >> assets/common_files.txt
            log_debug "Found: $file"
            curl -s "https://${DOMAIN}/${file}" > "assets/${file//\//_}"
        fi
    done
    
    # Server fingerprinting
    log_info "Fingerprinting server..."
    log_debug "Running nmap HTTP server header and title scripts"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap -sV --script http-server-header,http-title -p 80,443 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN assets/server_fingerprint.txt > /dev/null 2>&1
    else
        nmap -sV --script http-server-header,http-title -p 80,443 "$IP_ADDRESS" -oN assets/server_fingerprint.txt > /dev/null 2>&1
    fi
    
    log_success "Asset discovery complete"
}

################################################################################
# Old Software Detection
################################################################################
detect_old_software() {
    log_info "Detecting outdated software versions..."
    log_debug "Creating old_software directory"
    
    mkdir -p old_software
    
    # Parse service versions from port scan
    if [ -f service_versions.txt ]; then
        log_debug "Analyzing service versions for outdated software"
        
        # Check for old Apache versions
        log_debug "Checking for old Apache versions..."
        grep -i "apache" service_versions.txt | grep -E "1\.|2\.0|2\.2" > old_software/apache_old.txt || true
        
        # Check for old OpenSSH versions
        log_debug "Checking for old OpenSSH versions..."
        grep -i "openssh" service_versions.txt | grep -E "[1-6]\." > old_software/openssh_old.txt || true
        
        # Check for old PHP versions
        log_debug "Checking for old PHP versions..."
        grep -i "php" service_versions.txt | grep -E "[1-5]\." > old_software/php_old.txt || true
        
        # Check for old MySQL versions
        log_debug "Checking for old MySQL versions..."
        grep -i "mysql" service_versions.txt | grep -E "[1-4]\." > old_software/mysql_old.txt || true
        
        # Check for old nginx versions
        log_debug "Checking for old nginx versions..."
        grep -i "nginx" service_versions.txt | grep -E "0\.|1\.0|1\.1[0-5]" > old_software/nginx_old.txt || true
    fi
    
    # Check SSL/TLS for old versions
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        log_debug "Outdated TLS protocols detected"
        echo "Outdated TLS protocols detected: TLSv1.0 or TLSv1.1" > old_software/tls_old.txt
    fi
    
    # Count old software findings
    OLD_SOFTWARE_COUNT=$(find old_software -type f ! -empty | wc -l)
    echo "$OLD_SOFTWARE_COUNT" > old_software_count.txt
    
    if [ "$OLD_SOFTWARE_COUNT" -gt 0 ]; then
        log_warning "Found $OLD_SOFTWARE_COUNT outdated software components"
        log_debug "Outdated software details saved in old_software/ directory"
    else
        log_success "No obviously outdated software detected"
    fi
}

################################################################################
# API Discovery
################################################################################
api_discovery() {
    if [ "$API_SCAN" = false ]; then
        log_info "API scanning disabled"
        log_debug "Skipping API discovery"
        return
    fi
    
    log_info "Discovering APIs..."
    log_debug "Creating api_discovery directory"
    
    mkdir -p api_discovery
    
    # Common API paths
    API_PATHS=(
        "/api"
        "/api/v1"
        "/api/v2"
        "/rest"
        "/graphql"
        "/swagger"
        "/swagger.json"
        "/api-docs"
        "/openapi.json"
        "/.well-known/openapi"
    )
    
    log_debug "Testing ${#API_PATHS[@]} common API endpoints"
    for path in "${API_PATHS[@]}"; do
        log_debug "Testing: $path"
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}${path}")
        if [ "$HTTP_CODE" != "404" ]; then
            echo "$path: HTTP $HTTP_CODE" >> api_discovery/endpoints_found.txt
            log_debug "Found API endpoint: $path (HTTP $HTTP_CODE)"
            curl -s "https://${DOMAIN}${path}" > "api_discovery/${path//\//_}.txt" 2>/dev/null || true
        fi
    done
    
    # Check for API documentation
    log_info "Checking for API documentation..."
    log_debug "Testing for Swagger UI and API docs"
    curl -s "https://${DOMAIN}/swagger-ui" > api_discovery/swagger_ui.txt 2>/dev/null || true
    curl -s "https://${DOMAIN}/api/docs" > api_discovery/api_docs.txt 2>/dev/null || true
    
    log_success "API discovery complete"
}

################################################################################
# API Permission Testing
################################################################################
api_permission_testing() {
    if [ "$API_SCAN" = false ]; then
        log_debug "API scanning disabled, skipping permission testing"
        return
    fi
    
    log_info "Testing API permissions..."
    log_debug "Creating api_permissions directory"
    
    mkdir -p api_permissions
    
    # Test common API endpoints without authentication
    if [ -f api_discovery/endpoints_found.txt ]; then
        log_debug "Testing discovered API endpoints for authentication issues"
        while IFS= read -r endpoint; do
            API_PATH=$(echo "$endpoint" | cut -d: -f1)
            
            # Test GET without auth
            log_info "Testing $API_PATH without authentication..."
            log_debug "Sending unauthenticated GET request to $API_PATH"
            HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}${API_PATH}")
            echo "$API_PATH: $HTTP_CODE" >> api_permissions/unauth_access.txt
            log_debug "Response: HTTP $HTTP_CODE"
            
            # Test common HTTP methods
            log_debug "Testing HTTP methods on $API_PATH"
            for method in GET POST PUT DELETE PATCH; do
                HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X "$method" "https://${DOMAIN}${API_PATH}")
                if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "201" ]; then
                    log_warning "$API_PATH allows $method without authentication (HTTP $HTTP_CODE)"
                    echo "$API_PATH: $method - HTTP $HTTP_CODE" >> api_permissions/method_issues.txt
                fi
            done
        done < api_discovery/endpoints_found.txt
    fi
    
    # Check for CORS misconfigurations
    log_info "Checking CORS configuration..."
    log_debug "Testing CORS headers with evil.com origin"
    curl -s -H "Origin: https://evil.com" -I "https://${DOMAIN}/api" | grep -i "access-control" > api_permissions/cors_headers.txt || true
    
    log_success "API permission testing complete"
}

################################################################################
# HTTP Security Headers
################################################################################
http_security_headers() {
    log_info "Analyzing HTTP security headers..."
    log_debug "Fetching headers from https://${DOMAIN}"
    
    # Get headers from main domain
    curl -I "https://${DOMAIN}" 2>/dev/null > http_headers.txt
    
    # Check for specific security headers
    declare -A HEADERS=(
        ["x-frame-options"]="X-Frame-Options"
        ["x-content-type-options"]="X-Content-Type-Options"
        ["strict-transport-security"]="Strict-Transport-Security"
        ["content-security-policy"]="Content-Security-Policy"
        ["referrer-policy"]="Referrer-Policy"
        ["permissions-policy"]="Permissions-Policy"
        ["x-xss-protection"]="X-XSS-Protection"
    )
    
    log_debug "Checking for security headers"
    > security_headers_status.txt
    for header in "${!HEADERS[@]}"; do
        if grep -qi "^${header}:" http_headers.txt; then
 security_headers_status.txt
        else
            echo "${HEADERS[$header]}: Missing" >> security_headers_status.txt
        fi
    done
    
    log_success "HTTP security headers analysis complete"
}

################################################################################
# Subdomain Scanning
################################################################################
scan_subdomains() {
    if [ "$SCAN_SUBDOMAINS" = false ] || [ ! -f subdomains_discovered.txt ]; then
        log_debug "Subdomain scanning disabled or no subdomains discovered"
        return
    fi
    
    log_info "Scanning discovered subdomains..."
    log_debug "Creating subdomain_scans directory"
    
    mkdir -p subdomain_scans
    
    local count=0
    while IFS= read -r subdomain; do
        count=$((count + 1))
        log_info "Scanning subdomain $count/$SUBDOMAIN_COUNT: $subdomain"
        log_debug "Testing accessibility of $subdomain"
        
        # Quick check if subdomain is accessible
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${subdomain}" --max-time 5)
        
        if echo "$HTTP_CODE" | grep -q "^[2-4]"; then
            log_debug "$subdomain is accessible (HTTP $HTTP_CODE)"
            
            # Get headers
            log_debug "Fetching headers from $subdomain"
            curl -I "https://${subdomain}" 2>/dev/null > "subdomain_scans/${subdomain}_headers.txt"
            
            # Quick port check (top 100 ports)
            log_debug "Scanning top 100 ports on $subdomain"
            if [ -n "$NETWORK_INTERFACE" ]; then
                nmap --top-ports 100 "$subdomain" -e "$NETWORK_INTERFACE" -oN "subdomain_scans/${subdomain}_ports.txt" > /dev/null 2>&1
            else
                nmap --top-ports 100 "$subdomain" -oN "subdomain_scans/${subdomain}_ports.txt" > /dev/null 2>&1
            fi
            
            # Check for old software
            log_debug "Checking service versions on $subdomain"
            if [ -n "$NETWORK_INTERFACE" ]; then
                nmap -sV --top-ports 10 "$subdomain" -e "$NETWORK_INTERFACE" -oN "subdomain_scans/${subdomain}_versions.txt" > /dev/null 2>&1
            else
                nmap -sV --top-ports 10 "$subdomain" -oN "subdomain_scans/${subdomain}_versions.txt" > /dev/null 2>&1
            fi
            
            log_success "Scanned: $subdomain (HTTP $HTTP_CODE)"
        else
            log_warning "Subdomain not accessible: $subdomain (HTTP $HTTP_CODE)"
        fi
    done < subdomains_discovered.txt
    
    log_success "Subdomain scanning complete"
}

################################################################################
# Generate Executive Summary
################################################################################
generate_executive_summary() {
    log_info "Generating executive summary..."
    log_debug "Creating executive summary report"
    
    cat > "$EXEC_REPORT" << EOF
# Executive Summary
## Enhanced Security Assessment Report

**Target Domain:** $DOMAIN  
**Target IP:** $IP_ADDRESS  
**Scan Date:** $SCAN_DATE  
**Scanner Version:** $VERSION  
**Network Interface:** $NETWORK_INTERFACE

---

## Overview

This report summarizes the comprehensive security assessment findings for $DOMAIN. The assessment included passive reconnaissance, vulnerability scanning, asset discovery, and API security testing.

---

## Key Findings

### 1. Domain Information

- **Primary Domain:** $DOMAIN
- **IP Address:** $IP_ADDRESS
- **Subdomains Discovered:** $(cat subdomain_count.txt)

### 2. SSL/TLS Configuration

**Certificate Information:**
\`\`\`
Issuer: $(cat cert_issuer.txt)
Subject: $(cat cert_subject.txt)
$(cat cert_dates.txt)
\`\`\`

**TLS Protocol Support:**
\`\`\`
$(cat tls_versions.txt)
\`\`\`

**Assessment:**
EOF

    # Add TLS assessment
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "⚠️ **Warning:** Outdated TLS protocols detected (TLSv1.0/1.1)" >> "$EXEC_REPORT"
    else
        echo "✅ **Good:** Only modern TLS protocols detected (TLSv1.2/1.3)" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 3. Port Exposure

- **Open Ports (Top 1000):** $(cat open_ports_count.txt)

**Open Ports List:**
\`\`\`
$(cat open_ports_list.txt)
\`\`\`

### 4. Vulnerability Assessment

EOF

    if [ "$VULN_SCAN" = true ] && [ -f vulnerability_count.txt ]; then
        cat >> "$EXEC_REPORT" << EOF
- **Vulnerabilities Found:** $(cat vulnerability_count.txt)

**Critical Vulnerabilities:**
\`\`\`
$(head -20 vulnerabilities_found.txt)
\`\`\`

EOF
    else
        echo "Vulnerability scanning was not performed." >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 5. Outdated Software

- **Outdated Components Found:** $(cat old_software_count.txt)

EOF

    if [ -d old_software ] && [ "$(ls -A old_software)" ]; then
        echo "**Outdated Software Detected:**" >> "$EXEC_REPORT"
        echo "\`\`\`" >> "$EXEC_REPORT"
        find old_software -type f ! -empty -exec basename {} \; >> "$EXEC_REPORT"
        echo "\`\`\`" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 6. API Security

EOF

    if [ "$API_SCAN" = true ]; then
        if [ -f api_discovery/endpoints_found.txt ]; then
            cat >> "$EXEC_REPORT" << EOF
**API Endpoints Discovered:**
\`\`\`
$(cat api_discovery/endpoints_found.txt)
\`\`\`

EOF
        fi
        
        if [ -f api_permissions/method_issues.txt ]; then
            cat >> "$EXEC_REPORT" << EOF
**API Permission Issues:**
\`\`\`
$(cat api_permissions/method_issues.txt)
\`\`\`

EOF
        fi
    else
        echo "API scanning was not performed." >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 7. HTTP Security Headers

\`\`\`
$(cat security_headers_status.txt)
\`\`\`

---

## Priority Recommendations

### Immediate Actions (Priority 1)

EOF

    # Add specific recommendations
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "1. **Disable TLSv1.0/1.1:** Update TLS configuration immediately" >> "$EXEC_REPORT"
    fi
    
    if [ -f vulnerability_count.txt ] && [ "$(cat vulnerability_count.txt)" -gt 0 ]; then
        echo "2. **Patch Vulnerabilities:** Address $(cat vulnerability_count.txt) identified vulnerabilities" >> "$EXEC_REPORT"
    fi
    
    if [ -f old_software_count.txt ] && [ "$(cat old_software_count.txt)" -gt 0 ]; then
        echo "3. **Update Software:** Upgrade $(cat old_software_count.txt) outdated components" >> "$EXEC_REPORT"
    fi
    
    if grep -q "Missing" security_headers_status.txt; then
        echo "4. **Implement Security Headers:** Add missing HTTP security headers" >> "$EXEC_REPORT"
    fi
    
    if [ -f api_permissions/method_issues.txt ]; then
        echo "5. **Fix API Permissions:** Implement proper authentication on exposed APIs" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### Review Actions (Priority 2)

1. Review all open ports and close unnecessary services
2. Audit subdomain inventory and decommission unused subdomains
3. Implement API authentication and authorization
4. Regular vulnerability scanning schedule
5. Software update policy and procedures

---

## Next Steps

1. Review detailed technical and vulnerability reports
2. Prioritize remediation based on risk assessment
3. Implement security improvements
4. Schedule follow-up assessment after remediation

---

**Report Generated:** $(date)  
**Scan Directory:** $SCAN_DIR

**Additional Reports:**
- Technical Report: technical_report.md
- Vulnerability Report: vulnerability_report.md
- API Security Report: api_security_report.md

EOF

    log_success "Executive summary generated: $EXEC_REPORT"
    log_debug "Executive summary saved to: $SCAN_DIR/$EXEC_REPORT"
}

################################################################################
# Generate Technical Report
################################################################################
generate_technical_report() {
    log_info "Generating detailed technical report..."
    log_debug "Creating technical report"
    
    cat > "$TECH_REPORT" << EOF
# Technical Security Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Target IP:** $IP_ADDRESS  
**Scanner Version:** $VERSION  
**Network Interface:** $NETWORK_INTERFACE  
**Classification:** CONFIDENTIAL

---

## 1. Scope

**Primary Target:** $DOMAIN  
**IP Address:** $IP_ADDRESS  
**Subdomain Scanning:** $([ "$SCAN_SUBDOMAINS" = true ] && echo "Enabled" || echo "Disabled")  
**Vulnerability Scanning:** $([ "$VULN_SCAN" = true ] && echo "Enabled" || echo "Disabled")  
**API Testing:** $([ "$API_SCAN" = true ] && echo "Enabled" || echo "Disabled")

---

## 2. DNS Configuration

\`\`\`
$(cat dns_records.txt)
\`\`\`

---

## 3. SSL/TLS Configuration

\`\`\`
$(cat certificate_details.txt)
\`\`\`

---

## 4. Port Scan Results

\`\`\`
$(cat port_scan_top1000.txt)
\`\`\`

---

## 5. Vulnerability Assessment

EOF

    if [ "$VULN_SCAN" = true ]; then
        cat >> "$TECH_REPORT" << EOF
### 5.1 NMAP Vulnerability Scan

\`\`\`
$(cat nmap_vuln_scan.txt)
\`\`\`

### 5.2 HTTP Vulnerabilities

\`\`\`
$(cat http_vulnerabilities.txt)
\`\`\`

### 5.3 SSL/TLS Vulnerabilities

\`\`\`
$(cat ssl_vulnerabilities.txt)
\`\`\`

EOF
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 6. Asset Discovery

### 6.1 Web Technologies

\`\`\`
$(cat assets/web_technologies.txt)
\`\`\`

### 6.2 CMS Detection

\`\`\`
$(cat assets/cms_detection.txt)
\`\`\`

### 6.3 JavaScript Libraries

\`\`\`
$(cat assets/js_libraries.txt)
\`\`\`

### 6.4 Common Files

\`\`\`
$(cat assets/common_files.txt 2>/dev/null || echo "No common files found")
\`\`\`

---

## 7. Outdated Software

EOF

    if [ -d old_software ] && [ "$(ls -A old_software)" ]; then
        for file in old_software/*.txt; do
            if [ -f "$file" ] && [ -s "$file" ]; then
                echo "### $(basename "$file" .txt)" >> "$TECH_REPORT"
                echo "\`\`\`" >> "$TECH_REPORT"
                cat "$file" >> "$TECH_REPORT"
                echo "\`\`\`" >> "$TECH_REPORT"
                echo >> "$TECH_REPORT"
            fi
        done
    else
        echo "No outdated software detected." >> "$TECH_REPORT"
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 8. API Security

EOF

    if [ "$API_SCAN" = true ]; then
        cat >> "$TECH_REPORT" << EOF
### 8.1 API Endpoints

\`\`\`
$(cat api_discovery/endpoints_found.txt 2>/dev/null || echo "No API endpoints found")
\`\`\`

### 8.2 API Permissions

\`\`\`
$(cat api_permissions/unauth_access.txt 2>/dev/null || echo "No permission issues found")
\`\`\`

### 8.3 CORS Configuration

\`\`\`
$(cat api_permissions/cors_headers.txt 2>/dev/null || echo "No CORS headers found")
\`\`\`

EOF
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 9. HTTP Security Headers

\`\`\`
$(cat http_headers.txt)
\`\`\`

**Security Headers Status:**
\`\`\`
$(cat security_headers_status.txt)
\`\`\`

---

## 10. Recommendations

### 10.1 Immediate Actions

EOF

    # Add recommendations
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "1. Disable TLSv1.0 and TLSv1.1 protocols" >> "$TECH_REPORT"
    fi
    
    if [ -f vulnerability_count.txt ] && [ "$(cat vulnerability_count.txt)" -gt 0 ]; then
        echo "2. Patch identified vulnerabilities" >> "$TECH_REPORT"
    fi
    
    if [ -f old_software_count.txt ] && [ "$(cat old_software_count.txt)" -gt 0 ]; then
        echo "3. Update outdated software components" >> "$TECH_REPORT"
    fi
    
    cat >> "$TECH_REPORT" << EOF

### 10.2 Review Actions

1. Review all open ports and services
2. Audit subdomain inventory
3. Implement missing security headers
4. Review API authentication
5. Regular security assessments

---

## 11. Document Control

**Classification:** CONFIDENTIAL  
**Distribution:** Security Team, Infrastructure Team  
**Prepared By:** Enhanced Security Scanner v$VERSION  
**Date:** $(date)

---

**END OF TECHNICAL REPORT**
EOF

    log_success "Technical report generated: $TECH_REPORT"
    log_debug "Technical report saved to: $SCAN_DIR/$TECH_REPORT"
}

################################################################################
# Generate Vulnerability Report
################################################################################
generate_vulnerability_report() {
    if [ "$VULN_SCAN" = false ]; then
        log_debug "Vulnerability scanning disabled, skipping vulnerability report"
        return
    fi
    
    log_info "Generating vulnerability report..."
    log_debug "Creating vulnerability report"
    
    cat > "$VULN_REPORT" << EOF
# Vulnerability Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Target IP:** $IP_ADDRESS  
**Scanner Version:** $VERSION

---

## Executive Summary

**Total Vulnerabilities Found:** $(cat vulnerability_count.txt)

---

## 1. NMAP Vulnerability Scan

\`\`\`
$(cat nmap_vuln_scan.txt)
\`\`\`

---

## 2. HTTP Vulnerabilities

\`\`\`
$(cat http_vulnerabilities.txt)
\`\`\`

---

## 3. SSL/TLS Vulnerabilities

\`\`\`
$(cat ssl_vulnerabilities.txt)
\`\`\`

---

## 4. Detailed Findings

\`\`\`
$(cat vulnerabilities_found.txt)
\`\`\`

---

**END OF VULNERABILITY REPORT**
EOF

    log_success "Vulnerability report generated: $VULN_REPORT"
    log_debug "Vulnerability report saved to: $SCAN_DIR/$VULN_REPORT"
}

################################################################################
# Generate API Security Report
################################################################################
generate_api_report() {
    if [ "$API_SCAN" = false ]; then
        log_debug "API scanning disabled, skipping API report"
        return
    fi
    
    log_info "Generating API security report..."
    log_debug "Creating API security report"
    
    cat > "$API_REPORT" << EOF
# API Security Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Scanner Version:** $VERSION

---

## 1. API Discovery

### 1.1 Endpoints Found

\`\`\`
$(cat api_discovery/endpoints_found.txt 2>/dev/null || echo "No API endpoints found")
\`\`\`

---

## 2. Permission Testing

### 2.1 Unauthenticated Access

\`\`\`
$(cat api_permissions/unauth_access.txt 2>/dev/null || echo "No unauthenticated access issues")
\`\`\`

### 2.2 HTTP Method Issues

\`\`\`
$(cat api_permissions/method_issues.txt 2>/dev/null || echo "No method issues found")
\`\`\`

---

## 3. CORS Configuration

\`\`\`
$(cat api_permissions/cors_headers.txt 2>/dev/null || echo "No CORS issues found")
\`\`\`

---

**END OF API SECURITY REPORT**
EOF

    log_success "API security report generated: $API_REPORT"
    log_debug "API security report saved to: $SCAN_DIR/$API_REPORT"
}

################################################################################
# Main Execution
################################################################################
main() {
    echo "========================================"
    echo "Enhanced Security Scanner v${VERSION}"
    echo "========================================"
    echo
    log_debug "Script started at $(date)"
    log_debug "Network interface: $NETWORK_INTERFACE"
    log_debug "Debug mode: $DEBUG"
    echo
    
    # Check dependencies
    check_dependencies
    
    # Initialize scan
    initialize_scan
    
    # Run scans
    log_debug "Starting DNS reconnaissance phase"
    dns_reconnaissance
    
    log_debug "Starting subdomain discovery phase"
    discover_subdomains
    
    log_debug "Starting SSL/TLS analysis phase"
    ssl_tls_analysis
    
    log_debug "Starting port scanning phase"
    port_scanning
    
    if [ "$VULN_SCAN" = true ]; then
        log_debug "Starting vulnerability scanning phase"
        vulnerability_scanning
    fi
    
    log_debug "Starting hping3 testing phase"
    hping3_testing
    
    log_debug "Starting asset discovery phase"
    asset_discovery
    
    log_debug "Starting old software detection phase"
    detect_old_software
    
    if [ "$API_SCAN" = true ]; then
        log_debug "Starting API discovery phase"
        api_discovery
        log_debug "Starting API permission testing phase"
        api_permission_testing
    fi
    
    log_debug "Starting HTTP security headers analysis phase"
    http_security_headers
    
    log_debug "Starting subdomain scanning phase"
    scan_subdomains
    
    # Generate reports
    log_debug "Starting report generation phase"
    generate_executive_summary
    generate_technical_report
    generate_vulnerability_report
    generate_api_report
    
    # Summary
    echo
    echo "========================================"
    log_success "Scan Complete!"
    echo "========================================"
    echo
    log_info "Scan directory: $SCAN_DIR"
    log_info "Executive summary: $SCAN_DIR/$EXEC_REPORT"
    log_info "Technical report: $SCAN_DIR/$TECH_REPORT"
    
    if [ "$VULN_SCAN" = true ]; then
        log_info "Vulnerability report: $SCAN_DIR/$VULN_REPORT"
    fi
    
    if [ "$API_SCAN" = true ]; then
        log_info "API security report: $SCAN_DIR/$API_REPORT"
    fi
    
    echo
    log_info "Review the reports for detailed findings"
    log_debug "Script completed at $(date)"
}

################################################################################
# Parse Command Line Arguments
################################################################################
DOMAIN=""
SCAN_SUBDOMAINS=false
MAX_SUBDOMAINS=10
VULN_SCAN=false
API_SCAN=false

while getopts "d:sm:vah" opt; do
    case $opt in
        d)
            DOMAIN="$OPTARG"
            ;;
        s)
            SCAN_SUBDOMAINS=true
            ;;
        m)
            MAX_SUBDOMAINS="$OPTARG"
            ;;
        v)
            VULN_SCAN=true
            ;;
        a)
            API_SCAN=true
            ;;
        h)
            usage
            ;;
        \?)
            log_error "Invalid option: -$OPTARG"
            usage
            ;;
    esac
done

# Validate required arguments
if [ -z "$DOMAIN" ]; then
    log_error "Domain is required"
    usage
fi

# Run main function
main
            echo "${HEADERS[$header]}: Present" >>

EOF

chmod +x ./security_scanner_enhanced.sh
0
0

Macbook: Return a list of processes using a specific remote port number

I find this script useful for debugging which processes are talking to which remote port.

cat > ~/netmon.sh << 'EOF'
#!/bin/zsh

# Network Connection Monitor with Color Coding
# Shows TCP/UDP connections with state and process info
# Refreshes every 5 seconds
# Usage: ./netmon.sh [--port PORT] [--ip IP_ADDRESS]

# Parse command line arguments
FILTER_PORT=""
FILTER_IP=""

while [[ $# -gt 0 ]]; do
    case $1 in
        --port|-p)
            FILTER_PORT="$2"
            shift 2
            ;;
        --ip|-i)
            FILTER_IP="$2"
            shift 2
            ;;
        --help|-h)
            echo "Usage: $0 [OPTIONS]"
            echo "Options:"
            echo "  --port, -p PORT    Filter by remote port"
            echo "  --ip, -i IP        Filter by remote IP address"
            echo "  --help, -h         Show this help message"
            echo ""
            echo "Examples:"
            echo "  $0 --port 443      Show only connections to port 443"
            echo "  $0 --ip 1.1.1.1    Show only connections to IP 1.1.1.1"
            echo "  $0 -p 80 -i 192.168.1.1  Show connections to 192.168.1.1:80"
            exit 0
            ;;
        *)
            echo "Unknown option: $1"
            echo "Use --help for usage information"
            exit 1
            ;;
    esac
done

# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
MAGENTA='\033[0;35m'
CYAN='\033[0;36m'
WHITE='\033[1;37m'
GRAY='\033[0;90m'
NC='\033[0m' # No Color
BOLD='\033[1m'

# Function to get process name from PID
get_process_name() {
    local pid=$1
    if [ "$pid" != "-" ] && [ "$pid" != "0" ] && [ -n "$pid" ]; then
        ps -p "$pid" -o comm= 2>/dev/null || echo "unknown"
    else
        echo "-"
    fi
}

# Function to color-code based on state
get_state_color() {
    local state=$1
    case "$state" in
        "ESTABLISHED")
            echo "${GREEN}"
            ;;
        "LISTEN")
            echo "${BLUE}"
            ;;
        "TIME_WAIT")
            echo "${YELLOW}"
            ;;
        "CLOSE_WAIT")
            echo "${MAGENTA}"
            ;;
        "SYN_SENT"|"SYN_RCVD")
            echo "${CYAN}"
            ;;
        "FIN_WAIT"*)
            echo "${GRAY}"
            ;;
        "CLOSING"|"LAST_ACK")
            echo "${RED}"
            ;;
        *)
            echo "${WHITE}"
            ;;
    esac
}

# Function to split address into IP and port
split_address() {
    local addr=$1
    local ip=""
    local port=""
    
    if [[ "$addr" == "*"* ]]; then
        ip="*"
        port="*"
    elif [[ "$addr" =~ ^([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\.([0-9]+)$ ]]; then
        # IPv4 address with port (format: x.x.x.x.port)
        ip="${match[1]}"
        port="${match[2]}"
    elif [[ "$addr" =~ ^(.*):([0-9]+)$ ]]; then
        # Handle IPv6 format or hostname:port
        ip="${match[1]}"
        port="${match[2]}"
    elif [[ "$addr" =~ ^(.*)\.(well-known|[a-z]+)$ ]]; then
        # Handle named services
        ip="${match[1]}"
        port="${match[2]}"
    else
        ip="$addr"
        port="-"
    fi
    
    echo "$ip|$port"
}

# Function to check if connection matches filters
matches_filter() {
    local remote_ip=$1
    local remote_port=$2
    
    # Check port filter
    if [ -n "$FILTER_PORT" ] && [ "$remote_port" != "$FILTER_PORT" ]; then
        return 1
    fi
    
    # Check IP filter
    if [ -n "$FILTER_IP" ]; then
        # Handle partial IP matching
        if [[ "$remote_ip" != *"$FILTER_IP"* ]]; then
            return 1
        fi
    fi
    
    return 0
}

# Function to display connections
show_connections() {
    clear
    
    # Header
    echo -e "${BOLD}${WHITE}=== Network Connections Monitor ===${NC}"
    echo -e "${BOLD}${WHITE}$(date '+%Y-%m-%d %H:%M:%S')${NC}"
    
    # Show active filters
    if [ -n "$FILTER_PORT" ] || [ -n "$FILTER_IP" ]; then
        echo -e "${YELLOW}Active Filters:${NC}"
        [ -n "$FILTER_PORT" ] && echo -e "  Remote Port: ${BOLD}$FILTER_PORT${NC}"
        [ -n "$FILTER_IP" ] && echo -e "  Remote IP: ${BOLD}$FILTER_IP${NC}"
    fi
    echo ""
    
    # Legend
    echo -e "${BOLD}Color Legend:${NC}"
    echo -e "  ${GREEN}●${NC} ESTABLISHED    ${BLUE}●${NC} LISTEN         ${YELLOW}●${NC} TIME_WAIT"
    echo -e "  ${CYAN}●${NC} SYN_SENT/RCVD  ${MAGENTA}●${NC} CLOSE_WAIT     ${RED}●${NC} CLOSING/LAST_ACK"
    echo -e "  ${GRAY}●${NC} FIN_WAIT       ${WHITE}●${NC} OTHER/UDP"
    echo ""
    
    # Table header
    printf "${BOLD}%-6s %-22s %-22s %-7s %-12s %-8s %-30s${NC}\n" \
        "PROTO" "LOCAL ADDRESS" "REMOTE IP" "R.PORT" "STATE" "PID" "PROCESS"
    echo "$(printf '%.0s-' {1..120})"
    
    # Temporary file for storing connections
    TMPFILE=$(mktemp)
    
    # Get TCP connections with netstat
    # Note: On macOS, we need sudo to see process info for all connections
    if command -v sudo >/dev/null 2>&1; then
        # Try with sudo first (will show all processes)
        sudo netstat -anp tcp 2>/dev/null | grep -E '^tcp' > "$TMPFILE" 2>/dev/null || \
        netstat -an -p tcp 2>/dev/null | grep -E '^tcp' > "$TMPFILE"
    else
        netstat -an -p tcp 2>/dev/null | grep -E '^tcp' > "$TMPFILE"
    fi
    
    # Process TCP connections
    while IFS= read -r line; do
        # Parse netstat output (macOS format)
        proto=$(echo "$line" | awk '{print $1}')
        local_addr=$(echo "$line" | awk '{print $4}')
        remote_addr=$(echo "$line" | awk '{print $5}')
        state=$(echo "$line" | awk '{print $6}')
        
        # Split remote address into IP and port
        IFS='|' read -r remote_ip remote_port <<< "$(split_address "$remote_addr")"
        
        # Apply filters
        if ! matches_filter "$remote_ip" "$remote_port"; then
            continue
        fi
        
        # Try to get PID using lsof for the local address
        if [[ "$local_addr" =~ ^([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\.([0-9]+)$ ]]; then
            port="${match[2]}"
        elif [[ "$local_addr" =~ '^\*\.([0-9]+)$' ]]; then
            port="${match[1]}"
        elif [[ "$local_addr" =~ ^([0-9a-f:]+)\.([0-9]+)$ ]]; then
            port="${match[2]}"
            # Use lsof to find the PID
            pid=$(sudo lsof -i TCP:$port -sTCP:$state 2>/dev/null | grep -v PID | head -1 | awk '{print $2}')
            if [ -z "$pid" ]; then
                pid="-"
                process="-"
            else
                process=$(get_process_name "$pid")
            fi
        else
            pid="-"
            process="-"
        fi
        
        # Get color based on state
        color=$(get_state_color "$state")
        
        # Format and print
        printf "${color}%-6s %-22s %-22s %-7s %-12s %-8s %-30s${NC}\n" \
            "$proto" \
            "${local_addr:0:22}" \
            "${remote_ip:0:22}" \
            "${remote_port:0:7}" \
            "$state" \
            "$pid" \
            "${process:0:30}"
    done < "$TMPFILE"
    
    # Get UDP connections
    echo ""
    if command -v sudo >/dev/null 2>&1; then
        sudo netstat -anp udp 2>/dev/null | grep -E '^udp' > "$TMPFILE" 2>/dev/null || \
        netstat -an -p udp 2>/dev/null | grep -E '^udp' > "$TMPFILE"
    else
        netstat -an -p udp 2>/dev/null | grep -E '^udp' > "$TMPFILE"
    fi
    
    # Process UDP connections
    while IFS= read -r line; do
        # Parse netstat output for UDP
        proto=$(echo "$line" | awk '{print $1}')
        local_addr=$(echo "$line" | awk '{print $4}')
        remote_addr=$(echo "$line" | awk '{print $5}')
        
        # Split remote address into IP and port
        IFS='|' read -r remote_ip remote_port <<< "$(split_address "$remote_addr")"
        
        # Apply filters
        if ! matches_filter "$remote_ip" "$remote_port"; then
            continue
        fi
        
        # UDP doesn't have state
        state="*"
        
        # Try to get PID using lsof for the local address
        if [[ "$local_addr" =~ ^([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\.([0-9]+)$ ]]; then
            port="${match[2]}"
        elif [[ "$local_addr" =~ '^\*\.([0-9]+)$' ]]; then
            port="${match[1]}"
        elif [[ "$local_addr" =~ ^([0-9a-f:]+)\.([0-9]+)$ ]]; then
            port="${match[2]}"
            # Use lsof to find the PID
            pid=$(sudo lsof -i UDP:$port 2>/dev/null | grep -v PID | head -1 | awk '{print $2}')
            if [ -z "$pid" ]; then
                pid="-"
                process="-"
            else
                process=$(get_process_name "$pid")
            fi
        else
            pid="-"
            process="-"
        fi
        
        # White color for UDP
        printf "${WHITE}%-6s %-22s %-22s %-7s %-12s %-8s %-30s${NC}\n" \
            "$proto" \
            "${local_addr:0:22}" \
            "${remote_ip:0:22}" \
            "${remote_port:0:7}" \
            "$state" \
            "$pid" \
            "${process:0:30}"
    done < "$TMPFILE"
    
    # Clean up
    rm -f "$TMPFILE"
    
    # Footer
    echo ""
    echo "$(printf '%.0s-' {1..120})"
    echo -e "${BOLD}Press Ctrl+C to exit${NC} | Refreshing every 5 seconds..."
    
    # Show filter hint if no filters active
    if [ -z "$FILTER_PORT" ] && [ -z "$FILTER_IP" ]; then
        echo -e "${GRAY}Tip: Use --port PORT or --ip IP to filter connections${NC}"
    fi
}

# Trap Ctrl+C to exit cleanly
trap 'echo -e "\n${BOLD}Exiting...${NC}"; exit 0' INT

# Main loop
echo -e "${BOLD}${CYAN}Starting Network Connection Monitor...${NC}"
echo -e "${YELLOW}Note: Run with sudo for complete process information${NC}"

# Show active filters on startup
if [ -n "$FILTER_PORT" ] || [ -n "$FILTER_IP" ]; then
    echo -e "${GREEN}Filtering enabled:${NC}"
    [ -n "$FILTER_PORT" ] && echo -e "  Remote Port: ${BOLD}$FILTER_PORT${NC}"
    [ -n "$FILTER_IP" ] && echo -e "  Remote IP: ${BOLD}$FILTER_IP${NC}"
fi

sleep 2

while true; do
    show_connections
    sleep 5
done
EOF

chmod +x ~/netmon.sh

Example Usuage:

# Show all connections
./netmon.sh

# Filter by port
./netmon.sh --port 443

# Filter by IP
./netmon.sh --ip 142.251

# Run with sudo for full process information
sudo ./netmon.sh --port 443
0
0

Mac OSX : Tracing which network interface will be used to route traffic to an IP/DNS address

If you have multiple connections on your device (and maybe you have a zero trust client installed); how do you find out which network interface on your device will be used to route the traffic?

Below is a route get request for googles DNS service:

$ route get 8.8.8.8

   route to: dns.google
destination: dns.google
    gateway: 100.64.0.1
  interface: utun3
      flags: <UP,GATEWAY,HOST,DONE,WASCLONED,IFSCOPE,IFREF>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1400         0

If you have multiple interfaces enabled, then the first item in the Service Order will be used. If you want to see the default interface for your device:

$ route -n get 0.0.0.0 | grep interface
  interface: en0

Lets go an see whats going on in my default interface:

$ netstat utun3 | grep ESTABLISHED
tcp4       0      0  100.64.0.1.65271       jnb02s11-in-f4.1.https ESTABLISHED
tcp4       0      0  100.64.0.1.65269       jnb02s02-in-f14..https ESTABLISHED
tcp4       0      0  100.64.0.1.65262       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65261       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65260       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65259       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65258       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65257       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65256       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65255       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65254       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65253       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65252       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65251       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65250       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65249       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65248       ec2-13-244-140-3.https ESTABLISHED
tcp4       0      0  100.64.0.1.65247       192.0.73.2.https       ESTABLISHED
0
0

Macbook Tip: iTerm2 clearing your scroll back history

I frequently forget this command shortcut, so this post is simply because I am lazy. To clear your history in iTerm press Command + K. Control + L only clears the screen, so as soon as you run the next command you will see the scroll back again.

If you want to view your command history (for terminal) type:

$ ls -a ~ | grep hist
.zsh_history
$ cat .zsh_history
0
0

Macbook: Check a DNS (web site) to see if basic email security has been setup (SPF, DKIM and DMARC)

There are three basic ways to secure email, these are: Sender Policy Framework (SPF), Domain Keys Identified Mail (DKIM), Domain-based Message Authentication, Reporting & Conformance (DMARC) definitions. Lets quickly discuss these before we talk about how to check if they have been setup:

SPF helps prevent spoofing by verifying the sender’s IP address

SPF (Sender Policy Framework) is a DNS record containing information about servers allowed to send emails from a specific domain (eg which servers can send emails from andrewbaker.ninja). 

With it, you can verify that messages coming from your domain are sent by mail servers and IP addresses authorized by you. This might be your email servers or servers of another company you use for your email sending. If SPF isn’t set, scammers can take advantage of it and send fake messages that look like they come from you. 

It’s important to remember that there can be only one SPF record for one domain. Within one SPF record, however, there can be several servers and IP addresses mentioned (for instance, if emails are sent from several mailing platforms).

DKIM shows that the email hasn’t been tampered with

DKIM (DomainKeys Identified Mail) adds a digital signature to the header of your email message, which the receiving email servers then check to ensure that the email content hasn’t changed. Like SPF, a DKIM record exists in the DNS.

DMARC provides reporting visibility on the prior controls

DMARC (Domain-based Message Authentication, Reporting & Conformance) defines how the recipient’s mail server should process incoming emails if they don’t pass the authentication check (either SPF, DKIM, or both).

Basically, if there’s a DKIM signature, and the sending server is found in the SPF records, the email is sent to the recipient’s inbox. 

If the message fails authentication, it’s processed according to the selected DMARC policy: none, reject, or quarantine.

  • Under the “none” policy, the receiving server doesn’t take any action if your emails fail authentication. It doesn’t impact your deliverability. But it also doesn’t protect you from scammers, so we don’t recommend setting it. Only by introducing stricter policies can you block them in the very beginning and let the world know you care about your customers and brand. 
  • Here, messages that come from your domain but don’t pass the DMARC check go to “quarantine.” In such a case, the provider is advised to send your email to the spam folder. 
  • Under the “reject” policy, the receiving server rejects all messages that don’t pass email authentication. This means such emails won’t reach an addressee and will result in a bounce.

The “reject” option is the most effective, but it’s better to choose it only if you are sure that everything is configured correctly.

Now that we’ve clarified all the terms, let’s see how you can check if you have an existing SPF record, DKIM record, and DMARC policy set in place.

1. First Lets Check if SPF is setup

$ dig txt google.com | grep "v=spf"
google.com.		3600	IN	TXT	"v=spf1 include:_spf.google.com ~all"

How to read SPF correctly

  • The “v=spf1” part shows that the record is of SPF type (version 1). 
  • The “include” part lists servers allowed to send emails for the domain. 
  • The “~all” part indicates that if any part of the sent message doesn’t match the record, the recipient server will likely decline it.

2. Next Lets Check if DKIM is setup

What is a DKIM record?

A DKIM record stores the DKIM public key — a randomized string of characters that is used to verify anything signed with the private key. Email servers query the domain’s DNS records to see the DKIM record and view the public key.

A DKIM record is really a DNS TXT (“text”) record. TXT records can be used to store any text that a domain administrator wants to associate with their domain. DKIM is one of many uses for this type of DNS record. (In some cases, domains have stored their DKIM records as CNAME records that point to the key instead; however, the official RFC requires these records to be TXT.)

Here is an example of a DKIM DNS TXT record:

NameTypeContentTTL
big-email._domainkey.example.comTXTv=DKIM1; p=76E629F05F70
9EF665853333
EEC3F5ADE69A
2362BECE4065
8267AB2FC3CB
6CBE
6000

Name

Unlike most DNS TXT records, DKIM records are stored under a specialized name, not just the name of the domain. DKIM record names follow this format:

[selector]._domainkey.[domain]

The selector is a specialized value issued by the email service provider used by the domain. It is included in the DKIM header to enable an email server to perform the required DKIM lookup in the DNS. The domain is the email domain name. ._domainkey. is included in all DKIM record names.

If you want to find the value of the selector, you can view this by selecting “Show Original” when you have the email open in gmail:

Once you are able to view the original email, perform a text search for “DKIM-Signature”. This DKIM-Signature contains an attribute ‘s=’, this is the DKIM selector being used for this domain. In the example below (an amazon email), we can see the DKIM selector is “jvxsykglqiaiibkijmhy37vqxh4mzqr6”. 

DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=jvxsykglqiaiibkijmhy37vqxh4mzqr6; d=amazon.com; t=1675842267; h=Date:From:Reply-To:To:Message-ID:Subject:MIME-Version:Content-Type; bh=BJxF0PCdQ4TBdiPcAK83Ah0Z65hMjsvFIWVgzM0O8b0=; b=NUSl8nwZ2aF6ULhIFOJPCANWEeuQNUrnym4hobbeNsB6PPTs2/9jJPFCEEjAh8/q s1l53Vv5qAGx0zO4PTjASyB/UVOZj5FF+LEgDJtUclQcnlNVegRSodaJUHRL3W2xNxa ckDYAnSPr8fTNLG287LPrtxvIL2n8LPOTZWclaGg=
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1675842267; h=Date:From:Reply-To:To:Message-ID:Subject:MIME-Version:Content-Type:Feedback-ID; bh=BJxF0PCdQ4TBdiPcAK83Ah0Z65hMjsvFIWVgzM0O8b0=; b=ivBW6HbegrrlOj7BIB293ZNNy6K8D008I3+wwXoNvZdrBI6SBhL+QmCvCE3Sx0Av qh2hWMJyJBkVVcVwJns8cq8sn6l3NTY7nfN0H5RmuFn/MK4UHJw1vkkzEKKWSDncgf9 6K3DyNhKooBGopkxDOhg/nU8ZX8paHKlD67q7klc=
Date: Wed, 8 Feb 2023 07:44:27 +0000

To look up the DKIM record, email servers use the DKIM selector provided by the email service provider, not just the domain name. Suppose example.com uses Big Email as their email service provider, and suppose Big Email uses the DKIM selector big-email. Most of example.com’s DNS records would be named example.com, but their DKIM DNS record would be under the name big-email._domainkey.example.com, which is listed in the example above.

Content

This is the part of the DKIM DNS record that lists the public key. In the example above, v=DKIM1 indicates that this TXT record should be interpreted as DKIM, and the public key is everything after p=.

Below we query the linuxincluded.com domain using the “dkim” selector.

$ dig TXT dkim._domainkey.linuxincluded.com

; <<>> DiG 9.10.6 <<>> TXT dkim._domainkey.linuxincluded.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45496
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;dkim._domainkey.linuxincluded.com. IN	TXT

;; ANSWER SECTION:
dkim._domainkey.linuxincluded.com. 3600	IN TXT	"v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDdLyUk58Chz538ZQE4PnZ1JqBiYkSVWp8F77QpVF2onPCM4W4BnVJWXDSCC+yn747XFKv+XkVwayLexUkiAga7hIw6GwOj0gplVjv2dirFCoKecS2jvvqXc6/O0hjVqYlTYXwiYFJMSptaBWoHEEOvpS7VWelnQB+1m3UHHPJRiQIDAQAB; s=email"

;; Query time: 453 msec
;; SERVER: 100.64.0.1#53(100.64.0.1)
;; WHEN: Thu Feb 02 13:39:40 SAST 2023
;; MSG SIZE  rcvd: 318

3. Finally Lets Check if DMARC is setup

What is a DMARC record?

A DMARC record stores a domain’s DMARC policy. DMARC records are stored in the Domain Name System (DNS) as DNS TXT records. A DNS TXT record can contain almost any text a domain administrator wants to associate with their domain. One of the ways DNS TXT records are used is to store DMARC policies.

(Note that a DMARC record is a DNS TXT record that contains a DMARC policy, not a specialized type of DNS record.)

Example.com’s DMARC policy might look like this:

NameTypeContentTTL
example.comTXTv=DMARC1; p=quarantine; adkim=r; aspf=r; rua=mailto:example@third-party-example.com;3260
$ dig txt _dmarc.google.com

; <<>> DiG 9.10.6 <<>> txt _dmarc.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16231
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_dmarc.google.com.		IN	TXT

;; ANSWER SECTION:
_dmarc.google.com.	300	IN	TXT	"v=DMARC1; p=reject; rua=mailto:mailauth-reports@google.com"

;; Query time: 209 msec
;; SERVER: 100.64.0.1#53(100.64.0.1)
;; WHEN: Thu Feb 02 13:42:03 SAST 2023
;; MSG SIZE  rcvd: 117
0
0

Macbook: Querying DNS using the Host Command

1. Find a list of IP addresses linked to a domain

To find the IP address for a particular domain, simply pass the target domain name as an argument after the host command.

$ host andrewbaker.ninja
andrewbaker.ninja has address 13.244.140.33

For a comprehensive lookup using the verbose mode, use -a or -v flag option.

$ host -a andrewbaker.ninja
Trying "andrewbaker.ninja"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45489
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;andrewbaker.ninja.		IN	ANY

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33
andrewbaker.ninja.	21600	IN	NS	ns-1254.awsdns-28.org.
andrewbaker.ninja.	21600	IN	NS	ns-1514.awsdns-61.org.
andrewbaker.ninja.	21600	IN	NS	ns-1728.awsdns-24.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-1875.awsdns-42.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-491.awsdns-61.com.
andrewbaker.ninja.	21600	IN	NS	ns-496.awsdns-62.com.
andrewbaker.ninja.	21600	IN	NS	ns-533.awsdns-02.net.
andrewbaker.ninja.	21600	IN	NS	ns-931.awsdns-52.net.
andrewbaker.ninja.	900	IN	SOA	ns-1363.awsdns-42.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

Received 396 bytes from 100.64.0.1#53 in 262 ms

The -a option is used to find all Domain records and Zone information. You can also notice the local DNS server address utilised for the lookup.

2. Reverse Lookup

The command below performs a reverse lookup on the IP address and displays the hostname or domain name.

$ host 13.244.140.33
33.140.244.13.in-addr.arpa domain name pointer ec2-13-244-140-33.af-south-1.compute.amazonaws.com.

3. To find Domain Name servers

Use the -t option to get the domain name servers. It’s used to specify the query type. Below we pass the -t argument to find nameservers of a specific domain. NS record specifies the authoritative nameservers.

$ host -t ns andrewbaker.ninja
andrewbaker.ninja name server ns-1254.awsdns-28.org.
andrewbaker.ninja name server ns-1514.awsdns-61.org.
andrewbaker.ninja name server ns-1728.awsdns-24.co.uk.
andrewbaker.ninja name server ns-1875.awsdns-42.co.uk.
andrewbaker.ninja name server ns-491.awsdns-61.com.
andrewbaker.ninja name server ns-496.awsdns-62.com.
andrewbaker.ninja name server ns-533.awsdns-02.net.
andrewbaker.ninja name server ns-931.awsdns-52.net.

4. To query certain nameserver for a specific domain

To query details about a specific authoritative domain name server, use the below command.

$ host google.com olga.ns.cloudflare.com
Using domain server:
Name: olga.ns.cloudflare.com
Address: 173.245.58.137#53
Aliases:

google.com has address 172.217.170.14
google.com has IPv6 address 2c0f:fb50:4002:804::200e
google.com mail is handled by 10 smtp.google.com.

5. To find domain MX records

To get a list of a domain’s MX ( Mail Exchanger ) records.

$ host -t MX google.com
google.com mail is handled by 10 smtp.google.com.

6. To find domain TXT records

To get a list of a domain’s TXT ( human-readable information about a domain server ) record.

$ host -t txt google.com
google.com descriptive text "docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com descriptive text "v=spf1 include:_spf.google.com ~all"
google.com descriptive text "google-site-verification=TV9-DBe4R80X4v0M4U_bd_J9cpOJM0nikft0jAgjmsQ"
google.com descriptive text "facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com descriptive text "atlassian-domain-verification=5YjTmWmjI92ewqkx2oXmBaD60Td9zWon9r6eakvHX6B77zzkFQto8PQ9QsKnbf4I"
google.com descriptive text "onetrust-domain-verification=de01ed21f2fa4d8781cbc3ffb89cf4ef"
google.com descriptive text "globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com descriptive text "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
google.com descriptive text "apple-domain-verification=30afIBcvSuDV2PLX"
google.com descriptive text "google-site-verification=wD8N7i1JTNTkezJ49swvWW48f8_9xveREV4oB-0Hf5o"
google.com descriptive text "webexdomainverification.8YX6G=6e6922db-e3e6-4a36-904e-a805c28087fa"
google.com descriptive text "MS=E4A68B9AB2BB9670BCE15412F62916164C0B20BB"

7. To find domain SOA record

To get a list of a domain’s Start of Authority record

$ host -t soa google.com
google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60

Use the command below to compare the SOA records from all authoritative nameservers for a particular zone (the specific portion of the DNS namespace).

$ host -C google.com
Nameserver 216.239.36.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.38.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.32.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.34.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60

8. To find domain CNAME records

CNAME stands for canonical name record. This DNS record is responsible for redirecting one domain to another, which means it maps the original domain name to an alias.

To find out the domain CNAME DNS records, use the below command.

$ host -t cname www.yahoo.com
www.yahoo.com is an alias for new-fp-shed.wg1.b.yahoo.com.
$ dig www.yahoo.com
]
; <<>> DiG 9.10.6 <<>> www.yahoo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45503
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.yahoo.com.			IN	A

;; ANSWER SECTION:
www.yahoo.com.		12	IN	CNAME	new-fp-shed.wg1.b.yahoo.com.
new-fp-shed.wg1.b.yahoo.com. 38	IN	A	87.248.100.215
new-fp-shed.wg1.b.yahoo.com. 38	IN	A	87.248.100.216

;; Query time: 128 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Jan 30 17:07:55 SAST 2023
;; MSG SIZE  rcvd: 106

In the above shown example CNAME entry, if you want to reach “www.yahoo.com”, your computer’s DNS resolver will first fire an address lookup for “www.yahoo.com“. Your resolver then sees that it was returned a CNAME record of “new-fp-shed.wg1.b.yahoo.com“, and in response it will now fire another lookup for “new-fp-shed.wg1.b.yahoo.com“. It will then be returned the A record. So its important to note here is that there are two separate and independent DNS lookups performed by the resolver in order to convert a CNAME into a usable A record.

9. To find domain TTL information

TTL Stands for Time to live. It is a part of the Domain Name Server. It is automatically set by an authoritative nameserver for each DNS record.

In simple words, TTL refers to how long a DNS server caches a record before refreshing the data. Use the below command to see the TTL information of a domain name (in the example below its 300 seconds/5 minutes).

$ host -v -t a andrewbaker.ninja
Trying "andrewbaker.ninja"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27738
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;andrewbaker.ninja.		IN	A

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33

Received 51 bytes from 8.8.8.8#53 in 253 ms
0
0

Hacking: Using a Macbook and Nikto to Scan your Local Network

Nikto is becoming one of my favourite tools. I like it because of its wide ranging use cases and its simplicity. So whats an example use case for Nikto? When I am bored right now and so I am going to hunt around my local network and see what I can find…

# First install Nikto
brew install nikto
# Now get my ipaddress range
ifconfig
# Copy my ipaddress into to ipcalculator to get my cidr block
eth0      Link encap:Ethernet  HWaddr 00:0B:CD:1C:18:5A
          inet addr:172.16.25.126  Bcast:172.16.25.63  Mask:255.255.255.224
          inet6 addr: fe80::20b:cdff:fe1c:185a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2341604 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2217673 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:293460932 (279.8 MiB)  TX bytes:1042006549 (993.7 MiB)
          Interrupt:185 Memory:f7fe0000-f7ff0000
# Get my Cidr range (brew install ipcalc)
ipcalc 172.16.25.126
cp363412:~ $ ipcalc 172.16.25.126
Address:   172.16.25.126        10101100.00010000.00011001. 01111110
Netmask:   255.255.255.0 = 24   11111111.11111111.11111111. 00000000
Wildcard:  0.0.0.255            00000000.00000000.00000000. 11111111
=>
Network:   172.16.25.0/24       10101100.00010000.00011001. 00000000
HostMin:   172.16.25.1          10101100.00010000.00011001. 00000001
HostMax:   172.16.25.254        10101100.00010000.00011001. 11111110
Broadcast: 172.16.25.255        10101100.00010000.00011001. 11111111
Hosts/Net: 254                   Class B, Private Internet
# Our NW range is "Network:   172.16.25.0/24"

Now lets pop across to nmap to get a list of active hosts in my network

# Now we run a quick nmap scan for ports 80 and 443 across the entire range looking for any hosts that respond and dump the results into a grepable file
nmap -p 80,433 172.16.25.0/24 -oG webhosts.txt
# View the list of hosts
cat webhosts.txt
$ cat webhosts.txt
# Nmap 7.93 scan initiated Wed Jan 25 20:17:42 2023 as: nmap -p 80,433 -oG webhosts.txt 172.16.25.0/26
Host: 172.16.25.0 ()	Status: Up
Host: 172.16.25.0 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.1 ()	Status: Up
Host: 172.16.25.1 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.2 ()	Status: Up
Host: 172.16.25.2 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.3 ()	Status: Up
Host: 172.16.25.3 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.4 ()	Status: Up
Host: 172.16.25.4 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.5 ()	Status: Up

Next we want to grep this webhost file and send all the hosts that responded to the port probe of to Nikto for scanning. To do this we can use some linux magic. First we cat to read the output stored in our webhosts.txt document. Next we use awk. This is a Linux tool that will help search for the patterns. In the command below we are asking it to look for “Up” (meaning the host is up). Then we tell it to print $2, which means to print out the second word in the line that we found the word “Up” on, i.e. to print the IP address. Finally, we send that data to a new file called niktoscan.txt.

cat webhosts.txt | awk '/Up$/{print $2}' | cat >> niktoscan.txt
cat niktoscan.txt
$ cat niktoscan.txt
172.16.25.0
172.16.25.1
172.16.25.2
172.16.25.3
172.16.25.4
172.16.25.5
172.16.25.6
172.16.25.7
172.16.25.8
172.16.25.9
172.16.25.10
...

Now let nikto do its stuff:

nikto -h niktoscan.txt -ssl >> niktoresults.txt
# Lets check what came back
cat niktoresults.txt
0
0