It focuses on IP blocking, firewalls, or bandwidth graphs, while real WordPress sites go down for a much simpler reason: expensive application paths get hit concurrently.
You do not need botnets, spoofed IPs, or illegal tools to test this.
You need to understand what to protect, why it matters, and how to prove your assumptions are actually true.
This article shows you how to test a WordPress site safely from a single MacBook and, critically, how to confirm that your CDN is actually protecting you rather than just sitting in front of an exposed origin.
The First Principle: What Actually Dies First in WordPress
WordPress does not fail because your network link is saturated.
It fails when one of these resources is exhausted:
PHP FPM workers
Database connections
CPU consumed by uncached PHP execution
Memory consumed by concurrent requests
Cache bypass paths being abused
Your goal is not to block IP addresses.
Your goal is to ensure these resources are never reachable by anonymous traffic at scale.
The Assets You Must Protect (In Order)
PHP Execution
Every request that reaches PHP consumes CPU, memory, and a worker slot.
If anonymous traffic can repeatedly execute PHP, denial is inevitable.
Database Connections
Most WordPress attacks are asymmetric.
A cheap HTTP request for the attacker triggers an expensive database query.
A few hundred concurrent database connections is enough to stall the site.
Cache Bypass Paths
Attackers deliberately choose paths that skip cache:
Search
Login
Admin AJAX
Query parameters
RSS feeds
Origin Exposure
If attackers can reach your origin directly, your CDN and WAF are ornamental.
High Risk WordPress Endpoints
/wp-login.php
/wp-admin/admin-ajax.php
/xmlrpc.php
/?s=anything
/feed/
Any uncached page with query parameters
Preconditions Before You Test
You should already have:
A CDN in front of WordPress
Full page caching enabled
Origin firewalled to CDN IP ranges only
XML RPC disabled or restricted
Low PHP worker limits
Philosophy of Safe Testing
All testing:
Runs from one MacBook
Uses valid HTTP
Does not spoof IPs
Does not amplify traffic
Tooling Setup on macOS
Install Homebrew if required.
Install curl, siege, hey, and optionally k6.
How to Test if Your CDN Is Bypassable
Identify the origin IP from hosting dashboards or historical records.
Test direct origin access by forcing the Host header.
If WordPress content loads, your CDN is bypassable.
Test high risk endpoints directly against the origin.
Compare headers from CDN versus origin.
If origin responds normally, isolation is broken.
Safe Application Level Tests
Test anonymous page access for cache behaviour.
Test admin-ajax abuse.
Test login rate limiting.
Test search cache bypass.
Test concurrency with hey.
Metrics That Matter
PHP worker queues
Database connections
CPU and memory
Cache hit ratios
Time to first byte
Hard Rule
Anonymous users must never reach PHP.
How to Harden WordPress
Enforce origin isolation at the firewall
Make WordPress static for anonymous users
Lock down admin-ajax
Disable or restrict XML RPC
Rate limit by behaviour, not IP
Fail fast at the origin
Final Reality Check
If your CDN can be bypassed, all other protection is cosmetic.
Real protection means the origin is unreachable except through the CDN.
A Complete Guide to Archiving, Restoring, and Querying Large Table Partitions
When dealing with multi-terabyte tables in Aurora PostgreSQL, keeping historical partitions online becomes increasingly expensive and operationally burdensome. This guide presents a complete solution for archiving partitions to S3 in Iceberg/Parquet format, restoring them when needed, and querying archived data directly via a Spring Boot API without database restoration.
1. Architecture Overview
The solution comprises three components:
Archive Script: Exports a partition from Aurora PostgreSQL to Parquet files organised in Iceberg table format on S3
Restore Script: Imports archived data from S3 back into a staging table for validation and migration to the main table
Query API: A Spring Boot application that reads Parquet files directly from S3, applying predicate pushdown for efficient filtering
This approach reduces storage costs by approximately 70 to 80 percent compared to keeping data in Aurora, while maintaining full queryability through the API layer.
This script reverses the archive operation by reading Parquet files from S3 and loading them into a staging table.
4.1 Restore Script
#!/usr/bin/env python3
# restore_partition.py
"""
Restore an archived partition from S3 back to Aurora PostgreSQL.
Usage:
python restore_partition.py \
--source-path s3://bucket/prefix/schema/table/partition_col=value \
--target-table transactions_staging \
--target-schema public
"""
import argparse
import json
import logging
import sys
from datetime import datetime
from typing import Dict, Any, List, Optional
from urllib.parse import urlparse
import boto3
import pandas as pd
import psycopg2
from psycopg2 import sql
from psycopg2.extras import execute_values
import pyarrow.parquet as pq
from config import DatabaseConfig
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s"
)
logger = logging.getLogger(__name__)
class PartitionRestorer:
"""Restores archived partitions from S3 to PostgreSQL."""
def __init__(
self,
db_config: DatabaseConfig,
source_path: str,
target_schema: str,
target_table: str,
create_table: bool = True,
batch_size: int = 10000
):
self.db_config = db_config
self.source_path = source_path
self.target_schema = target_schema
self.target_table = target_table
self.create_table = create_table
self.batch_size = batch_size
parsed = urlparse(source_path)
self.bucket = parsed.netloc
self.prefix = parsed.path.lstrip("/")
self.s3_client = boto3.client("s3")
def _load_schema_snapshot(self) -> Dict[str, Any]:
"""Load the schema snapshot from the archive."""
response = self.s3_client.get_object(
Bucket=self.bucket,
Key=f"{self.prefix}/schema_snapshot.json"
)
return json.loads(response["Body"].read())
def _load_iceberg_metadata(self) -> Dict[str, Any]:
"""Load Iceberg metadata."""
response = self.s3_client.get_object(
Bucket=self.bucket,
Key=f"{self.prefix}/metadata/v1.metadata.json"
)
return json.loads(response["Body"].read())
def _list_data_files(self) -> List[str]:
"""List all Parquet data files in the archive."""
data_prefix = f"{self.prefix}/data/"
files = []
paginator = self.s3_client.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=self.bucket, Prefix=data_prefix):
for obj in page.get("Contents", []):
if obj["Key"].endswith(".parquet"):
files.append(obj["Key"])
return sorted(files)
def _postgres_type_from_column_def(self, col: Dict[str, Any]) -> str:
"""Convert column definition to PostgreSQL type."""
data_type = col["data_type"]
if data_type == "character varying":
max_len = col.get("character_maximum_length")
if max_len:
return f"varchar({max_len})"
return "text"
if data_type == "numeric":
precision = col.get("numeric_precision")
scale = col.get("numeric_scale")
if precision and scale:
return f"numeric({precision},{scale})"
return "numeric"
return data_type
def _create_staging_table(
self,
schema_snapshot: Dict[str, Any],
conn: psycopg2.extensions.connection
) -> None:
"""Create the staging table based on archived schema."""
columns = schema_snapshot["columns"]
column_defs = []
for col in columns:
pg_type = self._postgres_type_from_column_def(col)
nullable = "" if col["is_nullable"] == "YES" else " NOT NULL"
column_defs.append(f' "{col["column_name"]}" {pg_type}{nullable}')
create_sql = f"""
DROP TABLE IF EXISTS {self.target_schema}.{self.target_table};
CREATE TABLE {self.target_schema}.{self.target_table} (
{chr(10).join(column_defs)}
)
"""
with conn.cursor() as cur:
cur.execute(create_sql)
conn.commit()
logger.info(f"Created staging table {self.target_schema}.{self.target_table}")
def _insert_batch(
self,
df: pd.DataFrame,
columns: List[str],
conn: psycopg2.extensions.connection
) -> int:
"""Insert a batch of records into the staging table."""
if df.empty:
return 0
for col in df.columns:
if pd.api.types.is_datetime64_any_dtype(df[col]):
df[col] = df[col].apply(
lambda x: x.isoformat() if pd.notna(x) else None
)
values = [tuple(row) for row in df[columns].values]
column_names = ", ".join(f'"{c}"' for c in columns)
insert_sql = f"""
INSERT INTO {self.target_schema}.{self.target_table} ({column_names})
VALUES %s
"""
with conn.cursor() as cur:
execute_values(cur, insert_sql, values, page_size=self.batch_size)
return len(values)
def restore(self) -> Dict[str, Any]:
"""Execute the restore operation."""
logger.info(f"Starting restore from {self.source_path}")
schema_snapshot = self._load_schema_snapshot()
metadata = self._load_iceberg_metadata()
data_files = self._list_data_files()
logger.info(f"Found {len(data_files)} data files to restore")
columns = [col["column_name"] for col in schema_snapshot["columns"]]
with psycopg2.connect(self.db_config.connection_string) as conn:
if self.create_table:
self._create_staging_table(schema_snapshot, conn)
total_records = 0
for file_key in data_files:
s3_uri = f"s3://{self.bucket}/{file_key}"
logger.info(f"Restoring from {file_key}")
response = self.s3_client.get_object(Bucket=self.bucket, Key=file_key)
table = pq.read_table(response["Body"])
df = table.to_pandas()
file_records = 0
for start in range(0, len(df), self.batch_size):
batch_df = df.iloc[start:start + self.batch_size]
inserted = self._insert_batch(batch_df, columns, conn)
file_records += inserted
conn.commit()
total_records += file_records
logger.info(f"Restored {file_records} records from {file_key}")
result = {
"status": "success",
"source": self.source_path,
"target": f"{self.target_schema}.{self.target_table}",
"total_records": total_records,
"files_processed": len(data_files),
"restored_at": datetime.utcnow().isoformat()
}
logger.info(f"Restore complete: {total_records} records")
return result
def main():
parser = argparse.ArgumentParser(
description="Restore archived partition from S3 to PostgreSQL"
)
parser.add_argument("--source-path", required=True, help="S3 path to archived partition")
parser.add_argument("--target-table", required=True, help="Target table name")
parser.add_argument("--target-schema", default="public", help="Target schema name")
parser.add_argument("--batch-size", type=int, default=10000, help="Insert batch size")
parser.add_argument("--no-create", action="store_true", help="Don't create table, assume it exists")
args = parser.parse_args()
db_config = DatabaseConfig.from_environment()
restorer = PartitionRestorer(
db_config=db_config,
source_path=args.source_path,
target_schema=args.target_schema,
target_table=args.target_table,
create_table=not args.no_create,
batch_size=args.batch_size
)
result = restorer.restore()
print(json.dumps(result, indent=2))
return 0
if __name__ == "__main__":
sys.exit(main())
5. SQL Operations for Partition Migration
Once data is restored to a staging table, you need SQL operations to validate and migrate it to the main table.
5.1 Schema Validation
-- Validate that staging table schema matches the main table
CREATE OR REPLACE FUNCTION validate_table_schemas(
p_source_schema TEXT,
p_source_table TEXT,
p_target_schema TEXT,
p_target_table TEXT
) RETURNS TABLE (
validation_type TEXT,
column_name TEXT,
source_value TEXT,
target_value TEXT,
is_valid BOOLEAN
) AS $$
BEGIN
-- Check column count
RETURN QUERY
SELECT
'column_count'::TEXT,
NULL::TEXT,
src.cnt::TEXT,
tgt.cnt::TEXT,
src.cnt = tgt.cnt
FROM
(SELECT COUNT(*)::INT AS cnt
FROM information_schema.columns
WHERE table_schema = p_source_schema
AND table_name = p_source_table) src,
(SELECT COUNT(*)::INT AS cnt
FROM information_schema.columns
WHERE table_schema = p_target_schema
AND table_name = p_target_table) tgt;
-- Check each column exists with matching type
RETURN QUERY
SELECT
'column_definition'::TEXT,
src.column_name,
src.data_type || COALESCE('(' || src.character_maximum_length::TEXT || ')', ''),
COALESCE(tgt.data_type || COALESCE('(' || tgt.character_maximum_length::TEXT || ')', ''), 'MISSING'),
src.data_type = COALESCE(tgt.data_type, '')
AND COALESCE(src.character_maximum_length, 0) = COALESCE(tgt.character_maximum_length, 0)
FROM
information_schema.columns src
LEFT JOIN
information_schema.columns tgt
ON tgt.table_schema = p_target_schema
AND tgt.table_name = p_target_table
AND tgt.column_name = src.column_name
WHERE
src.table_schema = p_source_schema
AND src.table_name = p_source_table
ORDER BY src.ordinal_position;
-- Check nullability
RETURN QUERY
SELECT
'nullability'::TEXT,
src.column_name,
src.is_nullable,
COALESCE(tgt.is_nullable, 'MISSING'),
src.is_nullable = COALESCE(tgt.is_nullable, '')
FROM
information_schema.columns src
LEFT JOIN
information_schema.columns tgt
ON tgt.table_schema = p_target_schema
AND tgt.table_name = p_target_table
AND tgt.column_name = src.column_name
WHERE
src.table_schema = p_source_schema
AND src.table_name = p_source_table
ORDER BY src.ordinal_position;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT * FROM validate_table_schemas('public', 'transactions_staging', 'public', 'transactions');
5.2 Comprehensive Validation Report
-- Generate a full validation report before migration
CREATE OR REPLACE FUNCTION generate_migration_report(
p_staging_schema TEXT,
p_staging_table TEXT,
p_target_schema TEXT,
p_target_table TEXT,
p_partition_column TEXT,
p_partition_value TEXT
) RETURNS TABLE (
check_name TEXT,
result TEXT,
details JSONB
) AS $$
DECLARE
v_staging_count BIGINT;
v_existing_count BIGINT;
v_schema_valid BOOLEAN;
BEGIN
-- Get staging table count
EXECUTE format(
'SELECT COUNT(*) FROM %I.%I',
p_staging_schema, p_staging_table
) INTO v_staging_count;
RETURN QUERY SELECT
'staging_record_count'::TEXT,
'INFO'::TEXT,
jsonb_build_object('count', v_staging_count);
-- Check for existing data in target partition
BEGIN
EXECUTE format(
'SELECT COUNT(*) FROM %I.%I WHERE %I = $1',
p_target_schema, p_target_table, p_partition_column
) INTO v_existing_count USING p_partition_value;
IF v_existing_count > 0 THEN
RETURN QUERY SELECT
'existing_partition_data'::TEXT,
'WARNING'::TEXT,
jsonb_build_object(
'count', v_existing_count,
'message', 'Target partition already contains data'
);
ELSE
RETURN QUERY SELECT
'existing_partition_data'::TEXT,
'OK'::TEXT,
jsonb_build_object('count', 0);
END IF;
EXCEPTION WHEN undefined_column THEN
RETURN QUERY SELECT
'partition_column_check'::TEXT,
'ERROR'::TEXT,
jsonb_build_object(
'message', format('Partition column %s not found', p_partition_column)
);
END;
-- Validate schemas match
SELECT bool_and(is_valid) INTO v_schema_valid
FROM validate_table_schemas(
p_staging_schema, p_staging_table,
p_target_schema, p_target_table
);
RETURN QUERY SELECT
'schema_validation'::TEXT,
CASE WHEN v_schema_valid THEN 'OK' ELSE 'ERROR' END::TEXT,
jsonb_build_object('schemas_match', v_schema_valid);
-- Check for null values in NOT NULL columns
RETURN QUERY
SELECT
'null_check_' || c.column_name,
CASE WHEN null_count > 0 THEN 'ERROR' ELSE 'OK' END,
jsonb_build_object('null_count', null_count)
FROM information_schema.columns c
CROSS JOIN LATERAL (
SELECT COUNT(*) as null_count
FROM (
SELECT 1
FROM information_schema.columns ic
WHERE ic.table_schema = p_staging_schema
AND ic.table_name = p_staging_table
AND ic.column_name = c.column_name
) x
) nc
WHERE c.table_schema = p_target_schema
AND c.table_name = p_target_table
AND c.is_nullable = 'NO';
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT * FROM generate_migration_report(
'public', 'transactions_staging',
'public', 'transactions',
'transaction_date', '2024-01'
);
5.3 Partition Migration
-- Migrate data from staging table to main table
CREATE OR REPLACE PROCEDURE migrate_partition_data(
p_staging_schema TEXT,
p_staging_table TEXT,
p_target_schema TEXT,
p_target_table TEXT,
p_partition_column TEXT,
p_partition_value TEXT,
p_delete_existing BOOLEAN DEFAULT FALSE,
p_batch_size INTEGER DEFAULT 50000
)
LANGUAGE plpgsql
AS $$
DECLARE
v_columns TEXT;
v_total_migrated BIGINT := 0;
v_batch_migrated BIGINT;
v_validation_passed BOOLEAN;
BEGIN
-- Validate schemas match
SELECT bool_and(is_valid) INTO v_validation_passed
FROM validate_table_schemas(
p_staging_schema, p_staging_table,
p_target_schema, p_target_table
);
IF NOT v_validation_passed THEN
RAISE EXCEPTION 'Schema validation failed. Run validate_table_schemas() for details.';
END IF;
-- Build column list
SELECT string_agg(quote_ident(column_name), ', ' ORDER BY ordinal_position)
INTO v_columns
FROM information_schema.columns
WHERE table_schema = p_staging_schema
AND table_name = p_staging_table;
-- Delete existing data if requested
IF p_delete_existing THEN
EXECUTE format(
'DELETE FROM %I.%I WHERE %I = $1',
p_target_schema, p_target_table, p_partition_column
) USING p_partition_value;
RAISE NOTICE 'Deleted existing data for partition % = %',
p_partition_column, p_partition_value;
END IF;
-- Migrate in batches using a cursor approach
LOOP
EXECUTE format($sql$
WITH to_migrate AS (
SELECT ctid
FROM %I.%I
WHERE NOT EXISTS (
SELECT 1 FROM %I.%I t
WHERE t.%I = $1
)
LIMIT $2
),
inserted AS (
INSERT INTO %I.%I (%s)
SELECT %s
FROM %I.%I s
WHERE s.ctid IN (SELECT ctid FROM to_migrate)
RETURNING 1
)
SELECT COUNT(*) FROM inserted
$sql$,
p_staging_schema, p_staging_table,
p_target_schema, p_target_table, p_partition_column,
p_target_schema, p_target_table, v_columns,
v_columns,
p_staging_schema, p_staging_table
) INTO v_batch_migrated USING p_partition_value, p_batch_size;
v_total_migrated := v_total_migrated + v_batch_migrated;
IF v_batch_migrated = 0 THEN
EXIT;
END IF;
RAISE NOTICE 'Migrated batch: % records (total: %)', v_batch_migrated, v_total_migrated;
COMMIT;
END LOOP;
RAISE NOTICE 'Migration complete. Total records migrated: %', v_total_migrated;
END;
$$;
-- Usage
CALL migrate_partition_data(
'public', 'transactions_staging',
'public', 'transactions',
'transaction_date', '2024-01',
TRUE, -- delete existing
50000 -- batch size
);
5.4 Attach Partition (for Partitioned Tables)
-- For natively partitioned tables, attach the staging table as a partition
CREATE OR REPLACE PROCEDURE attach_restored_partition(
p_staging_schema TEXT,
p_staging_table TEXT,
p_target_schema TEXT,
p_target_table TEXT,
p_partition_column TEXT,
p_partition_start TEXT,
p_partition_end TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
v_partition_name TEXT;
v_constraint_name TEXT;
BEGIN
-- Validate schemas match
IF NOT (
SELECT bool_and(is_valid)
FROM validate_table_schemas(
p_staging_schema, p_staging_table,
p_target_schema, p_target_table
)
) THEN
RAISE EXCEPTION 'Schema validation failed';
END IF;
-- Add constraint to staging table that matches partition bounds
v_constraint_name := p_staging_table || '_partition_check';
EXECUTE format($sql$
ALTER TABLE %I.%I
ADD CONSTRAINT %I
CHECK (%I >= %L AND %I < %L)
$sql$,
p_staging_schema, p_staging_table,
v_constraint_name,
p_partition_column, p_partition_start,
p_partition_column, p_partition_end
);
-- Validate constraint without locking
EXECUTE format($sql$
ALTER TABLE %I.%I
VALIDATE CONSTRAINT %I
$sql$,
p_staging_schema, p_staging_table,
v_constraint_name
);
-- Detach old partition if exists
v_partition_name := p_target_table || '_' || replace(p_partition_start, '-', '_');
BEGIN
EXECUTE format($sql$
ALTER TABLE %I.%I
DETACH PARTITION %I.%I
$sql$,
p_target_schema, p_target_table,
p_target_schema, v_partition_name
);
RAISE NOTICE 'Detached existing partition %', v_partition_name;
EXCEPTION WHEN undefined_table THEN
RAISE NOTICE 'No existing partition to detach';
END;
-- Rename staging table to partition name
EXECUTE format($sql$
ALTER TABLE %I.%I RENAME TO %I
$sql$,
p_staging_schema, p_staging_table,
v_partition_name
);
-- Attach as partition
EXECUTE format($sql$
ALTER TABLE %I.%I
ATTACH PARTITION %I.%I
FOR VALUES FROM (%L) TO (%L)
$sql$,
p_target_schema, p_target_table,
p_staging_schema, v_partition_name,
p_partition_start, p_partition_end
);
RAISE NOTICE 'Successfully attached partition % to %',
v_partition_name, p_target_table;
END;
$$;
-- Usage for range partitioned table
CALL attach_restored_partition(
'public', 'transactions_staging',
'public', 'transactions',
'transaction_date',
'2024-01-01', '2024-02-01'
);
5.5 Cleanup Script
-- Clean up after successful migration
CREATE OR REPLACE PROCEDURE cleanup_after_migration(
p_staging_schema TEXT,
p_staging_table TEXT,
p_verify_target_schema TEXT DEFAULT NULL,
p_verify_target_table TEXT DEFAULT NULL,
p_verify_count BOOLEAN DEFAULT TRUE
)
LANGUAGE plpgsql
AS $$
DECLARE
v_staging_count BIGINT;
v_target_count BIGINT;
BEGIN
IF p_verify_count AND p_verify_target_schema IS NOT NULL THEN
EXECUTE format(
'SELECT COUNT(*) FROM %I.%I',
p_staging_schema, p_staging_table
) INTO v_staging_count;
EXECUTE format(
'SELECT COUNT(*) FROM %I.%I',
p_verify_target_schema, p_verify_target_table
) INTO v_target_count;
IF v_target_count < v_staging_count THEN
RAISE WARNING 'Target count (%) is less than staging count (%). Migration may be incomplete.',
v_target_count, v_staging_count;
RETURN;
END IF;
END IF;
EXECUTE format(
'DROP TABLE IF EXISTS %I.%I',
p_staging_schema, p_staging_table
);
RAISE NOTICE 'Dropped staging table %.%', p_staging_schema, p_staging_table;
END;
$$;
-- Usage
CALL cleanup_after_migration(
'public', 'transactions_staging',
'public', 'transactions',
TRUE
);
6. Spring Boot Query API
This API allows querying archived data directly from S3 without restoring to the database.
Expected performance for a 1TB partition (compressed to ~150GB Parquet):
Query Type
Typical Latency
Point lookup (indexed column)
500ms to 2s
Range scan (10% selectivity)
5s to 15s
Full scan with aggregation
30s to 60s
8.3 Monitoring and Alerting
Implement these CloudWatch metrics for production use:
@Component
public class QueryMetrics {
private final MeterRegistry meterRegistry;
public void recordQuery(QueryResponse response) {
meterRegistry.counter("archive.query.count").increment();
meterRegistry.timer("archive.query.duration")
.record(response.executionTimeMs(), TimeUnit.MILLISECONDS);
meterRegistry.gauge("archive.query.records_scanned", response.totalScanned());
meterRegistry.gauge("archive.query.records_matched", response.totalMatched());
}
}
9. Conclusion
This solution provides a complete data lifecycle management approach for large Aurora PostgreSQL tables. The archive script efficiently exports partitions to the cost effective Iceberg/Parquet format on S3, while the restore script enables seamless data recovery when needed. The Spring Boot API bridges the gap by allowing direct queries against archived data, eliminating the need for restoration in many analytical scenarios.
Key benefits:
Cost reduction: 90 to 98 percent storage cost savings compared to keeping data in Aurora
Operational flexibility: Query archived data without restoration
Schema preservation: Full schema metadata maintained for reliable restores
Partition management: Clean attach/detach operations for partitioned tables
Predicate pushdown: Efficient filtering reduces data transfer and processing
The Iceberg format ensures compatibility with the broader data ecosystem, allowing tools like Athena, Spark, and Trino to query the same archived data when needed for more complex analytical workloads.
CVE-2024-3094 represents one of the most sophisticated supply chain attacks in recent history. Discovered in March 2024, this vulnerability embedded a backdoor into XZ Utils versions 5.6.0 and 5.6.1, allowing attackers to compromise SSH authentication on Linux systems. With a CVSS score of 10.0 (Critical), this attack demonstrates the extreme risks inherent in open source supply chains and the sophistication of modern cyber threats.
This article provides a technical deep dive into how the backdoor works, why it’s extraordinarily dangerous, and practical methods for detecting compromised systems remotely.
Table of Contents
What Makes This Vulnerability Exceptionally Dangerous
The Anatomy of the Attack
Technical Implementation of the Backdoor
Detection Methodology
Remote Scanning Tools and Techniques
Remediation Steps
Lessons for the Security Community
What Makes This Vulnerability Exceptionally Dangerous
Supply Chain Compromise at Scale
Unlike traditional vulnerabilities discovered through code audits or penetration testing, CVE-2024-3094 was intentionally inserted through a sophisticated social engineering campaign. The attacker, operating under the pseudonym “Jia Tan,” spent over two years building credibility in the XZ Utils open source community before introducing the malicious code.
This attack vector is particularly insidious for several reasons:
Trust Exploitation: Open source projects rely on volunteer maintainers who operate under enormous time pressure. By becoming a trusted contributor over years, the attacker bypassed the natural skepticism that would greet code from unknown sources.
Delayed Detection: The malicious code was introduced gradually through multiple commits, making it difficult to identify the exact point of compromise. The backdoor was cleverly hidden in test files and binary blobs that would escape cursory code review.
Widespread Distribution: XZ Utils is a fundamental compression utility used across virtually all Linux distributions. The compromised versions were integrated into Debian, Ubuntu, Fedora, and Arch Linux testing and unstable repositories, affecting potentially millions of systems.
The Perfect Backdoor
What makes this backdoor particularly dangerous is its technical sophistication:
Pre-authentication Execution: The backdoor activates before SSH authentication completes, meaning attackers can gain access without valid credentials.
Remote Code Execution: Once triggered, the backdoor allows arbitrary command execution with the privileges of the SSH daemon, typically running as root.
Stealth Operation: The backdoor modifies the SSH authentication process in memory, leaving minimal forensic evidence. Traditional log analysis would show normal SSH connections, even when the backdoor was being exploited.
Selective Targeting: The backdoor contains logic to respond only to specially crafted SSH certificates, making it difficult for researchers to trigger and analyze the malicious behavior.
Timeline and Near Miss
The timeline of this attack demonstrates how close the security community came to widespread compromise:
Late 2021: “Jia Tan” begins contributing to XZ Utils project
2022-2023: Builds trust through legitimate contributions and pressures maintainer Lasse Collin
February 2024: Backdoored versions 5.6.0 and 5.6.1 released
March 29, 2024: Andres Freund, a PostgreSQL developer, notices unusual SSH behavior during performance testing and discovers the backdoor
March 30, 2024: Public disclosure and emergency response
Had Freund not noticed the 500ms SSH delay during unrelated performance testing, this backdoor could have reached production systems across the internet. The discovery was, by the discoverer’s own admission, largely fortuitous.
The Anatomy of the Attack
Multi-Stage Social Engineering
The attack began long before any malicious code was written. The attacker needed to:
Establish Identity: Create a credible online persona with consistent activity patterns
Build Reputation: Make legitimate contributions to build trust
Apply Pressure: Create artificial urgency around maintainer succession
Gain Commit Access: Become a co-maintainer with direct repository access
This process took approximately two years, demonstrating extraordinary patience and planning. The attacker created multiple personas to add social pressure on the sole maintainer, suggesting burnout and need for help.
Code Insertion Strategy
The malicious code was inserted through several mechanisms:
Obfuscated Build Scripts: The backdoor was triggered through the build system rather than in the main source code. Modified build scripts would inject malicious code during compilation.
Binary Test Files: Large binary test files were added to the repository, containing encoded malicious payloads. These files appeared to be legitimate test data but actually contained the backdoor implementation.
Multi-Commit Obfuscation: The backdoor was introduced across multiple commits over several weeks, making it difficult to identify a single “smoking gun” commit.
Ifunc Abuse: The backdoor used GNU indirect function (ifunc) resolvers to hook into the SSH authentication process at runtime, modifying program behavior without changing the obvious code paths.
Technical Implementation of the Backdoor
How XZ Utils Connects to SSH
To understand the backdoor, we must first understand an unexpected dependency chain:
On many modern Linux distributions, the SSH daemon links against libsystemd for process notification and logging. The systemd library, in turn, links against liblzma for compression functionality. This creates an indirect but critical dependency: SSH loads XZ Utils’ compression library into its address space.
The attackers exploited this dependency chain to inject their backdoor into the SSH authentication process.
Stage 1: Build Time Injection
The attack begins during the XZ Utils build process:
# Simplified representation of the malicious build script
if test -f "$srcdir/tests/files/good-large_compressed.lzma"; then
# Extract and execute embedded script from "test file"
eval $(xz -dc "$srcdir/tests/files/good-large_compressed.lzma" | head -c 1024)
fi
The build script would:
Detect specific binary test files in the source tree
Decompress these files to extract shell commands
Execute the extracted commands during the build process
Modify the liblzma source code to include backdoor hooks
These binary test files appeared legitimate because they were valid LZMA compressed data. However, they contained carefully crafted payloads that would only activate under specific conditions.
Stage 2: Runtime Hooking
Once compiled with the malicious modifications, liblzma contains an ifunc resolver that executes early during library loading:
// Simplified representation of the hooking mechanism
void __attribute__((ifunc("resolve_function")))
hooked_function(void);
void* resolve_function(void) {
// Check if we're loaded by sshd
if (check_ssh_context()) {
// Install hooks into RSA authentication
hook_rsa_public_decrypt();
return (void*)backdoor_implementation;
}
return (void*)legitimate_implementation;
}
The ifunc resolver runs before main() executes, allowing the backdoor to:
Detect if it’s loaded by sshd (vs other programs using liblzma)
Locate RSA authentication functions in memory
Hook the RSA public key verification function
Replace it with the backdoor implementation
Stage 3: Authentication Bypass
When an SSH connection arrives, the hooked RSA verification function:
// Conceptual representation of the backdoor logic
int backdoor_rsa_verify(unsigned char *signature, RSA *key) {
// Check if signature contains magic bytes
if (signature_contains_trigger(signature)) {
// Extract and execute payload from certificate
char *command = decode_payload(signature);
// Execute with sshd privileges (typically root)
system(command);
// Return success to bypass authentication
return 1;
}
// Otherwise, perform normal verification
return original_rsa_verify(signature, key);
}
The backdoor:
Intercepts all SSH authentication attempts
Examines the RSA signature for special markers
If triggered, extracts commands from the certificate
Executes commands with root privileges
Returns success to complete the “authentication”
From the SSH server’s perspective, this appears as a normal successful authentication. The logs would show a legitimate connection from an authorized user, even though no valid credentials were presented.
Why Traditional Detection Fails
The backdoor was designed to evade common security measures:
No Network Signatures: The malicious traffic looks identical to normal SSH, using standard protocols and ports.
No File System Artifacts: The backdoor exists only in memory after library loading. No malicious files are written to disk during exploitation.
Clean Source Code: The primary liblzma source code remains clean. The modifications occur during build time and aren’t present in the repository’s main files.
Log Evasion: Successful backdoor authentication appears in logs as a normal SSH connection, complete with username and source IP.
Selective Activation: The backdoor only responds to specially crafted certificates, making it difficult to trigger during security research or scanning.
Detection Methodology
Since the backdoor operates at runtime and leaves minimal artifacts, detection focuses on behavioral analysis rather than signature matching.
Timing Based Detection
The most reliable detection method exploits an unintended side effect: the backdoor’s cryptographic operations introduce measurable timing delays.
1. TCP Connection: 10-50ms
2. SSH Banner Exchange: 50-200ms (slower due to ifunc hooks)
3. Key Exchange Init: 200-500ms (backdoor initialization overhead)
4. Authentication Ready: 500-1500ms total (cryptographic hooking delays)
The backdoor adds overhead in several places:
Library Loading: The ifunc resolver runs additional code during liblzma initialization
Memory Scanning: The backdoor searches process memory for authentication functions to hook
Hook Installation: Modifying function pointers and setting up trampolines takes time
Certificate Inspection: Every authentication attempt is examined for trigger signatures
These delays are consistent and measurable, even without triggering the actual backdoor functionality.
Detection Through Multiple Samples
A single timing measurement might be affected by network latency, server load, or other factors. However, the backdoor creates a consistent pattern:
Statistical Analysis:
Normal SSH server (10 samples):
- Mean: 180ms
- Std Dev: 25ms
- Variance: 625ms²
Backdoored SSH server (10 samples):
- Mean: 850ms
- Std Dev: 180ms
- Variance: 32,400ms²
The backdoored server shows both higher average timing and greater variance, as the backdoor’s overhead varies depending on system state and what initialization code paths execute.
Banner Analysis
While not definitive, certain configurations increase vulnerability likelihood:
High Risk Indicators:
Debian or Ubuntu distribution
OpenSSH version 9.6 or 9.7
Recent system updates in February-March 2024
systemd based initialization
SSH daemon with systemd notification enabled
Configuration Detection:
# SSH banner typically reveals:
SSH-2.0-OpenSSH_9.6p1 Debian-5ubuntu1
# Breaking down the information:
# OpenSSH_9.6p1 - Version commonly affected
# Debian-5ubuntu1 - Distribution and package version
Debian and Ubuntu were the primary targets because:
They quickly incorporated the backdoored versions into testing repositories
They use systemd, creating the sshd → libsystemd → liblzma dependency chain
They enable systemd notification in sshd by default
Library Linkage Analysis
On accessible systems, verifying SSH’s library dependencies provides definitive evidence:
For integration with existing security scanning workflows, an Nmap NSE script provides standardized vulnerability reporting. Nmap Scripting Engine (NSE) scripts are written in Lua and leverage Nmap’s network scanning capabilities. Understanding NSE Script Structure NMAP NSE scripts follow a specific structure that integrates with Nmap’s scanning engine. Create the React2Shell detection script with:
cat > react2shell-detect.nse << 'EOF'
local shortport = require "shortport"
local stdnse = require "stdnse"
local ssh1 = require "ssh1"
local ssh2 = require "ssh2"
local string = require "string"
local nmap = require "nmap"
description = [[
Detects potential React2Shell (CVE-2024-3094) backdoor vulnerability in SSH servers.
This script tests for the backdoored XZ Utils vulnerability by:
1. Analyzing SSH banner information
2. Measuring authentication timing anomalies
3. Testing for unusual SSH handshake behavior
4. Detecting timing delays characteristic of the backdoor
]]
author = "Security Researcher"
license = "Same as Nmap"
categories = {"vuln", "safe", "intrusive"}
portrule = shortport.port_or_service(22, "ssh", "tcp", "open")
-- Timing thresholds (in milliseconds)
local HANDSHAKE_NORMAL = 200
local HANDSHAKE_SUSPICIOUS = 500
local AUTH_NORMAL = 300
local AUTH_SUSPICIOUS = 800
action = function(host, port)
local output = stdnse.output_table()
local vuln_table = {
title = "React2Shell SSH Backdoor (CVE-2024-3094)",
state = "NOT VULNERABLE",
risk_factor = "Critical",
references = {
"https://nvd.nist.gov/vuln/detail/CVE-2024-3094",
"https://www.openwall.com/lists/oss-security/2024/03/29/4"
}
}
local script_args = {
timeout = tonumber(stdnse.get_script_args(SCRIPT_NAME .. ".timeout")) or 10,
auth_threshold = tonumber(stdnse.get_script_args(SCRIPT_NAME .. ".auth-threshold")) or AUTH_SUSPICIOUS
}
local socket = nmap.new_socket()
socket:set_timeout(script_args.timeout * 1000)
local detection_results = {}
local suspicious_count = 0
-- Test 1: SSH Banner and Initial Handshake
local start_time = nmap.clock_ms()
local status, err = socket:connect(host, port)
if not status then
return nil
end
local banner_status, banner = socket:receive_lines(1)
local handshake_time = nmap.clock_ms() - start_time
if not banner_status then
socket:close()
return nil
end
detection_results["SSH Banner"] = banner:gsub("[\r\n]", "")
detection_results["Handshake Time"] = string.format("%dms", handshake_time)
if handshake_time > HANDSHAKE_SUSPICIOUS then
detection_results["Handshake Analysis"] = string.format("SUSPICIOUS (%dms > %dms)",
handshake_time, HANDSHAKE_SUSPICIOUS)
suspicious_count = suspicious_count + 1
else
detection_results["Handshake Analysis"] = "Normal"
end
socket:close()
-- Test 2: Authentication Timing Probe
socket = nmap.new_socket()
socket:set_timeout(script_args.timeout * 1000)
status = socket:connect(host, port)
if not status then
output["Detection Results"] = detection_results
return output
end
socket:receive_lines(1)
local client_banner = "SSH-2.0-OpenSSH_9.0_Nmap_Scanner\r\n"
socket:send(client_banner)
start_time = nmap.clock_ms()
local kex_status, kex_data = socket:receive()
local auth_time = nmap.clock_ms() - start_time
socket:close()
detection_results["Auth Probe Time"] = string.format("%dms", auth_time)
if auth_time > script_args.auth_threshold then
detection_results["Auth Analysis"] = string.format("SUSPICIOUS (%dms > %dms)",
auth_time, script_args.auth_threshold)
suspicious_count = suspicious_count + 2
else
detection_results["Auth Analysis"] = "Normal"
end
-- Banner Analysis
local banner_lower = banner:lower()
if banner_lower:match("debian") or banner_lower:match("ubuntu") then
detection_results["Distribution"] = "Debian/Ubuntu (higher risk)"
if banner_lower:match("openssh_9%.6") or banner_lower:match("openssh_9%.7") then
detection_results["Version Note"] = "OpenSSH version commonly affected"
suspicious_count = suspicious_count + 1
end
end
vuln_table["Detection Results"] = detection_results
if suspicious_count >= 3 then
vuln_table.state = "LIKELY VULNERABLE"
vuln_table["Confidence"] = "HIGH"
elseif suspicious_count >= 2 then
vuln_table.state = "POSSIBLY VULNERABLE"
vuln_table["Confidence"] = "MEDIUM"
elseif suspicious_count >= 1 then
vuln_table.state = "SUSPICIOUS"
vuln_table["Confidence"] = "LOW"
end
vuln_table["Indicators Found"] = string.format("%d suspicious indicators", suspicious_count)
if vuln_table.state ~= "NOT VULNERABLE" then
vuln_table["Recommendation"] = [[
1. Verify XZ Utils version on target
2. Check if SSH daemon links to liblzma
3. Review SSH authentication logs
4. Consider isolating system pending investigation
]]
end
return vuln_table
end
EOF
For enterprise environments with Security Information and Event Management systems:
#!/bin/bash
# SIEM integration script
SYSLOG_SERVER="siem.company.com"
SYSLOG_PORT=514
scan_and_log() {
local host=$1
local port=${2:-22}
result=$(./ssh_backdoor_scanner.py "$host" -p "$port" 2>&1)
if echo "$result" | grep -q "VULNERABLE"; then
severity="CRITICAL"
priority=2
elif echo "$result" | grep -q "SUSPICIOUS"; then
severity="WARNING"
priority=4
else
severity="INFO"
priority=6
fi
# Send to syslog
logger -n "$SYSLOG_SERVER" -P "$SYSLOG_PORT" \
-p "local0.$priority" \
-t "react2shell-scan" \
"[$severity] CVE-2024-3094 scan: host=$host:$port result=$severity"
}
# Scan from asset inventory
while read server; do
scan_and_log $server
done < asset_inventory.txt
Remediation Steps
Immediate Response for Vulnerable Systems
When a system is identified as potentially compromised:
Step 1: Verify the Finding
# Connect to the system (if possible)
ssh admin@suspicious-server
# Check XZ version
xz --version
# Look for: xz (XZ Utils) 5.6.0 or 5.6.1
# Verify SSH linkage
ldd $(which sshd) | grep liblzma
# If present, check version:
# readlink -f /lib/x86_64-linux-gnu/liblzma.so.5
Step 2: Assess Potential Compromise
# Review authentication logs
grep -E 'Accepted|Failed' /var/log/auth.log | tail -100
# Check for suspicious authentication patterns
# - Successful authentications without corresponding key/password attempts
# - Authentications from unexpected source IPs
# - User accounts that shouldn't have SSH access
# Review active sessions
w
last -20
# Check for unauthorized SSH keys
find /home -name authorized_keys -exec cat {} \;
find /root -name authorized_keys -exec cat {} \;
# Look for unusual processes
ps auxf | less
Step 3: Immediate Containment
If compromise is suspected:
# Isolate the system from network
# Save current state for forensics first
netstat -tupan > /tmp/netstat_snapshot.txt
ps auxf > /tmp/process_snapshot.txt
# Then block incoming SSH
iptables -I INPUT -p tcp --dport 22 -j DROP
# Or shutdown SSH entirely
systemctl stop ssh
Step 4: Remediation
For systems with the vulnerable version but no evidence of compromise:
# Debian/Ubuntu systems
apt-get update
apt-get install --only-upgrade xz-utils
# Verify the new version
xz --version
# Should show 5.4.x or 5.5.x
# Alternative: Explicit downgrade
apt-get install xz-utils=5.4.5-0.3
# Restart SSH to unload old library
systemctl restart ssh
Step 5: Post Remediation Verification
# Verify library version
readlink -f /lib/x86_64-linux-gnu/liblzma.so.5
# Should NOT be 5.6.0 or 5.6.1
# Confirm SSH no longer shows timing anomalies
# Run scanner again from remote system
./ssh_backdoor_scanner.py remediated-server.com
# Monitor for a period
tail -f /var/log/auth.log
System Hardening Post Remediation
After removing the backdoor, implement additional protections:
This attack highlights critical vulnerabilities in the open source ecosystem:
Maintainer Burnout: Many critical projects rely on volunteer maintainers working in isolation. The XZ Utils maintainer was a single individual managing a foundational library with limited resources and support.
Trust But Verify: The security community must develop better mechanisms for verifying not just code contributions, but also the contributors themselves. Multi-year social engineering campaigns can bypass traditional code review.
Automated Analysis: Build systems and binary artifacts must receive the same scrutiny as source code. The XZ backdoor succeeded partly because attention focused on C source files while malicious build scripts and test files went unexamined.
Dependency Awareness: Understanding indirect dependency chains is critical. Few would have identified XZ Utils as SSH-related, yet this unexpected connection enabled the attack.
Detection Strategy Evolution
The fortuitous discovery of this backdoor through performance testing suggests the security community needs new approaches:
Behavioral Baselining: Systems should establish performance baselines for critical services. Deviations, even subtle ones, warrant investigation.
Timing Analysis: Side-channel attacks aren’t just theoretical concerns. Timing differences can reveal malicious code even when traditional signatures fail.
Continuous Monitoring: Point-in-time security assessments miss time-based attacks. Continuous behavioral monitoring can detect anomalies as they emerge.
Cross-Discipline Collaboration: The backdoor was discovered by a database developer doing performance testing, not a security researcher. Encouraging collaboration across disciplines improves security outcomes.
Infrastructure Recommendations
Organizations should implement:
Binary Verification: Don’t just verify source code. Ensure build processes are deterministic and reproducible. Compare binaries across different build environments.
Runtime Monitoring: Deploy tools that can detect unexpected library loading, function hooking, and behavioral anomalies in production systems.
Network Segmentation: Limit the blast radius of compromised systems through proper network segmentation and access controls.
Incident Response Preparedness: Have procedures ready for supply chain compromises, including rapid version rollback and system isolation capabilities.
The Role of Timing in Security
This attack demonstrates the importance of performance analysis in security:
Performance as Security Signal: Unexplained performance degradation should trigger security investigation, not just performance optimization.
Side Channel Awareness: Developers should understand that any observable behavior, including timing, can reveal system state and potential compromise.
Benchmark Everything: Establish performance baselines for critical systems and alert on deviations.
Conclusion
CVE-2024-3094 represents a watershed moment in supply chain security. The sophistication of the attack, spanning years of social engineering and technical preparation, demonstrates that determined adversaries can compromise even well-maintained open source projects.
The backdoor’s discovery was largely fortuitous, happening during unrelated performance testing just before the compromised versions would have reached production systems worldwide. This near-miss should serve as a wake-up call for the entire security community.
The detection tools and methodologies presented in this article provide practical means for identifying compromised systems. However, the broader lesson is that security requires constant vigilance, comprehensive monitoring, and a willingness to investigate subtle anomalies that might otherwise be dismissed as performance issues.
As systems become more complex and supply chains more intricate, the attack surface expands beyond traditional code vulnerabilities to include the entire software development and distribution process. Defending against such attacks requires not just better tools, but fundamental changes in how we approach trust, verification, and monitoring in software systems.
The React2Shell backdoor was detected and neutralized before widespread exploitation. The next supply chain attack may not be discovered so quickly, or so fortunately. The time to prepare is now.
Additional Resources
Technical References
National Vulnerability Database: https://nvd.nist.gov/vuln/detail/CVE-2024-3094
Technical Analysis by Sam James: https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
Detection Tools
The scanner tools discussed in this article are available for download and can be deployed in production environments for ongoing monitoring. They require no authentication to the target systems and work by analyzing observable timing behavior in the SSH handshake and authentication process.
These tools should be integrated into regular security scanning procedures alongside traditional vulnerability scanners and intrusion detection systems.
Indicators of Compromise
XZ Utils version 5.6.0 or 5.6.1 installed
SSH daemon (sshd) linking to liblzma library
Unusual SSH authentication timing (>800ms for auth probe)
High variance in SSH connection establishment times
Recent XZ Utils updates from February or March 2024
Debian or Ubuntu systems with systemd enabled SSH
OpenSSH versions 9.6 or 9.7 on Debian-based distributions
Recommended Actions
Scan all SSH-accessible systems for timing anomalies
Verify XZ Utils versions across your infrastructure
Review SSH authentication logs for suspicious patterns
Implement continuous monitoring for behavioral anomalies
Establish performance baselines for critical services
Develop incident response procedures for supply chain compromises
Consider additional SSH hardening measures
Review and audit all open source dependencies in your environment
In August 2023, a critical zero day vulnerability in the HTTP/2 protocol was disclosed that affected virtually every HTTP/2 capable web server and proxy. Known as HTTP/2 Rapid Reset (CVE 2023 44487), this vulnerability enabled attackers to launch devastating Distributed Denial of Service (DDoS) attacks with minimal resources. Google reported mitigating the largest DDoS attack ever recorded at the time (398 million requests per second) leveraging this technique.
Understanding this vulnerability and knowing how to test your infrastructure against it is crucial for maintaining a secure and resilient web presence. This guide provides a flexible testing tool specifically designed for macOS that uses hping3 for packet crafting with CIDR based source IP address spoofing capabilities.
What is HTTP/2 Rapid Reset?
The HTTP/2 Protocol Foundation
HTTP/2 introduced multiplexing, allowing multiple streams (requests/responses) to be sent concurrently over a single TCP connection. Each stream has a unique identifier and can be independently managed. To cancel a stream, HTTP/2 uses the RST_STREAM frame, which immediately terminates the stream and signals that no further processing is needed.
The Vulnerability Mechanism
The HTTP/2 Rapid Reset attack exploits the asymmetry between client cost and server cost:
Client cost: Sending a request followed immediately by a RST_STREAM frame is computationally trivial
Server cost: Processing the incoming request (parsing headers, routing, backend queries) consumes significant resources before the cancellation is received
An attacker can:
Open an HTTP/2 connection
Send thousands of requests with incrementing stream IDs
Immediately cancel each request with RST_STREAM frames
Repeat this cycle at extremely high rates
The server receives these requests and begins processing them. Even though the cancellation arrives milliseconds later, the server has already invested CPU, memory, and I/O resources. By sending millions of request cancel pairs per second, attackers can exhaust server resources with minimal bandwidth.
Why It’s So Effective
Traditional rate limiting and DDoS mitigation techniques struggle against Rapid Reset attacks because:
Low bandwidth usage: The attack uses minimal data (mostly HTTP/2 frames with small headers)
Valid protocol behavior: RST_STREAM is a legitimate HTTP/2 mechanism
Connection reuse: Attackers multiplex thousands of streams over relatively few connections
Amplification: Each cheap client operation triggers expensive server side processing
How to Guard Against HTTP/2 Rapid Reset
1. Update Your Software Stack
Immediate Priority: Ensure all HTTP/2 capable components are patched:
Web Servers:
Nginx 1.25.2+ or 1.24.1+
Apache HTTP Server 2.4.58+
Caddy 2.7.4+
LiteSpeed 6.0.12+
Reverse Proxies and Load Balancers:
HAProxy 2.8.2+ or 2.6.15+
Envoy 1.27.0+
Traefik 2.10.5+
CDN and Cloud Services:
CloudFlare (auto patched August 2023)
AWS ALB/CloudFront (patched)
Azure Front Door (patched)
Google Cloud Load Balancer (patched)
Application Servers:
Tomcat 10.1.13+, 9.0.80+
Jetty 12.0.1+, 11.0.16+, 10.0.16+
Node.js 20.8.0+, 18.18.0+
2. Implement Stream Limits
Configure strict limits on HTTP/2 stream behavior:
Note: This reduces performance benefits but eliminates the vulnerability.
Testing Script for HTTP/2 Rapid Reset Vulnerabilities on macOS
Below is a parameterized Python script that tests your web servers using hping3 for packet crafting. This script is specifically optimized for macOS and can spoof source IP addresses from a CIDR block to simulate distributed attacks. Using hping3 ensures IP spoofing works consistently across different network environments.
Gradual escalation test (start small, increase if needed):
# Start with 50 packets
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 50
# If server handles it well, increase
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 200
# Final aggressive test
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000
Interpreting Results
The script outputs packet statistics including:
Total packets sent (SYN and RST combined)
Number of SYN packets
Number of RST packets
Failed packet count
Number of unique source IPs used
Average packet rate
Test duration
What to Monitor
Monitor your target server for:
Connection state table exhaustion: Check netstat or ss output for connection counts
CPU and memory utilization spikes: Use Activity Monitor or top command
Application performance degradation: Monitor response times and error rates
Firewall or rate limiting triggers: Check firewall logs and rate limiting counters
Protected Server Indicators
High failure rate in the test results
Server actively blocking or rate limiting connections
Firewall rules triggering during test
Connection resets from the server
Vulnerable Server Indicators
All packets successfully sent with low failure rate
No rate limiting or blocking observed
Server continues processing all requests
Resource utilization climbs steadily
Why hping3 for macOS?
Using hping3 provides several advantages for macOS users:
Universal IP Spoofing Support
Consistent behavior: hping3 provides reliable IP spoofing across different network configurations
Proven tool: Industry standard for packet crafting and network testing
Better compatibility: Works with most network interfaces and routing configurations
macOS Specific Benefits
Native support: Works well with macOS network stack
Firewall compatibility: Better integration with macOS firewall
Performance: Efficient packet generation on macOS
Reliability Advantages
Mature codebase: hping3 has been battle tested for decades
Active community: Well documented with extensive community support
Cross platform: Same tool works on Linux, BSD, and macOS
macOS Installation and Setup
Installing hping3
# Using Homebrew (recommended)
brew install hping
# Verify installation
which hping3
hping3 --version
Firewall Configuration
macOS firewall may need configuration for raw packet injection:
Open System Preferences > Security & Privacy > Firewall
Click “Firewall Options”
Add Python to allowed applications
Grant network access when prompted
Alternatively, for testing environments:
# Temporarily disable firewall (not recommended for production)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate off
# Re-enable after testing
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on
Network Interfaces
List available network interfaces:
ifconfig
Common macOS interfaces:
en0: Primary Ethernet/WiFi
en1: Secondary network interface
lo0: Loopback interface
bridge0: Bridged interface (if using virtualization)
Best Practices for Testing
Start with staging/test environments: Never run aggressive tests against production without authorization
Coordinate with your team: Inform security and operations teams before testing
Monitor server metrics: Watch CPU, memory, and connection counts during tests
Test during low traffic periods: Minimize impact on real users if testing production
Gradual escalation: Start with conservative parameters and increase gradually
Document results: Keep records of test results and any configuration changes
Have rollback plans: Be prepared to quickly disable testing if issues arise
Firewall blocking: Temporarily disable firewall or add exception
Interface not active: Check ifconfig output
Permission issues: Ensure running with sudo
Wrong interface: Specify interface with hping3 using i flag
Low Packet Rate
Performance optimization tips:
Use wired Ethernet instead of WiFi
Close other network intensive applications
Reduce packet rate target with --packetrate
Use smaller CIDR blocks
Monitoring Your Tests
Using tcpdump
Monitor packets in real time:
# Watch SYN packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-syn != 0' -n
# Watch RST packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-rst != 0' -n
# Watch specific host and port
sudo tcpdump -i en0 host example.com and port 443 -n
# Save to file for later analysis
sudo tcpdump -i en0 -w test_capture.pcap host example.com
Using Wireshark
For detailed packet analysis:
# Install Wireshark
brew install --cask wireshark
# Run Wireshark
sudo wireshark
# Or use tshark for command line
tshark -i en0 -f "host example.com"
Activity Monitor
Monitor system resources during testing:
Open Activity Monitor (Applications > Utilities > Activity Monitor)
Select “Network” tab
Watch “Packets in” and “Packets out”
Monitor “Data sent/received”
Check CPU usage of Python process
Server Side Monitoring
On your target server, monitor:
# Connection states
netstat -an | grep :443 | awk '{print $6}' | sort | uniq -c
# Active connections count
netstat -an | grep ESTABLISHED | wc -l
# SYN_RECV connections
netstat -an | grep SYN_RECV | wc -l
# System resources
top -l 1 | head -10
Understanding IP Spoofing with hping3
How It Works
hping3 creates raw packets at the network layer, allowing you to specify arbitrary source IP addresses. This bypasses normal TCP/IP stack restrictions.
Network Requirements
For IP spoofing to work effectively:
Local networks: Works best on LANs you control
Direct routing: Requires direct layer 2 access
No NAT interference: NAT devices may rewrite source addresses
Router configuration: Some routers filter spoofed packets (BCP 38)
Testing Without Spoofing
If IP spoofing is not working in your environment:
# Test without CIDR block
sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000
# This still validates:
# - Rate limiting configuration
# - Stream management
# - Server resilience
# - Resource consumption patterns
The HTTP/2 Rapid Reset vulnerability represents a significant threat to web infrastructure, but with proper patching, configuration, and monitoring, you can effectively protect your systems. This macOS optimized testing script using hping3 allows you to validate your defenses in a controlled manner with reliable IP spoofing capabilities across different network environments.
Remember that security is an ongoing process. Regularly:
Update your web server and proxy software
Review and adjust HTTP/2 configuration limits
Monitor for unusual traffic patterns
Test your defenses against emerging threats
By staying vigilant and proactive, you can maintain a resilient web presence capable of withstanding sophisticated DDoS attacks.
This blog post and testing script are provided for educational and defensive security purposes only. Always obtain proper authorization before testing systems you do not own.
NMAP (Network Mapper) is one of the most powerful and versatile network scanning tools available for security professionals, system administrators, and ethical hackers. When combined with Claude through the Model Context Protocol (MCP), it becomes an even more powerful tool, allowing you to leverage AI to intelligently analyze scan results, suggest scanning strategies, and interpret complex network data.
In this deep dive, we’ll explore how to set up NMAP with Claude Desktop using an MCP server, and demonstrate 20+ comprehensive vulnerability checks and reconnaissance techniques you can perform using natural language prompts.
Legal Disclaimer: Only scan systems and networks you own or have explicit written permission to test. Unauthorized scanning may be illegal in your jurisdiction.
Prerequisites
macOS, Linux, or Windows with WSL
Basic understanding of networking concepts
Permission to scan target systems
Claude Desktop installed
Part 1: Installation and Setup
Step 1: Install NMAP
On macOS:
# Using Homebrew brew install nmap
# Verify installation
On Linux (Ubuntu/Debian):
Step 2: Install Node.js (Required for MCP Server)
The NMAP MCP server requires Node.js to run.
Mac OS:
brew install node
node --version
npm --version
Step 3: Install the NMAP MCP Server
The most popular NMAP MCP server is available on GitHub. We’ll install it globally:
cd ~/
rm -rf nmap-mcp-server
git clone https://github.com/PhialsBasement/nmap-mcp-server.git
cd nmap-mcp-server
npm install
npm run build
Step 4: Configure Claude Desktop
Edit the Claude Desktop configuration file to add the NMAP MCP server.
with open(config_file, 'w') as f: json.dump(config, f, indent=2)
print("nmap server added to Claude Desktop config!") print(f"Backup saved to: {config_file}.backup") EOF
Step 5: Restart Claude Desktop
Close and reopen Claude Desktop. You should see the NMAP MCP server connected in the bottom-left corner.
Part 2: Understanding NMAP MCP Capabilities
Once configured, Claude can execute NMAP scans through the MCP server. The server typically provides:
Host discovery scans
Port scanning (TCP/UDP)
Service version detection
OS detection
Script scanning (NSE – NMAP Scripting Engine)
Output parsing and interpretation
Part 3: 20 Most Common Vulnerability Checks
For these examples, we’ll use a hypothetical target domain: example-target.com (replace with your authorized target).
1. Basic Host Discovery and Open Ports
Prompt:
Scan example-target.com to discover if the host is up and identify all open ports (1-1000). Use a TCP SYN scan for speed.
What this does: Performs a fast SYN scan on the first 1000 ports to quickly identify open services.
Expected NMAP command:
nmap -sS -p 1-1000 example-target.com
2. Comprehensive Port Scan (All 65535 Ports)
Prompt:
Perform a comprehensive scan of all 65535 TCP ports on example-target.com to identify any services running on non-standard ports.
What this does: Scans every possible TCP port – time-consuming but thorough.
Expected NMAP command:
nmap -p- example-target.com
3. Service Version Detection
Prompt:
Scan the top 1000 ports on example-target.com and detect the exact versions of services running on open ports. This will help identify outdated software.
What this does: Probes open ports to determine service/version info, crucial for finding known vulnerabilities.
Expected NMAP command:
nmap -sV example-target.com
4. Operating System Detection
Prompt:
Detect the operating system running on example-target.com using TCP/IP stack fingerprinting. Include OS detection confidence levels.
What this does: Analyzes network responses to guess the target OS.
Expected NMAP command:
nmap -O example-target.com
5. Aggressive Scan (OS + Version + Scripts + Traceroute)
Prompt:
Run an aggressive scan on example-target.com that includes OS detection, version detection, script scanning, and traceroute. This is comprehensive but noisy.
What this does: Combines multiple detection techniques for maximum information.
Expected NMAP command:
nmap -A example-target.com
6. Vulnerability Scanning with NSE Scripts
Prompt:
Scan example-target.com using NMAP's vulnerability detection scripts to check for known CVEs and security issues in running services.
What this does: Uses NSE scripts from the ‘vuln’ category to detect known vulnerabilities.
Expected NMAP command:
nmap --script vuln example-target.com
7. SSL/TLS Security Analysis
Prompt:
Analyze SSL/TLS configuration on example-target.com (port 443). Check for weak ciphers, certificate issues, and SSL vulnerabilities like Heartbleed and POODLE.
What this does: Comprehensive SSL/TLS security assessment.
Deep Dive Exercise 1: Complete Web Application Security Assessment
Scenario: You need to perform a comprehensive security assessment of a web application running at webapp.example-target.com.
Claude Prompt:
I need a complete security assessment of webapp.example-target.com. Please:
1. First, discover all open ports and running services
2. Identify the web server software and version
3. Check for SSL/TLS vulnerabilities and certificate issues
4. Test for common web vulnerabilities (XSS, SQLi, CSRF)
5. Check security headers (CSP, HSTS, X-Frame-Options, etc.)
6. Enumerate web directories and interesting files
7. Test for backup file exposure (.bak, .old, .zip)
8. Check for sensitive information in robots.txt and sitemap.xml
9. Test HTTP methods for dangerous verbs (PUT, DELETE, TRACE)
10. Provide a prioritized summary of findings with remediation advice
Use timing template T3 (normal) to avoid overwhelming the target.
What Claude will do:
Claude will execute multiple NMAP scans in sequence, starting with discovery and progressively getting more detailed. Example commands it might run:
How to interpret multiple scan results holistically
Prioritization of security findings by severity
Claude’s ability to correlate findings across multiple scans
Deep Dive Exercise 2: Network Perimeter Reconnaissance
Scenario: You’re assessing the security perimeter of an organization with the domain company.example-target.com and a known IP range 198.51.100.0/24.
Claude Prompt:
Perform comprehensive network perimeter reconnaissance for company.example-target.com (IP range 198.51.100.0/24). I need to:
1. Discover all live hosts in the IP range
2. For each live host, identify:
- Operating system
- All open ports (full 65535 range)
- Service versions
- Potential vulnerabilities
3. Map the network topology and identify:
- Firewalls and filtering
- DMZ hosts vs internal hosts
- Critical infrastructure (DNS, mail, web servers)
4. Test for common network misconfigurations:
- Open DNS resolvers
- Open mail relays
- Unauthenticated database access
- Unencrypted management protocols (Telnet, FTP)
5. Provide a network map and executive summary
Use slow timing (T2) to minimize detection risk and avoid false positives.
What Claude will do:
# Phase 1: Host Discovery
nmap -sn -T2 198.51.100.0/24
# Phase 2: OS Detection on Live Hosts
nmap -O -T2 198.51.100.0/24
# Phase 3: Comprehensive Port Scan (may suggest splitting into chunks)
nmap -p- -T2 198.51.100.0/24
# Phase 4: Service Version Detection
nmap -sV -T2 198.51.100.0/24
# Phase 5: Specific Service Checks
nmap -p 53 --script dns-recursion 198.51.100.0/24
nmap -p 25 --script smtp-open-relay 198.51.100.0/24
nmap -p 3306,5432,27017 --script mysql-empty-password,pgsql-brute,mongodb-databases 198.51.100.0/24
nmap -p 23,21 198.51.100.0/24
# Phase 6: Vulnerability Scanning on Critical Hosts
nmap --script vuln -T2 [critical-hosts]
Learning Outcomes:
Large-scale network scanning strategies
How to handle and analyze results from multiple hosts
Network segmentation analysis
Risk assessment across an entire network perimeter
Understanding firewall and filtering detection
Deep Dive Exercise 3: Advanced Vulnerability Research – Zero-Day Hunting
Scenario: You’ve discovered a host running potentially vulnerable services and want to do deep reconnaissance to identify potential zero-day vulnerabilities or chained exploits.
Claude Prompt:
I've found a server at secure-server.example-target.com that's running multiple services. I need advanced vulnerability research:
1. Aggressive version fingerprinting of all services
2. Check for version-specific CVEs in detected software
3. Look for unusual port combinations that might indicate custom applications
4. Test for default credentials on all identified services
5. Check for known backdoors in the detected software versions
6. Test for authentication bypass vulnerabilities
7. Look for information disclosure issues (version strings, debug info, error messages)
8. Test for timing attacks and race conditions
9. Analyze for possible exploit chains (e.g., LFI -> RCE)
10. Provide detailed analysis with CVSS scores and exploit availability
Run this aggressively (-T4) as we have permission for intensive testing.
Cross-reference detected versions with CVE databases
Explain potential exploit chains
Provide PoC (Proof of Concept) suggestions
Recommend remediation priorities
Suggest additional manual testing techniques
Learning Outcomes:
Advanced NSE scripting capabilities
How to correlate vulnerabilities for exploit chains
Understanding vulnerability severity and exploitability
Version-specific vulnerability research
Claude’s ability to provide context from its training data about specific CVEs
Part 5: Wide-Ranging Reconnaissance Exercises
Exercise 5.1: Subdomain Discovery and Mapping
Prompt:
Help me discover all subdomains of example-target.com and create a complete map of their infrastructure. For each subdomain found:
- Resolve its IP addresses
- Check if it's hosted on the same infrastructure
- Identify the services running
- Note any interesting or unusual findings
Also check for common subdomain patterns like api, dev, staging, admin, etc.
What this reveals: Shadow IT, forgotten dev servers, API endpoints, and the organization’s infrastructure footprint.
Exercise 5.2: API Security Testing
Prompt:
I've found an API at api.example-target.com. Please:
1. Identify the API type (REST, GraphQL, SOAP)
2. Discover all available endpoints
3. Test authentication mechanisms
4. Check for rate limiting
5. Test for IDOR (Insecure Direct Object References)
6. Look for excessive data exposure
7. Test for injection vulnerabilities
8. Check API versioning and test old versions for vulnerabilities
9. Verify CORS configuration
10. Test for JWT vulnerabilities if applicable
Exercise 5.3: Cloud Infrastructure Detection
Prompt:
Scan example-target.com to identify if they're using cloud infrastructure (AWS, Azure, GCP). Look for:
- Cloud-specific IP ranges
- S3 buckets or blob storage
- Cloud-specific services (CloudFront, Azure CDN, etc.)
- Misconfigured cloud resources
- Storage bucket permissions
- Cloud metadata services exposure
Exercise 5.4: IoT and Embedded Device Discovery
Prompt:
Scan the network 192.168.1.0/24 for IoT and embedded devices such as:
- IP cameras
- Smart TVs
- Printers
- Network attached storage (NAS)
- Home automation systems
- Industrial control systems (ICS/SCADA if applicable)
Check each device for:
- Default credentials
- Outdated firmware
- Unencrypted communications
- Exposed management interfaces
Exercise 5.5: Checking for Known Vulnerabilities and Old Software
Prompt:
Perform a comprehensive audit of example-target.com focusing on outdated and vulnerable software:
1. Detect exact versions of all running services
2. For each service, check if it's end-of-life (EOL)
3. Identify known CVEs for each version detected
4. Prioritize findings by:
- CVSS score
- Exploit availability
- Exposure (internet-facing vs internal)
5. Check for:
- Outdated TLS/SSL versions
- Deprecated cryptographic algorithms
- Unpatched web frameworks
- Old CMS versions (WordPress, Joomla, Drupal)
- Legacy protocols (SSLv3, TLS 1.0, weak ciphers)
6. Generate a remediation roadmap with version upgrade recommendations
A table of detected software with current versions and latest versions
CVE listings with severity scores
Specific upgrade recommendations
Risk assessment for each finding
Part 6: Advanced Tips and Techniques
6.1 Optimizing Scan Performance
Timing Templates:
-T0 (Paranoid): Extremely slow, for IDS evasion
-T1 (Sneaky): Slow, minimal detection risk
-T2 (Polite): Slower, less bandwidth intensive
-T3 (Normal): Default, balanced approach
-T4 (Aggressive): Faster, assumes good network
-T5 (Insane): Extremely fast, may miss results
Prompt:
Explain when to use each NMAP timing template and demonstrate the difference by scanning example-target.com with T2 and T4 timing.
6.2 Evading Firewalls and IDS
Prompt:
Scan example-target.com using techniques to evade firewalls and intrusion detection systems:
- Fragment packets
- Use decoy IP addresses
- Randomize scan order
- Use idle scan if possible
- Spoof MAC address (if on local network)
- Use source port 53 or 80 to bypass egress filtering
Help me create a custom NSE script that checks for a specific vulnerability in our custom application running on port 8080. The vulnerability is that the /debug endpoint returns sensitive configuration data without authentication.
Claude can help you write Lua scripts for NMAP’s scripting engine!
6.4 Output Parsing and Reporting
Prompt:
Scan example-target.com and save results in all available formats (normal, XML, grepable, script kiddie). Then help me parse the XML output to extract just the critical and high severity findings for a report.
Expected command:
nmap -oA scan_results example-target.com
Claude can then help you parse the XML file programmatically.
Part 7: Responsible Disclosure and Next Steps
After Finding Vulnerabilities
Document everything: Keep detailed records of your findings
Prioritize by risk: Use CVSS scores and business impact
Responsible disclosure: Follow the organization’s security policy
Remediation tracking: Help create an action plan
Verify fixes: Re-test after patches are applied
Using Claude for Post-Scan Analysis
Prompt:
I've completed my NMAP scans and found 15 vulnerabilities. Here are the results: [paste scan output].
Please:
1. Categorize by severity (Critical, High, Medium, Low, Info)
2. Explain each vulnerability in business terms
3. Provide remediation steps for each
4. Suggest a remediation priority order
5. Draft an executive summary for management
6. Create technical remediation tickets for the engineering team
Claude excels at translating technical scan results into actionable business intelligence.
Part 8: Continuous Monitoring with NMAP and Claude
Set up regular scanning routines and use Claude to track changes:
Prompt:
Create a baseline scan of example-target.com and save it. Then help me set up a cron job (or scheduled task) to run weekly scans and alert me to any changes in:
- New open ports
- Changed service versions
- New hosts discovered
- Changes in vulnerabilities detected
Conclusion
Combining NMAP’s powerful network scanning capabilities with Claude’s AI-driven analysis creates a formidable security assessment toolkit. The Model Context Protocol bridges these tools seamlessly, allowing you to:
Express complex scanning requirements in natural language
Get intelligent interpretation of scan results
Receive contextual security advice
Automate repetitive reconnaissance tasks
Learn security concepts through interactive exploration
Key Takeaways:
Always get permission before scanning any network or system
Start with gentle scans and progressively get more aggressive
Use timing controls to avoid overwhelming targets or triggering alarms
Correlate multiple scans for a complete security picture
Leverage Claude’s knowledge to interpret results and suggest next steps
Document everything for compliance and knowledge sharing
Keep NMAP updated to benefit from the latest scripts and capabilities
The examples provided in this guide demonstrate just a fraction of what’s possible when combining NMAP with AI assistance. As you become more comfortable with this workflow, you’ll discover new ways to leverage Claude’s understanding to make your security assessments more efficient and comprehensive.
About the Author: This guide was created to help security professionals and system administrators leverage AI assistance for more effective network reconnaissance and vulnerability assessment.
Modern sites often block plain curl. Using a real browser engine (Chromium via Playwright) gives you true browser behavior: real TLS/HTTP2 stack, cookies, redirects, and JavaScript execution if needed. This post mirrors the functionality of the original browser_curl.sh wrapper but implemented with Playwright. It also includes an optional Selenium mini-variant at the end.
What this tool does
Sends realistic browser headers (Chrome-like)
Uses Chromium’s real network stack (HTTP/2, compression)
Manages cookies (persist to a file)
Follows redirects by default
Supports JSON and form POSTs
Async mode that returns immediately
--count N to dispatch N async requests for quick load tests
Note: Advanced bot defenses (CAPTCHAs, JS/ML challenges, strict TLS/HTTP2 fingerprinting) may still require full page automation and real user-like behavior. Playwright can do that too by driving real pages.
Setup
Run these once to install Playwright and Chromium:
cat > pw_scrape.sh << 'EOF'
#!/usr/bin/env bash
URLS=(
"https://example.com/"
"https://example.com/"
"https://example.com/"
)
for url in "${URLS[@]}"; do
echo "Fetching: $url"
node browser_playwright.mjs -o "$(echo "$url" | sed 's#[/:]#_#g').html" "$url"
sleep 2
done
EOF
chmod +x pw_scrape.sh
./pw_scrape.sh
Health check monitoring:
cat > pw_health.sh << 'EOF'
#!/usr/bin/env bash
ENDPOINT="${1:-https://httpbin.org/status/200}"
while true; do
if node browser_playwright.mjs "$ENDPOINT" >/dev/null 2>&1; then
echo "$(date): Service healthy"
else
echo "$(date): Service unhealthy"
fi
sleep 30
done
EOF
chmod +x pw_health.sh
./pw_health.sh
Hanging or quoting issues: ensure your shell quoting is balanced. Prefer simple commands without complex inline quoting.
Verbose mode too noisy: omit -v in production.
Cookie file format: the script writes Playwright storageState JSON. It’s safe to keep or delete.
403 errors: site uses stronger protections. Drive a real page (Playwright page.goto) and interact, or solve CAPTCHAs where required.
Performance notes
Dispatch time depends on process spawn and Playwright startup. For higher throughput, consider reusing the same Node process to issue many requests (modify the script to loop internally) or use k6/Locust/Artillery for large-scale load testing.
Limitations
This CLI uses Playwright’s HTTP client bound to a Chromium context. It is much closer to real browsers than curl, but some advanced fingerprinting still detects automation.
WebSocket flows, MFA, or complex JS challenges generally require full page automation (which Playwright supports).
When to use what
Use this Playwright CLI when you need realistic browser behavior, cookies, and straightforward HTTP requests with quick async dispatch.
Use full Playwright page automation for dynamic content, complex logins, CAPTCHAs, and JS-heavy sites.
If you prefer Selenium, here’s a minimal GET/headers/redirect/cookie-capable script. Note: issuing cross-origin POST bodies is more ergonomic with Playwright’s request client; Selenium focuses on page automation.
You now have a Playwright-powered CLI that mirrors the original curl-wrapper’s ergonomics but uses a real browser engine, plus a minimal Selenium alternative. Use the CLI for realistic headers, cookies, redirects, JSON/form POSTs, and async dispatch with --count. For tougher sites, scale up to full page automation with Playwright.
Modern websites deploy bot defenses that can block plain curl or naive scripts. In many cases, adding the right browser-like headers, HTTP/2, cookie persistence, and compression gets you past basic filters without needing a full browser.
This post walks through a small shell utility, browser_curl.sh, that wraps curl with realistic browser behavior. It also supports “fire-and-forget” async requests and a --count flag to dispatch many requests at once for quick load tests.
What this script does
Sends browser-like headers (Chrome on macOS)
Uses HTTP/2 and compression
Manages cookies automatically (cookie jar)
Follows redirects by default
Supports JSON and form POSTs
Async mode that returns immediately
--count N to dispatch N async requests in one command
Note: This approach won’t solve advanced bot defenses that require JavaScript execution (e.g., Cloudflare Turnstile/CAPTCHAs or TLS/HTTP2 fingerprinting); for that, use a real browser automation tool like Playwright or Selenium.
The complete script
Save this as browser_curl.sh and make it executable in one command:
cat > browser_curl.sh << 'EOF' && chmod +x browser_curl.sh
#!/bin/bash
# browser_curl.sh - Advanced curl wrapper that mimics browser behavior
# Designed to bypass Cloudflare and other bot protection
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Default values
METHOD="GET"
ASYNC=false
COUNT=1
FOLLOW_REDIRECTS=true
SHOW_HEADERS=false
OUTPUT_FILE=""
TIMEOUT=30
DATA=""
CONTENT_TYPE=""
COOKIE_FILE="/tmp/browser_curl_cookies_$$.txt"
VERBOSE=false
# Browser fingerprint (Chrome on macOS)
USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
usage() {
cat << EOH
Usage: $(basename "$0") [OPTIONS] URL
Advanced curl wrapper that mimics browser behavior to bypass bot protection.
OPTIONS:
-X, --method METHOD HTTP method (GET, POST, PUT, DELETE, etc.) [default: GET]
-d, --data DATA POST/PUT data
-H, --header HEADER Add custom header (can be used multiple times)
-o, --output FILE Write output to file
-c, --cookie FILE Use custom cookie file [default: temp file]
-A, --user-agent UA Custom user agent [default: Chrome on macOS]
-t, --timeout SECONDS Request timeout [default: 30]
--async Run request asynchronously in background
--count N Number of async requests to fire [default: 1, requires --async]
--no-redirect Don't follow redirects
--show-headers Show response headers
--json Send data as JSON (sets Content-Type)
--form Send data as form-urlencoded
-v, --verbose Verbose output
-h, --help Show this help message
EXAMPLES:
# Simple GET request
$(basename "$0") https://example.com
# Async GET request
$(basename "$0") --async https://example.com
# POST with JSON data
$(basename "$0") -X POST --json -d '{"username":"test"}' https://api.example.com/login
# POST with form data
$(basename "$0") -X POST --form -d "username=test&password=secret" https://example.com/login
# Multiple async requests (using loop)
for i in {1..10}; do
$(basename "$0") --async https://example.com/api/endpoint
done
# Multiple async requests (using --count)
$(basename "$0") --async --count 10 https://example.com/api/endpoint
EOH
exit 0
}
# Parse arguments
EXTRA_HEADERS=()
URL=""
while [[ $# -gt 0 ]]; do
case $1 in
-X|--method)
METHOD="$2"
shift 2
;;
-d|--data)
DATA="$2"
shift 2
;;
-H|--header)
EXTRA_HEADERS+=("$2")
shift 2
;;
-o|--output)
OUTPUT_FILE="$2"
shift 2
;;
-c|--cookie)
COOKIE_FILE="$2"
shift 2
;;
-A|--user-agent)
USER_AGENT="$2"
shift 2
;;
-t|--timeout)
TIMEOUT="$2"
shift 2
;;
--async)
ASYNC=true
shift
;;
--count)
COUNT="$2"
shift 2
;;
--no-redirect)
FOLLOW_REDIRECTS=false
shift
;;
--show-headers)
SHOW_HEADERS=true
shift
;;
--json)
CONTENT_TYPE="application/json"
shift
;;
--form)
CONTENT_TYPE="application/x-www-form-urlencoded"
shift
;;
-v|--verbose)
VERBOSE=true
shift
;;
-h|--help)
usage
;;
*)
if [[ -z "$URL" ]]; then
URL="$1"
else
echo -e "${RED}Error: Unknown argument '$1'${NC}" >&2
exit 1
fi
shift
;;
esac
done
# Validate URL
if [[ -z "$URL" ]]; then
echo -e "${RED}Error: URL is required${NC}" >&2
usage
fi
# Validate count
if [[ "$COUNT" -gt 1 ]] && [[ "$ASYNC" == false ]]; then
echo -e "${RED}Error: --count requires --async${NC}" >&2
exit 1
fi
if ! [[ "$COUNT" =~ ^[0-9]+$ ]] || [[ "$COUNT" -lt 1 ]]; then
echo -e "${RED}Error: --count must be a positive integer${NC}" >&2
exit 1
fi
# Execute curl
execute_curl() {
# Build curl arguments as array instead of string
local -a curl_args=()
# Basic options
curl_args+=("--compressed")
curl_args+=("--max-time" "$TIMEOUT")
curl_args+=("--connect-timeout" "10")
curl_args+=("--http2")
# Cookies (ensure file exists to avoid curl warning)
: > "$COOKIE_FILE" 2>/dev/null || true
curl_args+=("--cookie" "$COOKIE_FILE")
curl_args+=("--cookie-jar" "$COOKIE_FILE")
# Follow redirects
if [[ "$FOLLOW_REDIRECTS" == true ]]; then
curl_args+=("--location")
fi
# Show headers
if [[ "$SHOW_HEADERS" == true ]]; then
curl_args+=("--include")
fi
# Output file
if [[ -n "$OUTPUT_FILE" ]]; then
curl_args+=("--output" "$OUTPUT_FILE")
fi
# Verbose
if [[ "$VERBOSE" == true ]]; then
curl_args+=("--verbose")
else
curl_args+=("--silent" "--show-error")
fi
# Method
curl_args+=("--request" "$METHOD")
# Browser-like headers
curl_args+=("--header" "User-Agent: $USER_AGENT")
curl_args+=("--header" "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8")
curl_args+=("--header" "Accept-Language: en-US,en;q=0.9")
curl_args+=("--header" "Accept-Encoding: gzip, deflate, br")
curl_args+=("--header" "Connection: keep-alive")
curl_args+=("--header" "Upgrade-Insecure-Requests: 1")
curl_args+=("--header" "Sec-Fetch-Dest: document")
curl_args+=("--header" "Sec-Fetch-Mode: navigate")
curl_args+=("--header" "Sec-Fetch-Site: none")
curl_args+=("--header" "Sec-Fetch-User: ?1")
curl_args+=("--header" "Cache-Control: max-age=0")
# Content-Type for POST/PUT
if [[ -n "$DATA" ]]; then
if [[ -n "$CONTENT_TYPE" ]]; then
curl_args+=("--header" "Content-Type: $CONTENT_TYPE")
fi
curl_args+=("--data" "$DATA")
fi
# Extra headers
for header in "${EXTRA_HEADERS[@]}"; do
curl_args+=("--header" "$header")
done
# URL
curl_args+=("$URL")
if [[ "$ASYNC" == true ]]; then
# Run asynchronously in background
if [[ "$VERBOSE" == true ]]; then
echo -e "${YELLOW}[ASYNC] Running $COUNT request(s) in background...${NC}" >&2
echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
fi
# Fire multiple requests if count > 1
local pids=()
for ((i=1; i<=COUNT; i++)); do
# Run in background detached, suppress all output
nohup curl "${curl_args[@]}" >/dev/null 2>&1 &
local pid=$!
disown $pid
pids+=("$pid")
done
if [[ "$COUNT" -eq 1 ]]; then
echo -e "${GREEN}[ASYNC] Request started with PID: ${pids[0]}${NC}" >&2
else
echo -e "${GREEN}[ASYNC] $COUNT requests started with PIDs: ${pids[*]}${NC}" >&2
fi
else
# Run synchronously
if [[ "$VERBOSE" == true ]]; then
echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
fi
curl "${curl_args[@]}"
local exit_code=$?
if [[ $exit_code -ne 0 ]]; then
echo -e "${RED}[ERROR] Request failed with exit code: $exit_code${NC}" >&2
return $exit_code
fi
fi
}
# Cleanup temp cookie file on exit (only if using default temp file)
cleanup() {
if [[ "$COOKIE_FILE" == "/tmp/browser_curl_cookies_$$"* ]] && [[ -f "$COOKIE_FILE" ]]; then
rm -f "$COOKIE_FILE"
fi
}
# Only set cleanup trap for synchronous requests
if [[ "$ASYNC" == false ]]; then
trap cleanup EXIT
fi
# Main execution
execute_curl
# For async requests, exit immediately without waiting
if [[ "$ASYNC" == true ]]; then
exit 0
fi
EOF
COOKIE_FILE="session_cookies.txt"
# Login and save cookies
./browser_curl.sh -c "$COOKIE_FILE" \
-X POST --form \
-d "user=test&pass=secret" \
https://example.com/login
# Authenticated request using saved cookies
./browser_curl.sh -c "$COOKIE_FILE" \
https://example.com/dashboard
#!/bin/bash
URLS=(
"https://example.com/page1"
"https://example.com/page2"
"https://example.com/page3"
)
for url in "${URLS[@]}"; do
echo "Fetching: $url"
./browser_curl.sh -o "$(basename "$url").html" "$url"
sleep 2 # Rate limiting
done
Example 3: Health check monitoring
#!/bin/bash
ENDPOINT="https://api.example.com/health"
while true; do
if ./browser_curl.sh "$ENDPOINT" | grep -q "healthy"; then
echo "$(date): Service healthy"
else
echo "$(date): Service unhealthy"
fi
sleep 30
done
Installing browser_curl to your PATH
If you want browser_curl.sh to be available anywhere then install it on your path using:
mkdir -p ~/.local/bin
echo "Installing browser_curl to ~/.local/bin/browser_curl"
install -m 0755 ~/Desktop/warp/browser_curl.sh ~/.local/bin/browser_curl
echo "Ensuring ~/.local/bin is on PATH via ~/.zshrc"
grep -q 'export PATH="$HOME/.local/bin:$PATH"' ~/.zshrc || \
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
echo "Reloading shell config (~/.zshrc)"
source ~/.zshrc
echo "Verifying browser_curl is on PATH"
command -v browser_curl && echo "browser_curl is installed and on PATH" || echo "browser_curl not found on PATH"
Troubleshooting
Issue: Hanging with dquote> prompt
Cause: Shell quoting issue (unbalanced quotes)
Solution: Use simple, direct commands
# Good
./browser_curl.sh --async https://example.com
# Bad (unbalanced quotes)
echo "test && ./browser_curl.sh --async https://example.com && echo "done"
Response validation – Assert status codes, content patterns
Metrics collection – Timing stats, success rates
Configuration file – Default settings per domain
Conclusion
browser_curl.sh provides a pragmatic middle ground between plain curl and full browser automation. For many APIs and websites with basic bot filters, browser-like headers, proper protocol use, and cookie handling are sufficient.
Key takeaways:
Simple wrapper around curl with realistic browser behavior
Async mode with --count for easy load testing
Works for basic bot detection, not advanced challenges
Combine with Playwright for tough targets
Lightweight and fast for everyday API work
The script is particularly useful for:
API development and testing
Quick load testing during development
Monitoring and health checks
Simple scraping tasks
Learning curl features
For production load testing at scale, consider tools like k6, Locust, or Artillery. For heavy web scraping with anti-bot measures, invest in proper browser automation infrastructure.
Annoying Apple never quite got around to making it easy to offload images from your iPhone to your Macbook. So below is a complete guide to automatically download photos and videos from your iPhone to your MacBook, with options to filter by pattern and date, and organize into folders by creation date.
Run the script below to create the file download-iphone-media.sh in your current directory:
#!/bin/bash
cat > download-iphone-media.sh << 'OUTER_EOF'
#!/bin/bash
# iPhone Media Downloader
# Downloads photos and videos from iPhone to MacBook
# Supports resumable, idempotent downloads
set -e
# Default values
PATTERN="*"
OUTPUT_DIR="."
ORGANIZE_BY_DATE=false
START_DATE=""
END_DATE=""
MOUNT_POINT="/tmp/iphone_mount"
STATE_DIR=""
VERIFY_CHECKSUM=true
# Usage function
usage() {
cat << 'INNER_EOF'
Usage: $0 [OPTIONS]
Download photos and videos from iPhone to MacBook.
OPTIONS:
-p PATTERN File pattern to match (e.g., "*.jpg", "*.mp4", "IMG_*")
Default: * (all files)
-o OUTPUT_DIR Output directory (default: current directory)
-d Organize files by creation date into YYYY/MMM folders
-s START_DATE Start date filter (YYYY-MM-DD)
-e END_DATE End date filter (YYYY-MM-DD)
-r Resume incomplete downloads (default: true)
-n Skip checksum verification (faster, less safe)
-h Show this help message
EXAMPLES:
# Download all photos and videos to current directory
$0
# Download only JPG files to ~/Pictures/iPhone
$0 -p "*.jpg" -o ~/Pictures/iPhone
# Download all media organized by date
$0 -d -o ~/Pictures/iPhone
# Download videos from specific date range
$0 -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/iPhone
# Download specific IMG files organized by date
$0 -p "IMG_*.{jpg,heic}" -d -o ~/Photos
INNER_EOF
exit 1
}
# Parse command line arguments
while getopts "p:o:ds:e:rnh" opt; do
case $opt in
p) PATTERN="$OPTARG" ;;
o) OUTPUT_DIR="$OPTARG" ;;
d) ORGANIZE_BY_DATE=true ;;
s) START_DATE="$OPTARG" ;;
e) END_DATE="$OPTARG" ;;
r) ;; # Resume is default, keeping for backward compatibility
n) VERIFY_CHECKSUM=false ;;
h) usage ;;
*) usage ;;
esac
done
# Create output directory if it doesn't exist
mkdir -p "$OUTPUT_DIR"
OUTPUT_DIR=$(cd "$OUTPUT_DIR" && pwd)
# Set up state directory for tracking downloads
STATE_DIR="$OUTPUT_DIR/.iphone_download_state"
mkdir -p "$STATE_DIR"
# Create mount point
mkdir -p "$MOUNT_POINT"
echo "=== iPhone Media Downloader ==="
echo "Pattern: $PATTERN"
echo "Output: $OUTPUT_DIR"
echo "Organize by date: $ORGANIZE_BY_DATE"
[ -n "$START_DATE" ] && echo "Start date: $START_DATE"
[ -n "$END_DATE" ] && echo "End date: $END_DATE"
echo ""
# Check if iPhone is connected
echo "Checking for iPhone connection..."
if ! ideviceinfo -s > /dev/null 2>&1; then
echo "Error: No iPhone detected. Please connect your iPhone and trust this computer."
exit 1
fi
# Mount iPhone
echo "Mounting iPhone..."
if ! ifuse "$MOUNT_POINT" 2>/dev/null; then
echo "Error: Failed to mount iPhone. Make sure you've trusted this computer on your iPhone."
exit 1
fi
# Cleanup function
cleanup() {
local exit_code=$?
echo ""
if [ $exit_code -ne 0 ]; then
echo "⚠ Download interrupted. Run the script again to resume."
fi
echo "Unmounting iPhone..."
umount "$MOUNT_POINT" 2>/dev/null || true
rmdir "$MOUNT_POINT" 2>/dev/null || true
}
trap cleanup EXIT
# Find DCIM folder
DCIM_PATH="$MOUNT_POINT/DCIM"
if [ ! -d "$DCIM_PATH" ]; then
echo "Error: DCIM folder not found on iPhone"
exit 1
fi
echo "Scanning for files matching pattern: $PATTERN"
echo ""
# Counter
TOTAL_FILES=0
COPIED_FILES=0
SKIPPED_FILES=0
RESUMED_FILES=0
FAILED_FILES=0
# Function to compute file checksum
compute_checksum() {
local file="$1"
if [ -f "$file" ]; then
shasum -a 256 "$file" 2>/dev/null | awk '{print $1}'
fi
}
# Function to get file size
get_file_size() {
local file="$1"
if [ -f "$file" ]; then
stat -f "%z" "$file" 2>/dev/null
fi
}
# Function to mark file as completed
mark_completed() {
local source_file="$1"
local dest_file="$2"
local checksum="$3"
local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
echo "$dest_file|$checksum|$(date +%s)" > "$state_file"
}
# Function to check if file was previously completed
is_completed() {
local source_file="$1"
local dest_file="$2"
local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
if [ ! -f "$state_file" ]; then
return 1
fi
# Read state file
local saved_dest saved_checksum saved_timestamp
IFS='|' read -r saved_dest saved_checksum saved_timestamp < "$state_file"
# Check if destination file exists and matches
if [ "$saved_dest" = "$dest_file" ] && [ -f "$dest_file" ]; then
if [ "$VERIFY_CHECKSUM" = true ]; then
local current_checksum=$(compute_checksum "$dest_file")
if [ "$current_checksum" = "$saved_checksum" ]; then
return 0
fi
else
# Without checksum verification, just check file exists
return 0
fi
fi
return 1
}
# Convert dates to timestamps for comparison
START_TIMESTAMP=""
END_TIMESTAMP=""
if [ -n "$START_DATE" ]; then
START_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$START_DATE" "+%s" 2>/dev/null || echo "")
if [ -z "$START_TIMESTAMP" ]; then
echo "Error: Invalid start date format. Use YYYY-MM-DD"
exit 1
fi
fi
if [ -n "$END_DATE" ]; then
END_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$END_DATE" "+%s" 2>/dev/null || echo "")
if [ -z "$END_TIMESTAMP" ]; then
echo "Error: Invalid end date format. Use YYYY-MM-DD"
exit 1
fi
# Add 24 hours to include the entire end date
END_TIMESTAMP=$((END_TIMESTAMP + 86400))
fi
# Process files
find "$DCIM_PATH" -type f | while read -r file; do
filename=$(basename "$file")
# Check if filename matches pattern (basic glob matching)
if [[ ! "$filename" == $PATTERN ]]; then
continue
fi
TOTAL_FILES=$((TOTAL_FILES + 1))
# Get file creation date
if command -v exiftool > /dev/null 2>&1; then
# Try to get date from EXIF data
CREATE_DATE=$(exiftool -s3 -DateTimeOriginal -d "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
if [ -z "$CREATE_DATE" ]; then
# Fallback to file modification time
CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
fi
else
# Use file modification time
CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
fi
# Extract date components
if [ -n "$CREATE_DATE" ]; then
FILE_DATE=$(echo "$CREATE_DATE" | cut -d' ' -f1)
FILE_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$FILE_DATE" "+%s" 2>/dev/null || echo "")
# Check date filters
if [ -n "$START_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -lt "$START_TIMESTAMP" ]; then
SKIPPED_FILES=$((SKIPPED_FILES + 1))
continue
fi
if [ -n "$END_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -ge "$END_TIMESTAMP" ]; then
SKIPPED_FILES=$((SKIPPED_FILES + 1))
continue
fi
# Determine output path with YYYY/MMM structure
if [ "$ORGANIZE_BY_DATE" = true ]; then
YEAR=$(echo "$FILE_DATE" | cut -d'-' -f1)
MONTH_NUM=$(echo "$FILE_DATE" | cut -d'-' -f2)
# Convert month number to 3-letter abbreviation
case "$MONTH_NUM" in
01) MONTH="Jan" ;;
02) MONTH="Feb" ;;
03) MONTH="Mar" ;;
04) MONTH="Apr" ;;
05) MONTH="May" ;;
06) MONTH="Jun" ;;
07) MONTH="Jul" ;;
08) MONTH="Aug" ;;
09) MONTH="Sep" ;;
10) MONTH="Oct" ;;
11) MONTH="Nov" ;;
12) MONTH="Dec" ;;
*) MONTH="Unknown" ;;
esac
DEST_DIR="$OUTPUT_DIR/$YEAR/$MONTH"
else
DEST_DIR="$OUTPUT_DIR"
fi
else
DEST_DIR="$OUTPUT_DIR"
fi
# Create destination directory
mkdir -p "$DEST_DIR"
# Determine destination path
DEST_PATH="$DEST_DIR/$filename"
# Check if this file was previously completed successfully
if is_completed "$file" "$DEST_PATH"; then
echo "✓ Already downloaded: $filename"
SKIPPED_FILES=$((SKIPPED_FILES + 1))
continue
fi
# Check if file already exists with same content (for backward compatibility)
if [ -f "$DEST_PATH" ]; then
if cmp -s "$file" "$DEST_PATH"; then
echo "✓ Already exists (identical): $filename"
# Mark as completed for future runs
SOURCE_CHECKSUM=$(compute_checksum "$DEST_PATH")
mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
SKIPPED_FILES=$((SKIPPED_FILES + 1))
continue
else
# Add timestamp to avoid overwriting different file
BASE="${filename%.*}"
EXT="${filename##*.}"
DEST_PATH="$DEST_DIR/${BASE}_$(date +%s).$EXT"
fi
fi
# Use temporary file for atomic copy
TEMP_PATH="${DEST_PATH}.tmp.$$"
# Copy to temporary file
echo "⬇ Downloading: $filename → $DEST_PATH"
if ! cp "$file" "$TEMP_PATH" 2>/dev/null; then
echo "✗ Failed to copy: $filename"
rm -f "$TEMP_PATH"
FAILED_FILES=$((FAILED_FILES + 1))
continue
fi
# Verify size matches (basic corruption check)
SOURCE_SIZE=$(get_file_size "$file")
TEMP_SIZE=$(get_file_size "$TEMP_PATH")
if [ "$SOURCE_SIZE" != "$TEMP_SIZE" ]; then
echo "✗ Size mismatch for $filename (source: $SOURCE_SIZE, copied: $TEMP_SIZE)"
rm -f "$TEMP_PATH"
FAILED_FILES=$((FAILED_FILES + 1))
continue
fi
# Compute checksum for verification and tracking
if [ "$VERIFY_CHECKSUM" = true ]; then
SOURCE_CHECKSUM=$(compute_checksum "$TEMP_PATH")
else
SOURCE_CHECKSUM="skipped"
fi
# Preserve timestamps
if [ -n "$CREATE_DATE" ]; then
touch -t $(date -j -f "%Y-%m-%d %H:%M:%S" "$CREATE_DATE" "+%Y%m%d%H%M.%S" 2>/dev/null) "$TEMP_PATH" 2>/dev/null || true
fi
# Atomic move from temp to final destination
if mv "$TEMP_PATH" "$DEST_PATH" 2>/dev/null; then
echo "✓ Completed: $filename"
# Mark as successfully completed
mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
COPIED_FILES=$((COPIED_FILES + 1))
else
echo "✗ Failed to finalize: $filename"
rm -f "$TEMP_PATH"
FAILED_FILES=$((FAILED_FILES + 1))
fi
done
echo ""
echo "=== Summary ==="
echo "Total files matching pattern: $TOTAL_FILES"
echo "Files downloaded: $COPIED_FILES"
echo "Files already present: $SKIPPED_FILES"
if [ $FAILED_FILES -gt 0 ]; then
echo "Files failed: $FAILED_FILES"
echo ""
echo "⚠ Some files failed to download. Run the script again to retry."
exit 1
fi
echo ""
echo "✓ Download complete! All files transferred successfully."
OUTER_EOF
echo "Making the script executable..."
chmod +x download-iphone-media.sh
echo "✓ Script created successfully: download-iphone-media.sh"
Usage Examples
Basic Usage
Download all photos and videos to the current directory:
./download-iphone-media.sh
Download with Date Organization
Organize files into folders by creation date (YYYY/MMM structure):
# Photos from January 2025
./download-iphone-media.sh -s 2025-01-01 -e 2025-01-31 -d -o ~/Pictures/January2025
# Photos from last week
./download-iphone-media.sh -s 2025-11-10 -e 2025-11-17 -o ~/Pictures/LastWeek
# Photos after a specific date
./download-iphone-media.sh -s 2025-11-01 -o ~/Pictures/Recent
Combined Filters
Combine multiple options for precise control:
# Download only videos from January 2025, organized by date
./download-iphone-media.sh -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/Vacation
# Download all HEIC photos from the last month, organized by date
./download-iphone-media.sh -p "*.heic" -s 2025-10-17 -e 2025-11-17 -d -o ~/Pictures/LastMonth
Features
Resumable & Idempotent Downloads
Crash recovery: Interrupted downloads can be resumed by running the script again
Atomic operations: Files are copied to temporary locations first, then moved atomically
State tracking: Maintains a hidden state directory (.iphone_download_state) to track completed files
Checksum verification: Uses SHA-256 checksums to verify file integrity (can be disabled with -n for speed)
No duplicates: Running the script multiple times won’t re-download existing files
Corruption detection: Validates file sizes and optionally checksums after copy
Date-Based Organization
Automatic folder structure: Creates YYYY/MMM folders based on photo creation date (e.g., 2025/Jan, 2025/Feb)
EXIF data support: Reads actual photo capture date from EXIF metadata when available
Fallback mechanism: Uses file modification time if EXIF data is unavailable
Fewer folders: Maximum 12 month folders per year instead of up to 365 day folders
Smart File Handling
Duplicate detection: Skips files that already exist with identical content
Conflict resolution: Adds timestamp suffix to filename if different file with same name exists
Timestamp preservation: Maintains original creation dates on copied files
Error tracking: Reports failed files and provides clear exit codes
Progress Feedback
Real-time progress updates showing each file being downloaded
Summary statistics at the end (total found, downloaded, skipped, failed)
Clear error messages for troubleshooting
Helpful resume instructions if interrupted
Common File Patterns
iPhone typically uses these file formats:
Type
Extensions
Pattern Example
Photos
.jpg, .heic
*.jpg or *.heic
Videos
.mov, .mp4
*.mov or *.mp4
Screenshots
.png
*.png
Live Photos
.heic, .mov
IMG_*.heic + IMG_*.mov
All media
all above
* (default)
5. Handling Interrupted Downloads
If a download is interrupted (disconnection, error, etc.), simply run the script again:
# Script was interrupted - just run it again
./download-iphone-media.sh -d -o ~/Pictures/iPhone
The script will:
Skip all successfully downloaded files
Retry any failed files
Continue from where it left off
6. Fast Mode (Skip Checksum Verification)
For faster transfers on reliable connections, disable checksum verification:
Note: This is generally safe but won’t detect corruption as thoroughly.
7. Clean State and Re-download
If you want to force a re-download of all files:
# Remove state directory to start fresh
rm -rf ~/Pictures/iPhone/.iphone_download_state
./download-iphone-media.sh -d -o ~/Pictures/iPhone
Troubleshooting
iPhone Not Detected
Error:No iPhone detected. Please connect your iPhone and trust this computer.
Solution:
Make sure your iPhone is connected via USB cable
Unlock your iPhone
Tap “Trust” when prompted on your iPhone
Run idevicepair pair if you haven’t already
Failed to Mount iPhone
Error:Failed to mount iPhone
Solution:
Try unplugging and reconnecting your iPhone
Check if another process is using the iPhone:umount /tmp/iphone_mount 2>/dev/null
Restart your iPhone and try again
On macOS Ventura or later, check System Settings → Privacy & Security → Files and Folders
Permission Denied
Solution: Make sure the script has executable permissions:
chmod +x download-iphone-media.sh
Missing Tools
Error: Commands not found
Solution: Install the required tools:
brew install libimobiledevice ifuse exiftool
FUSE-Related Errors on macOS
On newer macOS versions, you may need to install macFUSE:
brew install --cask macfuse
After installation, you may need to restart your Mac and allow the kernel extension in System Settings → Privacy & Security.
Tips and Best Practices
1. Regular Backups
Create a scheduled backup script:
#!/bin/bash
# Save as ~/bin/backup-iphone-photos.sh
DATE=$(date +%Y-%m-%d)
BACKUP_DIR=~/Pictures/iPhone-Backups/$DATE
./download-iphone-media.sh -d -o "$BACKUP_DIR"
echo "Backup completed to $BACKUP_DIR"
2. Incremental Downloads
The script is fully idempotent and tracks completed downloads, making it perfect for incremental backups:
# Run daily to get new photos - only new files will be downloaded
./download-iphone-media.sh -d -o ~/Pictures/iPhone
The script maintains state in .iphone_download_state/ within your output directory, ensuring:
Already downloaded files are skipped instantly (no re-copying)
Interrupted downloads can be resumed
File integrity is verified with checksums
3. Free Up iPhone Storage
After confirming successful download:
Verify files are on your MacBook
Check file counts match
Delete photos from iPhone via Photos app
Empty “Recently Deleted” album
4. Convert HEIC to JPG (Optional)
If you need JPG files for compatibility:
# Install ImageMagick
brew install imagemagick
# Convert all HEIC files to JPG
find ~/Pictures/iPhone -name "*.heic" -exec sh -c 'magick "$0" "${0%.heic}.jpg"' {} \;
How Idempotent Recovery Works
The script implements several mechanisms to ensure safe, resumable downloads:
1. State Tracking
A hidden directory .iphone_download_state/ is created in your output directory. For each successfully downloaded file, a state file is created containing:
Destination file path
SHA-256 checksum (if verification enabled)
Completion timestamp
2. Atomic Operations
Each file is downloaded using a two-phase commit:
Download Phase: Copy to temporary file (.tmp.$$ suffix)
Verification Phase: Check file size and optionally compute checksum
Commit Phase: Atomically move temp file to final destination
Record Phase: Write completion state
If the script is interrupted at any point, incomplete temporary files are cleaned up automatically.
3. Idempotent Behavior
When you run the script:
Before downloading each file, it checks the state directory
If a state file exists, it verifies the destination file still exists and matches the checksum
If verification passes, the file is skipped (no re-download)
If verification fails or no state exists, the file is downloaded
This means:
✓ Safe to run multiple times
✓ Interrupted downloads can be resumed
✓ Corrupted files are detected and re-downloaded
✓ No wasted bandwidth on already-downloaded files
4. Checksum Verification
By default, SHA-256 checksums are computed and verified:
During download: Checksum computed after copy completes
On resume: Existing files are verified against stored checksum
Optional: Use -n flag to skip checksums for speed (still verifies file sizes)
Example Recovery Scenario
# Start downloading 1000 photos
./download-iphone-media.sh -d -o ~/Pictures/iPhone
# Script is interrupted after 500 files
# Press Ctrl+C or cable disconnects
# Simply run again - picks up where it left off
./download-iphone-media.sh -d -o ~/Pictures/iPhone
# Output:
# ✓ Already downloaded: IMG_0001.heic
# ✓ Already downloaded: IMG_0002.heic
# ...
# ⬇ Downloading: IMG_0501.heic → ~/Pictures/iPhone/2025/Jan/IMG_0501.heic
Performance Notes
Transfer speed: Depends on USB connection (USB 2.0 vs USB 3.0)
Large libraries: May take significant time for thousands of photos
EXIF reading: Adds minimal overhead but provides accurate dates
Pattern matching: Processed client-side, so all files are scanned
Conclusion
This script provides a robust, production-ready solution for downloading photos and videos from your iPhone to your MacBook. Key capabilities:
Core Features:
Filter by file patterns (type, name)
Filter by date ranges
Organize automatically into date-based folders
Preserve original file metadata
Reliability:
Fully idempotent – safe to run multiple times
Resumable downloads with automatic crash recovery
Atomic file operations prevent corruption
Checksum verification ensures data integrity
Clear error reporting and recovery instructions
For regular use, consider creating aliases in your ~/.zshrc:
# Add to ~/.zshrc
alias iphone-backup='~/download-iphone-media.sh -d -o ~/Pictures/iPhone'
alias iphone-videos='~/download-iphone-media.sh -p "*.mov" -d -o ~/Videos/iPhone'
Then simply run iphone-backup whenever you want to download your photos!
The script below monitors LDAP operations on a Domain Controller and logs detailed information about queries that exceed specified thresholds for execution time, CPU usage, or results returned. It helps identify problematic LDAP queries that may be impacting domain controller performance.
Parameter: ThresholdSeconds Minimum query duration in seconds to log (default: 5)
Parameter: LogPath Path where log files will be saved (default: C:\LDAPDiagnostics)
Parameter: MonitorDuration How long to monitor in minutes (default: continuous)
EXAMPLE .\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -LogPath “C:\Logs\LDAP”
## Usage Examples
### Basic Usage (Continuous Monitoring)
Run with default settings - monitors queries taking 5+ seconds:
```powershell
.\Diagnose-LDAPQueries.ps1
```
### Custom Threshold and Duration
Monitor for 30 minutes, logging queries that take 3+ seconds:
```powershell
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -MonitorDuration 30
```
### Custom Log Location
Save logs to a specific directory:
```powershell
.\Diagnose-LDAPQueries.ps1 -LogPath "D:\Logs\LDAP"
```
### Verbose Output
See real-time LDAP statistics while monitoring:
```powershell
.\Diagnose-LDAPQueries.ps1 -Verbose
```
## Requirements
- **Administrator privileges** on the domain controller
- **Windows Server** with Active Directory Domain Services role
- **PowerShell 5.1 or later**
## Understanding the Output
### Log File Example
```
[2025-01-15 14:23:45] [WARNING] === Expensive LDAP Query Detected ===
[2025-01-15 14:23:45] [WARNING] Time: 01/15/2025 14:23:43
[2025-01-15 14:23:45] [WARNING] Client IP: 192.168.1.50
[2025-01-15 14:23:45] [WARNING] Duration: 8.5 seconds
[2025-01-15 14:23:45] [WARNING] Starting Node: DC=contoso,DC=com
[2025-01-15 14:23:45] [WARNING] Filter: (&(objectClass=user)(memberOf=*))
[2025-01-15 14:23:45] [WARNING] Search Scope: 2
[2025-01-15 14:23:45] [WARNING] Visited Entries: 45000
[2025-01-15 14:23:45] [WARNING] Returned Entries: 12000
```
### What to Look For
- **High visited/returned ratio** - Indicates an inefficient filter
- **Subtree searches from root** - Often unnecessarily broad
- **Wildcard filters** - Like `(cn=*)` can be very expensive
- **Unindexed attributes** - Queries on non-indexed attributes visit many entries
- **Repeated queries** - Same client making the same expensive query repeatedly
## Troubleshooting Common Issues
### No Events Appearing
If you're not seeing Event ID 1644, you may need to lower the expensive search threshold in Active Directory:
```powershell
# Lower the threshold to 1000ms (1 second)
Get-ADObject "CN=Query-Policies,CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=yourdomain,DC=com" |
Set-ADObject -Replace @{lDAPAdminLimits="MaxQueryDuration=1000"}
```
### Script Requires Restart
After enabling Field Engineering logging, you may need to restart the NTDS service:
```powershell
Restart-Service NTDS -Force
```
Best Practices
1. **Run during peak hours** to capture real-world problematic queries 2. **Start with a lower threshold** (2-3 seconds) to catch more queries 3. **Analyze the CSV** in Excel or Power BI for patterns 4. **Correlate with client IPs** to identify problematic applications 5. **Work with application owners** to optimize queries with indexes or better filters
Once you’ve identified expensive queries:
1. **Add indexes** for frequently searched attributes 2. **Optimize LDAP filters** to be more specific 3. **Reduce search scope** where possible 4. **Implement paging** for large result sets 5. **Cache results** on the client side when appropriate
This script has helped me identify numerous performance bottlenecks in production environments. I hope it helps you optimize your Active Directory infrastructure as well!
# Basic scan
./security_scanner_enhanced.sh -d example.com
# Full scan with all features
./security_scanner_enhanced.sh -d example.com -s -m 20 -v -a
# Vulnerability assessment only
./security_scanner_enhanced.sh -d example.com -v
# API security testing
./security_scanner_enhanced.sh -d example.com -a
Network Configuration
Default Interface:en0 (bypasses Zscaler)
To change the interface, edit line 24:
NETWORK_INTERFACE="en0" # Change to your interface
The script automatically falls back to default routing if the interface is unavailable.
Debug Mode
Debug mode is enabled by default and shows:
Dependency checks
Network interface status
Command execution details
Scan progress
File operations
Debug messages appear in cyan with [DEBUG] prefix.
To disable, edit line 27:
DEBUG=false
Output
Each scan creates a timestamped directory: scan_example.com_20251016_191806/