MacOs: Getting Started with Memgraph, Memgraph MCP and Claude Desktop by Analyzing test banking data for Mule Accounts

1. Introduction

This guide walks you through setting up Memgraph with Claude Desktop on your laptop to analyze relationships between mule accounts in banking systems. By the end of this tutorial, you’ll have a working setup where Claude can query and visualize banking transaction patterns to identify potential mule account networks.

Why Graph Databases for Fraud Detection?

Traditional relational databases store data in tables with rows and columns, which works well for structured, hierarchical data. However, fraud detection requires understanding relationships between entities—and this is where graph databases excel.

In fraud investigation, the connections matter more than the entities themselves:

  • Follow the money: Tracing funds through multiple accounts requires traversing relationships, not joining tables
  • Multi-hop queries: Finding patterns like “accounts connected within 3 transactions” is natural in graphs but complex in SQL
  • Pattern matching: Detecting suspicious structures (like a controller account distributing to multiple mules) is intuitive with graph queries
  • Real-time analysis: Graph databases can quickly identify new connections as transactions occur

Mule account schemes specifically benefit from graph analysis because they form distinct network patterns:

  • A central controller account receives large deposits
  • Funds are rapidly distributed to multiple recruited “mule” accounts
  • Mules quickly withdraw cash or transfer funds, completing the laundering cycle
  • These patterns create a recognizable “hub-and-spoke” topology in a graph

In a traditional relational database, finding these patterns requires multiple complex JOINs and recursive queries. In a graph database, you simply ask: “show me accounts connected to this one” or “find all paths between these two accounts.”

Why This Stack?

We’ve chosen a powerful combination of technologies that work seamlessly together:

Memgraph (Graph Database)

  • Native graph database built for speed and real-time analytics
  • Uses Cypher query language (intuitive, SQL-like syntax for graphs)
  • In-memory architecture provides millisecond query responses
  • Perfect for fraud detection where you need to explore relationships quickly
  • Lightweight and runs easily in Docker on your laptop
  • Open-source with excellent tooling (Memgraph Lab for visualization)

Claude Desktop (AI Interface)

  • Natural language interface eliminates the need to learn Cypher query syntax
  • Ask questions in plain English: “Which accounts received money from ACC006?”
  • Claude translates your questions into optimized graph queries automatically
  • Provides explanations and insights alongside query results
  • Dramatically lowers the barrier to entry for graph analysis

MCP (Model Context Protocol)

  • Connects Claude directly to Memgraph
  • Enables Claude to execute queries and retrieve real-time data
  • Secure, local connection—your data never leaves your machine
  • Extensible architecture allows adding other tools and databases

Why Not PostgreSQL?

While PostgreSQL is excellent for transactional data storage, graph relationships in SQL require:

  • Complex recursive CTEs (Common Table Expressions) for multi-hop queries
  • Multiple JOINs that become exponentially slower as relationships deepen
  • Manual construction of relationship paths
  • Limited visualization capabilities for network structures

Memgraph’s native graph model represents accounts and transactions as nodes and edges, making relationship queries natural and performant. For fraud detection where you need to quickly explore “who’s connected to whom,” graph databases are the right tool.

What You’ll Build

By following this guide, you’ll create:

The ability to ask natural language questions and get instant graph insights

A local Memgraph database with 57 accounts and 512 transactions

A realistic mule account network hidden among legitimate transactions

An AI-powered analysis interface through Claude Desktop

2. Prerequisites

Before starting, ensure you have:

  • macOS laptop
  • Homebrew package manager (we’ll install if needed)
  • Claude Desktop app installed
  • Basic terminal knowledge

3. Automated Setup

Below is a massive script. I did have it as single scripts, but it has merged into a large hazardous blob of bash. This script is badged under the “it works on my laptop” disclaimer!

cat > ~/setup_memgraph_complete.sh << 'EOF'
#!/bin/bash

# Complete automated setup for Memgraph + Claude Desktop

echo "========================================"
echo "Memgraph + Claude Desktop Setup"
echo "========================================"
echo ""

# Step 1: Install Rancher Desktop
echo "Step 1/7: Installing Rancher Desktop..."

# Check if Docker daemon is already running
DOCKER_RUNNING=false
if command -v docker &> /dev/null && docker info &> /dev/null 2>&1; then
    echo "Container runtime is already running!"
    DOCKER_RUNNING=true
fi

if [ "$DOCKER_RUNNING" = false ]; then
    # Check if Homebrew is installed
    if ! command -v brew &> /dev/null; then
        echo "Installing Homebrew first..."
        /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
        
        # Add Homebrew to PATH for Apple Silicon Macs
        if [[ $(uname -m) == 'arm64' ]]; then
            echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
            eval "$(/opt/homebrew/bin/brew shellenv)"
        fi
    fi
    
    # Check if Rancher Desktop is installed
    RANCHER_INSTALLED=false
    if brew list --cask rancher 2>/dev/null | grep -q rancher; then
        RANCHER_INSTALLED=true
        echo "Rancher Desktop is installed via Homebrew."
    fi
    
    # If not installed, install it
    if [ "$RANCHER_INSTALLED" = false ]; then
        echo "Installing Rancher Desktop..."
        brew install --cask rancher
        sleep 3
    fi
    
    echo "Starting Rancher Desktop..."
    
    # Launch Rancher Desktop
    if [ -d "/Applications/Rancher Desktop.app" ]; then
        echo "Launching Rancher Desktop from /Applications..."
        open "/Applications/Rancher Desktop.app"
        sleep 5
    else
        echo ""
        echo "Please launch Rancher Desktop manually:"
        echo "  1. Press Cmd+Space"
        echo "  2. Type 'Rancher Desktop'"
        echo "  3. Press Enter"
        echo ""
        echo "Waiting for you to launch Rancher Desktop..."
        echo "Press Enter once you've started Rancher Desktop"
        read
    fi
    
    # Add Rancher Desktop to PATH
    export PATH="$HOME/.rd/bin:$PATH"
    
    echo "Waiting for container runtime to start (this may take 30-60 seconds)..."
    # Wait for docker command to become available
    for i in {1..60}; do
        if command -v docker &> /dev/null && docker info &> /dev/null 2>&1; then
            echo ""
            echo "Container runtime is running!"
            break
        fi
        echo -n "."
        sleep 3
    done
    
    if ! command -v docker &> /dev/null || ! docker info &> /dev/null 2>&1; then
        echo ""
        echo "Rancher Desktop is taking longer than expected. Please:"
        echo "1. Wait for Rancher Desktop to fully initialize"
        echo "2. Accept any permissions requests"
        echo "3. Once you see 'Kubernetes is running' in Rancher Desktop, press Enter"
        read
        
        # Try to add Rancher Desktop to PATH
        export PATH="$HOME/.rd/bin:$PATH"
        
        # Check one more time
        if ! command -v docker &> /dev/null || ! docker info &> /dev/null 2>&1; then
            echo "Container runtime still not responding."
            echo "Please ensure Rancher Desktop is fully started and try again."
            exit 1
        fi
    fi
fi

# Ensure docker is in PATH for the rest of the script
export PATH="$HOME/.rd/bin:$PATH"

echo ""
echo "Step 2/7: Installing Memgraph container..."

# Stop and remove existing container if it exists
if docker ps -a 2>/dev/null | grep -q memgraph; then
    echo "Removing existing Memgraph container..."
    docker stop memgraph 2>/dev/null || true
    docker rm memgraph 2>/dev/null || true
fi

docker pull memgraph/memgraph-platform || { echo "Failed to pull Memgraph image"; exit 1; }
docker run -d -p 7687:7687 -p 7444:7444 -p 3000:3000 \
  --name memgraph \
  -v memgraph_data:/var/lib/memgraph \
  memgraph/memgraph-platform || { echo "Failed to start Memgraph container"; exit 1; }

echo "Waiting for Memgraph to be ready..."
sleep 10

echo ""
echo "Step 3/7: Installing Python and Memgraph MCP server..."

# Install Python if not present
if ! command -v python3 &> /dev/null; then
    echo "Installing Python..."
    brew install python3
fi

# Install uv package manager
if ! command -v uv &> /dev/null; then
    echo "Installing uv package manager..."
    curl -LsSf https://astral.sh/uv/install.sh | sh
    export PATH="$HOME/.local/bin:$PATH"
fi

echo "Memgraph MCP will be configured to run via uv..."

echo ""
echo "Step 4/7: Configuring Claude Desktop..."

CONFIG_DIR="$HOME/Library/Application Support/Claude"
CONFIG_FILE="$CONFIG_DIR/claude_desktop_config.json"

mkdir -p "$CONFIG_DIR"

if [ -f "$CONFIG_FILE" ] && [ -s "$CONFIG_FILE" ]; then
    echo "Backing up existing Claude configuration..."
    cp "$CONFIG_FILE" "$CONFIG_FILE.backup.$(date +%s)"
fi

# Get the full path to uv
UV_PATH=$(which uv 2>/dev/null || echo "$HOME/.local/bin/uv")

# Merge memgraph config with existing config
if [ -f "$CONFIG_FILE" ] && [ -s "$CONFIG_FILE" ]; then
    echo "Merging memgraph config with existing MCP servers..."
    
    # Use Python to merge JSON (more reliable than jq which may not be installed)
    python3 << PYTHON_MERGE
import json
import sys

config_file = "$CONFIG_FILE"
uv_path = "${UV_PATH}"

try:
    # Read existing config
    with open(config_file, 'r') as f:
        config = json.load(f)
    
    # Ensure mcpServers exists
    if 'mcpServers' not in config:
        config['mcpServers'] = {}
    
    # Add/update memgraph server
    config['mcpServers']['memgraph'] = {
        "command": uv_path,
        "args": [
            "run",
            "--with",
            "mcp-memgraph",
            "--python",
            "3.13",
            "mcp-memgraph"
        ],
        "env": {
            "MEMGRAPH_HOST": "localhost",
            "MEMGRAPH_PORT": "7687"
        }
    }
    
    # Write merged config
    with open(config_file, 'w') as f:
        json.dump(config, f, indent=2)
    
    print("Successfully merged memgraph config")
    sys.exit(0)
except Exception as e:
    print(f"Error merging config: {e}", file=sys.stderr)
    sys.exit(1)
PYTHON_MERGE
    
    if [ $? -ne 0 ]; then
        echo "Failed to merge config, creating new one..."
        cat > "$CONFIG_FILE" << JSON
{
  "mcpServers": {
    "memgraph": {
      "command": "${UV_PATH}",
      "args": [
        "run",
        "--with",
        "mcp-memgraph",
        "--python",
        "3.13",
        "mcp-memgraph"
      ],
      "env": {
        "MEMGRAPH_HOST": "localhost",
        "MEMGRAPH_PORT": "7687"
      }
    }
  }
}
JSON
    fi
else
    echo "Creating new Claude Desktop configuration..."
    cat > "$CONFIG_FILE" << JSON
{
  "mcpServers": {
    "memgraph": {
      "command": "${UV_PATH}",
      "args": [
        "run",
        "--with",
        "mcp-memgraph",
        "--python",
        "3.13",
        "mcp-memgraph"
      ],
      "env": {
        "MEMGRAPH_HOST": "localhost",
        "MEMGRAPH_PORT": "7687"
      }
    }
  }
}
JSON
fi

echo "Claude Desktop configured!"

echo ""
echo "Step 5/7: Setting up mgconsole..."
echo "mgconsole will be used via Docker (included in memgraph/memgraph-platform)"

echo ""
echo "Step 6/7: Setting up database schema..."

sleep 5  # Give Memgraph extra time to be ready

echo "Clearing existing data..."
echo "MATCH (n) DETACH DELETE n;" | docker exec -i memgraph mgconsole --host 127.0.0.1 --port 7687

echo "Creating indexes..."
cat <<'CYPHER' | docker exec -i memgraph mgconsole --host 127.0.0.1 --port 7687
CREATE INDEX ON :Account(account_id);
CREATE INDEX ON :Account(account_type);
CREATE INDEX ON :Person(person_id);
CYPHER

echo ""
echo "Step 7/7: Populating test data..."

echo "Loading core mule account data..."
cat <<'CYPHER' | docker exec -i memgraph mgconsole --host 127.0.0.1 --port 7687
CREATE (p1:Person {person_id: 'P001', name: 'John Smith', age: 45, risk_score: 'low'})
CREATE (a1:Account {account_id: 'ACC001', account_type: 'checking', balance: 15000, opened_date: '2020-01-15', status: 'active'})
CREATE (p1)-[:OWNS {since: '2020-01-15'}]->(a1)
CREATE (p2:Person {person_id: 'P002', name: 'Sarah Johnson', age: 38, risk_score: 'low'})
CREATE (a2:Account {account_id: 'ACC002', account_type: 'savings', balance: 25000, opened_date: '2019-06-10', status: 'active'})
CREATE (p2)-[:OWNS {since: '2019-06-10'}]->(a2)
CREATE (p3:Person {person_id: 'P003', name: 'Michael Brown', age: 22, risk_score: 'high'})
CREATE (a3:Account {account_id: 'ACC003', account_type: 'checking', balance: 500, opened_date: '2024-08-01', status: 'active'})
CREATE (p3)-[:OWNS {since: '2024-08-01'}]->(a3)
CREATE (p4:Person {person_id: 'P004', name: 'Lisa Chen', age: 19, risk_score: 'high'})
CREATE (a4:Account {account_id: 'ACC004', account_type: 'checking', balance: 300, opened_date: '2024-08-05', status: 'active'})
CREATE (p4)-[:OWNS {since: '2024-08-05'}]->(a4)
CREATE (p5:Person {person_id: 'P005', name: 'David Martinez', age: 21, risk_score: 'high'})
CREATE (a5:Account {account_id: 'ACC005', account_type: 'checking', balance: 450, opened_date: '2024-08-03', status: 'active'})
CREATE (p5)-[:OWNS {since: '2024-08-03'}]->(a5)
CREATE (p6:Person {person_id: 'P006', name: 'Robert Wilson', age: 35, risk_score: 'critical'})
CREATE (a6:Account {account_id: 'ACC006', account_type: 'business', balance: 2000, opened_date: '2024-07-15', status: 'active'})
CREATE (p6)-[:OWNS {since: '2024-07-15'}]->(a6)
CREATE (p7:Person {person_id: 'P007', name: 'Unknown Entity', risk_score: 'critical'})
CREATE (a7:Account {account_id: 'ACC007', account_type: 'business', balance: 150000, opened_date: '2024-06-01', status: 'active'})
CREATE (p7)-[:OWNS {since: '2024-06-01'}]->(a7)
CREATE (a7)-[:TRANSACTION {transaction_id: 'TXN001', amount: 50000, timestamp: '2024-09-01T10:15:00', type: 'wire_transfer', flagged: true}]->(a6)
CREATE (a6)-[:TRANSACTION {transaction_id: 'TXN002', amount: 9500, timestamp: '2024-09-01T14:30:00', type: 'transfer', flagged: true}]->(a3)
CREATE (a6)-[:TRANSACTION {transaction_id: 'TXN003', amount: 9500, timestamp: '2024-09-01T14:32:00', type: 'transfer', flagged: true}]->(a4)
CREATE (a6)-[:TRANSACTION {transaction_id: 'TXN004', amount: 9500, timestamp: '2024-09-01T14:35:00', type: 'transfer', flagged: true}]->(a5)
CREATE (a3)-[:TRANSACTION {transaction_id: 'TXN005', amount: 9000, timestamp: '2024-09-02T09:00:00', type: 'cash_withdrawal', flagged: true}]->(a6)
CREATE (a4)-[:TRANSACTION {transaction_id: 'TXN006', amount: 9000, timestamp: '2024-09-02T09:15:00', type: 'cash_withdrawal', flagged: true}]->(a6)
CREATE (a5)-[:TRANSACTION {transaction_id: 'TXN007', amount: 9000, timestamp: '2024-09-02T09:30:00', type: 'cash_withdrawal', flagged: true}]->(a6)
CREATE (a7)-[:TRANSACTION {transaction_id: 'TXN008', amount: 45000, timestamp: '2024-09-15T11:20:00', type: 'wire_transfer', flagged: true}]->(a6)
CREATE (a6)-[:TRANSACTION {transaction_id: 'TXN009', amount: 9800, timestamp: '2024-09-15T15:00:00', type: 'transfer', flagged: true}]->(a3)
CREATE (a6)-[:TRANSACTION {transaction_id: 'TXN010', amount: 9800, timestamp: '2024-09-15T15:05:00', type: 'transfer', flagged: true}]->(a4)
CREATE (a1)-[:TRANSACTION {transaction_id: 'TXN011', amount: 150, timestamp: '2024-09-10T12:00:00', type: 'debit_card', flagged: false}]->(a2)
CREATE (a2)-[:TRANSACTION {transaction_id: 'TXN012', amount: 1000, timestamp: '2024-09-12T10:00:00', type: 'transfer', flagged: false}]->(a1);
CYPHER

echo "Loading noise data (50 accounts, 500 transactions)..."
cat <<'CYPHER' | docker exec -i memgraph mgconsole --host 127.0.0.1 --port 7687
UNWIND range(1, 50) AS i
WITH i,
     ['Alice', 'Bob', 'Carol', 'David', 'Emma', 'Frank', 'Grace', 'Henry', 'Iris', 'Jack',
      'Karen', 'Leo', 'Mary', 'Nathan', 'Olivia', 'Peter', 'Quinn', 'Rachel', 'Steve', 'Tina',
      'Uma', 'Victor', 'Wendy', 'Xavier', 'Yara', 'Zack', 'Amy', 'Ben', 'Chloe', 'Daniel',
      'Eva', 'Fred', 'Gina', 'Hugo', 'Ivy', 'James', 'Kate', 'Luke', 'Mia', 'Noah',
      'Opal', 'Paul', 'Rosa', 'Sam', 'Tara', 'Umar', 'Vera', 'Will', 'Xena', 'Yuki'] AS firstNames,
     ['Anderson', 'Baker', 'Clark', 'Davis', 'Evans', 'Foster', 'Garcia', 'Harris', 'Irwin', 'Jones',
      'King', 'Lopez', 'Miller', 'Nelson', 'Owens', 'Parker', 'Quinn', 'Reed', 'Scott', 'Taylor',
      'Underwood', 'Vargas', 'White', 'Young', 'Zhao', 'Adams', 'Brooks', 'Collins', 'Duncan', 'Ellis'] AS lastNames,
     ['checking', 'savings', 'checking', 'savings', 'checking'] AS accountTypes,
     ['low', 'low', 'low', 'medium', 'low'] AS riskScores,
     ['2018-03-15', '2018-07-22', '2019-01-10', '2019-05-18', '2019-09-30', '2020-02-14', '2020-06-25', '2020-11-08', '2021-04-17', '2021-08-29', '2022-01-20', '2022-05-12', '2022-10-03', '2023-02-28', '2023-07-15'] AS dates
WITH i,
     firstNames[toInteger(rand() * size(firstNames))] + ' ' + lastNames[toInteger(rand() * size(lastNames))] AS fullName,
     accountTypes[toInteger(rand() * size(accountTypes))] AS accType,
     riskScores[toInteger(rand() * size(riskScores))] AS risk,
     toInteger(rand() * 40 + 25) AS age,
     toInteger(rand() * 80000 + 1000) AS balance,
     dates[toInteger(rand() * size(dates))] AS openDate
CREATE (p:Person {person_id: 'NOISE_P' + toString(i), name: fullName, age: age, risk_score: risk})
CREATE (a:Account {account_id: 'NOISE_ACC' + toString(i), account_type: accType, balance: balance, opened_date: openDate, status: 'active'})
CREATE (p)-[:OWNS {since: openDate}]->(a);
UNWIND range(1, 500) AS i
WITH i,
     toInteger(rand() * 50 + 1) AS fromIdx,
     toInteger(rand() * 50 + 1) AS toIdx,
     ['transfer', 'debit_card', 'check', 'atm_withdrawal', 'direct_deposit', 'wire_transfer', 'mobile_payment'] AS txnTypes,
     ['2024-01-15', '2024-02-20', '2024-03-10', '2024-04-05', '2024-05-18', '2024-06-22', '2024-07-14', '2024-08-09', '2024-09-25', '2024-10-30'] AS dates
WHERE fromIdx <> toIdx
WITH i, fromIdx, toIdx, txnTypes, dates,
     txnTypes[toInteger(rand() * size(txnTypes))] AS txnType,
     toInteger(rand() * 5000 + 10) AS amount,
     (rand() < 0.05) AS shouldFlag,
     dates[toInteger(rand() * size(dates))] AS txnDate
MATCH (from:Account {account_id: 'NOISE_ACC' + toString(fromIdx)})
MATCH (to:Account {account_id: 'NOISE_ACC' + toString(toIdx)})
CREATE (from)-[:TRANSACTION {
    transaction_id: 'NOISE_TXN' + toString(i),
    amount: amount,
    timestamp: txnDate + 'T' + toString(toInteger(rand() * 24)) + ':' + toString(toInteger(rand() * 60)) + ':00',
    type: txnType,
    flagged: shouldFlag
}]->(to);
CYPHER

echo ""
echo "========================================"
echo "Setup Complete!"
echo "========================================"
echo ""
echo "Next steps:"
echo "1. Restart Claude Desktop (Quit and reopen)"
echo "2. Open Memgraph Lab at http://localhost:3000"
echo "3. Start asking Claude questions about the mule account data!"
echo ""
echo "Example query: 'Show me all accounts owned by people with high or critical risk scores in Memgraph'"
echo ""

EOF

chmod +x ~/setup_memgraph_complete.sh
~/setup_memgraph_complete.sh

The script will:

  1. Install Rancher Desktop (if not already installed)
  2. Install Homebrew (if needed)
  3. Pull and start Memgraph container
  4. Install Node.js and Memgraph MCP server
  5. Configure Claude Desktop automatically
  6. Install mgconsole CLI tool
  7. Set up database schema with indexes
  8. Populate with mule account data and 500+ noise transactions

After the script completes, restart Claude Desktop (quit and reopen) for the MCP configuration to take effect.

4. Verifying the Setup

Verify the setup by accessing Memgraph Lab at http://localhost:3000 or using mgconsole via Docker:

docker exec -it memgraph mgconsole --host 127.0.0.1 --port 7687

In mgconsole, run:

MATCH (n) RETURN count(n);

You should see:

+----------+
| count(n) |
+----------+
| 152      |
+----------+
1 row in set (round trip in 0.002 sec)

Check the transaction relationships:

MATCH ()-[r:TRANSACTION]->() RETURN count(r);

You should see:

+----------+
| count(r) |
+----------+
| 501      |
+----------+
1 row in set (round trip in 0.002 sec)

Verify the mule accounts are still identifiable:

MATCH (p:Person)-[:OWNS]->(a:Account)
WHERE p.risk_score IN ['high', 'critical']
RETURN p.name, a.account_id, p.risk_score
ORDER BY p.risk_score DESC;

This should return the 5 suspicious accounts from our mule network:

+------------------+------------------+------------------+
| p.name           | a.account_id     | p.risk_score     |
+------------------+------------------+------------------+
| "Michael Brown"  | "ACC003"         | "high"           |
| "Lisa Chen"      | "ACC004"         | "high"           |
| "David Martinez" | "ACC005"         | "high"           |
| "Robert Wilson"  | "ACC006"         | "critical"       |
| "Unknown Entity" | "ACC007"         | "critical"       |
+------------------+------------------+------------------+
5 rows in set (round trip in 0.002 sec)

5. Using Claude with Memgraph

Now that everything is set up, you can interact with Claude Desktop to analyze the mule account network. Here are example queries you can try:

Example 1: Find All High-Risk Accounts

Ask Claude:

Show me all accounts owned by people with high or critical risk scores in Memgraph

Claude will query Memgraph and return results showing the suspicious accounts (ACC003, ACC004, ACC005, ACC006, ACC007), filtering out the 50+ noise accounts.

Example 2: Identify Transaction Patterns

Ask Claude:

Find all accounts that received money from ACC006 within a 24-hour period. Show the transaction amounts and timestamps.

Claude will identify the three mule accounts (ACC003, ACC004, ACC005) that received similar amounts in quick succession.

Example 3: Trace Money Flow

Ask Claude:

Trace the flow of money from ACC007 through the network. Show me the complete transaction path.

Claude will visualize the path: ACC007 -> ACC006 -> [ACC003, ACC004, ACC005], revealing the laundering pattern.

Example 4: Calculate Total Funds

Ask Claude:

Calculate the total amount of money that flowed through ACC006 in September 2024

Claude will aggregate all incoming and outgoing transactions for the controller account.

Example 5: Find Rapid Withdrawal Patterns

Ask Claude:

Find accounts where money was withdrawn within 48 hours of being deposited. What are the amounts and account holders?

This reveals the classic mule account behavior of quick cash extraction.

Example 6: Network Analysis

Ask Claude:

Show me all accounts that have transaction relationships with ACC006. Create a visualization of this network.

Claude will generate a graph showing the controller account at the center with connections to both the source and mule accounts.

Example 7: Risk Assessment

Ask Claude:

Which accounts have received flagged transactions totaling more than $15,000? List them by total amount.

This helps identify which mule accounts have processed the most illicit funds.

6. Understanding the Graph Visualization

When Claude displays graph results, you’ll see:

  • Nodes: Circles representing accounts and persons
  • Edges: Lines representing transactions or ownership relationships
  • Properties: Attributes like amounts, timestamps, and risk scores

The graph structure makes it easy to spot:

  • Central nodes (controllers) with many connections
  • Similar transaction patterns across multiple accounts
  • Timing correlations between related transactions
  • Isolation of legitimate vs. suspicious account clusters

7. Advanced Analysis Queries

Once you’re comfortable with basic queries, try these advanced analyses:

Community Detection

Ask Claude:

Find groups of accounts that frequently transact with each other. Are there separate communities in the network?

Temporal Analysis

Ask Claude:

Show me the timeline of transactions for accounts owned by people under 25 years old. Are there any patterns?

Shortest Path Analysis

Ask Claude:

What's the shortest path of transactions between ACC007 and ACC003? How many hops does it take?

8. Cleaning Up

When you’re done experimenting, you can stop and remove the Memgraph container:

docker stop memgraph
docker rm memgraph

To remove the data volume completely:

docker volume rm memgraph_data

To restart later with fresh data, just run the setup script again.

9. Troubleshooting

Docker Not Running

If you get errors about Docker not running:

open -a Docker

Wait for Docker Desktop to start, then verify:

docker info

Memgraph Container Won’t Start

Check if ports are already in use:

lsof -i :7687
lsof -i :3000

Kill any conflicting processes or change the port mappings in the docker run command.

Claude Can’t Connect to Memgraph

Verify the MCP server configuration:

cat ~/Library/Application\ Support/Claude/claude_desktop_config.json

Ensure Memgraph is running:

docker ps | grep memgraph

Restart Claude Desktop completely after configuration changes.

mgconsole Command Not Found

Install it manually:

brew install memgraph/tap/mgconsole

No Data Returned from Queries

Check if data was loaded successfully:

mgconsole --host 127.0.0.1 --port 7687 -e "MATCH (n) RETURN count(n);"

If the count is 0, rerun the setup script.

10. Next Steps

Now that you have a working setup, you can:

  • Add more complex transaction patterns
  • Implement real-time fraud detection rules
  • Create additional graph algorithms for anomaly detection
  • Connect to real banking data sources (with proper security)
  • Build automated alerting for suspicious patterns
  • Expand the schema to include IP addresses, devices, and locations

The combination of Memgraph’s graph database capabilities and Claude’s natural language interface makes it easy to explore and analyze complex relationship data without writing complex Cypher queries manually.

11. Conclusion

You now have a complete environment for analyzing banking mule accounts using Memgraph and Claude Desktop. The graph database structure naturally represents the relationships between accounts, making it ideal for fraud detection. Claude’s integration through MCP allows you to query and visualize this data using natural language, making sophisticated analysis accessible without deep technical knowledge.

The test dataset demonstrates typical mule account patterns: rapid movement of funds through multiple accounts, young account holders, recently opened accounts, and structured amounts designed to avoid reporting thresholds. These patterns are much easier to spot in a graph database than in traditional relational databases.

Experiment with different queries and explore how graph thinking can reveal hidden patterns in connected data.

MacOs: Deep Dive into NMAP using Claude Desktop with an NMAP MCP

Introduction

NMAP (Network Mapper) is one of the most powerful and versatile network scanning tools available for security professionals, system administrators, and ethical hackers. When combined with Claude through the Model Context Protocol (MCP), it becomes an even more powerful tool, allowing you to leverage AI to intelligently analyze scan results, suggest scanning strategies, and interpret complex network data.

In this deep dive, we’ll explore how to set up NMAP with Claude Desktop using an MCP server, and demonstrate 20+ comprehensive vulnerability checks and reconnaissance techniques you can perform using natural language prompts.

⚠️ Legal Disclaimer: Only scan systems and networks you own or have explicit written permission to test. Unauthorized scanning may be illegal in your jurisdiction.

Prerequisites

  • macOS, Linux, or Windows with WSL
  • Basic understanding of networking concepts
  • Permission to scan target systems
  • Claude Desktop installed

Part 1: Installation and Setup

Step 1: Install NMAP

On macOS:

# Using Homebrew
brew install nmap

# Verify installation

On Linux (Ubuntu/Debian):

Step 2: Install Node.js (Required for MCP Server)

The NMAP MCP server requires Node.js to run.

Mac OS:

brew install node
node --version
npm --version

Step 3: Install the NMAP MCP Server

The most popular NMAP MCP server is available on GitHub. We’ll install it globally:

cd ~/
rm -rf nmap-mcp-server
git clone https://github.com/PhialsBasement/nmap-mcp-server.git
cd nmap-mcp-server
npm install
npm run build

Step 4: Configure Claude Desktop

Edit the Claude Desktop configuration file to add the NMAP MCP server.

On macOS:

CONFIG_FILE="$HOME/Library/Application Support/Claude/claude_desktop_config.json"
USERNAME=$(whoami)

cp "$CONFIG_FILE" "$CONFIG_FILE.backup"

python3 << 'EOF'
import json
import os

config_file = os.path.expanduser("~/Library/Application Support/Claude/claude_desktop_config.json")
username = os.environ['USER']

with open(config_file, 'r') as f:
config = json.load(f)

if 'mcpServers' not in config:
config['mcpServers'] = {}

config['mcpServers']['nmap'] = {
"command": "node",
"args": [
f"/Users/{username}/nmap-mcp-server/dist/index.js"
],
"env": {}
}

with open(config_file, 'w') as f:
json.dump(config, f, indent=2)

print("nmap server added to Claude Desktop config!")
print(f"Backup saved to: {config_file}.backup")
EOF


Step 5: Restart Claude Desktop

Close and reopen Claude Desktop. You should see the NMAP MCP server connected in the bottom-left corner.

Part 2: Understanding NMAP MCP Capabilities

Once configured, Claude can execute NMAP scans through the MCP server. The server typically provides:

  • Host discovery scans
  • Port scanning (TCP/UDP)
  • Service version detection
  • OS detection
  • Script scanning (NSE – NMAP Scripting Engine)
  • Output parsing and interpretation

Part 3: 20 Most Common Vulnerability Checks

For these examples, we’ll use a hypothetical target domain: example-target.com (replace with your authorized target).

1. Basic Host Discovery and Open Ports

Prompt:

Scan example-target.com to discover if the host is up and identify all open ports (1-1000). Use a TCP SYN scan for speed.

What this does: Performs a fast SYN scan on the first 1000 ports to quickly identify open services.

Expected NMAP command:

nmap -sS -p 1-1000 example-target.com

2. Comprehensive Port Scan (All 65535 Ports)

Prompt:

Perform a comprehensive scan of all 65535 TCP ports on example-target.com to identify any services running on non-standard ports.

What this does: Scans every possible TCP port – time-consuming but thorough.

Expected NMAP command:

nmap -p- example-target.com

3. Service Version Detection

Prompt:

Scan the top 1000 ports on example-target.com and detect the exact versions of services running on open ports. This will help identify outdated software.

What this does: Probes open ports to determine service/version info, crucial for finding known vulnerabilities.

Expected NMAP command:

nmap -sV example-target.com

4. Operating System Detection

Prompt:

Detect the operating system running on example-target.com using TCP/IP stack fingerprinting. Include OS detection confidence levels.

What this does: Analyzes network responses to guess the target OS.

Expected NMAP command:

nmap -O example-target.com

5. Aggressive Scan (OS + Version + Scripts + Traceroute)

Prompt:

Run an aggressive scan on example-target.com that includes OS detection, version detection, script scanning, and traceroute. This is comprehensive but noisy.

What this does: Combines multiple detection techniques for maximum information.

Expected NMAP command:

nmap -A example-target.com

6. Vulnerability Scanning with NSE Scripts

Prompt:

Scan example-target.com using NMAP's vulnerability detection scripts to check for known CVEs and security issues in running services.

What this does: Uses NSE scripts from the ‘vuln’ category to detect known vulnerabilities.

Expected NMAP command:

nmap --script vuln example-target.com

7. SSL/TLS Security Analysis

Prompt:

Analyze SSL/TLS configuration on example-target.com (port 443). Check for weak ciphers, certificate issues, and SSL vulnerabilities like Heartbleed and POODLE.

What this does: Comprehensive SSL/TLS security assessment.

Expected NMAP command:

nmap -p 443 --script ssl-enum-ciphers,ssl-cert,ssl-heartbleed,ssl-poodle example-target.com

8. HTTP Security Headers and Vulnerabilities

Prompt:

Check example-target.com's web server (ports 80, 443, 8080) for security headers, common web vulnerabilities, and HTTP methods allowed.

What this does: Tests for missing security headers, dangerous HTTP methods, and common web flaws.

Expected NMAP command:

nmap -p 80,443,8080 --script http-security-headers,http-methods,http-csrf,http-stored-xss example-target.com

Prompt:

Scan example-target.com for SMB vulnerabilities including MS17-010 (EternalBlue), SMB signing issues, and accessible shares.

What this does: Critical for identifying Windows systems vulnerable to ransomware exploits.

Expected NMAP command:

nmap -p 445 --script smb-vuln-ms17-010,smb-vuln-*,smb-enum-shares example-target.com

10. SQL Injection Testing

Prompt:

Test web applications on example-target.com (ports 80, 443) for SQL injection vulnerabilities in common web paths and parameters.

What this does: Identifies potential SQL injection points.

Expected NMAP command:

nmap -p 80,443 --script http-sql-injection example-target.com

11. DNS Zone Transfer Vulnerability

Prompt:

Test if example-target.com's DNS servers allow unauthorized zone transfers, which could leak internal network information.

What this does: Attempts AXFR zone transfer – a serious misconfiguration if allowed.

Expected NMAP command:

nmap --script dns-zone-transfer --script-args dns-zone-transfer.domain=example-target.com -p 53 example-target.com

12. SSH Security Assessment

Prompt:

Analyze SSH configuration on example-target.com (port 22). Check for weak encryption algorithms, host keys, and authentication methods.

What this does: Identifies insecure SSH configurations.

Expected NMAP command:

nmap -p 22 --script ssh-auth-methods,ssh-hostkey,ssh2-enum-algos example-target.com

Prompt:

Check if example-target.com's FTP server (port 21) allows anonymous login and scan for FTP-related vulnerabilities.

What this does: Tests for anonymous FTP access and common FTP security issues.

Expected NMAP command:

nmap -p 21 --script ftp-anon,ftp-vuln-cve2010-4221,ftp-bounce example-target.com

Prompt:

Scan example-target.com's email servers (ports 25, 110, 143, 587, 993, 995) for open relays, STARTTLS support, and vulnerabilities.

What this does: Comprehensive email server security check.

Expected NMAP command:

nmap -p 25,110,143,587,993,995 --script smtp-open-relay,smtp-enum-users,ssl-cert example-target.com

15. Database Server Exposure

Prompt:

Check if example-target.com has publicly accessible database servers (MySQL, PostgreSQL, MongoDB, Redis) and test for default credentials.

What this does: Identifies exposed databases, a critical security issue.

Expected NMAP command:

nmap -p 3306,5432,27017,6379 --script mysql-empty-password,pgsql-brute,mongodb-databases,redis-info example-target.com

16. WordPress Security Scan

Prompt:

If example-target.com runs WordPress, enumerate plugins, themes, and users, and check for known vulnerabilities.

What this does: WordPress-specific security assessment.

Expected NMAP command:

nmap -p 80,443 --script http-wordpress-enum,http-wordpress-users example-target.com

17. XML External Entity (XXE) Vulnerability

Prompt:

Test web services on example-target.com for XML External Entity (XXE) injection vulnerabilities.

What this does: Identifies XXE flaws in XML parsers.

Expected NMAP command:

nmap -p 80,443 --script http-vuln-cve2017-5638 example-target.com

18. SNMP Information Disclosure

Prompt:

Scan example-target.com for SNMP services (UDP port 161) and attempt to extract system information using common community strings.

What this does: SNMP can leak sensitive system information.

Expected NMAP command:

nmap -sU -p 161 --script snmp-brute,snmp-info example-target.com

19. RDP Security Assessment

Prompt:

Check if Remote Desktop Protocol (RDP) on example-target.com (port 3389) is vulnerable to known exploits like BlueKeep (CVE-2019-0708).

What this does: Critical Windows remote access security check.

Expected NMAP command:

nmap -p 3389 --script rdp-vuln-ms12-020,rdp-enum-encryption example-target.com

20. API Endpoint Discovery and Testing

Prompt:

Discover API endpoints on example-target.com and test for common API vulnerabilities including authentication bypass and information disclosure.

What this does: Identifies REST APIs and tests for common API security issues.

Expected NMAP command:

nmap -p 80,443,8080,8443 --script http-methods,http-auth-finder,http-devframework example-target.com

Part 4: Deep Dive Exercises

Deep Dive Exercise 1: Complete Web Application Security Assessment

Scenario: You need to perform a comprehensive security assessment of a web application running at webapp.example-target.com.

Claude Prompt:

I need a complete security assessment of webapp.example-target.com. Please:

1. First, discover all open ports and running services
2. Identify the web server software and version
3. Check for SSL/TLS vulnerabilities and certificate issues
4. Test for common web vulnerabilities (XSS, SQLi, CSRF)
5. Check security headers (CSP, HSTS, X-Frame-Options, etc.)
6. Enumerate web directories and interesting files
7. Test for backup file exposure (.bak, .old, .zip)
8. Check for sensitive information in robots.txt and sitemap.xml
9. Test HTTP methods for dangerous verbs (PUT, DELETE, TRACE)
10. Provide a prioritized summary of findings with remediation advice

Use timing template T3 (normal) to avoid overwhelming the target.

What Claude will do:

Claude will execute multiple NMAP scans in sequence, starting with discovery and progressively getting more detailed. Example commands it might run:

# Phase 1: Discovery
nmap -sV -T3 webapp.example-target.com

# Phase 2: SSL/TLS Analysis
nmap -p 443 -T3 --script ssl-cert,ssl-enum-ciphers,ssl-known-key,ssl-heartbleed,ssl-poodle,ssl-ccs-injection webapp.example-target.com

# Phase 3: Web Vulnerability Scanning
nmap -p 80,443 -T3 --script http-security-headers,http-csrf,http-sql-injection,http-stored-xss,http-dombased-xss webapp.example-target.com

# Phase 4: Directory and File Enumeration
nmap -p 80,443 -T3 --script http-enum,http-backup-finder webapp.example-target.com

# Phase 5: HTTP Methods Testing
nmap -p 80,443 -T3 --script http-methods --script-args http-methods.test-all webapp.example-target.com

Learning Outcomes:

  • Understanding layered security assessment methodology
  • How to interpret multiple scan results holistically
  • Prioritization of security findings by severity
  • Claude’s ability to correlate findings across multiple scans

Deep Dive Exercise 2: Network Perimeter Reconnaissance

Scenario: You’re assessing the security perimeter of an organization with the domain company.example-target.com and a known IP range 198.51.100.0/24.

Claude Prompt:

Perform comprehensive network perimeter reconnaissance for company.example-target.com (IP range 198.51.100.0/24). I need to:

1. Discover all live hosts in the IP range
2. For each live host, identify:
   - Operating system
   - All open ports (full 65535 range)
   - Service versions
   - Potential vulnerabilities
3. Map the network topology and identify:
   - Firewalls and filtering
   - DMZ hosts vs internal hosts
   - Critical infrastructure (DNS, mail, web servers)
4. Test for common network misconfigurations:
   - Open DNS resolvers
   - Open mail relays
   - Unauthenticated database access
   - Unencrypted management protocols (Telnet, FTP)
5. Provide a network map and executive summary

Use slow timing (T2) to minimize detection risk and avoid false positives.

What Claude will do:

# Phase 1: Host Discovery
nmap -sn -T2 198.51.100.0/24

# Phase 2: OS Detection on Live Hosts
nmap -O -T2 198.51.100.0/24

# Phase 3: Comprehensive Port Scan (may suggest splitting into chunks)
nmap -p- -T2 198.51.100.0/24

# Phase 4: Service Version Detection
nmap -sV -T2 198.51.100.0/24

# Phase 5: Specific Service Checks
nmap -p 53 --script dns-recursion 198.51.100.0/24
nmap -p 25 --script smtp-open-relay 198.51.100.0/24
nmap -p 3306,5432,27017 --script mysql-empty-password,pgsql-brute,mongodb-databases 198.51.100.0/24
nmap -p 23,21 198.51.100.0/24

# Phase 6: Vulnerability Scanning on Critical Hosts
nmap --script vuln -T2 [critical-hosts]

Learning Outcomes:

  • Large-scale network scanning strategies
  • How to handle and analyze results from multiple hosts
  • Network segmentation analysis
  • Risk assessment across an entire network perimeter
  • Understanding firewall and filtering detection

Deep Dive Exercise 3: Advanced Vulnerability Research – Zero-Day Hunting

Scenario: You’ve discovered a host running potentially vulnerable services and want to do deep reconnaissance to identify potential zero-day vulnerabilities or chained exploits.

Claude Prompt:

I've found a server at secure-server.example-target.com that's running multiple services. I need advanced vulnerability research:

1. Aggressive version fingerprinting of all services
2. Check for version-specific CVEs in detected software
3. Look for unusual port combinations that might indicate custom applications
4. Test for default credentials on all identified services
5. Check for known backdoors in the detected software versions
6. Test for authentication bypass vulnerabilities
7. Look for information disclosure issues (version strings, debug info, error messages)
8. Test for timing attacks and race conditions
9. Analyze for possible exploit chains (e.g., LFI -> RCE)
10. Provide detailed analysis with CVSS scores and exploit availability

Run this aggressively (-T4) as we have permission for intensive testing.

What Claude will do:

# Phase 1: Aggressive Fingerprinting
nmap -sV --version-intensity 9 -T4 secure-server.example-target.com

# Phase 2: All Vulnerability Scripts
nmap --script vuln,exploit -T4 secure-server.example-target.com

# Phase 3: Default Credential Testing
nmap --script auth -T4 secure-server.example-target.com

# Phase 4: Backdoor Detection
nmap --script backdoor-check,unusual-port -T4 secure-server.example-target.com

# Phase 5: Authentication Testing
nmap --script auth-bypass,brute -T4 secure-server.example-target.com

# Phase 6: Information Disclosure
nmap --script banner,http-errors,http-git,http-svn-enum -T4 secure-server.example-target.com

# Phase 7: Service-Specific Deep Dives
# (Claude will run targeted scripts based on discovered services)

After scans, Claude will:

  • Cross-reference detected versions with CVE databases
  • Explain potential exploit chains
  • Provide PoC (Proof of Concept) suggestions
  • Recommend remediation priorities
  • Suggest additional manual testing techniques

Learning Outcomes:

  • Advanced NSE scripting capabilities
  • How to correlate vulnerabilities for exploit chains
  • Understanding vulnerability severity and exploitability
  • Version-specific vulnerability research
  • Claude’s ability to provide context from its training data about specific CVEs

Part 5: Wide-Ranging Reconnaissance Exercises

Exercise 5.1: Subdomain Discovery and Mapping

Prompt:

Help me discover all subdomains of example-target.com and create a complete map of their infrastructure. For each subdomain found:
- Resolve its IP addresses
- Check if it's hosted on the same infrastructure
- Identify the services running
- Note any interesting or unusual findings

Also check for common subdomain patterns like api, dev, staging, admin, etc.

What this reveals: Shadow IT, forgotten dev servers, API endpoints, and the organization’s infrastructure footprint.

Exercise 5.2: API Security Testing

Prompt:

I've found an API at api.example-target.com. Please:
1. Identify the API type (REST, GraphQL, SOAP)
2. Discover all available endpoints
3. Test authentication mechanisms
4. Check for rate limiting
5. Test for IDOR (Insecure Direct Object References)
6. Look for excessive data exposure
7. Test for injection vulnerabilities
8. Check API versioning and test old versions for vulnerabilities
9. Verify CORS configuration
10. Test for JWT vulnerabilities if applicable

Exercise 5.3: Cloud Infrastructure Detection

Prompt:

Scan example-target.com to identify if they're using cloud infrastructure (AWS, Azure, GCP). Look for:
- Cloud-specific IP ranges
- S3 buckets or blob storage
- Cloud-specific services (CloudFront, Azure CDN, etc.)
- Misconfigured cloud resources
- Storage bucket permissions
- Cloud metadata services exposure

Exercise 5.4: IoT and Embedded Device Discovery

Prompt:

Scan the network 192.168.1.0/24 for IoT and embedded devices such as:
- IP cameras
- Smart TVs
- Printers
- Network attached storage (NAS)
- Home automation systems
- Industrial control systems (ICS/SCADA if applicable)

Check each device for:
- Default credentials
- Outdated firmware
- Unencrypted communications
- Exposed management interfaces

Exercise 5.5: Checking for Known Vulnerabilities and Old Software

Prompt:

Perform a comprehensive audit of example-target.com focusing on outdated and vulnerable software:

1. Detect exact versions of all running services
2. For each service, check if it's end-of-life (EOL)
3. Identify known CVEs for each version detected
4. Prioritize findings by:
   - CVSS score
   - Exploit availability
   - Exposure (internet-facing vs internal)
5. Check for:
   - Outdated TLS/SSL versions
   - Deprecated cryptographic algorithms
   - Unpatched web frameworks
   - Old CMS versions (WordPress, Joomla, Drupal)
   - Legacy protocols (SSLv3, TLS 1.0, weak ciphers)
6. Generate a remediation roadmap with version upgrade recommendations

Expected approach:

# Detailed version detection
nmap -sV --version-intensity 9 example-target.com

# Check for versionable services
nmap --script version,http-server-header,http-generator example-target.com

# SSL/TLS testing
nmap -p 443 --script ssl-cert,ssl-enum-ciphers,sslv2,ssl-date example-target.com

# CMS detection
nmap -p 80,443 --script http-wordpress-enum,http-joomla-brute,http-drupal-enum example-target.com

Claude will then analyze the results and provide:

  • A table of detected software with current versions and latest versions
  • CVE listings with severity scores
  • Specific upgrade recommendations
  • Risk assessment for each finding

Part 6: Advanced Tips and Techniques

6.1 Optimizing Scan Performance

Timing Templates:

  • -T0 (Paranoid): Extremely slow, for IDS evasion
  • -T1 (Sneaky): Slow, minimal detection risk
  • -T2 (Polite): Slower, less bandwidth intensive
  • -T3 (Normal): Default, balanced approach
  • -T4 (Aggressive): Faster, assumes good network
  • -T5 (Insane): Extremely fast, may miss results

Prompt:

Explain when to use each NMAP timing template and demonstrate the difference by scanning example-target.com with T2 and T4 timing.

6.2 Evading Firewalls and IDS

Prompt:

Scan example-target.com using techniques to evade firewalls and intrusion detection systems:
- Fragment packets
- Use decoy IP addresses
- Randomize scan order
- Use idle scan if possible
- Spoof MAC address (if on local network)
- Use source port 53 or 80 to bypass egress filtering

Expected command examples:

# Fragmented packets
nmap -f example-target.com

# Decoy scan
nmap -D RND:10 example-target.com

# Randomize hosts
nmap --randomize-hosts example-target.com

# Source port spoofing
nmap --source-port 53 example-target.com

6.3 Creating Custom NSE Scripts with Claude

Prompt:

Help me create a custom NSE script that checks for a specific vulnerability in our custom application running on port 8080. The vulnerability is that the /debug endpoint returns sensitive configuration data without authentication.

Claude can help you write Lua scripts for NMAP’s scripting engine!

6.4 Output Parsing and Reporting

Prompt:

Scan example-target.com and save results in all available formats (normal, XML, grepable, script kiddie). Then help me parse the XML output to extract just the critical and high severity findings for a report.

Expected command:

nmap -oA scan_results example-target.com

Claude can then help you parse the XML file programmatically.

Part 7: Responsible Disclosure and Next Steps

After Finding Vulnerabilities

  1. Document everything: Keep detailed records of your findings
  2. Prioritize by risk: Use CVSS scores and business impact
  3. Responsible disclosure: Follow the organization’s security policy
  4. Remediation tracking: Help create an action plan
  5. Verify fixes: Re-test after patches are applied

Using Claude for Post-Scan Analysis

Prompt:

I've completed my NMAP scans and found 15 vulnerabilities. Here are the results: [paste scan output]. 

Please:
1. Categorize by severity (Critical, High, Medium, Low, Info)
2. Explain each vulnerability in business terms
3. Provide remediation steps for each
4. Suggest a remediation priority order
5. Draft an executive summary for management
6. Create technical remediation tickets for the engineering team

Claude excels at translating technical scan results into actionable business intelligence.

Part 8: Continuous Monitoring with NMAP and Claude

Set up regular scanning routines and use Claude to track changes:

Prompt:

Create a baseline scan of example-target.com and save it. Then help me set up a cron job (or scheduled task) to run weekly scans and alert me to any changes in:
- New open ports
- Changed service versions
- New hosts discovered
- Changes in vulnerabilities detected

Conclusion

Combining NMAP’s powerful network scanning capabilities with Claude’s AI-driven analysis creates a formidable security assessment toolkit. The Model Context Protocol bridges these tools seamlessly, allowing you to:

  • Express complex scanning requirements in natural language
  • Get intelligent interpretation of scan results
  • Receive contextual security advice
  • Automate repetitive reconnaissance tasks
  • Learn security concepts through interactive exploration

Key Takeaways:

  1. Always get permission before scanning any network or system
  2. Start with gentle scans and progressively get more aggressive
  3. Use timing controls to avoid overwhelming targets or triggering alarms
  4. Correlate multiple scans for a complete security picture
  5. Leverage Claude’s knowledge to interpret results and suggest next steps
  6. Document everything for compliance and knowledge sharing
  7. Keep NMAP updated to benefit from the latest scripts and capabilities

The examples provided in this guide demonstrate just a fraction of what’s possible when combining NMAP with AI assistance. As you become more comfortable with this workflow, you’ll discover new ways to leverage Claude’s understanding to make your security assessments more efficient and comprehensive.

Additional Resources

About the Author: This guide was created to help security professionals and system administrators leverage AI assistance for more effective network reconnaissance and vulnerability assessment.

Last Updated: 2025-11-21

Version: 1.0

Deep Dive into PostgreSQL Prepared Statements: When Plan Caching Goes Wrong leading to Memory Exhaustion

Prepared statements are one of PostgreSQL’s most powerful features for query optimization. By parsing and planning queries once, then reusing those plans for subsequent executions, they can dramatically improve performance. But this optimization comes with a hidden danger: sometimes caching the same plan for every execution can lead to catastrophic memory exhaustion and performance degradation.

In this deep dive, we’ll explore how prepared statement plan caching works, when it fails spectacularly, and how PostgreSQL has evolved to address these challenges.

1. Understanding Prepared Statements and Plan Caching

When you execute a prepared statement in PostgreSQL, the database goes through several phases:

  1. Parsing: Converting the SQL text into a parse tree
  2. Planning: Creating an execution plan based on statistics and parameters
  3. Execution: Running the plan against actual data

The promise of prepared statements is simple: do steps 1 and 2 once, then reuse the results for repeated executions with different parameter values.

-- Prepare the statement
PREPARE get_orders AS
SELECT * FROM orders WHERE customer_id = $1;

-- Execute multiple times with different parameters
EXECUTE get_orders(123);
EXECUTE get_orders(456);
EXECUTE get_orders(789);

PostgreSQL uses a clever heuristic to decide when to cache plans. For the first five executions, it creates a custom plan specific to the parameter values. Starting with the sixth execution, it evaluates whether a generic plan (one that works for any parameter value) would be more efficient. If the average cost of the custom plans is close enough to the generic plan’s cost, PostgreSQL switches to reusing the generic plan.

2. The Dark Side: Memory Exhaustion from Plan Caching

Here’s where things can go catastrophically wrong. Consider a partitioned table:

CREATE TABLE events (
    id BIGSERIAL,
    event_date DATE,
    user_id INTEGER,
    event_type TEXT,
    data JSONB
) PARTITION BY RANGE (event_date);

-- Create 365 partitions, one per day
CREATE TABLE events_2024_01_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-01-02');
CREATE TABLE events_2024_01_02 PARTITION OF events
    FOR VALUES FROM ('2024-01-02') TO ('2024-01-03');
-- ... 363 more partitions

Now consider this prepared statement:

PREPARE get_events AS
SELECT * FROM events WHERE event_date = $1;

The Problem: Generic Plans Can’t Prune Partitions

When PostgreSQL creates a generic plan for this query, it doesn’t know which specific date you’ll query at execution time. Without this knowledge, the planner cannot perform partition pruning the critical optimization that eliminates irrelevant partitions from consideration.

Here’s what happens:

  1. Custom plan (first 5 executions): PostgreSQL sees the actual date value, realizes only one partition is relevant, and creates a plan that touches only that partition. Fast and efficient.
  2. Generic plan (6th execution onward): PostgreSQL creates a plan that must be valid for ANY date value. Since it can’t know which partition you’ll need, it includes ALL 365 partitions in the plan.

The result: Instead of reading from 1 partition, PostgreSQL’s generic plan prepares to read from all 365 partitions. This leads to:

  • Memory exhaustion: The query plan itself becomes enormous, containing nodes for every partition
  • Planning overhead: Even though the plan is cached, initializing it for execution requires allocating memory for all partition nodes
  • Execution inefficiency: The executor must check every partition, even though 364 of them will return zero rows

In extreme cases with thousands of partitions, this can consume gigabytes of memory per connection and bring your database to its knees.

3. Partition Pruning: The Critical Optimization and How It Works

Partition pruning is the process of eliminating partitions that cannot possibly contain relevant data based on query constraints. Understanding partition pruning in depth is essential for working with partitioned tables effectively.

3.1 What Is Partition Pruning?

At its core, partition pruning is PostgreSQL’s mechanism for avoiding unnecessary work. When you query a partitioned table, the database analyzes your WHERE clause and determines which partitions could possibly contain matching rows. All other partitions are excluded from the query execution entirely.

Consider a table partitioned by date range:

CREATE TABLE sales (
    sale_id BIGINT,
    sale_date DATE,
    amount NUMERIC,
    product_id INTEGER
) PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2023_q1 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
    
CREATE TABLE sales_2023_q2 PARTITION OF sales
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
    
CREATE TABLE sales_2023_q3 PARTITION OF sales
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
    
CREATE TABLE sales_2023_q4 PARTITION OF sales
    FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');

When you execute:

SELECT * FROM sales WHERE sale_date = '2023-05-15';

PostgreSQL performs partition pruning by examining the partition constraints. It determines that only sales_2023_q2 can contain rows with sale_date = ‘2023-05-15’, so it completely ignores the other three partitions. They never get opened, scanned, or loaded into memory.

3.2 The Two Stages of Partition Pruning

PostgreSQL performs partition pruning at two distinct stages in query execution, and understanding the difference is crucial for troubleshooting performance issues.

Stage 1: Plan Time Pruning (Static Pruning)

Plan time pruning happens during the query planning phase, before execution begins. This is the ideal scenario because pruned partitions never appear in the execution plan at all.

When it occurs:

  • The query contains literal values in the WHERE clause
  • The partition key columns are directly compared to constants
  • The planner can evaluate the partition constraints at planning time

Example:

EXPLAIN SELECT * FROM sales WHERE sale_date = '2023-05-15';

Output might show:

Seq Scan on sales_2023_q2 sales
  Filter: (sale_date = '2023-05-15'::date)

Notice that only one partition appears in the plan. The other three partitions were pruned away during planning, and they consume zero resources.

What makes plan time pruning possible:

The planner evaluates the WHERE clause condition against each partition’s constraint. For sales_2023_q2, the constraint is:

sale_date >= '2023-04-01' AND sale_date < '2023-07-01'

The planner performs boolean logic: “Can sale_date = ‘2023-05-15’ be true if the constraint requires sale_date >= ‘2023-04-01’ AND sale_date < ‘2023-07-01’?” Yes, it can. For the other partitions, the answer is no, so they’re eliminated.

Performance characteristics:

  • No runtime overhead for pruned partitions
  • Minimal memory usage
  • Optimal query performance
  • The execution plan is lean and specific

Stage 2: Execution Time Pruning (Dynamic Pruning)

Execution time pruning, also called runtime pruning, happens during query execution rather than planning. This occurs when the planner cannot determine which partitions to prune until the query actually runs.

When it occurs:

  • Parameters or variables are used instead of literal values
  • Subqueries provide the filter values
  • Join conditions determine which partitions are needed
  • Prepared statements with parameters

Example:

PREPARE get_sales AS 
SELECT * FROM sales WHERE sale_date = $1;

EXPLAIN (ANALYZE) EXECUTE get_sales('2023-05-15');

With execution time pruning, the plan initially includes all partitions, but the output shows:

Append (actual rows=100)
  Subplans Removed: 3
  -> Seq Scan on sales_2023_q2 sales (actual rows=100)
       Filter: (sale_date = '2023-05-15'::date)

The key indicator is “Subplans Removed: 3”, which tells you that three partitions were pruned at execution time.

How execution time pruning works:

During the initialization phase of query execution, PostgreSQL evaluates the actual parameter values and applies the same constraint checking logic as plan time pruning. However, instead of eliminating partitions from the plan, it marks them as “pruned” and skips their initialization and execution.

The critical difference:

Even though execution time pruning skips scanning the pruned partitions, the plan still contains nodes for all partitions. This means:

  • Memory is allocated for all partition nodes (though less than full initialization)
  • The plan structure is larger
  • There is a small runtime cost to check each partition
  • More complex bookkeeping is required

This is why execution time pruning, while much better than no pruning, is not quite as efficient as plan time pruning.

3.3 Partition Pruning with Different Partition Strategies

PostgreSQL supports multiple partitioning strategies, and pruning works differently for each.

Range Partitioning

Range partitioning is the most common and supports the most effective pruning:

CREATE TABLE measurements (
    measurement_time TIMESTAMPTZ,
    sensor_id INTEGER,
    value NUMERIC
) PARTITION BY RANGE (measurement_time);

Pruning logic: PostgreSQL uses range comparison. Given a filter like measurement_time >= '2024-01-01' AND measurement_time < '2024-02-01', it identifies all partitions whose ranges overlap with this query range.

Pruning effectiveness: Excellent. Range comparisons are computationally cheap and highly selective.

List Partitioning

List partitioning groups rows by discrete values:

CREATE TABLE orders (
    order_id BIGINT,
    country_code TEXT,
    amount NUMERIC
) PARTITION BY LIST (country_code);

CREATE TABLE orders_us PARTITION OF orders
    FOR VALUES IN ('US');
    
CREATE TABLE orders_uk PARTITION OF orders
    FOR VALUES IN ('UK');
    
CREATE TABLE orders_eu PARTITION OF orders
    FOR VALUES IN ('DE', 'FR', 'IT', 'ES');

Pruning logic: PostgreSQL checks if the query value matches any value in each partition’s list.

SELECT * FROM orders WHERE country_code = 'FR';

Only orders_eu is accessed because ‘FR’ appears in its value list.

Pruning effectiveness: Very good for equality comparisons. Less effective for OR conditions across many values or pattern matching.

Hash Partitioning

Hash partitioning distributes rows using a hash function:

CREATE TABLE users (
    user_id BIGINT,
    username TEXT,
    email TEXT
) PARTITION BY HASH (user_id);

CREATE TABLE users_p0 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
    
CREATE TABLE users_p1 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... p2, p3

Pruning logic: PostgreSQL computes the hash of the query value and determines which partition it maps to.

SELECT * FROM users WHERE user_id = 12345;

PostgreSQL calculates hash(12345) % 4 and accesses only the matching partition.

Pruning effectiveness: Excellent for equality on the partition key. Completely ineffective for range queries, pattern matching, or anything except exact equality matches.

3.4 Complex Partition Pruning Scenarios

Real world queries are often more complex than simple equality comparisons. Here’s how pruning handles various scenarios:

Multi Column Partition Keys

CREATE TABLE events (
    event_date DATE,
    region TEXT,
    data JSONB
) PARTITION BY RANGE (event_date, region);

Pruning works on the leading columns of the partition key. A query filtering only on event_date can still prune effectively. A query filtering only on region cannot prune at all because region is not the leading column.

OR Conditions

SELECT * FROM sales 
WHERE sale_date = '2023-05-15' OR sale_date = '2023-08-20';

PostgreSQL must access partitions for both dates (Q2 and Q3), so it keeps both and prunes Q1 and Q4. OR conditions reduce pruning effectiveness.

Inequality Comparisons

SELECT * FROM sales WHERE sale_date >= '2023-05-01';

PostgreSQL prunes partitions entirely before the date (Q1) but must keep all partitions from Q2 onward. Range queries reduce pruning selectivity.

Joins Between Partitioned Tables

SELECT * FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE s.sale_date = '2023-05-15';

If sales is partitioned by sale_date, that partition pruning works normally. If products is also partitioned, PostgreSQL attempts partitionwise joins where possible, enabling pruning on both sides.

Subqueries Providing Values

SELECT * FROM sales 
WHERE sale_date = (SELECT MAX(order_date) FROM orders);

This requires execution time pruning because the subquery must run before PostgreSQL knows which partition to access.

3.5 Monitoring Partition Pruning

To verify partition pruning is working, use EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM sales WHERE sale_date = '2023-05-15';

What to look for:

Plan time pruning succeeded:

Seq Scan on sales_2023_q2

Only one partition appears in the plan at all.

Execution time pruning succeeded:

Append
  Subplans Removed: 3
  -> Seq Scan on sales_2023_q2

All partitions appear in the plan structure, but “Subplans Removed” shows pruning happened.

No pruning occurred:

Append
  -> Seq Scan on sales_2023_q1
  -> Seq Scan on sales_2023_q2
  -> Seq Scan on sales_2023_q3
  -> Seq Scan on sales_2023_q4

All partitions were scanned. This indicates a problem.

3.6 Why Partition Pruning Fails

Understanding why pruning fails helps you fix it:

  1. Query doesn’t filter on partition key: If your WHERE clause doesn’t reference the partition column(s), PostgreSQL cannot prune.
  2. Function calls on partition key: WHERE EXTRACT(YEAR FROM sale_date) = 2023 prevents pruning because PostgreSQL can’t map the function result back to partition ranges. Use WHERE sale_date >= '2023-01-01' AND sale_date < '2024-01-01' instead.
  3. Type mismatches: If your partition key is DATE but you compare to TEXT without explicit casting, pruning may fail.
  4. Generic plans in prepared statements: As discussed in the main article, generic plans prevent plan time pruning and older PostgreSQL versions struggled with execution time pruning.
  5. OR conditions with non-partition columns: WHERE sale_date = '2023-05-15' OR customer_id = 100 prevents pruning because customer_id isn’t the partition key.
  6. Volatile functions: WHERE sale_date = CURRENT_DATE may prevent plan time pruning (but should work with execution time pruning).

3.7 Partition Pruning Performance Impact

The performance difference between pruned and unpruned queries can be staggering:

Example scenario: 1000 partitions, each with 1 million rows. Query targets one partition.

With pruning:

  • Partitions opened: 1
  • Rows scanned: 1 million
  • Memory for plan nodes: ~10KB
  • Query time: 50ms

Without pruning:

  • Partitions opened: 1000
  • Rows scanned: 1 billion (returning ~1 million)
  • Memory for plan nodes: ~10MB
  • Query time: 45 seconds

The difference is not incremental; it’s exponential as partition count grows.

4. Partition Pruning in Prepared Statements: The Core Problem

Let me illustrate the severity with a real-world scenario:

-- Table with 1000 partitions
CREATE TABLE metrics (
    timestamp TIMESTAMPTZ,
    metric_name TEXT,
    value NUMERIC
) PARTITION BY RANGE (timestamp);

-- Create 1000 daily partitions...

-- Prepared statement
PREPARE get_metrics AS
SELECT * FROM metrics 
WHERE timestamp >= $1 AND timestamp < $2;

After the 6th execution, PostgreSQL switches to a generic plan. Each subsequent execution:

  1. Allocates memory for 1000 partition nodes
  2. Initializes executor state for 1000 partitions
  3. Checks 1000 partition constraints
  4. Returns data from just 1-2 partitions

If you have 100 connections each executing this prepared statement, you’re multiplying this overhead by 100. With connection poolers reusing connections (and thus reusing prepared statements), the problem compounds.

6. The Fix: Evolution Across PostgreSQL Versions

PostgreSQL has steadily improved partition pruning for prepared statements:

PostgreSQL 11: Execution-Time Pruning Introduced

PostgreSQL 11 introduced run-time partition pruning, but it had significant limitations with prepared statements. Generic plans still included all partitions in memory, even if they could be skipped during execution.

PostgreSQL 12: Better Prepared Statement Pruning

PostgreSQL 12 made substantial improvements:

  • Generic plans gained the ability to defer partition pruning to execution time more effectively
  • The planner became smarter about when to use generic vs. custom plans for partitioned tables
  • Memory consumption for generic plans improved significantly

However, issues remained in edge cases, particularly with: • Multi level partitioning • Complex join queries involving partitioned tables • Prepared statements in stored procedures

PostgreSQL 13-14: Refined Heuristics

These versions improved the cost model for deciding between custom and generic plans:

  • Better accounting for partition pruning benefits in the cost calculation
  • More accurate statistics gathering on partitioned tables
  • Improved handling of partitionwise joins

PostgreSQL 15-16: The Real Game Changers

PostgreSQL 15 and 16 brought transformative improvements:

PostgreSQL 15:

  • Dramatically reduced memory usage for generic plans on partitioned tables
  • Improved execution-time pruning performance
  • Better handling of prepared statements with partition pruning

PostgreSQL 16:

  • Introduced incremental sorting improvements that benefit partitioned queries
  • Enhanced partition-wise aggregation
  • More aggressive execution-time pruning

The key breakthrough: PostgreSQL now builds “stub” plans that allocate minimal memory for partitions that will be pruned, rather than fully initializing all partition nodes.

Workarounds for Older Versions

If you’re stuck on older PostgreSQL versions, here are strategies to avoid the prepared statement pitfall:

1. Disable Generic Plans

Force PostgreSQL to always use custom plans:

-- Set at session level
SET plan_cache_mode = force_custom_plan;

-- Or for specific prepared statement contexts
PREPARE get_events AS
SELECT * FROM events WHERE event_date = $1;

-- Before execution
SET LOCAL plan_cache_mode = force_custom_plan;
EXECUTE get_events('2024-06-15');

This sacrifices the planning time savings but ensures proper partition pruning.

2. Use Statement Level Caching Instead

Many ORMs and database drivers offer statement level caching that doesn’t persist across multiple executions:

# psycopg2 example - named cursors create server-side cursors
# but don't persist plans
cursor = connection.cursor()
cursor.execute(
    "SELECT * FROM events WHERE event_date = %s",
    (date_value,)
)

3. Adjust plan_cache_mode Per Query

PostgreSQL 12+ provides plan_cache_mode:

-- auto (default): use PostgreSQL's heuristics
-- force_generic_plan: always use generic plan
-- force_custom_plan: always use custom plan

SET plan_cache_mode = force_custom_plan;

For partitioned tables, force_custom_plan is often the right choice.

4. Increase Custom Plan Count

The threshold of 5 custom plans before switching to generic is hardcoded, but you can work around it by using different prepared statement names or by periodically deallocating and recreating prepared statements:

DEALLOCATE get_events;
PREPARE get_events AS SELECT * FROM events WHERE event_date = $1;

5. Partition Pruning Hints

In PostgreSQL 12+, you can sometimes coerce the planner into better behavior:

-- Using an explicit constraint that helps the planner
SELECT * FROM events 
WHERE event_date = $1 
  AND event_date >= CURRENT_DATE - INTERVAL '1 year';

This additional constraint provides a hint about the parameter range.

Best Practices

  1. Monitor your query plans: Use EXPLAIN (ANALYZE, BUFFERS) to check if partition pruning is happening:
EXPLAIN (ANALYZE, BUFFERS) EXECUTE get_events('2024-06-15');

Look for “Partitions removed” in the output.

  1. Check prepared statement statistics: Query pg_prepared_statements to see generic vs. custom plan usage:
SELECT name, 
       generic_plans, 
       custom_plans 
FROM pg_prepared_statements;
  1. Upgrade PostgreSQL: If you’re dealing with large partitioned tables, the improvements in PostgreSQL 15+ are worth the upgrade effort.
  2. Design partitions appropriately: Don’t over-partition. Having 10,000 tiny partitions creates problems even with perfect pruning.
  3. Use connection pooling wisely: Prepared statements persist per connection. With connection pooling, long-lived connections accumulate many prepared statements. Configure your pooler to occasionally recycle connections.
  4. Benchmark both modes: Test your specific workload with both custom and generic plans to measure the actual impact.

Conclusion

Prepared statements are a powerful optimization, but their interaction with partitioned tables exposes a fundamental tension: caching for reuse versus specificity for efficiency. PostgreSQL’s evolution from version 11 through 16 represents a masterclass in addressing this challenge.

The key takeaway: if you’re using prepared statements with partitioned tables on PostgreSQL versions older than 15, be vigilant about plan caching behavior. Monitor memory usage, check execution plans, and don’t hesitate to force custom plans when generic plans cause problems.

For modern PostgreSQL installations (15+), the improvements are substantial enough that the traditional guidance of “be careful with prepared statements on partitioned tables” is becoming outdated. The database now handles these scenarios with far more intelligence and efficiency.

But understanding the history and mechanics remains crucial, because the next time you see mysterious memory growth in your PostgreSQL connections, you’ll know exactly where to look.

Stablecoins: A Comprehensive Guide

1. What Are Stablecoins?

Stablecoins are a type of cryptocurrency designed to maintain a stable value by pegging themselves to a reserve asset, typically a fiat currency like the US dollar. Unlike volatile cryptocurrencies such as Bitcoin or Ethereum, which can experience dramatic price swings, stablecoins aim to provide the benefits of digital currency without the price volatility.

The most common types of stablecoins include:

Fiat collateralized stablecoins are backed by traditional currencies held in reserve at a 1:1 ratio. Examples include Tether (USDT) and USD Coin (USDC), which maintain reserves in US dollars or dollar equivalent assets.

Crypto collateralized stablecoins use other cryptocurrencies as collateral, often over collateralized to account for volatility. DAI is a prominent example, backed by Ethereum and other crypto assets.

Algorithmic stablecoins attempt to maintain their peg through automated supply adjustments based on market demand, without traditional collateral backing. These have proven to be the most controversial and risky category.

2. Why Do Stablecoins Exist?

Stablecoins emerged to solve several critical problems in both traditional finance and the cryptocurrency ecosystem.

In the crypto world, they provide a stable store of value and medium of exchange. Traders use stablecoins to move in and out of volatile positions without converting back to fiat currency, avoiding the delays and fees associated with traditional banking. They serve as a safe harbor during market turbulence and enable seamless transactions across different blockchain platforms.

For cross border payments and remittances, stablecoins offer significant advantages over traditional methods. International transfers that typically take days and cost substantial fees can be completed in minutes for a fraction of the cost. This makes them particularly valuable for workers sending money to families in other countries or businesses conducting international trade.

Stablecoins also address financial inclusion challenges. In countries with unstable currencies or limited banking infrastructure, they provide access to a stable digital currency that can be held and transferred using just a smartphone. This opens up financial services to the unbanked and underbanked populations worldwide.

3. How Do Stablecoins Make Money?

Stablecoin issuers have developed several revenue models that can be remarkably profitable.

The primary revenue source for fiat backed stablecoins is interest on reserves. When issuers hold billions of dollars in US Treasury bills or other interest bearing assets backing their stablecoins, they earn substantial returns. For instance, with interest rates at 5%, a stablecoin issuer with $100 billion in reserves could generate $5 billion annually while still maintaining the 1:1 peg. Users typically receive no interest on their stablecoin holdings, allowing issuers to pocket the entire yield.

Transaction fees represent another revenue stream. While often minimal, the sheer volume of stablecoin transactions generates significant income. Some issuers charge fees for minting (creating) or redeeming stablecoins, particularly for large institutional transactions.

Premium services for institutional clients provide additional revenue. Banks, payment processors, and large enterprises often pay for faster settlement, higher transaction limits, dedicated support, and integration services.

Many stablecoin platforms also generate revenue through their broader ecosystem. This includes charging fees on decentralized exchanges, lending protocols, or other financial services built around the stablecoin.

3.1 The Pendle Revenue Model: Yield Trading Innovation

Pendle represents an innovative evolution in the DeFi stablecoin ecosystem through its yield trading protocol. Rather than issuing stablecoins directly, Pendle creates markets for trading future yield on stablecoin deposits and other interest bearing assets.

The Pendle revenue model operates through several mechanisms. The protocol charges trading fees on its automated market makers (AMMs), typically around 0.1% to 0.3% per swap. When users trade yield tokens on Pendle’s platform, a portion of these fees goes to the protocol treasury while another portion rewards liquidity providers who supply capital to the trading pools.

Pendle’s unique approach involves splitting interest bearing tokens into two components: the principal token (PT) representing the underlying asset, and the yield token (YT) representing the future interest. This separation allows sophisticated users to speculate on interest rates, hedge yield exposure, or lock in fixed returns on their stablecoin holdings.

The protocol generates revenue through swap fees, redemption fees when tokens mature, and potential governance token value capture as the protocol grows. This model demonstrates how stablecoin adjacent services can create profitable businesses by adding layers of financial sophistication on top of basic stablecoin infrastructure. Pendle particularly benefits during periods of high interest rates, when demand for yield trading increases and the potential returns from separating yield rights become more valuable.

4. Security and Fraud Concerns

Stablecoins face several critical security and fraud challenges that potential users and regulators must consider.

Reserve transparency and verification remain the most significant concern. Issuers must prove they actually hold the assets backing their stablecoins. Several controversies have erupted when stablecoin companies failed to provide clear, audited proof of reserves. The risk is that an issuer might not have sufficient backing, leading to a bank run scenario where the peg collapses and users cannot redeem their coins.

Smart contract vulnerabilities pose technical risks. Stablecoins built on blockchain platforms rely on code that, if flawed, can be exploited by hackers. Major hacks have resulted in hundreds of millions of dollars in losses, and once stolen, blockchain transactions are typically irreversible.

Regulatory uncertainty creates ongoing challenges. Different jurisdictions treat stablecoins differently, and the lack of clear, consistent regulation creates risks for both issuers and users. There’s potential for sudden regulatory action that could freeze assets or shut down operations.

Counterparty risk is inherent in centralized stablecoins. Users must trust the issuing company to maintain reserves, operate honestly, and remain solvent. If the company fails or acts fraudulently, users may lose their funds with limited recourse.

The algorithmic stablecoin model has proven particularly vulnerable. The catastrophic collapse of TerraUSD in 2022, which lost over $40 billion in value, demonstrated that algorithmic mechanisms can fail spectacularly under market stress, creating devastating losses for holders.

Money laundering and sanctions evasion concerns have drawn regulatory scrutiny. The pseudonymous nature of cryptocurrency transactions makes stablecoins attractive for illicit finance, though blockchain’s transparent ledger also makes transactions traceable with proper tools and cooperation.

4.1 Monitoring Stablecoin Flows

Effective monitoring of stablecoin flows has become critical for financial institutions, regulators, and the issuers themselves to ensure compliance, detect fraud, and understand market dynamics.

On Chain Analytics Tools provide the foundation for stablecoin monitoring. Since most stablecoins operate on public blockchains, every transaction is recorded and traceable. Companies like Chainalysis, Elliptic, and TRM Labs specialize in blockchain analytics, offering platforms that track stablecoin movements across wallets and exchanges. These tools can identify patterns, flag suspicious activities, and trace funds through complex transaction chains.

Real Time Transaction Monitoring systems alert institutions to potentially problematic flows. These systems track large transfers, unusual transaction patterns, rapid movement between exchanges (potentially indicating wash trading or manipulation), and interactions with known illicit addresses. Financial institutions integrating stablecoins must implement monitoring comparable to traditional payment systems.

Wallet Clustering and Entity Attribution techniques help identify the real world entities behind blockchain addresses. By analyzing transaction patterns, timing, and common input addresses, analytics firms can cluster related wallets and often attribute them to specific exchanges, services, or even individuals. This capability is crucial for understanding who holds stablecoins and where they’re being used.

Reserve Monitoring and Attestation focuses on the issuer side. Independent auditors and blockchain analysis firms track the total supply of stablecoins and verify that corresponding reserves exist. Circle, for instance, publishes monthly attestations from accounting firms. Some advanced monitoring systems provide real time transparency by linking on chain supply data with bank account verification.

Cross Chain Tracking has become essential as stablecoins exist across multiple blockchains. USDC and USDT operate on Ethereum, Tron, Solana, and other chains, requiring monitoring solutions that aggregate data across these ecosystems to provide a complete picture of flows.

Market Intelligence and Risk Assessment platforms combine on chain data with off chain information to assess concentration risk, identify potential market manipulation, and provide early warning of potential instability. When a small number of addresses hold large stablecoin positions, it creates systemic risk that monitoring can help quantify.

Banks and financial institutions implementing stablecoins typically deploy a combination of commercial blockchain analytics platforms, custom monitoring systems, and compliance teams trained in cryptocurrency investigation. The goal is achieving the same level of financial crime prevention and risk management that exists in traditional banking while adapting to the unique characteristics of blockchain technology.

5. How Regulators View Stablecoins

Regulatory attitudes toward stablecoins vary significantly across jurisdictions, but common themes and concerns have emerged globally.

United States Regulatory Approach involves multiple agencies with overlapping jurisdictions. The Securities and Exchange Commission (SEC) has taken the position that some stablecoins may be securities, particularly those offering yield or governed by investment contracts. The Commodity Futures Trading Commission (CFTC) views certain stablecoins as commodities. The Treasury Department and the Financial Stability Oversight Council have identified stablecoins as potential systemic risks requiring bank like regulation.

Proposed legislation in the US Congress has sought to create a comprehensive framework requiring stablecoin issuers to maintain high quality liquid reserves, submit to regular audits, and potentially obtain banking charters or trust company licenses. The regulatory preference is clearly toward treating major stablecoin issuers as financial institutions subject to banking supervision.

European Union Regulation has taken a more structured approach through the Markets in Crypto Assets (MiCA) regulation, which came into effect in 2024. MiCA establishes clear requirements for stablecoin issuers including reserve asset quality standards, redemption rights for holders, capital requirements, and governance standards. The regulation distinguishes between smaller stablecoin operations and “significant” stablecoins that require more stringent oversight due to their systemic importance.

United Kingdom Regulators are developing a framework that treats stablecoins used for payments as similar to traditional payment systems. The Bank of England and Financial Conduct Authority have indicated that stablecoin issuers should meet standards comparable to commercial banks, including holding reserves in central bank accounts or high quality government securities.

Asian Regulatory Perspectives vary widely. Singapore’s Monetary Authority has created a licensing regime for stablecoin issuers focused on reserve management and redemption guarantees. Hong Kong is developing similar frameworks. China has banned private stablecoins entirely while developing its own central bank digital currency. Japan requires stablecoin issuers to be licensed banks or trust companies.

Key Regulatory Concerns consistently include systemic risk (the failure of a major stablecoin could trigger broader financial instability), consumer protection (ensuring holders can redeem stablecoins for fiat currency), anti money laundering compliance, reserve adequacy and quality, concentration risk in the Treasury market (if stablecoin reserves significantly increase holdings of government securities), and the potential for stablecoins to facilitate capital flight or undermine monetary policy.

Central Bank Digital Currencies (CBDCs) represent a regulatory response to private stablecoins. Many central banks are developing or piloting digital currencies partly to provide a public alternative to private stablecoins, allowing governments to maintain monetary sovereignty while capturing the benefits of digital currency.

The regulatory trend is clearly toward treating stablecoins as systemically important financial infrastructure requiring oversight comparable to banks or payment systems, with an emphasis on reserve quality, redemption rights, and anti money laundering compliance.

5.1 How Stablecoins Impact the Correspondent Banking Model

Stablecoins pose both opportunities and existential challenges to the traditional correspondent banking system that has dominated international payments for decades.

The Traditional Correspondent Banking Model relies on a network of banking relationships where banks hold accounts with each other to facilitate international transfers. When a business in Brazil wants to pay a supplier in Thailand, the payment typically flows through multiple intermediary banks, each taking fees and adding delays. This system involves currency conversion, compliance checks at multiple points, and settlement risk, making international payments slow and expensive.

Stablecoins as Direct Competition offer a fundamentally different model. A business can send USDC directly to a recipient anywhere in the world in minutes, bypassing the correspondent banking network entirely. The recipient can then convert to local currency through a local exchange or payment processor. This disintermediation threatens the fee generating correspondent banking relationships that have been profitable for banks, particularly in remittance corridors and business to business payments.

Cost and Speed Advantages are significant. Traditional correspondent banking involves fees at multiple layers, often totaling 3-7% for remittances and 1-3% for business payments, with settlement taking 1-5 days. Stablecoin transfers can cost less than 1% including conversion fees, with settlement in minutes. This efficiency gap puts pressure on banks to either adopt stablecoin technology or risk losing payment volume.

The Disintermediation Threat extends beyond just payments. Correspondent banking generates substantial revenue for major international banks through foreign exchange spreads, service fees, and liquidity management. If businesses and individuals can hold and transfer value in stablecoins, they become less dependent on banks for international transactions. This is particularly threatening in high volume, low margin corridors where efficiency matters most.

Banks Adapting Through Integration represents one response to this threat. Rather than being displaced, some banks are incorporating stablecoins into their service offerings. They can issue their own stablecoins, partner with stablecoin issuers to provide on ramps and off ramps, or offer custody and transaction services for corporate clients wanting to use stablecoins. JPMorgan’s JPM Coin exemplifies this approach, using blockchain technology and stablecoin principles for institutional payments within a bank controlled system.

The Hybrid Model Emerging in practice combines stablecoins with traditional banking. Banks provide the fiat on ramps and off ramps, regulatory compliance, customer relationships, and local currency conversion, while stablecoins handle the actual transfer of value. This partnership model allows banks to maintain their customer relationships and regulatory compliance role while capturing efficiency gains from blockchain technology.

Regulatory Arbitrage Concerns arise because stablecoins can sometimes operate with less regulatory burden than traditional correspondent banking. Banks face extensive anti money laundering requirements, capital requirements, and regulatory scrutiny. If stablecoins provide similar services with lighter regulation, they gain a competitive advantage that regulators are increasingly seeking to eliminate through tighter stablecoin oversight.

Settlement Risk and Liquidity Management change fundamentally with stablecoins. Traditional correspondent banking requires banks to maintain nostro accounts (accounts held in foreign banks) prefunded with liquidity. Stablecoins allow for near instant settlement without prefunding requirements, potentially freeing up billions in trapped liquidity that banks currently must maintain across the correspondent network.

The long term impact will likely involve correspondent banking evolving rather than disappearing. Banks will increasingly serve as regulated gateways between fiat currency and stablecoins, while stablecoins handle the actual transfer of value. The most vulnerable players are mid tier correspondent banks that primarily provide routing services without strong customer relationships or value added services.

5.2 How FATF Standards Apply to Stablecoins

The Financial Action Task Force (FATF) provides international standards for combating money laundering and terrorist financing, and these standards have been extended to cover stablecoins and other virtual assets.

The Travel Rule represents the most significant FATF requirement affecting stablecoins. Originally designed for traditional wire transfers, the Travel Rule requires that information about the originator and beneficiary of transfers above a certain threshold (typically $1,000) must travel with the transaction. For stablecoins, this means that Virtual Asset Service Providers (VASPs) such as exchanges, wallet providers, and payment processors must collect and transmit customer information when facilitating stablecoin transfers.

Implementing the Travel Rule on public blockchains creates technical challenges. While bank wire transfers pass through controlled systems where information can be attached, blockchain transactions are peer to peer and pseudonymous. The industry has developed solutions like the Travel Rule Information Sharing Architecture (TRISA) and other protocols that allow VASPs to exchange customer information securely off chain while the stablecoin transaction occurs on chain.

Know Your Customer (KYC) and Customer Due Diligence requirements apply to any entity that provides services for stablecoin transactions. Exchanges, wallet providers, and payment processors must verify customer identities, assess risk levels, and maintain records of transactions. This requirement creates a tension with the permissionless nature of blockchain technology, where anyone can hold a self hosted wallet and transact directly without intermediaries.

VASP Registration and Licensing is required in most jurisdictions following FATF guidance. Any business providing stablecoin custody, exchange, or transfer services must register with financial authorities, implement anti money laundering programs, and submit to regulatory oversight. This has created significant compliance burdens for smaller operators and driven consolidation toward larger, well capitalized platforms.

Stablecoin Issuers as VASPs are generally classified as Virtual Asset Service Providers under FATF standards, subjecting them to the full range of anti money laundering and counter terrorist financing obligations. This includes transaction monitoring, suspicious activity reporting, and sanctions screening. Major issuers like Circle and Paxos have built sophisticated compliance programs comparable to traditional financial institutions.

The Self Hosted Wallet Challenge represents a key friction point. FATF has expressed concern about transactions involving self hosted (non custodial) wallets where users control their own private keys without intermediary oversight. Some jurisdictions have proposed restricting or requiring enhanced due diligence for transactions between VASPs and self hosted wallets, though this remains controversial and difficult to enforce technically.

Cross Border Coordination is essential but challenging. Stablecoins operate globally and instantly, but regulatory enforcement is jurisdictional. FATF promotes information sharing between national financial intelligence units and encourages mutual legal assistance. However, gaps in enforcement across jurisdictions create opportunities for regulatory arbitrage, where bad actors operate from jurisdictions with weak oversight.

Sanctions Screening is mandatory for stablecoin service providers. They must screen transactions against lists of sanctioned individuals, entities, and countries maintained by organizations like the US Office of Foreign Assets Control (OFAC). Several stablecoin issuers have demonstrated the ability to freeze funds in wallets associated with sanctioned addresses, showing that even decentralized systems can implement centralized controls when required by law.

Risk Based Approach is fundamental to FATF methodology. Service providers must assess the money laundering and terrorist financing risks specific to their operations and implement controls proportionate to those risks. For stablecoins, this means considering factors like transaction volumes, customer types, geographic exposure, and the underlying blockchain’s anonymity features.

Challenges in Implementation are significant. The pseudonymous nature of blockchain transactions makes it difficult to identify ultimate beneficial owners. The speed and global reach of stablecoin transfers compress the time window for intervention. The prevalence of decentralized exchanges and peer to peer transactions creates enforcement gaps. Some argue that excessive regulation will drive activity to unregulated platforms or privacy focused cryptocurrencies, making financial crime harder rather than easier to detect.

The FATF framework essentially attempts to impose traditional financial system controls on a technology designed to operate without intermediaries. While large, regulated stablecoin platforms can implement these requirements, the tension between regulatory compliance and the permissionless nature of blockchain technology remains unresolved and continues to drive both technological innovation and regulatory evolution.

6. Good Use Cases for Stablecoins

Despite the risks, stablecoins excel in several legitimate applications that offer clear advantages over traditional alternatives.

Cross border payments and remittances benefit enormously from stablecoins. Workers sending money home can avoid high fees and long delays, with transactions settling in minutes rather than days. Businesses conducting international trade can reduce costs and streamline operations significantly.

Treasury management for crypto native companies provides a practical use case. Cryptocurrency exchanges, blockchain projects, and Web3 companies need stable assets for operations while staying within the crypto ecosystem. Stablecoins let them hold working capital without exposure to crypto volatility.

Decentralized finance (DeFi) applications rely heavily on stablecoins. They enable lending and borrowing, yield farming, liquidity provision, and trading without the complications of volatile assets. Users can earn interest on stablecoin deposits or use them as collateral for loans.

Hedging against local currency instability makes stablecoins valuable in countries experiencing hyperinflation or currency crises. Citizens can preserve purchasing power by holding dollar backed stablecoins instead of rapidly devalating local currencies.

Programmable payments and smart contracts benefit from stablecoins. Businesses can automate payments based on conditions (such as releasing funds when goods are received) or create subscription services, escrow arrangements, and other complex payment structures that execute automatically.

Ecommerce and online payments increasingly accept stablecoins as they combine the low fees of cryptocurrency with price stability. This is particularly valuable for digital goods, online services, and merchant payments where volatility would be problematic.

6.1 Companies Specializing in Banking Stablecoin Integration

Several companies have emerged as leaders in helping traditional banks launch and integrate stablecoin solutions into their existing infrastructure.

Paxos is a regulated blockchain infrastructure company that provides white label stablecoin solutions for financial institutions. They’ve partnered with major companies to issue stablecoins and offer compliance focused infrastructure that meets banking regulatory requirements. Paxos handles the technical complexity while allowing banks to maintain their customer relationships.

Circle offers comprehensive business account services and APIs that enable banks to integrate USD Coin (USDC) into their platforms. Their developer friendly tools and banking partnerships have made them a go to provider for institutions wanting to offer stablecoin services. Circle emphasizes regulatory compliance and transparency with regular reserve attestations.

Fireblocks provides institutional grade infrastructure for banks looking to offer digital asset services, including stablecoins. Their platform handles custody, treasury operations, and connectivity to various blockchains, allowing banks to offer stablecoin functionality without building everything from scratch.

Taurus specializes in digital asset infrastructure for banks, wealth managers, and other financial institutions in Europe. They provide technology for custody, tokenization, and trading that enables traditional financial institutions to offer stablecoin services within existing regulatory frameworks.

Sygnum operates as a Swiss digital asset bank and offers banking as a service solutions. They help other banks integrate digital assets including stablecoins while ensuring compliance with Swiss banking regulations. Their approach combines traditional banking security with blockchain innovation.

Ripple has expanded beyond its cryptocurrency focus to offer enterprise blockchain solutions for banks, including infrastructure for stablecoin issuance and cross border payment solutions. Their partnerships with financial institutions worldwide position them as a bridge between traditional banking and blockchain technology.

BBVA and JPMorgan have also developed proprietary solutions (JPM Coin for JPMorgan) that other institutions might license or use as models, though these are typically more focused on their own operations and select partners.

7.1 The Bid Offer Spread Challenge: Liquidity vs. True 1:1 Conversions

One of the hidden costs in stablecoin adoption that significantly impacts user economics is the bid offer spread applied during conversions between fiat currency and stablecoins. While stablecoins are designed to maintain a 1:1 peg with their underlying asset (typically the US dollar), the reality of converting between fiat and crypto introduces market dynamics that can erode this theoretical parity.

7.1 Understanding the Spread Problem

When users convert fiat currency to stablecoins or vice versa through most platforms, they encounter a bid offer spread the difference between the buying price and selling price. Even though USDC or USDT theoretically equals $1.00, a platform might effectively charge $1.008 to buy a stablecoin and offer only $0.992 when selling it back. This 0.8% to 1.5% spread represents a significant friction cost, particularly for businesses making frequent conversions or moving large amounts.

This spread exists because most platforms operate market making models where they must maintain liquidity on both sides of the transaction. Holding inventory of both fiat and stablecoins involves costs: capital tied up in reserves, exposure to brief depegging events, regulatory compliance overhead, and the operational expense of managing banking relationships for fiat on ramps and off ramps. Platforms traditionally recover these costs through the spread rather than explicit fees.

For cryptocurrency exchanges and most fintech platforms, the spread also serves as their primary revenue mechanism for stablecoin conversions. When a platform facilitates thousands or millions of conversions daily, even small spreads generate substantial income. The spread compensates for the risk that during periods of market stress, stablecoins might temporarily trade below their peg, leaving the platform holding depreciated assets.

7.2 The Impact on Users and Business Operations

The cumulative effect of bid offer spreads becomes particularly painful for certain use cases. Small and medium sized businesses operating across borders face multiple conversion points: exchanging local currency to USD, converting USD to stablecoins for cross border transfer, then converting stablecoins back to USD or local currency at the destination. Each conversion compounds the cost, potentially consuming 2% to 4% of the transaction value when combined with traditional banking fees.

For businesses using stablecoins as working capital converting payroll, managing treasury operations, or settling international invoices the spread can eliminate much of the cost advantage that stablecoins are supposed to provide over traditional correspondent banking. A company converting $100,000 might effectively pay $1,500 in spread costs on a round trip conversion, comparable to traditional wire transfer fees that stablecoins aimed to disrupt.

Individual users in countries with unstable currencies face similar challenges. While holding USDT or USDC protects against local currency devaluation, the cost of frequently moving between local currency and stablecoins can be prohibitive. The spread becomes a “tax” on financial stability that disproportionately affects those who can least afford it.

7.3 Revolut’s 1:1 Model: Internalizing the Cost

Revolut’s recent introduction of true 1:1 conversions between USD and stablecoins (USDC and USDT) represents a fundamentally different approach to solving the spread problem. Rather than passing market making costs to users, Revolut absorbs the spread internally, guaranteeing that $1.00 in fiat equals exactly 1.00 stablecoin units in both directions, with no hidden markups.

This model is economically viable for Revolut because of several structural advantages. First, as a neobank with 65 million users and existing banking infrastructure, Revolut already maintains substantial fiat currency liquidity and doesn’t need to rely on external banking partners for every stablecoin conversion. Second, the company generates revenue from other services within its ecosystem subscription fees, interchange fees from card spending, interest on deposits allowing it to treat stablecoin conversions as a loss leader or break even feature that enhances customer retention and platform stickiness.

Third, by setting a monthly limit of approximately $578,000 per customer, Revolut manages its risk exposure while still accommodating the vast majority of retail and small business use cases. This prevents arbitrage traders from exploiting the zero spread model to make risk free profits by moving large volumes between Revolut and other platforms where spreads exist.

Revolut essentially bets that the value of removing friction from fiat crypto conversions thereby making stablecoins genuinely useful as working capital rather than speculative assets will drive sufficient user engagement and platform growth to justify the cost of eliminating spreads. For users, this transforms the economics of stablecoin usage, particularly for frequent converters or those operating in high currency volatility environments.

7.4 Why Not Everyone Can Offer 1:1 Conversions

The challenge for smaller platforms and pure cryptocurrency exchanges is that they lack Revolut’s structural advantages. A standalone crypto exchange without banking licenses and integrated fiat services must partner with banks for fiat on ramps, pay fees to those partners, maintain separate liquidity pools, and manage the regulatory complexity of operating in multiple jurisdictions. These costs don’t disappear simply because users want better rates they must be recovered somehow.

Additionally, maintaining tight spreads or true 1:1 conversions requires deep liquidity and sophisticated risk management. When thousands of users simultaneously want to exit stablecoins during market stress, a platform must have sufficient reserves to honor redemptions instantly without moving the price. Smaller platforms operating with thin liquidity buffers cannot safely eliminate spreads without risking insolvency during volatile periods.

The market structure for stablecoins also presents challenges. While stablecoins theoretically maintain 1:1 pegs, secondary market prices on decentralized exchanges and between different platforms can vary by small amounts. A platform offering guaranteed 1:1 conversions must either hold sufficient reserves to absorb these variations or accept that arbitrage traders will exploit any price discrepancies, potentially draining liquidity.

7.5 The Competitive Implications

Revolut’s move to zero spread stablecoin conversions could trigger a competitive dynamic in the fintech space, similar to how its original zero fee foreign exchange offering disrupted traditional currency conversion. Established players like Coinbase, Kraken, and other major exchanges will face pressure to reduce their spreads or explain why their costs remain higher.

For traditional banks contemplating stablecoin integration, the spread question becomes strategic. Banks could follow the Revolut model, absorbing spread costs to drive adoption and maintain customer relationships in an increasingly crypto integrated financial system. Alternatively, they might maintain spreads but offer other value added services that justify the cost, such as enhanced compliance, insurance on holdings, or integration with business treasury management systems.

The long term outcome may be market segmentation. Large, integrated fintech platforms with diverse revenue streams can offer true 1:1 conversions as a competitive advantage. Smaller, specialized platforms will continue operating with spreads but may differentiate through speed, blockchain coverage, or serving specific niches like high volume traders who value depth of liquidity over tight spreads.

For stablecoin issuers like Circle and Tether, the spread dynamics affect their business indirectly. Wider spreads on third party platforms create friction that slows stablecoin adoption, reducing the total assets under management that generate interest income for issuers. Partnerships with platforms offering tighter spreads or true 1:1 conversions could accelerate growth, even if those partnerships involve revenue sharing or other commercial arrangements.

Ultimately, the bid offer spread challenge highlights a fundamental tension in stablecoin economics: the gap between the theoretical promise of 1:1 value stability and the practical costs of maintaining liquidity, managing risk, and operating the infrastructure that connects fiat currency to blockchain based assets. Platforms that can bridge this gap efficiently whether through scale, integration, or innovative business models will have significant competitive advantages as stablecoins move from crypto native use cases into mainstream financial infrastructure.

8. Conclusion

Stablecoins represent a significant innovation in digital finance, offering the benefits of cryptocurrency without extreme volatility. They’ve found genuine utility in payments, remittances, and decentralized finance while generating substantial revenue for issuers through interest on reserves. However, they also carry real risks around reserve transparency, regulatory uncertainty, and potential fraud that users and institutions must carefully consider.

The regulatory landscape is rapidly evolving, with authorities worldwide moving toward treating stablecoins as systemically important financial infrastructure requiring bank like oversight. FATF standards impose traditional anti money laundering requirements on stablecoin service providers, creating compliance obligations comparable to traditional finance. Meanwhile, sophisticated monitoring tools have emerged to track flows, detect illicit activity, and ensure reserve adequacy.

For traditional banks, stablecoins represent both a competitive threat to correspondent banking models and an opportunity to modernize payment infrastructure. Rather than being displaced entirely, banks are increasingly positioning themselves as regulated gateways between fiat currency and stablecoins, maintaining customer relationships and compliance functions while leveraging blockchain efficiency.

For banks considering stablecoin integration, working with established infrastructure providers can mitigate technical and compliance challenges. The key is choosing use cases where stablecoins offer clear advantages, particularly in cross border payments and treasury management, while implementing robust risk management, transaction monitoring, and ensuring regulatory compliance with both traditional financial regulations and emerging crypto specific frameworks.

As the regulatory landscape evolves and technology matures, stablecoins are likely to become increasingly integrated into mainstream financial services. Their success will depend on maintaining trust through transparency, security, and regulatory cooperation while continuing to deliver value that traditional financial rails cannot match. The future likely involves a hybrid model where stablecoins and traditional banking coexist, each playing to their respective strengths in a more efficient, global financial system.

Building an advanced Browser Curl Script with Playwright and Selenium for load testing websites

Modern sites often block plain curl. Using a real browser engine (Chromium via Playwright) gives you true browser behavior: real TLS/HTTP2 stack, cookies, redirects, and JavaScript execution if needed. This post mirrors the functionality of the original browser_curl.sh wrapper but implemented with Playwright. It also includes an optional Selenium mini-variant at the end.

What this tool does

  • Sends realistic browser headers (Chrome-like)
  • Uses Chromium’s real network stack (HTTP/2, compression)
  • Manages cookies (persist to a file)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests for quick load tests

Note: Advanced bot defenses (CAPTCHAs, JS/ML challenges, strict TLS/HTTP2 fingerprinting) may still require full page automation and real user-like behavior. Playwright can do that too by driving real pages.

Setup

Run these once to install Playwright and Chromium:

npm init -y && \
npm install playwright && \
npx playwright install chromium

The complete Playwright CLI

Run this to create browser_playwright.mjs:

cat > browser_playwright.mjs << 'EOF'
#!/usr/bin/env node
import { chromium } from 'playwright';
import fs from 'fs';
import path from 'path';
import { spawn } from 'child_process';
const RED = '\u001b[31m';
const GRN = '\u001b[32m';
const YLW = '\u001b[33m';
const NC  = '\u001b[0m';
function usage() {
  const b = path.basename(process.argv[1]);
  console.log(`Usage: ${b} [OPTIONS] URL
Advanced HTTP client using Playwright (Chromium) with browser-like behavior.
OPTIONS:
  -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE) [default: GET]
  -d, --data DATA            Request body
  -H, --header HEADER        Add custom header (repeatable)
  -o, --output FILE          Write response body to file
  -c, --cookie FILE          Cookie storage file [default: /tmp/pw_cookies_<pid>.json]
  -A, --user-agent UA        Custom User-Agent
  -t, --timeout SECONDS      Request timeout [default: 30]
      --async                Run request(s) in background
      --count N              Number of async requests to fire [default: 1, requires --async]
      --no-redirect          Do not follow redirects (best-effort)
      --show-headers         Print response headers
      --json                 Send data as JSON (sets Content-Type)
      --form                 Send data as application/x-www-form-urlencoded
  -v, --verbose              Verbose output
  -h, --help                 Show this help message
EXAMPLES:
  ${b} https://example.com
  ${b} --async https://example.com
  ${b} -X POST --json -d '{"a":1}' https://httpbin.org/post
  ${b} --async --count 10 https://httpbin.org/get
`);
}
function parseArgs(argv) {
  const args = { method: 'GET', async: false, count: 1, followRedirects: true, showHeaders: false, timeout: 30, data: '', contentType: '', cookieFile: '', verbose: false, headers: [], url: '' };
  for (let i = 0; i < argv.length; i++) {
    const a = argv[i];
    switch (a) {
      case '-X': case '--method': args.method = String(argv[++i] || 'GET'); break;
      case '-d': case '--data': args.data = String(argv[++i] || ''); break;
      case '-H': case '--header': args.headers.push(String(argv[++i] || '')); break;
      case '-o': case '--output': args.output = String(argv[++i] || ''); break;
      case '-c': case '--cookie': args.cookieFile = String(argv[++i] || ''); break;
      case '-A': case '--user-agent': args.userAgent = String(argv[++i] || ''); break;
      case '-t': case '--timeout': args.timeout = Number(argv[++i] || '30'); break;
      case '--async': args.async = true; break;
      case '--count': args.count = Number(argv[++i] || '1'); break;
      case '--no-redirect': args.followRedirects = false; break;
      case '--show-headers': args.showHeaders = true; break;
      case '--json': args.contentType = 'application/json'; break;
      case '--form': args.contentType = 'application/x-www-form-urlencoded'; break;
      case '-v': case '--verbose': args.verbose = true; break;
      case '-h': case '--help': usage(); process.exit(0);
      default:
        if (!args.url && !a.startsWith('-')) args.url = a; else {
          console.error(`${RED}Error: Unknown argument: ${a}${NC}`);
          process.exit(1);
        }
    }
  }
  return args;
}
function parseHeaderList(list) {
  const out = {};
  for (const h of list) {
    const idx = h.indexOf(':');
    if (idx === -1) continue;
    const name = h.slice(0, idx).trim();
    const value = h.slice(idx + 1).trim();
    if (!name) continue;
    out[name] = value;
  }
  return out;
}
function buildDefaultHeaders(userAgent) {
  const ua = userAgent || 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
  return {
    'User-Agent': ua,
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Cache-Control': 'max-age=0'
  };
}
async function performRequest(opts) {
  // Cookie file handling
  const defaultCookie = `/tmp/pw_cookies_${process.pid}.json`;
  const cookieFile = opts.cookieFile || defaultCookie;
  // Launch Chromium
  const browser = await chromium.launch({ headless: true });
  const extraHeaders = { ...buildDefaultHeaders(opts.userAgent), ...parseHeaderList(opts.headers) };
  if (opts.contentType) extraHeaders['Content-Type'] = opts.contentType;
  const context = await browser.newContext({ userAgent: extraHeaders['User-Agent'], extraHTTPHeaders: extraHeaders });
  // Load cookies if present
  if (fs.existsSync(cookieFile)) {
    try {
      const ss = JSON.parse(fs.readFileSync(cookieFile, 'utf8'));
      if (ss.cookies?.length) await context.addCookies(ss.cookies);
    } catch {}
  }
  const request = context.request;
  // Build request options
  const reqOpts = { headers: extraHeaders, timeout: opts.timeout * 1000 };
  if (opts.data) {
    // Playwright will detect JSON strings vs form strings by headers
    reqOpts.data = opts.data;
  }
  if (opts.followRedirects === false) {
    // Best-effort: limit redirects to 0
    reqOpts.maxRedirects = 0;
  }
  const method = opts.method.toUpperCase();
  let resp;
  try {
    if (method === 'GET') resp = await request.get(opts.url, reqOpts);
    else if (method === 'POST') resp = await request.post(opts.url, reqOpts);
    else if (method === 'PUT') resp = await request.put(opts.url, reqOpts);
    else if (method === 'DELETE') resp = await request.delete(opts.url, reqOpts);
    else if (method === 'PATCH') resp = await request.patch(opts.url, reqOpts);
    else {
      console.error(`${RED}Unsupported method: ${method}${NC}`);
      await browser.close();
      process.exit(2);
    }
  } catch (e) {
    console.error(`${RED}[ERROR] ${e?.message || e}${NC}`);
    await browser.close();
    process.exit(3);
  }
  // Persist cookies
  try {
    const state = await context.storageState();
    fs.writeFileSync(cookieFile, JSON.stringify(state, null, 2));
  } catch {}
  // Output
  const status = resp.status();
  const statusText = resp.statusText();
  const headers = await resp.headers();
  const body = await resp.text();
  if (opts.verbose) {
    console.error(`${YLW}Request: ${method} ${opts.url}${NC}`);
    console.error(`${YLW}Headers: ${JSON.stringify(extraHeaders)}${NC}`);
  }
  if (opts.showHeaders) {
    // Print a simple status line and headers to stdout before body
    console.log(`HTTP ${status} ${statusText}`);
    for (const [k, v] of Object.entries(headers)) {
      console.log(`${k}: ${v}`);
    }
    console.log('');
  }
  if (opts.output) {
    fs.writeFileSync(opts.output, body);
  } else {
    process.stdout.write(body);
  }
  if (!resp.ok()) {
    console.error(`${RED}[ERROR] HTTP ${status} ${statusText}${NC}`);
    await browser.close();
    process.exit(4);
  }
  await browser.close();
}
async function main() {
  const argv = process.argv.slice(2);
  const opts = parseArgs(argv);
  if (!opts.url) { console.error(`${RED}Error: URL is required${NC}`); usage(); process.exit(1); }
  if ((opts.count || 1) > 1 && !opts.async) {
    console.error(`${RED}Error: --count requires --async${NC}`);
    process.exit(1);
  }
  if (opts.count < 1 || !Number.isInteger(opts.count)) {
    console.error(`${RED}Error: --count must be a positive integer${NC}`);
    process.exit(1);
  }
  if (opts.async) {
    // Fire-and-forget background processes
    const baseArgs = process.argv.slice(2).filter(a => a !== '--async' && !a.startsWith('--count'));
    const pids = [];
    for (let i = 0; i < opts.count; i++) {
      const child = spawn(process.execPath, [process.argv[1], ...baseArgs], { detached: true, stdio: 'ignore' });
      pids.push(child.pid);
      child.unref();
    }
    if (opts.verbose) {
      console.error(`${YLW}[ASYNC] Spawned ${opts.count} request(s).${NC}`);
    }
    if (opts.count === 1) console.error(`${GRN}[ASYNC] Request started with PID: ${pids[0]}${NC}`);
    else console.error(`${GRN}[ASYNC] ${opts.count} requests started with PIDs: ${pids.join(' ')}${NC}`);
    process.exit(0);
  }
  await performRequest(opts);
}
main().catch(err => {
  console.error(`${RED}[FATAL] ${err?.stack || err}${NC}`);
  process.exit(1);
});
EOF
chmod +x browser_playwright.mjs

Optionally, move it into your PATH:

sudo mv browser_playwright.mjs /usr/local/bin/browser_playwright

Quick start

  • Simple GET:
node browser_playwright.mjs https://example.com
  • Async GET (returns immediately):
node browser_playwright.mjs --async https://example.com
  • Fire 100 async requests in one command:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get

  • POST JSON:
node browser_playwright.mjs -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://httpbin.org/post
  • POST form data:
node browser_playwright.mjs -X POST --form \
  -d "username=user&password=pass" \
  https://httpbin.org/post
  • Include response headers:
node browser_playwright.mjs --show-headers https://example.com
  • Save response to a file:
node browser_playwright.mjs -o response.json https://httpbin.org/json
  • Custom headers:
node browser_playwright.mjs \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://httpbin.org/headers
  • Persistent cookies across requests:
COOKIE_FILE="playwright_session.json"
# Login and save cookies
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://httpbin.org/post > /dev/null
# Authenticated-like follow-up (cookie file reused)
node browser_playwright.mjs -c "$COOKIE_FILE" \
  https://httpbin.org/cookies

Load testing patterns

  • Simple load test with --count:
node browser_playwright.mjs --async --count 100 https://httpbin.org/get
  • Loop-based alternative:
for i in {1..100}; do
  node browser_playwright.mjs --async https://httpbin.org/get
done
  • Timed load test:
cat > pw_load_for_duration.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
DURATION="${2:-60}"
COUNT=0
END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  node browser_playwright.mjs --async "$URL" >/dev/null 2>&1
  ((COUNT++))
done
echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"
EOF
chmod +x pw_load_for_duration.sh
./pw_load_for_duration.sh https://httpbin.org/get 30
  • Parameterized load test:
cat > pw_load_test.sh << 'EOF'
#!/usr/bin/env bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"
echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""
START=$(date +%s)
node browser_playwright.mjs --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"
EOF
chmod +x pw_load_test.sh
./pw_load_test.sh https://httpbin.org/get 200

Options reference

  • -X, --method HTTP method (GET/POST/PUT/DELETE/PATCH)
  • -d, --data Request body
  • -H, --header Add extra headers (repeatable)
  • -o, --output Write response body to file
  • -c, --cookie Cookie file to use (and persist)
  • -A, --user-agent Override User-Agent
  • -t, --timeout Max request time in seconds (default 30)
  • --async Run request(s) in the background
  • --count N Fire N async requests (requires --async)
  • --no-redirect Best-effort disable following redirects
  • --show-headers Include response headers before body
  • --json Sets Content-Type: application/json
  • --form Sets Content-Type: application/x-www-form-urlencoded
  • -v, --verbose Verbose diagnostics

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

  • Real Chromium network stack (HTTP/2, TLS, compression)
  • Browser-like headers and a true User-Agent
  • Cookie jar via Playwright context storageState
  • Redirect handling by the browser stack

This helps pass simplistic bot checks and more closely resembles real user traffic.

Real-world examples

  • API-style auth flow (demo endpoints):
cat > pw_auth_flow.sh << 'EOF'
#!/usr/bin/env bash
COOKIE_FILE="pw_auth_session.json"
BASE="https://httpbin.org"
echo "Login (simulated form POST)..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=user&pass=pass" \
  "$BASE/post" > /dev/null
echo "Fetch cookies..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  "$BASE/cookies"
echo "Load test a protected-like endpoint..."
node browser_playwright.mjs -c "$COOKIE_FILE" \
  --async --count 20 \
  "$BASE/get"
echo "Done"
rm -f "$COOKIE_FILE"
EOF
chmod +x pw_auth_flow.sh
./pw_auth_flow.sh
  • Scraping with rate limiting:
cat > pw_scrape.sh << 'EOF'
#!/usr/bin/env bash
URLS=(
  "https://example.com/"
  "https://example.com/"
  "https://example.com/"
)
for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  node browser_playwright.mjs -o "$(echo "$url" | sed 's#[/:]#_#g').html" "$url"
  sleep 2
done
EOF
chmod +x pw_scrape.sh
./pw_scrape.sh
  • Health check monitoring:
cat > pw_health.sh << 'EOF'
#!/usr/bin/env bash
ENDPOINT="${1:-https://httpbin.org/status/200}"
while true; do
  if node browser_playwright.mjs "$ENDPOINT" >/dev/null 2>&1; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done
EOF
chmod +x pw_health.sh
./pw_health.sh

  • Hanging or quoting issues: ensure your shell quoting is balanced. Prefer simple commands without complex inline quoting.
  • Verbose mode too noisy: omit -v in production.
  • Cookie file format: the script writes Playwright storageState JSON. It’s safe to keep or delete.
  • 403 errors: site uses stronger protections. Drive a real page (Playwright page.goto) and interact, or solve CAPTCHAs where required.

Performance notes

Dispatch time depends on process spawn and Playwright startup. For higher throughput, consider reusing the same Node process to issue many requests (modify the script to loop internally) or use k6/Locust/Artillery for large-scale load testing.

Limitations

  • This CLI uses Playwright’s HTTP client bound to a Chromium context. It is much closer to real browsers than curl, but some advanced fingerprinting still detects automation.
  • WebSocket flows, MFA, or complex JS challenges generally require full page automation (which Playwright supports).

When to use what

  • Use this Playwright CLI when you need realistic browser behavior, cookies, and straightforward HTTP requests with quick async dispatch.
  • Use full Playwright page automation for dynamic content, complex logins, CAPTCHAs, and JS-heavy sites.

Advanced combos

  • With jq for JSON processing:
node browser_playwright.mjs https://httpbin.org/json | jq '.slideshow.title'
  • With parallel for concurrency:
echo -e "https://httpbin.org/get\nhttps://httpbin.org/headers" | \
parallel -j 5 "node browser_playwright.mjs -o {#}.json {}"
  • With watch for monitoring:
watch -n 5 "node browser_playwright.mjs https://httpbin.org/status/200 >/dev/null && echo ok || echo fail"
  • With xargs for batch processing:
echo -e "1\n2\n3" | xargs -I {} node browser_playwright.mjs "https://httpbin.org/anything/{}"

Future enhancements

  • Built-in rate limiting and retry logic
  • Output modes (JSON-only, headers-only)
  • Proxy support
  • Response assertions (status codes, content patterns)
  • Metrics collection (timings, success rates)

Minimal Selenium variant (Python)

If you prefer Selenium, here’s a minimal GET/headers/redirect/cookie-capable script. Note: issuing cross-origin POST bodies is more ergonomic with Playwright’s request client; Selenium focuses on page automation.

Install Selenium:

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip selenium

Create browser_selenium.py:

cat > browser_selenium.py << 'EOF'
#!/usr/bin/env python3
import argparse, json, os, sys, time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
RED='\033[31m'; GRN='\033[32m'; YLW='\033[33m'; NC='\033[0m'
def parse_args():
    p = argparse.ArgumentParser(description='Minimal Selenium GET client')
    p.add_argument('url')
    p.add_argument('-o','--output')
    p.add_argument('-c','--cookie', default=f"/tmp/selenium_cookies_{os.getpid()}.json")
    p.add_argument('--show-headers', action='store_true')
    p.add_argument('-t','--timeout', type=int, default=30)
    p.add_argument('-A','--user-agent')
    p.add_argument('-v','--verbose', action='store_true')
    return p.parse_args()
args = parse_args()
opts = Options()
opts.add_argument('--headless=new')
if args.user_agent:
    opts.add_argument(f'--user-agent={args.user_agent}')
with webdriver.Chrome(options=opts) as driver:
    driver.set_page_load_timeout(args.timeout)
    # Load cookies if present (domain-specific; best-effort)
    if os.path.exists(args.cookie):
        try:
            ck = json.load(open(args.cookie))
            for c in ck.get('cookies', []):
                try:
                    driver.get('https://' + c.get('domain').lstrip('.'))
                    driver.add_cookie({
                        'name': c['name'], 'value': c['value'], 'path': c.get('path','/'),
                        'domain': c.get('domain'), 'secure': c.get('secure', False)
                    })
                except Exception:
                    pass
        except Exception:
            pass
    driver.get(args.url)
    # Persist cookies (best-effort)
    try:
        cookies = driver.get_cookies()
        json.dump({'cookies': cookies}, open(args.cookie, 'w'), indent=2)
    except Exception:
        pass
    if args.output:
        open(args.output, 'w').write(driver.page_source)
    else:
        sys.stdout.write(driver.page_source)
EOF
chmod +x browser_selenium.py

Use it:

./browser_selenium.py https://example.com > out.html

Conclusion

You now have a Playwright-powered CLI that mirrors the original curl-wrapper’s ergonomics but uses a real browser engine, plus a minimal Selenium alternative. Use the CLI for realistic headers, cookies, redirects, JSON/form POSTs, and async dispatch with --count. For tougher sites, scale up to full page automation with Playwright.

Resources

Building a Browser Curl Wrapper for Reliable HTTP Requests and Load Testing

Modern websites deploy bot defenses that can block plain curl or naive scripts. In many cases, adding the right browser-like headers, HTTP/2, cookie persistence, and compression gets you past basic filters without needing a full browser.

This post walks through a small shell utility, browser_curl.sh, that wraps curl with realistic browser behavior. It also supports “fire-and-forget” async requests and a --count flag to dispatch many requests at once for quick load tests.

What this script does

  • Sends browser-like headers (Chrome on macOS)
  • Uses HTTP/2 and compression
  • Manages cookies automatically (cookie jar)
  • Follows redirects by default
  • Supports JSON and form POSTs
  • Async mode that returns immediately
  • --count N to dispatch N async requests in one command

Note: This approach won’t solve advanced bot defenses that require JavaScript execution (e.g., Cloudflare Turnstile/CAPTCHAs or TLS/HTTP2 fingerprinting); for that, use a real browser automation tool like Playwright or Selenium.

The complete script

Save this as browser_curl.sh and make it executable in one command:

cat > browser_curl.sh << 'EOF' && chmod +x browser_curl.sh
#!/bin/bash

# browser_curl.sh - Advanced curl wrapper that mimics browser behavior
# Designed to bypass Cloudflare and other bot protection

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Default values
METHOD="GET"
ASYNC=false
COUNT=1
FOLLOW_REDIRECTS=true
SHOW_HEADERS=false
OUTPUT_FILE=""
TIMEOUT=30
DATA=""
CONTENT_TYPE=""
COOKIE_FILE="/tmp/browser_curl_cookies_$$.txt"
VERBOSE=false

# Browser fingerprint (Chrome on macOS)
USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

usage() {
    cat << EOH
Usage: $(basename "$0") [OPTIONS] URL

Advanced curl wrapper that mimics browser behavior to bypass bot protection.

OPTIONS:
    -X, --method METHOD        HTTP method (GET, POST, PUT, DELETE, etc.) [default: GET]
    -d, --data DATA           POST/PUT data
    -H, --header HEADER       Add custom header (can be used multiple times)
    -o, --output FILE         Write output to file
    -c, --cookie FILE         Use custom cookie file [default: temp file]
    -A, --user-agent UA       Custom user agent [default: Chrome on macOS]
    -t, --timeout SECONDS     Request timeout [default: 30]
    --async                   Run request asynchronously in background
    --count N                 Number of async requests to fire [default: 1, requires --async]
    --no-redirect             Don't follow redirects
    --show-headers            Show response headers
    --json                    Send data as JSON (sets Content-Type)
    --form                    Send data as form-urlencoded
    -v, --verbose             Verbose output
    -h, --help                Show this help message

EXAMPLES:
    # Simple GET request
    $(basename "$0") https://example.com

    # Async GET request
    $(basename "$0") --async https://example.com

    # POST with JSON data
    $(basename "$0") -X POST --json -d '{"username":"test"}' https://api.example.com/login

    # POST with form data
    $(basename "$0") -X POST --form -d "username=test&password=secret" https://example.com/login

    # Multiple async requests (using loop)
    for i in {1..10}; do
        $(basename "$0") --async https://example.com/api/endpoint
    done

    # Multiple async requests (using --count)
    $(basename "$0") --async --count 10 https://example.com/api/endpoint

EOH
    exit 0
}

# Parse arguments
EXTRA_HEADERS=()
URL=""

while [[ $# -gt 0 ]]; do
    case $1 in
        -X|--method)
            METHOD="$2"
            shift 2
            ;;
        -d|--data)
            DATA="$2"
            shift 2
            ;;
        -H|--header)
            EXTRA_HEADERS+=("$2")
            shift 2
            ;;
        -o|--output)
            OUTPUT_FILE="$2"
            shift 2
            ;;
        -c|--cookie)
            COOKIE_FILE="$2"
            shift 2
            ;;
        -A|--user-agent)
            USER_AGENT="$2"
            shift 2
            ;;
        -t|--timeout)
            TIMEOUT="$2"
            shift 2
            ;;
        --async)
            ASYNC=true
            shift
            ;;
        --count)
            COUNT="$2"
            shift 2
            ;;
        --no-redirect)
            FOLLOW_REDIRECTS=false
            shift
            ;;
        --show-headers)
            SHOW_HEADERS=true
            shift
            ;;
        --json)
            CONTENT_TYPE="application/json"
            shift
            ;;
        --form)
            CONTENT_TYPE="application/x-www-form-urlencoded"
            shift
            ;;
        -v|--verbose)
            VERBOSE=true
            shift
            ;;
        -h|--help)
            usage
            ;;
        *)
            if [[ -z "$URL" ]]; then
                URL="$1"
            else
                echo -e "${RED}Error: Unknown argument '$1'${NC}" >&2
                exit 1
            fi
            shift
            ;;
    esac
done

# Validate URL
if [[ -z "$URL" ]]; then
    echo -e "${RED}Error: URL is required${NC}" >&2
    usage
fi

# Validate count
if [[ "$COUNT" -gt 1 ]] && [[ "$ASYNC" == false ]]; then
    echo -e "${RED}Error: --count requires --async${NC}" >&2
    exit 1
fi

if ! [[ "$COUNT" =~ ^[0-9]+$ ]] || [[ "$COUNT" -lt 1 ]]; then
    echo -e "${RED}Error: --count must be a positive integer${NC}" >&2
    exit 1
fi

# Execute curl
execute_curl() {
    # Build curl arguments as array instead of string
    local -a curl_args=()
    
    # Basic options
    curl_args+=("--compressed")
    curl_args+=("--max-time" "$TIMEOUT")
    curl_args+=("--connect-timeout" "10")
    curl_args+=("--http2")
    
    # Cookies (ensure file exists to avoid curl warning)
    : > "$COOKIE_FILE" 2>/dev/null || true
    curl_args+=("--cookie" "$COOKIE_FILE")
    curl_args+=("--cookie-jar" "$COOKIE_FILE")
    
    # Follow redirects
    if [[ "$FOLLOW_REDIRECTS" == true ]]; then
        curl_args+=("--location")
    fi
    
    # Show headers
    if [[ "$SHOW_HEADERS" == true ]]; then
        curl_args+=("--include")
    fi
    
    # Output file
    if [[ -n "$OUTPUT_FILE" ]]; then
        curl_args+=("--output" "$OUTPUT_FILE")
    fi
    
    # Verbose
    if [[ "$VERBOSE" == true ]]; then
        curl_args+=("--verbose")
    else
        curl_args+=("--silent" "--show-error")
    fi
    
    # Method
    curl_args+=("--request" "$METHOD")
    
    # Browser-like headers
    curl_args+=("--header" "User-Agent: $USER_AGENT")
    curl_args+=("--header" "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8")
    curl_args+=("--header" "Accept-Language: en-US,en;q=0.9")
    curl_args+=("--header" "Accept-Encoding: gzip, deflate, br")
    curl_args+=("--header" "Connection: keep-alive")
    curl_args+=("--header" "Upgrade-Insecure-Requests: 1")
    curl_args+=("--header" "Sec-Fetch-Dest: document")
    curl_args+=("--header" "Sec-Fetch-Mode: navigate")
    curl_args+=("--header" "Sec-Fetch-Site: none")
    curl_args+=("--header" "Sec-Fetch-User: ?1")
    curl_args+=("--header" "Cache-Control: max-age=0")
    
    # Content-Type for POST/PUT
    if [[ -n "$DATA" ]]; then
        if [[ -n "$CONTENT_TYPE" ]]; then
            curl_args+=("--header" "Content-Type: $CONTENT_TYPE")
        fi
        curl_args+=("--data" "$DATA")
    fi
    
    # Extra headers
    for header in "${EXTRA_HEADERS[@]}"; do
        curl_args+=("--header" "$header")
    done
    
    # URL
    curl_args+=("$URL")
    
    if [[ "$ASYNC" == true ]]; then
        # Run asynchronously in background
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}[ASYNC] Running $COUNT request(s) in background...${NC}" >&2
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        # Fire multiple requests if count > 1
        local pids=()
        for ((i=1; i<=COUNT; i++)); do
            # Run in background detached, suppress all output
            nohup curl "${curl_args[@]}" >/dev/null 2>&1 &
            local pid=$!
            disown $pid
            pids+=("$pid")
        done
        
        if [[ "$COUNT" -eq 1 ]]; then
            echo -e "${GREEN}[ASYNC] Request started with PID: ${pids[0]}${NC}" >&2
        else
            echo -e "${GREEN}[ASYNC] $COUNT requests started with PIDs: ${pids[*]}${NC}" >&2
        fi
    else
        # Run synchronously
        if [[ "$VERBOSE" == true ]]; then
            echo -e "${YELLOW}Command: curl ${curl_args[*]}${NC}" >&2
        fi
        
        curl "${curl_args[@]}"
        local exit_code=$?
        
        if [[ $exit_code -ne 0 ]]; then
            echo -e "${RED}[ERROR] Request failed with exit code: $exit_code${NC}" >&2
            return $exit_code
        fi
    fi
}

# Cleanup temp cookie file on exit (only if using default temp file)
cleanup() {
    if [[ "$COOKIE_FILE" == "/tmp/browser_curl_cookies_$$"* ]] && [[ -f "$COOKIE_FILE" ]]; then
        rm -f "$COOKIE_FILE"
    fi
}

# Only set cleanup trap for synchronous requests
if [[ "$ASYNC" == false ]]; then
    trap cleanup EXIT
fi

# Main execution
execute_curl

# For async requests, exit immediately without waiting
if [[ "$ASYNC" == true ]]; then
    exit 0
fi
EOF

Optionally, move it to your PATH:

sudo mv browser_curl.sh /usr/local/bin/browser_curl

Quick start

Simple GET request

./browser_curl.sh https://example.com

Async GET (returns immediately)

./browser_curl.sh --async https://example.com

Fire 100 async requests in one command

./browser_curl.sh --async --count 100 https://example.com/api

Common examples

POST JSON

./browser_curl.sh -X POST --json \
  -d '{"username":"user","password":"pass"}' \
  https://api.example.com/login

POST form data

./browser_curl.sh -X POST --form \
  -d "username=user&password=pass" \
  https://example.com/login

Include response headers

./browser_curl.sh --show-headers https://example.com

Save response to a file

./browser_curl.sh -o response.json https://api.example.com/data

Custom headers

./browser_curl.sh \
  -H "X-API-Key: your-key" \
  -H "Authorization: Bearer token" \
  https://api.example.com/data

Persistent cookies across requests

COOKIE_FILE="session_cookies.txt"

# Login and save cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  -X POST --form \
  -d "user=test&pass=secret" \
  https://example.com/login

# Authenticated request using saved cookies
./browser_curl.sh -c "$COOKIE_FILE" \
  https://example.com/dashboard

Load testing patterns

Simple load test with –count

The easiest way to fire multiple requests:

./browser_curl.sh --async --count 100 https://example.com/api

Example output:

[ASYNC] 100 requests started with PIDs: 1234 1235 1236 ... 1333

Performance: 100 requests dispatched in approximately 0.09 seconds

Loop-based approach (alternative)

for i in {1..100}; do
  ./browser_curl.sh --async https://example.com/api
done

Timed load test

Run continuous requests for a specific duration:

#!/bin/bash
URL="https://example.com/api"
DURATION=60  # seconds
COUNT=0

END_TIME=$(($(date +%s) + DURATION))
while [ "$(date +%s)" -lt "$END_TIME" ]; do
  ./browser_curl.sh --async "$URL" > /dev/null 2>&1
  ((COUNT++))
done

echo "Sent $COUNT requests in $DURATION seconds"
echo "Rate: $((COUNT / DURATION)) requests/second"

Parameterized load test script

#!/bin/bash
URL="${1:-https://httpbin.org/get}"
REQUESTS="${2:-50}"

echo "Load testing: $URL"
echo "Requests: $REQUESTS"
echo ""

START=$(date +%s)
./browser_curl.sh --async --count "$REQUESTS" "$URL"
echo ""
echo "Dispatched in $(($(date +%s) - START)) seconds"

Usage:

./load_test.sh https://api.example.com/endpoint 200

Options reference

OptionDescriptionDefault
-X, --methodHTTP method (GET/POST/PUT/DELETE)GET
-d, --dataRequest body (JSON or form)
-H, --headerAdd extra headers (repeatable)
-o, --outputWrite response to a filestdout
-c, --cookieCookie file to use (and persist)temp file
-A, --user-agentOverride User-AgentChrome/macOS
-t, --timeoutMax request time in seconds30
--asyncRun request(s) in the backgroundfalse
--count NFire N async requests (requires --async)1
--no-redirectDon’t follow redirectsfollows
--show-headersInclude response headersfalse
--jsonSets Content-Type: application/json
--formSets Content-Type: application/x-www-form-urlencoded
-v, --verboseVerbose diagnosticsfalse
-h, --helpShow usage

Validation rules:

  • --count requires --async
  • --count must be a positive integer

Under the hood: why this works better than plain curl

Browser-like headers

The script automatically adds these headers to mimic Chrome:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif...
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1

HTTP/2 + compression

  • Uses --http2 flag for HTTP/2 protocol support
  • Enables --compressed for automatic gzip/brotli decompression
  • Closer to modern browser behavior
  • Maintains session cookies across redirects and calls
  • Persists cookies to file for reuse
  • Automatically created and cleaned up

Redirect handling

  • Follows redirects by default with --location
  • Critical for login flows, SSO, and OAuth redirects

These features help bypass basic bot detection that blocks obvious non-browser clients.

Real-world examples

Example 1: API authentication flow

cd ~/Desktop/warp
bash -c 'cat > test_auth.sh << '\''SCRIPT'\''
#!/bin/bash
COOKIE_FILE="auth_session.txt"
API_BASE="https://api.example.com"

echo "Logging in..."
./browser_curl.sh -c "$COOKIE_FILE" -X POST --json -d "{\"username\":\"user\",\"password\":\"pass\"}" "$API_BASE/auth/login" > /dev/null

echo "Fetching profile..."
./browser_curl.sh -c "$COOKIE_FILE" "$API_BASE/user/profile" | jq .

echo "Load testing..."
./browser_curl.sh -c "$COOKIE_FILE" --async --count 50 "$API_BASE/api/data"

echo "Done!"
rm -f "$COOKIE_FILE"
SCRIPT
chmod +x test_auth.sh
./test_auth.sh'

Example 2: Scraping with rate limiting

#!/bin/bash
URLS=(
  "https://example.com/page1"
  "https://example.com/page2"
  "https://example.com/page3"
)

for url in "${URLS[@]}"; do
  echo "Fetching: $url"
  ./browser_curl.sh -o "$(basename "$url").html" "$url"
  sleep 2  # Rate limiting
done

Example 3: Health check monitoring

#!/bin/bash
ENDPOINT="https://api.example.com/health"

while true; do
  if ./browser_curl.sh "$ENDPOINT" | grep -q "healthy"; then
    echo "$(date): Service healthy"
  else
    echo "$(date): Service unhealthy"
  fi
  sleep 30
done

Installing browser_curl to your PATH

If you want browser_curl.sh to be available anywhere then install it on your path using:

mkdir -p ~/.local/bin
echo "Installing browser_curl to ~/.local/bin/browser_curl"
install -m 0755 ~/Desktop/warp/browser_curl.sh ~/.local/bin/browser_curl

echo "Ensuring ~/.local/bin is on PATH via ~/.zshrc"
grep -q 'export PATH="$HOME/.local/bin:$PATH"' ~/.zshrc || \
  echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

echo "Reloading shell config (~/.zshrc)"
source ~/.zshrc

echo "Verifying browser_curl is on PATH"
command -v browser_curl && echo "browser_curl is installed and on PATH" || echo "browser_curl not found on PATH"

Troubleshooting

Issue: Hanging with dquote> prompt

Cause: Shell quoting issue (unbalanced quotes)

Solution: Use simple, direct commands

# Good
./browser_curl.sh --async https://example.com

# Bad (unbalanced quotes)
echo "test && ./browser_curl.sh --async https://example.com && echo "done"

For chaining commands:

echo Start; ./browser_curl.sh --async https://example.com; echo Done

Issue: Verbose mode produces too much output

Cause: -v flag prints all curl diagnostics to stderr

Solution: Remove -v for production use:

# Debug mode
./browser_curl.sh -v https://example.com

# Production mode
./browser_curl.sh https://example.com

Cause: First-time cookie file creation

Solution: The script now pre-creates the cookie file automatically. You can ignore any residual warnings.

Issue: 403 Forbidden errors

Cause: Site has stronger protections (JavaScript challenges, TLS fingerprinting)

Solution: Consider using real browser automation:

  • Playwright (Python/Node.js)
  • Selenium
  • Puppeteer

Or combine approaches:

  1. Use Playwright to initialize session and get cookies
  2. Export cookies to file
  3. Use browser_curl.sh -c cookies.txt for subsequent requests

Performance benchmarks

Tests conducted on 2023 MacBook Pro M2, macOS Sonoma:

TestTimeRequests/sec
Single sync requestapproximately 0.2s
10 async requests (–count)approximately 0.03s333/s
100 async requests (–count)approximately 0.09s1111/s
1000 async requests (–count)approximately 0.8s1250/s

Note: Dispatch time only; actual HTTP completion depends on target server.

Limitations

What this script CANNOT do

  • JavaScript execution – Can’t solve JS challenges (use Playwright)
  • CAPTCHA solving – Requires human intervention or services
  • Advanced TLS fingerprinting – Can’t mimic exact browser TLS stack
  • HTTP/2 fingerprinting – Can’t perfectly match browser HTTP/2 frames
  • WebSocket connections – HTTP only
  • Browser API access – No Canvas, WebGL, Web Crypto fingerprints

What this script CAN do

  • Basic header spoofing – Pass simple User-Agent checks
  • Cookie management – Maintain sessions
  • Load testing – Quick async request dispatch
  • API testing – POST/PUT/DELETE with JSON/form data
  • Simple scraping – Pages without JS requirements
  • Health checks – Monitoring endpoints

When to use what

Use browser_curl.sh when:

  • Target has basic bot detection (header checks)
  • API testing with authentication
  • Quick load testing (less than 10k requests)
  • Monitoring/health checks
  • No JavaScript required
  • You want a lightweight tool

Use Playwright/Selenium when:

  • Target requires JavaScript execution
  • CAPTCHA challenges present
  • Advanced fingerprinting detected
  • Need to interact with dynamic content
  • Heavy scraping with anti-bot measures
  • Login flows with MFA/2FA

Hybrid approach:

  1. Use Playwright to bootstrap session
  2. Extract cookies
  3. Use browser_curl.sh for follow-up requests (faster)

Advanced: Combining with other tools

With jq for JSON processing

./browser_curl.sh https://api.example.com/users | jq '.[] | .name'

With parallel for concurrency control

cat urls.txt | parallel -j 10 "./browser_curl.sh -o {#}.html {}"

With watch for monitoring

watch -n 5 "./browser_curl.sh https://api.example.com/health | jq .status"

With xargs for batch processing

cat ids.txt | xargs -I {} ./browser_curl.sh "https://api.example.com/item/{}"

Future enhancements

Potential features to add:

  • Rate limiting – Built-in requests/second throttling
  • Retry logic – Exponential backoff on failures
  • Output formats – JSON-only, CSV, headers-only modes
  • Proxy support – SOCKS5/HTTP proxy options
  • Custom TLS – Certificate pinning, client certs
  • Response validation – Assert status codes, content patterns
  • Metrics collection – Timing stats, success rates
  • Configuration file – Default settings per domain

Conclusion

browser_curl.sh provides a pragmatic middle ground between plain curl and full browser automation. For many APIs and websites with basic bot filters, browser-like headers, proper protocol use, and cookie handling are sufficient.

Key takeaways:

  • Simple wrapper around curl with realistic browser behavior
  • Async mode with --count for easy load testing
  • Works for basic bot detection, not advanced challenges
  • Combine with Playwright for tough targets
  • Lightweight and fast for everyday API work

The script is particularly useful for:

  • API development and testing
  • Quick load testing during development
  • Monitoring and health checks
  • Simple scraping tasks
  • Learning curl features

For production load testing at scale, consider tools like k6, Locust, or Artillery. For heavy web scraping with anti-bot measures, invest in proper browser automation infrastructure.

Resources

Amazon Aurora DSQL: A Deep Dive into Performance and Limitations

Executive Summary

Amazon Aurora DSQL represents AWS’s ambitious entry into the distributed SQL database market, announced at re:Invent 2024. It’s a serverless, distributed SQL database featuring active active high availability and PostgreSQL compatibility. While the service offers impressive architectural innovations including 99.99% single region and 99.999% multi region availability, but it comes with significant limitations that developers must carefully consider. This analysis examines Aurora DSQL’s performance characteristics, architectural tradeoffs, and critical constraints that impact real world applications.

Key Takeaways:

  • Aurora DSQL excels at multiregion active active workloads with low latency reads and writes
  • Significant PostgreSQL compatibility gaps limit migration paths for existing applications
  • Optimistic concurrency control requires application level retry logic
  • Preview phase limitations include 10,000 row transaction limits and missing critical features
  • Pricing model is complex and difficult to predict without production testing

Architecture Overview

Aurora DSQL fundamentally reimagines distributed database architecture by decoupling transaction processing from storage. The service overcomes two historical challenges: achieving multi region strong consistency with low latency and syncing servers with microsecond accuracy around the globe.

Core Components

The system consists of three independently scalable components:

  1. Compute Layer: Executes SQL queries without managing locks
  2. Commit/Journal Layer: Handles transaction ordering and conflict detection
  3. Storage Layer: Provides durable, queryable storage built from transaction logs

The Journal logs every transaction, while the Adjudicator component manages transaction isolation and conflict resolution. This separation allows Aurora DSQL to scale each component independently based on workload demands.

Multi Region Architecture

In multi region configurations, clusters provide two regional endpoints that present a single logical database supporting concurrent read and write operations with strong data consistency. A third witness region stores transaction logs for recovery purposes, enabling the system to maintain availability even during regional failures.

Performance Characteristics

Read and Write Latency

In simple workload tests, read latency achieves single digit milliseconds, while write latency is approximately two roundtrip times to the nearest region at commit. This predictable latency model differs significantly from traditional databases where performance degrades under contention.

Transaction latency remains constant relative to statement count, even across regions. This consistency provides predictable performance characteristics regardless of transaction complexity—a significant advantage for globally distributed applications.

Scalability Claims

AWS claims Aurora DSQL delivers reads and writes that are four times faster than Google Cloud Spanner. However, real world benchmarks will further reveal performance at extreme scale, as Aurora DSQL is yet to be truly tested on an enterprise scale.

The service provides virtually unlimited scalability from a single endpoint, eliminating manual provisioning and management of database instances. The architecture automatically partitions the key space to detect conflicting transactions, allowing it to scale without traditional sharding complexity.

Concurrency Control Tradeoffs

Aurora DSQL uses optimistic concurrency control (OCC), where transactions run without considering other concurrent transactions, with conflict detection happening at commit time. While this prevents slow transactions from blocking others, it requires applications to handle transaction retries.

OCC provides better scalability for query processing and a more robust cluster for realistic failures by avoiding locking mechanisms that can lead to deadlocks or performance bottlenecks. However, this comes at the cost of increased application complexity.

Critical Limitations

Transaction Constraints

The preview version imposes several hard limits that significantly impact application design:

  • Maximum 10,000 rows modified per transaction
  • Transaction size cannot exceed 10 MiB
  • Sessions are capped at 1 hour, requiring reconnection for long lived processes
  • Cannot mix DDL and DML statements within a single transaction

Common DDL/DML Limitation Example:

The most common pattern that fails in Aurora DSQL is SELECT INTO, which combines table creation and data population in a single statement:

BEGIN;
SELECT id, username, created_at 
INTO new_users_copy 
FROM users 
WHERE active = true;
COMMIT;  -- ERROR: SELECT INTO mixes DDL and DML

This pattern is extremely common in:

  • ETL processes that create temporary staging tables and load data with SELECT INTO
  • Reporting workflows that materialize query results into new tables
  • Database migrations that create tables and seed initial data
  • Test fixtures that set up schema and populate test data
  • Multi-tenant applications that dynamically create tenant-specific tables and initialize them

The workaround requires splitting into separate transactions using CREATE TABLE followed by INSERT INTO ... SELECT:

-- Transaction 1: Create table structure
BEGIN;
CREATE TABLE new_users_copy (
    id BIGINT,
    username TEXT,
    created_at TIMESTAMP
);
COMMIT;

-- Transaction 2: Populate with data
BEGIN;
INSERT INTO new_users_copy 
SELECT id, username, created_at 
FROM users 
WHERE active = true;
COMMIT;

This limitation stems from Aurora DSQL’s asynchronous DDL processing architecture, where schema changes are propagated separately from data changes across the distributed system.

These constraints make Aurora DSQL unsuitable for batch processing workloads or applications requiring large bulk operations.

Missing PostgreSQL Features

Aurora DSQL sacrifices several critical PostgreSQL features for performance and scalability:

Not Supported:

  • No foreign keys
  • No temporary tables
  • No views
  • No triggers, PL/pgSQL, sequences, or explicit locking
  • No serializable isolation level or LOCK TABLE support
  • No PostgreSQL extensions (pgcrypto, PostGIS, PGVector, hstore)

Unsupported Data Types:

  • No SERIAL or BIGSERIAL (auto incrementing integers)
  • No JSON or JSONB types
  • No range types (tsrange, int4range, etc.)
  • No geospatial types (geometry, geography)
  • No vector types
  • TEXT type limited to 1MB (vs. 1GB in PostgreSQL)

Aurora DSQL prioritizes scalability, sacrificing some features for performance. The distributed architecture and asynchronous DDL requirements prevent support for extensions that depend on PostgreSQL’s internal storage format. These omissions require significant application redesign for PostgreSQL migrations.

Isolation Level Limitations

Aurora DSQL supports strong snapshot isolation, equivalent to repeatable read isolation in PostgreSQL. This creates several important implications:

  1. Write Skew Anomalies: Applications susceptible to write skew cannot rely on serializable isolation to prevent conflicts.

What is Write Skew?

Write skew is a database anomaly that occurs when two concurrent transactions read overlapping data, make disjoint updates based on what they read, and both commit successfully—even though the result violates a business constraint. This happens because snapshot isolation only detects direct write write conflicts on the same rows, not constraint violations across different rows.

Classic example: A hospital requires at least two doctors on call at all times. Two transactions check the current count (finds 2 doctors), both see the requirement is satisfied, and both remove themselves from the on call roster. Both transactions modify different rows (their own records), so snapshot isolation sees no conflict. Both commit successfully, leaving zero doctors on call and violating the business rule.

In PostgreSQL with serializable isolation, this would be prevented because the database detects the anomaly and aborts one transaction. Aurora DSQL’s snapshot isolation cannot prevent this, requiring application level logic to handle such constraints.

  1. Referential Integrity Challenges: Without foreign keys and serializable isolation, maintaining referential integrity requires application side logic using SELECT FOR UPDATE to lock parent keys during child table inserts.
  2. Conflict Detection: Aurora DSQL uses optimistic concurrency control where SELECT FOR UPDATE intents are not synchronized to be visible to other transactions until commit.

Inter Region Consistency and Data Durability

Strong Consistency Guarantees

Aurora DSQL provides strong consistency across all regional endpoints, ensuring that reads and writes to any endpoint always reflect the same logical state. This is achieved through synchronous replication and careful transaction ordering:

How It Works:

  1. Synchronous Commits: When a transaction commits, Aurora DSQL writes to the distributed transaction log and synchronously replicates all committed log data across regions before acknowledging the commit to the client.
  2. Single Logical Database: Both regional endpoints in a multi region cluster present the same logical database. Readers consistently see the same data regardless of which endpoint they query.
  3. Zero Replication Lag: Unlike traditional asynchronous replication, there is no replication lag on commit. The commit only succeeds after data is durably stored across regions.
  4. Witness Region: A third region stores transaction logs for durability but doesn’t have a query endpoint. This witness region ensures multi region durability without requiring three full active regions.

Can You Lose Data?

Aurora DSQL is designed with strong durability guarantees that make data loss extremely unlikely:

Single Region Configuration:

  • All write transactions are synchronously replicated to storage replicas across three Availability Zones
  • Replication uses quorum based commits, ensuring data survives even if one AZ fails completely
  • No risk of data loss due to replication lag because replication is always synchronous

Multi Region Configuration:

  • Committed transactions are synchronously written to the transaction log in both active regions plus the witness region
  • A transaction only acknowledges success after durable storage across multiple regions
  • Even if an entire AWS region becomes unavailable, committed data remains accessible from the other region

Failure Scenarios:

Single AZ Failure: Aurora DSQL automatically routes to healthy AZs. No data loss occurs because data is replicated across three AZs synchronously.

Single Region Failure (Multi Region Setup): Applications can continue operating from the remaining active region with zero data loss. All committed transactions were synchronously replicated before the commit acknowledgment was sent.

Component Failure: Individual component failures (compute, storage, journal) are handled through Aurora DSQL’s self healing architecture. The system automatically repairs failed replicas asynchronously while serving requests from healthy components.

Will Everything Always Be in Both Regions?

Yes, with important caveats:

Committed Data: Once a transaction receives a commit acknowledgment, that data is guaranteed to exist in both active regions. The synchronous replication model ensures this.

Uncommitted Transactions: Transactions that haven’t yet committed exist only in their originating region’s session state. If that region fails before commit, the transaction is lost (which is expected behavior).

Durability vs. Availability Tradeoff: The strong consistency model means that if cross region network connectivity is lost, write operations may be impacted. Aurora DSQL prioritizes consistency over availability in the CAP theorem sense, it won’t accept writes that can’t be properly replicated.

Geographic Restrictions: Multi region clusters are currently limited to geographic groupings (US regions together, European regions together, Asia Pacific regions together). You cannot pair US East with EU West, which limits truly global active active deployments.

Consistency Risks and Limitations

While Aurora DSQL provides strong consistency, developers should understand these considerations:

Network Partition Handling: In the event of a network partition between regions, Aurora DSQL’s behavior depends on which components can maintain quorum. The system is designed to maintain consistency, which may mean rejecting writes rather than accepting writes that can’t be properly replicated.

Write Skew at Application Level: While individual transactions are consistent, applications must still handle write skew anomalies that can occur with snapshot isolation (as discussed in the Isolation Level Limitations section).

Time Synchronization Dependency: Aurora DSQL relies on Amazon Time Sync Service for precise time coordination. While highly reliable, this creates a subtle dependency on time synchronization for maintaining transaction ordering across regions.

This test measures basic insert latency using the PostgreSQL wire protocol:

#!/bin/bash
# Basic Aurora DSQL Performance Test
# Prerequisites: AWS CLI, psql, jq
# Configuration
REGION="us-east-1"
CLUSTER_ENDPOINT="your-cluster-endpoint.dsql.us-east-1.on.aws"
DATABASE="testdb"
# Generate temporary authentication token
export PGPASSWORD=$(aws dsql generate-db-connect-admin-auth-token \
  --hostname $CLUSTER_ENDPOINT \
  --region $REGION \
  --expires-in 3600)
export PGHOST=$CLUSTER_ENDPOINT
export PGDATABASE=$DATABASE
export PGUSER=admin
# Create test table
psql << 'EOF'
DROP TABLE IF EXISTS perf_test;
CREATE TABLE perf_test (
  id BIGSERIAL PRIMARY KEY,
  data TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
EOF
# Run timed insert test
echo "Running 100 sequential inserts..."
time psql -c "
DO \$\$
DECLARE
  i INT;
BEGIN
  FOR i IN 1..100 LOOP
    INSERT INTO perf_test (data) 
    VALUES ('test_data_' || i);
  END LOOP;
END \$\$;
"
# Test transaction commit latency
echo "Testing transaction commit latency..."
psql << 'EOF'
\timing on
BEGIN;
INSERT INTO perf_test (data) VALUES ('commit_test');
COMMIT;
EOF

Concurrency Control Testing

Test optimistic concurrency behavior and conflict detection:

#!/usr/bin/env python3
"""
Aurora DSQL Concurrency Test
Tests optimistic concurrency control and retry logic
"""
import psycopg2
import boto3
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_auth_token(endpoint, region):
    """Generate IAM authentication token"""
    client = boto3.client('dsql', region_name=region)
    return client.generate_db_connect_admin_auth_token(
        hostname=endpoint,
        region=region,
        expires_in=3600
    )
def connect(endpoint, database, region):
    """Create database connection"""
    token = get_auth_token(endpoint, region)
    return psycopg2.connect(
        host=endpoint,
        database=database,
        user='admin',
        password=token,
        sslmode='require'
    )
def update_with_retry(endpoint, database, region, row_id, new_value, max_retries=5):
    """Update with exponential backoff retry logic"""
    retries = 0
    delay = 0.1
    
    while retries < max_retries:
        conn = None
        try:
            conn = connect(endpoint, database, region)
            cursor = conn.cursor()
            
            # Start transaction
            cursor.execute("BEGIN")
            
            # Read current value
            cursor.execute(
                "SELECT value FROM test_table WHERE id = %s FOR UPDATE",
                (row_id,)
            )
            current = cursor.fetchone()
            
            # Simulate some processing
            time.sleep(0.01)
            
            # Update value
            cursor.execute(
                "UPDATE test_table SET value = %s WHERE id = %s",
                (new_value, row_id)
            )
            
            # Commit
            cursor.execute("COMMIT")
            
            return True, retries
            
        except psycopg2.Error as e:
            if "change conflicts with another transaction" in str(e):
                retries += 1
                if retries < max_retries:
                    time.sleep(delay)
                    delay *= 2  # Exponential backoff
                    continue
            raise
        finally:
            if conn:
                conn.close()
    
    return False, max_retries
def run_concurrency_test(endpoint, database, region, num_threads=10):
    """Run concurrent updates on same row"""
    
    # Setup test table
    conn = connect(endpoint, database, region)
    cursor = conn.cursor()
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS test_table (
            id BIGINT PRIMARY KEY,
            value INT
        )
    """)
    cursor.execute("INSERT INTO test_table (id, value) VALUES (1, 0)")
    conn.commit()
    conn.close()
    
    # Run concurrent updates
    start_time = time.time()
    results = {'success': 0, 'failed': 0, 'total_retries': 0}
    
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [
            executor.submit(update_with_retry, endpoint, database, region, 1, i)
            for i in range(num_threads)
        ]
        
        for future in as_completed(futures):
            success, retries = future.result()
            if success:
                results['success'] += 1
            else:
                results['failed'] += 1
            results['total_retries'] += retries
    
    elapsed = time.time() - start_time
    
    print(f"\nConcurrency Test Results:")
    print(f"Duration: {elapsed:.2f}s")
    print(f"Successful: {results['success']}")
    print(f"Failed: {results['failed']}")
    print(f"Total Retries: {results['total_retries']}")
    print(f"Avg Retries per Transaction: {results['total_retries']/num_threads:.2f}")
if __name__ == "__main__":
    ENDPOINT = "your-cluster-endpoint.dsql.us-east-1.on.aws"
    DATABASE = "testdb"
    REGION = "us-east-1"
    
    run_concurrency_test(ENDPOINT, DATABASE, REGION)

Multi Region Latency Test

Measure cross region write latency:

// Node.js Multi-Region Latency Test
const { Client } = require('pg');
const AWS = require('aws-sdk');
async function getAuthToken(endpoint, region) {
  const dsql = new AWS.DSQL({ region });
  const params = {
    hostname: endpoint,
    region: region,
    expiresIn: 3600
  };
  return dsql.generateDbConnectAdminAuthToken(params);
}
async function testRegionalLatency(endpoints, database) {
  const results = [];
  
  for (const [region, endpoint] of Object.entries(endpoints)) {
    const token = await getAuthToken(endpoint, region);
    
    const client = new Client({
      host: endpoint,
      database: database,
      user: 'admin',
      password: token,
      ssl: { rejectUnauthorized: true }
    });
    
    await client.connect();
    
    // Measure read latency
    const readStart = Date.now();
    await client.query('SELECT 1');
    const readLatency = Date.now() - readStart;
    
    // Measure write latency (includes commit sync)
    const writeStart = Date.now();
    await client.query('BEGIN');
    await client.query('INSERT INTO latency_test (ts) VALUES (NOW())');
    await client.query('COMMIT');
    const writeLatency = Date.now() - writeStart;
    
    results.push({
      region,
      readLatency,
      writeLatency
    });
    
    await client.end();
  }
  
  console.log('Multi-Region Latency Results:');
  console.table(results);
}
// Usage
const endpoints = {
  'us-east-1': 'cluster-1.dsql.us-east-1.on.aws',
  'us-west-2': 'cluster-1.dsql.us-west-2.on.aws'
};
testRegionalLatency(endpoints, 'testdb');

Transaction Size Limit Test

Verify the 10,000 row transaction limit impacts on operations:

#!/usr/bin/env python3
"""
Test Aurora DSQL transaction size limits
"""
import psycopg2
import boto3
def connect(endpoint, database, region):
    """Create database connection"""
    client = boto3.client('dsql', region_name=region)
    token = client.generate_db_connect_admin_auth_token(
        hostname=endpoint,
        region=region,
        expires_in=3600
    )
    return psycopg2.connect(
        host=endpoint,
        database=database,
        user='admin',
        password=token,
        sslmode='require'
    )
def test_limits(endpoint, database, region):
    """Test transaction row limits"""
    conn = connect(endpoint, database, region)
    cursor = conn.cursor()
    
    # Create test table
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS limit_test (
            id BIGSERIAL PRIMARY KEY,
            data TEXT
        )
    """)
    conn.commit()
    
    # Test under limit (should succeed)
    print("Testing 9,999 row insert (under limit)...")
    try:
        cursor.execute("BEGIN")
        for i in range(9999):
            cursor.execute(
                "INSERT INTO limit_test (data) VALUES (%s)",
                (f"row_{i}",)
            )
        cursor.execute("COMMIT")
        print("✓ Success: Under-limit transaction committed")
    except Exception as e:
        print(f"✗ Failed: {e}")
        cursor.execute("ROLLBACK")
    
    conn.close()
if __name__ == "__main__":
    ENDPOINT = "your-cluster-endpoint.dsql.us-east-1.on.aws"
    DATABASE = "testdb"
    REGION = "us-east-1"
    
    test_limits(ENDPOINT, DATABASE, REGION)

Performance Analysis

Strengths

  1. Predictable Latency: Transaction latency remains constant regardless of statement count, providing consistent performance characteristics that simplify capacity planning.
  2. Multi Region Active Active: Both regional endpoints support concurrent read and write operations with strong data consistency, enabling true active active configurations without complex replication lag management.
  3. No Single Point of Contention: A single slow client or long running query doesn’t impact other transactions because contention is handled at commit time on the server side.

Weaknesses

  1. High Contention Workload Performance: Applications with frequent updates to small key ranges experience high retry rates (see detailed explanation in Performance Analysis section above).
  2. Application Complexity: Aurora DSQL’s optimistic concurrency control minimizes cross region latency but requires applications to handle retries. This shifts complexity from the database to application code.
  3. Feature Gaps: Missing PostgreSQL features like foreign keys, triggers, views, and critical data types like JSON/JSONB require application redesign. Some developers view it as barely a database, more like a key value store with basic PostgreSQL wire compatibility.
  4. Unpredictable Costs: The pricing model is monumentally confusing, with costs varying based on DPU consumption that’s difficult to predict without production testing.

Use Case Recommendations

Good Fit

Global Ecommerce Platforms

Applications requiring continuous availability across regions with strong consistency for inventory and order management. If your business depends on continuous availability—like global ecommerce or financial platforms—Aurora DSQL’s active active model is a game changer.

Multi Tenant SaaS Applications

Services with dynamic scaling requirements and geographic distribution of users. The automatic scaling eliminates capacity planning concerns.

Financial Services (with caveats)

Transaction processing systems that can implement application level retry logic and work within the snapshot isolation model.

Poor Fit

Batch Processing Systems

The 10,000 row transaction limit makes Aurora DSQL unsuitable for bulk data operations, ETL processes, or large scale data migrations.

Legacy PostgreSQL Applications

Applications depending on foreign keys, triggers, stored procedures, views, or serializable isolation will require extensive rewrites.

High Contention Workloads

Applications with frequent updates to small key ranges (like continuously updating stock tickers, inventory counters for popular items, or high frequency account balance updates) will experience high retry rates and degraded throughput due to optimistic concurrency control. See the detailed explanation in the Performance Analysis section for why this occurs.

Comparison with Alternatives

vs. Google Cloud Spanner

Aurora DSQL claims 4x faster performance, but Spanner has been used for multi region consistent deployments at proven enterprise scale. Spanner uses its own SQL dialect, while Aurora DSQL provides PostgreSQL wire protocol compatibility.

vs. CockroachDB

YugabyteDB is more compatible with PostgreSQL, supporting features like triggers, PL/pgSQL, foreign keys, sequences, all isolation levels, and explicit locking. CockroachDB offers similar advantages with battle tested multi cloud deployment options, while Aurora DSQL is AWS exclusive and still in preview.

vs. Aurora PostgreSQL

Traditional Aurora PostgreSQL provides full PostgreSQL compatibility with proven reliability but lacks the multi region active active capabilities and automatic horizontal scaling of DSQL. The choice depends on whether distributed architecture benefits outweigh compatibility trade offs.

Production Readiness Assessment

Preview Status Concerns

As of November 2024, Aurora DSQL remains in public preview with several implications:

  • No production SLA guarantees
  • Feature set still evolving
  • Limited regional availability
  • Pricing subject to change at general availability

Missing Observability

During preview, instrumentation is limited. PostgreSQL doesn’t provide EXPLAIN ANALYZE output from commits, making it difficult to understand what happens during the synchronization and wait phases.

Migration Complexity

Aurora DSQL prioritizes scalability, sacrificing some features for performance. This requires careful evaluation of application dependencies on unsupported PostgreSQL features before attempting migration.

Pricing Considerations

Billing for Aurora DSQL is based on two primary measures: Distributed Processing Units (DPU) and storage, with costs of $8 per million DPU and $0.33 per GB month in US East.

However, DPU consumption varies unpredictably based on query complexity, making cost forecasting extremely difficult. The result of cost modeling exercises amounts to “yes, this will cost you some amount of money,” which is unacceptable when costs rise beyond science experiment levels.

The AWS Free Tier provides 100,000 DPUs and 1 GB month of storage monthly, allowing for initial testing without costs.

Recommendations

For New Applications

Aurora DSQL makes sense for greenfield projects where:

  • Multi region active active is a core requirement
  • Application can be designed around optimistic concurrency from the start
  • Features like foreign keys and triggers aren’t architectural requirements
  • Team accepts preview stage maturity risks

For Existing Applications

Migration from PostgreSQL requires:

  • Comprehensive audit of PostgreSQL feature dependencies
  • Redesign of referential integrity enforcement
  • Implementation of retry logic for optimistic concurrency
  • Extensive testing to validate cost models
  • Acceptance that some features may require application level implementation

Testing Strategy

Before production commitment:

  1. Benchmark actual workloads against Aurora DSQL to measure real DPU consumption
  2. Test at production scale to validate the 10,000 row transaction limit doesn’t impact operations
  3. Implement comprehensive retry logic and verify behavior under contention
  4. Measure cross region latency for your specific geographic requirements
  5. Calculate total cost of ownership including application development effort for missing features

Conclusion

Amazon Aurora DSQL represents significant innovation in distributed SQL database architecture, solving genuine problems around multi region strong consistency and operational simplicity. The technical implementation—particularly the disaggregated architecture and optimistic concurrency control—demonstrates sophisticated engineering.

However, the service makes substantial tradeoffs that limit its applicability. The missing PostgreSQL features, transaction size constraints, and optimistic concurrency requirements create significant migration friction for existing applications. The unpredictable pricing model adds further uncertainty.

For organizations building new globally distributed applications with flexible architectural requirements, Aurora DSQL deserves serious evaluation. For teams with existing PostgreSQL applications or those requiring full PostgreSQL compatibility, traditional Aurora PostgreSQL or alternative distributed SQL databases may provide better paths forward.

As the service matures beyond preview status, AWS will likely address some limitations and provide better cost prediction tools. Until then, Aurora DSQL remains a promising but unfinished solution that requires careful evaluation against specific requirements and willingness to adapt applications to its architectural constraints.

References

Last Updated: November 2024 | Preview Status: Public Preview

A Script to download Photos, Videos and Images from your iPhone to your Macbook (by creation date and a file name filter)

Annoying Apple never quite got around to making it easy to offload images from your iPhone to your Macbook. So below is a complete guide to automatically download photos and videos from your iPhone to your MacBook, with options to filter by pattern and date, and organize into folders by creation date.

Prerequisites

Install the required tools using Homebrew:

cat > install_iphone_util.sh << 'EOF'
#!/bin/bash
set -e

echo "Installing tools..."

echo "Installing macFUSE"
brew install --cask macfuse

echo "Adding Brew Tap" 
brew tap gromgit/fuse

echo "Installing ifuse-mac" 
brew install gromgit/fuse/ifuse-mac

echo "Installing libimobiledevice" 
brew install libimobiledevice

echo "Installing exiftool"
brew install exiftool

echo "Done! Tools installed."
EOF

echo "Making executable..."
chmod +x install_iphone_util.sh

./install_iphone_util.sh

Setup/Pair your iPhone to your Macbook

  1. Connect your iPhone to your MacBook via USB
  2. Trust the computer on your iPhone when prompted
  3. Verify the connection:
idevicepair validate

If not paired, run:

idevicepair pair

Download Script

Run the script below to create the file download-iphone-media.sh in your current directory:

#!/bin/bash

cat > download-iphone-media.sh << 'OUTER_EOF'
#!/bin/bash

# iPhone Media Downloader
# Downloads photos and videos from iPhone to MacBook
# Supports resumable, idempotent downloads

set -e

# Default values
PATTERN="*"
OUTPUT_DIR="."
ORGANIZE_BY_DATE=false
START_DATE=""
END_DATE=""
MOUNT_POINT="/tmp/iphone_mount"
STATE_DIR=""
VERIFY_CHECKSUM=true

# Usage function
usage() {
    cat << 'INNER_EOF'
Usage: $0 [OPTIONS]

Download photos and videos from iPhone to MacBook.

OPTIONS:
    -p PATTERN          File pattern to match (e.g., "*.jpg", "*.mp4", "IMG_*")
                        Default: * (all files)
    -o OUTPUT_DIR       Output directory (default: current directory)
    -d                  Organize files by creation date into YYYY/MMM folders
    -s START_DATE       Start date filter (YYYY-MM-DD)
    -e END_DATE         End date filter (YYYY-MM-DD)
    -r                  Resume incomplete downloads (default: true)
    -n                  Skip checksum verification (faster, less safe)
    -h                  Show this help message

EXAMPLES:
    # Download all photos and videos to current directory
    $0

    # Download only JPG files to ~/Pictures/iPhone
    $0 -p "*.jpg" -o ~/Pictures/iPhone

    # Download all media organized by date
    $0 -d -o ~/Pictures/iPhone

    # Download videos from specific date range
    $0 -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/iPhone

    # Download specific IMG files organized by date
    $0 -p "IMG_*.{jpg,heic}" -d -o ~/Photos
INNER_EOF
    exit 1
}

# Parse command line arguments
while getopts "p:o:ds:e:rnh" opt; do
    case $opt in
        p) PATTERN="$OPTARG" ;;
        o) OUTPUT_DIR="$OPTARG" ;;
        d) ORGANIZE_BY_DATE=true ;;
        s) START_DATE="$OPTARG" ;;
        e) END_DATE="$OPTARG" ;;
        r) ;; # Resume is default, keeping for backward compatibility
        n) VERIFY_CHECKSUM=false ;;
        h) usage ;;
        *) usage ;;
    esac
done

# Create output directory if it doesn't exist
mkdir -p "$OUTPUT_DIR"
OUTPUT_DIR=$(cd "$OUTPUT_DIR" && pwd)

# Set up state directory for tracking downloads
STATE_DIR="$OUTPUT_DIR/.iphone_download_state"
mkdir -p "$STATE_DIR"

# Create mount point
mkdir -p "$MOUNT_POINT"

echo "=== iPhone Media Downloader ==="
echo "Pattern: $PATTERN"
echo "Output: $OUTPUT_DIR"
echo "Organize by date: $ORGANIZE_BY_DATE"
[ -n "$START_DATE" ] && echo "Start date: $START_DATE"
[ -n "$END_DATE" ] && echo "End date: $END_DATE"
echo ""

# Check if iPhone is connected
echo "Checking for iPhone connection..."
if ! ideviceinfo -s > /dev/null 2>&1; then
    echo "Error: No iPhone detected. Please connect your iPhone and trust this computer."
    exit 1
fi

# Mount iPhone
echo "Mounting iPhone..."
if ! ifuse "$MOUNT_POINT" 2>/dev/null; then
    echo "Error: Failed to mount iPhone. Make sure you've trusted this computer on your iPhone."
    exit 1
fi

# Cleanup function
cleanup() {
    local exit_code=$?
    echo ""
    if [ $exit_code -ne 0 ]; then
        echo "⚠ Download interrupted. Run the script again to resume."
    fi
    echo "Unmounting iPhone..."
    umount "$MOUNT_POINT" 2>/dev/null || true
    rmdir "$MOUNT_POINT" 2>/dev/null || true
}
trap cleanup EXIT

# Find DCIM folder
DCIM_PATH="$MOUNT_POINT/DCIM"
if [ ! -d "$DCIM_PATH" ]; then
    echo "Error: DCIM folder not found on iPhone"
    exit 1
fi

echo "Scanning for files matching pattern: $PATTERN"
echo ""

# Counter
TOTAL_FILES=0
COPIED_FILES=0
SKIPPED_FILES=0
RESUMED_FILES=0
FAILED_FILES=0

# Function to compute file checksum
compute_checksum() {
    local file="$1"
    if [ -f "$file" ]; then
        shasum -a 256 "$file" 2>/dev/null | awk '{print $1}'
    fi
}

# Function to get file size
get_file_size() {
    local file="$1"
    if [ -f "$file" ]; then
        stat -f "%z" "$file" 2>/dev/null
    fi
}

# Function to mark file as completed
mark_completed() {
    local source_file="$1"
    local dest_file="$2"
    local checksum="$3"
    local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
    
    echo "$dest_file|$checksum|$(date +%s)" > "$state_file"
}

# Function to check if file was previously completed
is_completed() {
    local source_file="$1"
    local dest_file="$2"
    local state_file="$STATE_DIR/$(echo "$source_file" | shasum -a 256 | awk '{print $1}')"
    
    if [ ! -f "$state_file" ]; then
        return 1
    fi
    
    # Read state file
    local saved_dest saved_checksum saved_timestamp
    IFS='|' read -r saved_dest saved_checksum saved_timestamp < "$state_file"
    
    # Check if destination file exists and matches
    if [ "$saved_dest" = "$dest_file" ] && [ -f "$dest_file" ]; then
        if [ "$VERIFY_CHECKSUM" = true ]; then
            local current_checksum=$(compute_checksum "$dest_file")
            if [ "$current_checksum" = "$saved_checksum" ]; then
                return 0
            fi
        else
            # Without checksum verification, just check file exists
            return 0
        fi
    fi
    
    return 1
}

# Convert dates to timestamps for comparison
START_TIMESTAMP=""
END_TIMESTAMP=""
if [ -n "$START_DATE" ]; then
    START_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$START_DATE" "+%s" 2>/dev/null || echo "")
    if [ -z "$START_TIMESTAMP" ]; then
        echo "Error: Invalid start date format. Use YYYY-MM-DD"
        exit 1
    fi
fi
if [ -n "$END_DATE" ]; then
    END_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$END_DATE" "+%s" 2>/dev/null || echo "")
    if [ -z "$END_TIMESTAMP" ]; then
        echo "Error: Invalid end date format. Use YYYY-MM-DD"
        exit 1
    fi
    # Add 24 hours to include the entire end date
    END_TIMESTAMP=$((END_TIMESTAMP + 86400))
fi

# Process files
find "$DCIM_PATH" -type f | while read -r file; do
    filename=$(basename "$file")
    
    # Check if filename matches pattern (basic glob matching)
    if [[ ! "$filename" == $PATTERN ]]; then
        continue
    fi
    
    TOTAL_FILES=$((TOTAL_FILES + 1))
    
    # Get file creation date
    if command -v exiftool > /dev/null 2>&1; then
        # Try to get date from EXIF data
        CREATE_DATE=$(exiftool -s3 -DateTimeOriginal -d "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
        if [ -z "$CREATE_DATE" ]; then
            # Fallback to file modification time
            CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
        fi
    else
        # Use file modification time
        CREATE_DATE=$(stat -f "%Sm" -t "%Y-%m-%d %H:%M:%S" "$file" 2>/dev/null)
    fi
    
    # Extract date components
    if [ -n "$CREATE_DATE" ]; then
        FILE_DATE=$(echo "$CREATE_DATE" | cut -d' ' -f1)
        FILE_TIMESTAMP=$(date -j -f "%Y-%m-%d" "$FILE_DATE" "+%s" 2>/dev/null || echo "")
        
        # Check date filters
        if [ -n "$START_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -lt "$START_TIMESTAMP" ]; then
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        fi
        if [ -n "$END_TIMESTAMP" ] && [ -n "$FILE_TIMESTAMP" ] && [ "$FILE_TIMESTAMP" -ge "$END_TIMESTAMP" ]; then
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        fi
        
        # Determine output path with YYYY/MMM structure
        if [ "$ORGANIZE_BY_DATE" = true ]; then
            YEAR=$(echo "$FILE_DATE" | cut -d'-' -f1)
            MONTH_NUM=$(echo "$FILE_DATE" | cut -d'-' -f2)
            # Convert month number to 3-letter abbreviation
            case "$MONTH_NUM" in
                01) MONTH="Jan" ;;
                02) MONTH="Feb" ;;
                03) MONTH="Mar" ;;
                04) MONTH="Apr" ;;
                05) MONTH="May" ;;
                06) MONTH="Jun" ;;
                07) MONTH="Jul" ;;
                08) MONTH="Aug" ;;
                09) MONTH="Sep" ;;
                10) MONTH="Oct" ;;
                11) MONTH="Nov" ;;
                12) MONTH="Dec" ;;
                *) MONTH="Unknown" ;;
            esac
            DEST_DIR="$OUTPUT_DIR/$YEAR/$MONTH"
        else
            DEST_DIR="$OUTPUT_DIR"
        fi
    else
        DEST_DIR="$OUTPUT_DIR"
    fi
    
    # Create destination directory
    mkdir -p "$DEST_DIR"
    
    # Determine destination path
    DEST_PATH="$DEST_DIR/$filename"
    
    # Check if this file was previously completed successfully
    if is_completed "$file" "$DEST_PATH"; then
        echo "✓ Already downloaded: $filename"
        SKIPPED_FILES=$((SKIPPED_FILES + 1))
        continue
    fi
    
    # Check if file already exists with same content (for backward compatibility)
    if [ -f "$DEST_PATH" ]; then
        if cmp -s "$file" "$DEST_PATH"; then
            echo "✓ Already exists (identical): $filename"
            # Mark as completed for future runs
            SOURCE_CHECKSUM=$(compute_checksum "$DEST_PATH")
            mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
            SKIPPED_FILES=$((SKIPPED_FILES + 1))
            continue
        else
            # Add timestamp to avoid overwriting different file
            BASE="${filename%.*}"
            EXT="${filename##*.}"
            DEST_PATH="$DEST_DIR/${BASE}_$(date +%s).$EXT"
        fi
    fi
    
    # Use temporary file for atomic copy
    TEMP_PATH="${DEST_PATH}.tmp.$$"
    
    # Copy to temporary file
    echo "⬇ Downloading: $filename → $DEST_PATH"
    if ! cp "$file" "$TEMP_PATH" 2>/dev/null; then
        echo "✗ Failed to copy: $filename"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
        continue
    fi
    
    # Verify size matches (basic corruption check)
    SOURCE_SIZE=$(get_file_size "$file")
    TEMP_SIZE=$(get_file_size "$TEMP_PATH")
    
    if [ "$SOURCE_SIZE" != "$TEMP_SIZE" ]; then
        echo "✗ Size mismatch for $filename (source: $SOURCE_SIZE, copied: $TEMP_SIZE)"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
        continue
    fi
    
    # Compute checksum for verification and tracking
    if [ "$VERIFY_CHECKSUM" = true ]; then
        SOURCE_CHECKSUM=$(compute_checksum "$TEMP_PATH")
    else
        SOURCE_CHECKSUM="skipped"
    fi
    
    # Preserve timestamps
    if [ -n "$CREATE_DATE" ]; then
        touch -t $(date -j -f "%Y-%m-%d %H:%M:%S" "$CREATE_DATE" "+%Y%m%d%H%M.%S" 2>/dev/null) "$TEMP_PATH" 2>/dev/null || true
    fi
    
    # Atomic move from temp to final destination
    if mv "$TEMP_PATH" "$DEST_PATH" 2>/dev/null; then
        echo "✓ Completed: $filename"
        # Mark as successfully completed
        mark_completed "$file" "$DEST_PATH" "$SOURCE_CHECKSUM"
        COPIED_FILES=$((COPIED_FILES + 1))
    else
        echo "✗ Failed to finalize: $filename"
        rm -f "$TEMP_PATH"
        FAILED_FILES=$((FAILED_FILES + 1))
    fi
done

echo ""
echo "=== Summary ==="
echo "Total files matching pattern: $TOTAL_FILES"
echo "Files downloaded: $COPIED_FILES"
echo "Files already present: $SKIPPED_FILES"
if [ $FAILED_FILES -gt 0 ]; then
    echo "Files failed: $FAILED_FILES"
    echo ""
    echo "⚠ Some files failed to download. Run the script again to retry."
    exit 1
fi
echo ""
echo "✓ Download complete! All files transferred successfully."
OUTER_EOF

echo "Making the script executable..."
chmod +x download-iphone-media.sh

echo "✓ Script created successfully: download-iphone-media.sh"

Usage Examples

Basic Usage

Download all photos and videos to the current directory:

./download-iphone-media.sh

Download with Date Organization

Organize files into folders by creation date (YYYY/MMM structure):

./download-iphone-media.sh -d -o ./Pictures

This creates a structure like:

./Pictures
├── 2024/
│   ├── Jan/
│   │   ├── IMG_1234.jpg
│   │   └── IMG_1235.heic
│   ├── Feb/
│   └── Dec/
├── 2025/
│   ├── Jan/
│   └── Nov/

Filter by File Pattern

Download only specific file types:

# Only JPG files
./download-iphone-media.sh -p "*.jpg" -o ~/Pictures/iPhone

# Only videos (MOV and MP4)
./download-iphone-media.sh -p "*.mov" -o ~/Videos/iPhone
./download-iphone-media.sh -p "*.mp4" -o ~/Videos/iPhone

# Files starting with IMG_
./download-iphone-media.sh -p "IMG_*" -o ~/Pictures

# HEIC photos (iPhone's default format)
./download-iphone-media.sh -p "*.heic" -o ~/Pictures/iPhone

Filter by Date Range

Download photos from a specific date range:

# Photos from January 2025
./download-iphone-media.sh -s 2025-01-01 -e 2025-01-31 -d -o ~/Pictures/January2025

# Photos from last week
./download-iphone-media.sh -s 2025-11-10 -e 2025-11-17 -o ~/Pictures/LastWeek

# Photos after a specific date
./download-iphone-media.sh -s 2025-11-01 -o ~/Pictures/Recent

Combined Filters

Combine multiple options for precise control:

# Download only videos from January 2025, organized by date
./download-iphone-media.sh -p "*.mov" -s 2025-01-01 -e 2025-01-31 -d -o ~/Videos/Vacation

# Download all HEIC photos from the last month, organized by date
./download-iphone-media.sh -p "*.heic" -s 2025-10-17 -e 2025-11-17 -d -o ~/Pictures/LastMonth

Features

Resumable & Idempotent Downloads

  • Crash recovery: Interrupted downloads can be resumed by running the script again
  • Atomic operations: Files are copied to temporary locations first, then moved atomically
  • State tracking: Maintains a hidden state directory (.iphone_download_state) to track completed files
  • Checksum verification: Uses SHA-256 checksums to verify file integrity (can be disabled with -n for speed)
  • No duplicates: Running the script multiple times won’t re-download existing files
  • Corruption detection: Validates file sizes and optionally checksums after copy

Date-Based Organization

  • Automatic folder structure: Creates YYYY/MMM folders based on photo creation date (e.g., 2025/Jan, 2025/Feb)
  • EXIF data support: Reads actual photo capture date from EXIF metadata when available
  • Fallback mechanism: Uses file modification time if EXIF data is unavailable
  • Fewer folders: Maximum 12 month folders per year instead of up to 365 day folders

Smart File Handling

  • Duplicate detection: Skips files that already exist with identical content
  • Conflict resolution: Adds timestamp suffix to filename if different file with same name exists
  • Timestamp preservation: Maintains original creation dates on copied files
  • Error tracking: Reports failed files and provides clear exit codes

Progress Feedback

  • Real-time progress updates showing each file being downloaded
  • Summary statistics at the end (total found, downloaded, skipped, failed)
  • Clear error messages for troubleshooting
  • Helpful resume instructions if interrupted

Common File Patterns

iPhone typically uses these file formats:

TypeExtensionsPattern Example
Photos.jpg.heic*.jpg or *.heic
Videos.mov.mp4*.mov or *.mp4
Screenshots.png*.png
Live Photos.heic.movIMG_*.heic + IMG_*.mov
All mediaall above* (default)

5. Handling Interrupted Downloads

If a download is interrupted (disconnection, error, etc.), simply run the script again:

# Script was interrupted - just run it again
./download-iphone-media.sh -d -o ~/Pictures/iPhone

The script will:

  • Skip all successfully downloaded files
  • Retry any failed files
  • Continue from where it left off

6. Fast Mode (Skip Checksum Verification)

For faster transfers on reliable connections, disable checksum verification:

# Skip checksums for speed (still verifies file sizes)
./download-iphone-media.sh -n -d -o ~/Pictures/iPhone

Note: This is generally safe but won’t detect corruption as thoroughly.

7. Clean State and Re-download

If you want to force a re-download of all files:

# Remove state directory to start fresh
rm -rf ~/Pictures/iPhone/.iphone_download_state
./download-iphone-media.sh -d -o ~/Pictures/iPhone

Troubleshooting

iPhone Not Detected

Error: No iPhone detected. Please connect your iPhone and trust this computer.

Solution:

  1. Make sure your iPhone is connected via USB cable
  2. Unlock your iPhone
  3. Tap “Trust” when prompted on your iPhone
  4. Run idevicepair pair if you haven’t already

Failed to Mount iPhone

Error: Failed to mount iPhone

Solution:

  1. Try unplugging and reconnecting your iPhone
  2. Check if another process is using the iPhone:umount /tmp/iphone_mount 2>/dev/null
  3. Restart your iPhone and try again
  4. On macOS Ventura or later, check System Settings → Privacy & Security → Files and Folders

Permission Denied

Solution:
Make sure the script has executable permissions:

chmod +x download-iphone-media.sh

Missing Tools

Error: Commands not found

Solution:
Install the required tools:

brew install libimobiledevice ifuse exiftool

On newer macOS versions, you may need to install macFUSE:

brew install --cask macfuse

After installation, you may need to restart your Mac and allow the kernel extension in System Settings → Privacy & Security.

Tips and Best Practices

1. Regular Backups

Create a scheduled backup script:

#!/bin/bash
# Save as ~/bin/backup-iphone-photos.sh

DATE=$(date +%Y-%m-%d)
BACKUP_DIR=~/Pictures/iPhone-Backups/$DATE

./download-iphone-media.sh -d -o "$BACKUP_DIR"

echo "Backup completed to $BACKUP_DIR"

2. Incremental Downloads

The script is fully idempotent and tracks completed downloads, making it perfect for incremental backups:

# Run daily to get new photos - only new files will be downloaded
./download-iphone-media.sh -d -o ~/Pictures/iPhone

The script maintains state in .iphone_download_state/ within your output directory, ensuring:

  • Already downloaded files are skipped instantly (no re-copying)
  • Interrupted downloads can be resumed
  • File integrity is verified with checksums

3. Free Up iPhone Storage

After confirming successful download:

  1. Verify files are on your MacBook
  2. Check file counts match
  3. Delete photos from iPhone via Photos app
  4. Empty “Recently Deleted” album

4. Convert HEIC to JPG (Optional)

If you need JPG files for compatibility:

# Install ImageMagick
brew install imagemagick

# Convert all HEIC files to JPG
find ~/Pictures/iPhone -name "*.heic" -exec sh -c 'magick "$0" "${0%.heic}.jpg"' {} \;

How Idempotent Recovery Works

The script implements several mechanisms to ensure safe, resumable downloads:

1. State Tracking

A hidden directory .iphone_download_state/ is created in your output directory. For each successfully downloaded file, a state file is created containing:

  • Destination file path
  • SHA-256 checksum (if verification enabled)
  • Completion timestamp

2. Atomic Operations

Each file is downloaded using a two-phase commit:

  1. Download Phase: Copy to temporary file (.tmp.$$ suffix)
  2. Verification Phase: Check file size and optionally compute checksum
  3. Commit Phase: Atomically move temp file to final destination
  4. Record Phase: Write completion state

If the script is interrupted at any point, incomplete temporary files are cleaned up automatically.

3. Idempotent Behavior

When you run the script:

  1. Before downloading each file, it checks the state directory
  2. If a state file exists, it verifies the destination file still exists and matches the checksum
  3. If verification passes, the file is skipped (no re-download)
  4. If verification fails or no state exists, the file is downloaded

This means:

  • ✓ Safe to run multiple times
  • ✓ Interrupted downloads can be resumed
  • ✓ Corrupted files are detected and re-downloaded
  • ✓ No wasted bandwidth on already-downloaded files

4. Checksum Verification

By default, SHA-256 checksums are computed and verified:

  • During download: Checksum computed after copy completes
  • On resume: Existing files are verified against stored checksum
  • Optional: Use -n flag to skip checksums for speed (still verifies file sizes)

Example Recovery Scenario

# Start downloading 1000 photos
./download-iphone-media.sh -d -o ~/Pictures/iPhone

# Script is interrupted after 500 files
# Press Ctrl+C or cable disconnects

# Simply run again - picks up where it left off
./download-iphone-media.sh -d -o ~/Pictures/iPhone
# Output:
# ✓ Already downloaded: IMG_0001.heic
# ✓ Already downloaded: IMG_0002.heic
# ...
# ⬇ Downloading: IMG_0501.heic → ~/Pictures/iPhone/2025/Jan/IMG_0501.heic

Performance Notes

  • Transfer speed: Depends on USB connection (USB 2.0 vs USB 3.0)
  • Large libraries: May take significant time for thousands of photos
  • EXIF reading: Adds minimal overhead but provides accurate dates
  • Pattern matching: Processed client-side, so all files are scanned

Conclusion

This script provides a robust, production-ready solution for downloading photos and videos from your iPhone to your MacBook. Key capabilities:

Core Features:

  • Filter by file patterns (type, name)
  • Filter by date ranges
  • Organize automatically into date-based folders
  • Preserve original file metadata

Reliability:

  • Fully idempotent – safe to run multiple times
  • Resumable downloads with automatic crash recovery
  • Atomic file operations prevent corruption
  • Checksum verification ensures data integrity
  • Clear error reporting and recovery instructions

For regular use, consider creating aliases in your ~/.zshrc:

# Add to ~/.zshrc
alias iphone-backup='~/download-iphone-media.sh -d -o ~/Pictures/iPhone'
alias iphone-videos='~/download-iphone-media.sh -p "*.mov" -d -o ~/Videos/iPhone'

Then simply run iphone-backup whenever you want to download your photos!

Resources

Deep Dive into PostgreSQL Aurora Vacuum Optimizations for Large Tables

When managing large PostgreSQL tables with frequent updates, vacuum operations become critical for maintaining database health and performance. In this comprehensive guide, we’ll explore vacuum optimization techniques, dive deep into the pg_repack extension, and provide hands-on examples you can run in your own environment.

1. Understanding the Problem

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle concurrent transactions. When rows are updated or deleted, PostgreSQL doesn’t immediately remove the old versions—it marks them as dead tuples. Over time, these dead tuples accumulate, leading to:

  • Table bloat: Wasted disk space
  • Index bloat: Degraded query performance
  • Slower sequential scans: More pages to read
  • Transaction ID wraparound risks: In extreme cases

The VACUUM process reclaims this space, but for large, heavily-updated tables, standard vacuum strategies often fall short.

1.1 Setting Up Our Test Environment

Let’s create a realistic scenario to understand vacuum optimization. We’ll build a large user activity tracking table that receives constant updates—similar to what you might find in production systems tracking user behaviors, session data, or transaction logs.

1.2 Creating the Test Table

This schema represents a typical high-volume table with multiple indexes for different query patterns:

-- Create our test table
CREATE TABLE user_activities (
    id BIGSERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    activity_data JSONB,
    status VARCHAR(20) DEFAULT 'pending',
    processed_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    metadata TEXT
);

-- Create indexes
CREATE INDEX idx_user_activities_user_id ON user_activities(user_id);
CREATE INDEX idx_user_activities_status ON user_activities(status);
CREATE INDEX idx_user_activities_created_at ON user_activities(created_at);
CREATE INDEX idx_user_activities_processed_at ON user_activities(processed_at) 
    WHERE processed_at IS NOT NULL;

Example Output:

CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX

1.3 Population Script

This function generates realistic test data with varied activity types and statuses to simulate a production environment:

-- Function to generate random activity data
CREATE OR REPLACE FUNCTION generate_user_activities(num_rows INTEGER)
RETURNS void AS $$
DECLARE
    batch_size INTEGER := 10000;
    batches INTEGER;
    i INTEGER;
BEGIN
    batches := CEIL(num_rows::NUMERIC / batch_size);
    
    FOR i IN 1..batches LOOP
        INSERT INTO user_activities (
            user_id,
            activity_type,
            activity_data,
            status,
            created_at,
            metadata
        )
        SELECT
            (random() * 100000)::INTEGER + 1,
            (ARRAY['login', 'purchase', 'view', 'search', 'logout'])[FLOOR(random() * 5 + 1)],
            jsonb_build_object(
                'ip', '192.168.' || (random() * 255)::INTEGER || '.' || (random() * 255)::INTEGER,
                'user_agent', 'Mozilla/5.0',
                'session_id', md5(random()::TEXT)
            ),
            (ARRAY['pending', 'processing', 'completed'])[FLOOR(random() * 3 + 1)],
            NOW() - (random() * INTERVAL '90 days'),
            repeat('x', (random() * 500)::INTEGER + 100)
        FROM generate_series(1, LEAST(batch_size, num_rows - (i-1) * batch_size));
        
        RAISE NOTICE 'Inserted batch % of %', i, batches;
    END LOOP;
    
    RAISE NOTICE 'Completed inserting % rows', num_rows;
END;
$$ LANGUAGE plpgsql;

-- Populate with 5 million rows (adjust as needed)
SELECT generate_user_activities(5000000);

-- Analyze the table
ANALYZE user_activities;

Example Output:

CREATE FUNCTION
NOTICE:  Inserted batch 1 of 500
NOTICE:  Inserted batch 2 of 500
NOTICE:  Inserted batch 3 of 500
...
NOTICE:  Inserted batch 500 of 500
NOTICE:  Completed inserting 5000000 rows
 generate_user_activities 
---------------------------
 
(1 row)

ANALYZE

1.4 Simulating Heavy Update Load

Understanding bloat requires seeing it in action. This function simulates the update-heavy workload patterns that cause vacuum challenges in production systems:

-- Function to simulate continuous updates
CREATE OR REPLACE FUNCTION simulate_updates(duration_minutes INTEGER)
RETURNS void AS $$
DECLARE
    end_time TIMESTAMP;
    update_count INTEGER := 0;
BEGIN
    end_time := NOW() + (duration_minutes || ' minutes')::INTERVAL;
    
    WHILE NOW() < end_time LOOP
        -- Update random rows to 'processing' status
        UPDATE user_activities
        SET status = 'processing',
            updated_at = NOW()
        WHERE id IN (
            SELECT id FROM user_activities
            WHERE status = 'pending'
            ORDER BY random()
            LIMIT 1000
        );
        
        -- Update random rows to 'completed' status
        UPDATE user_activities
        SET status = 'completed',
            processed_at = NOW(),
            updated_at = NOW()
        WHERE id IN (
            SELECT id FROM user_activities
            WHERE status = 'processing'
            ORDER BY random()
            LIMIT 800
        );
        
        update_count := update_count + 1800;
        
        IF update_count % 10000 = 0 THEN
            RAISE NOTICE 'Processed % updates', update_count;
        END IF;
        
        PERFORM pg_sleep(0.1);
    END LOOP;
    
    RAISE NOTICE 'Completed % total updates', update_count;
END;
$$ LANGUAGE plpgsql;

-- Run for 5 minutes to generate bloat
-- SELECT simulate_updates(5);

Example Output (when running the simulate_updates function):

NOTICE:  Processed 10000 updates
NOTICE:  Processed 20000 updates
NOTICE:  Processed 30000 updates
...
NOTICE:  Completed 54000 total updates
 simulate_updates 
------------------
 
(1 row)

1.4 Monitoring Table Health

Before optimizing vacuum operations, you need visibility into your table’s health metrics. These queries provide essential diagnostics for understanding bloat levels and vacuum effectiveness.

1.5 Check Table and Index Bloat

This comprehensive query gives you a snapshot of your table’s overall health, including size metrics and tuple statistics:

-- Comprehensive bloat analysis
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - 
                   pg_relation_size(schemaname||'.'||tablename)) AS indexes_size,
    n_live_tup,
    n_dead_tup,
    ROUND(100 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_tuple_percent,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename = 'user_activities';

Example Output:

 schemaname |    tablename     | total_size | table_size | indexes_size | n_live_tup | n_dead_tup | dead_tuple_percent |       last_vacuum       |     last_autovacuum     |       last_analyze       |     last_autoanalyze     
------------+------------------+------------+------------+--------------+------------+------------+--------------------+-------------------------+-------------------------+--------------------------+--------------------------
 public     | user_activities  | 4892 MB    | 3214 MB    | 1678 MB      |    5000000 |     847523 |              14.51 | 2024-11-16 02:15:33.421 | 2024-11-17 08:22:14.832 | 2024-11-16 02:15:45.123 | 2024-11-17 08:22:28.945
(1 row)

1.6 Detailed Bloat Estimation

For a more precise understanding of how much space is wasted, this query calculates bloat based on tuple density:

-- More accurate bloat estimation
WITH table_stats AS (
    SELECT
        schemaname,
        tablename,
        n_live_tup,
        n_dead_tup,
        pg_relation_size(schemaname||'.'||tablename) AS table_bytes,
        (n_live_tup + n_dead_tup)::NUMERIC AS total_tuples
    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities'
),
bloat_calc AS (
    SELECT
        *,
        CASE 
            WHEN total_tuples > 0 THEN
                table_bytes / NULLIF(total_tuples, 0)
            ELSE 0
        END AS bytes_per_tuple,
        CASE
            WHEN n_live_tup > 0 THEN
                table_bytes * (n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0))
            ELSE 0
        END AS bloat_bytes
    FROM table_stats
)
SELECT
    tablename,
    pg_size_pretty(table_bytes) AS current_size,
    pg_size_pretty(bloat_bytes::BIGINT) AS estimated_bloat,
    ROUND(100 * bloat_bytes / NULLIF(table_bytes, 0), 2) AS bloat_percent,
    n_live_tup,
    n_dead_tup
FROM bloat_calc;

Example Output:

    tablename     | current_size | estimated_bloat | bloat_percent | n_live_tup | n_dead_tup 
------------------+--------------+-----------------+---------------+------------+------------
 user_activities  | 3214 MB      | 466 MB          |         14.51 |    5000000 |     847523
(1 row)

1.7 Check Current Vacuum Activity

When troubleshooting vacuum issues, it’s crucial to see what’s actually running:

-- Monitor active vacuum operations
SELECT
    pid,
    datname,
    usename,
    state,
    query_start,
    NOW() - query_start AS duration,
    query
FROM pg_stat_activity
WHERE query LIKE '%VACUUM%'
  AND query NOT LIKE '%pg_stat_activity%';

Example Output:

  pid  |  datname   | usename  |  state  |         query_start         |    duration     |                        query                        
-------+------------+----------+---------+-----------------------------+-----------------+-----------------------------------------------------
 12847 | production | postgres | active  | 2024-11-17 09:15:22.534829  | 00:03:17.482341 | VACUUM (VERBOSE, ANALYZE) user_activities;
(1 row)

2. Standard Vacuum Strategies

Understanding the different vacuum options is essential for choosing the right approach for your workload. Each vacuum variant serves different purposes and has different performance characteristics.

2.1 Manual VACUUM

These are the basic vacuum commands you’ll use for routine maintenance:

-- Basic vacuum (doesn't lock table)
VACUUM user_activities;

-- Vacuum with analyze
VACUUM ANALYZE user_activities;

-- Verbose output for monitoring
VACUUM (VERBOSE, ANALYZE) user_activities;

-- Aggressive vacuum (more thorough, slower)
VACUUM (FULL, VERBOSE, ANALYZE) user_activities;

Example Output (VACUUM VERBOSE):

INFO:  vacuuming "public.user_activities"
INFO:  scanned index "user_activities_pkey" to remove 847523 row versions
DETAIL:  CPU: user: 2.45 s, system: 0.89 s, elapsed: 12.34 s
INFO:  scanned index "idx_user_activities_user_id" to remove 847523 row versions
DETAIL:  CPU: user: 1.87 s, system: 0.67 s, elapsed: 9.12 s
INFO:  scanned index "idx_user_activities_status" to remove 847523 row versions
DETAIL:  CPU: user: 1.92 s, system: 0.71 s, elapsed: 9.45 s
INFO:  scanned index "idx_user_activities_created_at" to remove 847523 row versions
DETAIL:  CPU: user: 1.88 s, system: 0.68 s, elapsed: 9.23 s
INFO:  "user_activities": removed 847523 row versions in 112456 pages
DETAIL:  CPU: user: 3.21 s, system: 1.45 s, elapsed: 18.67 s
INFO:  "user_activities": found 847523 removable, 5000000 nonremovable row versions in 425678 out of 425678 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 123456789
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 11.33 s, system: 4.40 s, elapsed: 58.81 s.
VACUUM

Note: VACUUM FULL requires an ACCESS EXCLUSIVE lock and rewrites the entire table, making it unsuitable for production during business hours.

2.2 Configuring Autovacuum

Aurora PostgreSQL has autovacuum enabled by default, but tuning these parameters is critical for large, frequently-updated tables:

-- Check current autovacuum settings
SHOW autovacuum_vacuum_threshold;
SHOW autovacuum_vacuum_scale_factor;
SHOW autovacuum_vacuum_cost_delay;
SHOW autovacuum_vacuum_cost_limit;

-- Custom autovacuum settings for our table
ALTER TABLE user_activities SET (
    autovacuum_vacuum_threshold = 5000,
    autovacuum_vacuum_scale_factor = 0.05,  -- More aggressive (default 0.2)
    autovacuum_vacuum_cost_delay = 10,      -- Faster vacuum (default 20)
    autovacuum_analyze_threshold = 2500,
    autovacuum_analyze_scale_factor = 0.05
);

-- For extremely busy tables
ALTER TABLE user_activities SET (
    autovacuum_vacuum_threshold = 1000,
    autovacuum_vacuum_scale_factor = 0.02,
    autovacuum_vacuum_cost_delay = 2,
    autovacuum_vacuum_cost_limit = 2000,    -- Higher I/O limit
    autovacuum_naptime = 10                 -- Check more frequently
);

Example Output:

 autovacuum_vacuum_threshold 
-----------------------------
 50
(1 row)

 autovacuum_vacuum_scale_factor 
--------------------------------
 0.2
(1 row)

 autovacuum_vacuum_cost_delay 
------------------------------
 20
(1 row)

 autovacuum_vacuum_cost_limit 
------------------------------
 200
(1 row)

ALTER TABLE
ALTER TABLE

These are table-level storage parameters, not server-level GUC (Grand Unified Configuration) parameters, so no restart is needed. These settings take effect immediately without requiring a database or server restart.

Server-Level vs Table-Level

These specific parameters are being set at the table level using ALTER TABLE ... SET, which means they only affect the user_activities table.

However, these same parameters do exist as server-level GUC parameters with slightly different names:

  • autovacuum_vacuum_threshold (server-level GUC exists)
  • autovacuum_vacuum_scale_factor (server-level GUC exists)
  • autovacuum_vacuum_cost_delay (server-level GUC exists)
  • autovacuum_analyze_threshold (server-level GUC exists)
  • autovacuum_analyze_scale_factor (server-level GUC exists)

When set at the server level in postgresql.conf, those would require a reload (pg_ctl reload or SELECT pg_reload_conf()), but not a full restart.

Your Command

Your ALTER TABLE command is overriding the server-level defaults specifically for the user_activities table, making autovacuum more aggressive for that table. This is a common approach for high-churn tables and applies instantly.

2.3 The pg_repack Extension

pg_repack is a game-changer for managing large tables with bloat. While VACUUM FULL requires a long-duration exclusive lock that blocks all operations, pg_repack uses an innovative approach that allows the table to remain online and accessible throughout most of the operation.

Understanding pg_repack’s Architecture

pg_repack works fundamentally differently from traditional vacuum operations. Here’s what makes it special:

The Problem with VACUUM FULL:

  • Acquires an ACCESS EXCLUSIVE lock for the entire operation
  • Blocks all reads and writes
  • For a 100GB table, this could mean hours of downtime
  • Single-threaded operation

How pg_repack Solves This:

pg_repack employs a clever multi-stage approach:

  1. Log Table Creation: Creates a temporary log table to capture changes made during the rebuild
  2. Online Rebuild: Builds a new, defragmented copy of your table while the original remains fully operational
  3. Change Capture: Records all INSERT, UPDATE, and DELETE operations in the log table
  4. Change Replay: Applies the logged changes to the new table
  5. Atomic Swap: Takes a brief exclusive lock (typically < 1 second) to swap the old and new tables
  6. Index Rebuild: Rebuilds indexes concurrently on the new table

Key Benefits:

  • Minimal Locking: Only a brief lock during the table swap
  • Online Operation: Applications continue running normally
  • Better Efficiency: Rewrites data in optimal order, improving subsequent query performance
  • Parallel Processing: Can use multiple workers for faster completion
  • Transaction Safety: All changes are captured and replayed, ensuring data consistency

2.4 Installing pg_repack on Aurora

Setting up pg_repack is straightforward on Aurora PostgreSQL:

-- Check available extensions
SELECT * FROM pg_available_extensions WHERE name = 'pg_repack';

-- Install pg_repack (requires rds_superuser role)
CREATE EXTENSION pg_repack;

-- Verify installation
\dx pg_repack

Example Output:

   name    | default_version | installed_version |                         comment                          
-----------+-----------------+-------------------+----------------------------------------------------------
 pg_repack | 1.4.8           |                   | Reorganize tables in PostgreSQL databases with minimal locks
(1 row)

CREATE EXTENSION

                                    List of installed extensions
   Name    | Version |   Schema   |                         Description                          
-----------+---------+------------+--------------------------------------------------------------
 pg_repack | 1.4.8   | public     | Reorganize tables in PostgreSQL databases with minimal locks
(1 row)

2.5 How pg_repack Works (Technical Deep Dive)

Let’s break down the pg_repack process with more detail:

Phase 1: Setup (seconds)

  • Creates schema repack for temporary objects
  • Creates a log table repack.log_XXXXX with triggers
  • Installs triggers on source table to capture changes
  • Takes a snapshot of current transaction ID

Phase 2: Initial Copy (majority of time)

  • Copies all data from original table to repack.table_XXXXX
  • Sorts data optimally (by primary key or specified order)
  • Meanwhile, all changes are captured in the log table
  • No locks on the original table during this phase

Phase 3: Delta Application (proportional to changes)

  • Reads the log table
  • Applies INSERT/UPDATE/DELETE operations to new table
  • May iterate if many changes occurred during Phase 2

Phase 4: Final Swap (< 1 second typically)

  • Acquires ACCESS EXCLUSIVE lock
  • Applies any final logged changes
  • Swaps the table definitions atomically
  • Releases lock
  • Drops old table and log table

Phase 5: Index Rebuild (concurrent)

  • Rebuilds all indexes on new table
  • Uses CREATE INDEX CONCURRENTLY to avoid blocking

2.6 Basic pg_repack Usage

From the command line (requires appropriate IAM/credentials for Aurora):

# Basic repack
pg_repack -h your-aurora-cluster.region.rds.amazonaws.com \
          -U your_username \
          -d your_database \
          -t user_activities

# With specific options
pg_repack -h your-aurora-cluster.region.rds.amazonaws.com \
          -U your_username \
          -d your_database \
          -t user_activities \
          --no-order \
          --no-kill-backend \
          -j 4  # Use 4 parallel workers

Example Output:

INFO: repacking table "public.user_activities"
INFO: disabling triggers
INFO: creating temporary table
INFO: copying rows
INFO: 5000000 rows copied
INFO: creating indexes
INFO: creating index "user_activities_pkey"
INFO: creating index "idx_user_activities_user_id"
INFO: creating index "idx_user_activities_status"
INFO: creating index "idx_user_activities_created_at"
INFO: creating index "idx_user_activities_processed_at"
INFO: swapping tables
INFO: applying log
INFO: 12847 log rows applied
INFO: enabling triggers
INFO: dropping old table
INFO: Repacked user_activities (3.2GB -> 2.7GB), 15.6% space reclaimed
NOTICE: TABLE "public.user_activities" repacked successfully

2.7 Advanced pg_repack with SQL Interface

You can also trigger pg_repack from within PostgreSQL:

-- Repack a specific table
SELECT repack.repack_table('public.user_activities');

-- Repack with options
SELECT repack.repack_table(
    'public.user_activities',
    'REINDEX'  -- Rebuild indexes too
);

-- Check pg_repack progress (run in another session)
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_tup_ins,
    n_tup_upd,
    n_tup_del
FROM pg_stat_user_tables
WHERE tablename LIKE '%repack%';

Example Output:

 repack_table 
--------------
 t
(1 row)

 schemaname |         tablename          |  size   | n_tup_ins | n_tup_upd | n_tup_del 
------------+----------------------------+---------+-----------+-----------+-----------
 repack     | table_12847                | 2689 MB |   5000000 |         0 |         0
 repack     | log_12847                  | 145 MB  |     12847 |         0 |         0
(2 rows)

2.8 Monitoring pg_repack Progress

Real-time monitoring helps you understand how long the operation will take:

-- Create a monitoring function
CREATE OR REPLACE FUNCTION monitor_repack()
RETURNS TABLE (
    table_name TEXT,
    phase TEXT,
    elapsed_time INTERVAL,
    table_size TEXT,
    estimated_remaining TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT
        t.tablename::TEXT,
        CASE
            WHEN t.tablename LIKE 'repack.table_%' THEN 'Building new table'
            WHEN t.tablename LIKE 'repack.log_%' THEN 'Logging changes'
            ELSE 'Processing'
        END AS phase,
        NOW() - ps.query_start AS elapsed,
        pg_size_pretty(pg_total_relation_size(t.schemaname||'.'||t.tablename)),
        '~' || ROUND(EXTRACT(EPOCH FROM (NOW() - ps.query_start)) * 1.5 / 60) || ' min' AS est_remaining
    FROM pg_stat_user_tables t
    LEFT JOIN pg_stat_activity ps ON ps.query LIKE '%repack%'
    WHERE t.schemaname = 'repack'
       OR ps.query LIKE '%repack%';
END;
$$ LANGUAGE plpgsql;

-- Monitor during repack
SELECT * FROM monitor_repack();

Example Output:

CREATE FUNCTION

       table_name       |       phase        | elapsed_time  | table_size | estimated_remaining 
------------------------+--------------------+---------------+------------+---------------------
 repack.table_12847     | Building new table | 00:08:23.457  | 2689 MB    | ~13 min
 repack.log_12847       | Logging changes    | 00:08:23.457  | 145 MB     | ~13 min
(2 rows)

3.0 Off-Hours Maintenance Script

This comprehensive script is designed to run during low-traffic periods and automatically selects the best vacuum strategy based on bloat levels:

-- ============================================
-- OFF-HOURS TABLE MAINTENANCE SCRIPT
-- Run during maintenance windows
-- ============================================

DO $$
DECLARE
    v_start_time TIMESTAMP;
    v_table_size BIGINT;
    v_dead_tuples BIGINT;
    v_bloat_percent NUMERIC;
    v_action TEXT;
    v_repack_available BOOLEAN;
BEGIN
    v_start_time := NOW();
    
    RAISE NOTICE '========================================';
    RAISE NOTICE 'Starting maintenance at %', v_start_time;
    RAISE NOTICE '========================================';
    
    -- Check if pg_repack is available
    SELECT EXISTS (
        SELECT 1 FROM pg_extension WHERE extname = 'pg_repack'
    ) INTO v_repack_available;
    
    -- Gather current statistics
    SELECT
        pg_relation_size('user_activities'),
        n_dead_tup,
        ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2)
    INTO v_table_size, v_dead_tuples, v_bloat_percent
    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities';
    
    RAISE NOTICE 'Current table size: %', pg_size_pretty(v_table_size);
    RAISE NOTICE 'Dead tuples: %', v_dead_tuples;
    RAISE NOTICE 'Bloat percentage: %', v_bloat_percent;
    
    -- Decide on action based on bloat level
    IF v_bloat_percent > 50 THEN
        v_action := 'pg_repack (high bloat)';
        
        IF v_repack_available THEN
            RAISE NOTICE 'Bloat > 50%: Executing pg_repack...';
            PERFORM repack.repack_table('public.user_activities');
            RAISE NOTICE 'pg_repack completed';
        ELSE
            RAISE NOTICE 'pg_repack not available, running VACUUM FULL...';
            RAISE NOTICE 'WARNING: This will lock the table!';
            EXECUTE 'VACUUM FULL ANALYZE user_activities';
        END IF;
        
    ELSIF v_bloat_percent > 20 THEN
        v_action := 'aggressive_vacuum';
        RAISE NOTICE 'Bloat 20-50%: Running aggressive VACUUM...';
        EXECUTE 'VACUUM (VERBOSE, ANALYZE, FREEZE) user_activities';
        
    ELSIF v_bloat_percent > 10 THEN
        v_action := 'standard_vacuum';
        RAISE NOTICE 'Bloat 10-20%: Running standard VACUUM...';
        EXECUTE 'VACUUM ANALYZE user_activities';
        
    ELSE
        v_action := 'analyze_only';
        RAISE NOTICE 'Bloat < 10%: Running ANALYZE only...';
        EXECUTE 'ANALYZE user_activities';
    END IF;
    
    -- Rebuild indexes if needed
    RAISE NOTICE 'Checking index health...';
    
    -- Reindex if bloated
    IF v_bloat_percent > 30 THEN
        RAISE NOTICE 'Rebuilding indexes concurrently...';
        EXECUTE 'REINDEX INDEX CONCURRENTLY idx_user_activities_user_id';
        EXECUTE 'REINDEX INDEX CONCURRENTLY idx_user_activities_status';
        EXECUTE 'REINDEX INDEX CONCURRENTLY idx_user_activities_created_at';
        EXECUTE 'REINDEX INDEX CONCURRENTLY idx_user_activities_processed_at';
    END IF;
    
    -- Final statistics
    SELECT
        pg_relation_size('user_activities'),
        n_dead_tup,
        ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2)
    INTO v_table_size, v_dead_tuples, v_bloat_percent
    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities';
    
    RAISE NOTICE '========================================';
    RAISE NOTICE 'Maintenance completed in %', NOW() - v_start_time;
    RAISE NOTICE 'Action taken: %', v_action;
    RAISE NOTICE 'Final table size: %', pg_size_pretty(v_table_size);
    RAISE NOTICE 'Final dead tuples: %', v_dead_tuples;
    RAISE NOTICE 'Final bloat percentage: %', v_bloat_percent;
    RAISE NOTICE '========================================';
    
END $$;

Example Output:

NOTICE:  ========================================
NOTICE:  Starting maintenance at 2024-11-17 02:00:00.123456
NOTICE:  ========================================
NOTICE:  Current table size: 3214 MB
NOTICE:  Dead tuples: 847523
NOTICE:  Bloat percentage: 14.51
NOTICE:  Bloat 10-20%: Running standard VACUUM...
INFO:  vacuuming "public.user_activities"
INFO:  "user_activities": removed 847523 row versions in 112456 pages
INFO:  "user_activities": found 847523 removable, 5000000 nonremovable row versions
NOTICE:  Checking index health...
NOTICE:  ========================================
NOTICE:  Maintenance completed in 00:01:23.847293
NOTICE:  Action taken: standard_vacuum
NOTICE:  Final table size: 2987 MB
NOTICE:  Final dead tuples: 0
NOTICE:  Final bloat percentage: 0.00
NOTICE:  ========================================
DO

3.1 Scheduling the Maintenance Script

For Aurora PostgreSQL, you can use AWS EventBridge with Lambda to schedule this:

# Lambda function to execute maintenance
import boto3
import psycopg2
import os

def lambda_handler(event, context):
    conn = psycopg2.connect(
        host=os.environ['DB_HOST'],
        database=os.environ['DB_NAME'],
        user=os.environ['DB_USER'],
        password=os.environ['DB_PASSWORD']
    )
    
    with conn.cursor() as cur:
        # Read and execute the maintenance script
        with open('maintenance_script.sql', 'r') as f:
            cur.execute(f.read())
        conn.commit()
    
    conn.close()
    return {'statusCode': 200, 'body': 'Maintenance completed'}

Or use a cron job on an EC2 instance:

# Add to crontab for 2 AM daily maintenance
0 2 * * * psql -h your-aurora-cluster.region.rds.amazonaws.com \
               -U your_user \
               -d your_db \
               -f /path/to/maintenance_script.sql \
               >> /var/log/postgres_maintenance.log 2>&1

4.0 Memory Configuration for Vacuum Operations

While tuning autovacuum thresholds and cost-based settings is crucial, proper memory allocation can dramatically improve vacuum performance, especially for large tables. Two key parameters control how much memory vacuum operations can use.

Understanding Vacuum Memory Parameters

maintenance_work_mem: This parameter controls the maximum amount of memory used by maintenance operations including VACUUMCREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. The default is typically 64MB, which is often insufficient for large tables.

-- Check current setting
SHOW maintenance_work_mem;

-- Set globally (requires reload)
ALTER SYSTEM SET maintenance_work_mem = '2GB';
SELECT pg_reload_conf();

-- Or set per session for manual vacuum
SET maintenance_work_mem = '4GB';
VACUUM VERBOSE user_activities;

autovacuum_work_mem: Introduced in PostgreSQL 9.4, this parameter specifically controls memory for autovacuum workers. If set to -1 (default), it falls back to maintenance_work_mem. Setting this separately allows you to allocate different memory limits for automatic vs. manual vacuum operations.

-- Check current setting
SHOW autovacuum_work_mem;

-- Set globally (requires reload)
ALTER SYSTEM SET autovacuum_work_mem = '1GB';
SELECT pg_reload_conf();

4.1 How Memory Affects Vacuum Performance

During vacuum, PostgreSQL maintains an array of dead tuple identifiers (TIDs) in memory. When this array fills up, vacuum must:

  1. Stop scanning the table
  2. Scan and clean all indexes
  3. Remove the dead tuples from the heap
  4. Continue scanning for more dead tuples

This process repeats until the entire table is processed. More memory means:

  • Fewer index scan passes (expensive operation)
  • Better vacuum throughput
  • Reduced overall vacuum time

4.2 Memory Sizing Guidelines

Calculate required memory: Each dead tuple requires 6 bytes of memory. For a table with many dead tuples:

required_memory = (dead_tuples × 6 bytes) + overhead

Best practices:

  • Small instances: Set maintenance_work_mem to 256MB-512MB
  • Medium instances: 1GB-2GB for maintenance_work_mem, 512MB-1GB for autovacuum_work_mem
  • Large instances: 4GB-8GB for maintenance_work_mem, 1GB-2GB per autovacuum worker
  • Critical consideration: Remember autovacuum_work_mem is allocated per worker, so with autovacuum_max_workers = 5 and autovacuum_work_mem = 2GB, you could use up to 10GB total

4.3 Aurora-Specific Considerations

For Amazon Aurora PostgreSQL:

  • Aurora uses shared storage, so vacuum doesn’t rewrite data to new storage
  • Memory settings still impact performance of index cleaning phases
  • Monitor using CloudWatch metric FreeableMemory to ensure you’re not over-allocating
  • Consider Aurora’s instance size when setting these parameters
-- Conservative Aurora settings for db.r5.2xlarge (64GB RAM)
ALTER SYSTEM SET maintenance_work_mem = '2GB';
ALTER SYSTEM SET autovacuum_work_mem = '1GB';
ALTER SYSTEM SET autovacuum_max_workers = 3;
SELECT pg_reload_conf();

4.4 Monitoring Memory Usage

Check if vacuum operations are hitting memory limits:

-- Check for multiple index scan passes (indicates insufficient memory)
SELECT 
    schemaname,
    relname,
    last_vacuum,
    n_dead_tup,
    round(pg_table_size(schemaname||'.'||relname)::numeric / (1024^3), 2) as table_size_gb,
    round((n_dead_tup * 6) / (1024^2), 2) as min_required_mem_mb
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC
LIMIT 20;

When running manual vacuum with VERBOSE, watch for messages like:

INFO:  index "user_activities_pkey" now contains 1000000 row versions in 2745 pages
INFO:  "user_activities": removed 500000 row versions in 12500 pages

If you see multiple cycles of “removed X row versions”, your maintenance_work_mem may be too small.

  1. Assess current state: Run the estimation script below to calculate memory requirements
  2. Set conservative values: Start with moderate memory allocations
  3. Monitor performance: Watch vacuum duration and CloudWatch metrics
  4. Iterate: Gradually increase memory if vacuum is still slow and memory is available
  5. Balance resources: Ensure vacuum memory doesn’t starve your application connections

4.7 SQL Script: Estimate Required Vacuum Memory

This script analyzes your largest tables and estimates the optimal maintenance_work_mem and autovacuum_work_mem settings:

-- Vacuum Memory Requirement Estimator
-- This script analyzes table sizes and estimates memory needed for efficient vacuum operations

WITH table_stats AS (
    SELECT
        schemaname,
        tablename,
        pg_total_relation_size(schemaname||'.'||tablename) as total_size_bytes,
        pg_table_size(schemaname||'.'||tablename) as table_size_bytes,
        pg_indexes_size(schemaname||'.'||tablename) as indexes_size_bytes,
        n_live_tup,
        n_dead_tup,
        last_vacuum,
        last_autovacuum,
        -- Estimate potential dead tuples based on table size and typical churn
        GREATEST(n_dead_tup, n_live_tup * 0.20) as estimated_max_dead_tup
    FROM pg_stat_user_tables
    WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
),
memory_calculations AS (
    SELECT
        schemaname,
        tablename,
        -- Size formatting
        pg_size_pretty(total_size_bytes) as total_size,
        pg_size_pretty(table_size_bytes) as table_size,
        pg_size_pretty(indexes_size_bytes) as indexes_size,
        -- Tuple counts
        n_live_tup,
        n_dead_tup,
        estimated_max_dead_tup::bigint,
        -- Memory calculations (6 bytes per dead tuple TID)
        round((estimated_max_dead_tup * 6) / (1024.0 * 1024.0), 2) as min_memory_mb,
        round((estimated_max_dead_tup * 6 * 1.2) / (1024.0 * 1024.0), 2) as recommended_memory_mb,
        -- Vacuum history
        last_vacuum,
        last_autovacuum,
        -- Number of index scan passes with current maintenance_work_mem
        CASE 
            WHEN estimated_max_dead_tup = 0 THEN 0
            ELSE CEIL(
                (estimated_max_dead_tup * 6.0) / 
                (SELECT setting::bigint * 1024 FROM pg_settings WHERE name = 'maintenance_work_mem')
            )::integer
        END as estimated_index_scans
    FROM table_stats
),
system_config AS (
    SELECT
        name,
        setting,
        unit,
        CASE 
            WHEN unit = 'kB' THEN (setting::bigint / 1024)::text || ' MB'
            WHEN unit = 'MB' THEN setting || ' MB'
            WHEN unit = 'GB' THEN setting || ' GB'
            ELSE setting || COALESCE(' ' || unit, '')
        END as formatted_value
    FROM pg_settings
    WHERE name IN ('maintenance_work_mem', 'autovacuum_work_mem', 'autovacuum_max_workers')
)
SELECT
    '=== CURRENT MEMORY CONFIGURATION ===' as info,
    NULL::text as schemaname,
    NULL::text as tablename,
    NULL::text as total_size,
    NULL::bigint as n_live_tup,
    NULL::bigint as n_dead_tup,
    NULL::bigint as estimated_max_dead_tup,
    NULL::numeric as min_memory_mb,
    NULL::numeric as recommended_memory_mb,
    NULL::integer as estimated_index_scans,
    NULL::timestamp as last_vacuum,
    NULL::timestamp as last_autovacuum
UNION ALL
SELECT
    name || ': ' || formatted_value,
    NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
FROM system_config
UNION ALL
SELECT
    '=== TOP TABLES BY SIZE (Memory Requirements) ===' as info,
    NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
UNION ALL
SELECT
    CASE 
        WHEN recommended_memory_mb > 1024 THEN 'Warn:  ' || tablename
        ELSE tablename
    END as info,
    schemaname,
    tablename,
    total_size,
    n_live_tup,
    n_dead_tup,
    estimated_max_dead_tup,
    min_memory_mb,
    recommended_memory_mb,
    estimated_index_scans,
    last_vacuum,
    last_autovacuum
FROM memory_calculations
WHERE total_size_bytes > 1048576  -- Only tables > 1MB
ORDER BY total_size_bytes DESC
LIMIT 25;

-- Summary recommendations
SELECT
    '=== RECOMMENDATIONS ===' as section,
    '' as recommendation;

SELECT
    'Based on largest tables:' as section,
    'Suggested maintenance_work_mem: ' || 
    CASE
        WHEN MAX(recommended_memory_mb) < 256 THEN '256 MB (small DB)'
        WHEN MAX(recommended_memory_mb) < 1024 THEN CEIL(MAX(recommended_memory_mb) / 256) * 256 || ' MB'
        WHEN MAX(recommended_memory_mb) < 4096 THEN CEIL(MAX(recommended_memory_mb) / 512) * 512 || ' MB'
        ELSE LEAST(8192, CEIL(MAX(recommended_memory_mb) / 1024) * 1024) || ' MB (capped at 8GB)'
    END as recommendation
FROM memory_calculations;

SELECT
    'For autovacuum workers:' as section,
    'Suggested autovacuum_work_mem: ' || 
    CASE
        WHEN MAX(recommended_memory_mb) < 512 THEN '256 MB'
        WHEN MAX(recommended_memory_mb) < 2048 THEN '512 MB to 1 GB'
        ELSE '1 GB to 2 GB per worker'
    END || 
    ' (remember: allocated per worker!)' as recommendation
FROM memory_calculations;

SELECT
    'Tables requiring attention:' as section,
    COUNT(*)::text || ' tables need more than 1GB for optimal vacuum' as recommendation
FROM memory_calculations
WHERE recommended_memory_mb > 1024;

SELECT
    'Memory efficiency:' as section,
    CASE
        WHEN COUNT(*) = 0 THEN 'All tables can vacuum efficiently with current settings'
        ELSE COUNT(*)::text || ' tables will require multiple index scans with current settings'
    END as recommendation
FROM memory_calculations
WHERE estimated_index_scans > 1;

4.8 Using the Estimation Script

  1. Run the script against your database to see current configuration and requirements
  2. Review the output focusing on:
    • Tables with estimated_index_scans > 1 (need more memory)
    • recommended_memory_mb for your largest tables
    • Tables marked with “Warn: ” (require > 1GB memory)
  3. Apply recommendations using the summary output
  4. Monitor vacuum performance after changes

Example output interpretation:

tablename              | recommended_memory_mb | estimated_index_scans
-----------------------|----------------------|----------------------
user_activities        | 1843.20              | 2
orders                 | 512.45               | 1
products               | 128.30               | 1

This indicates user_activities needs ~1.8GB for single-pass vacuum. If maintenance_work_mem = 1GB, vacuum will scan indexes twice, which is inefficient.

Pro tip: For tables that require excessive memory (>4GB), consider using pg_repack instead of relying solely on vacuum, or vacuum during maintenance windows with temporarily increased maintenance_work_mem.

5.0 Parallel Vacuuming

Starting with PostgreSQL 13, vacuum operations can leverage multiple CPU cores through parallel processing, dramatically reducing vacuum time for large tables with multiple indexes. This feature is particularly valuable for Aurora PostgreSQL environments where large tables can take hours to vacuum serially.

5.1 How Parallel Vacuum Works

Parallel vacuum speeds up the index cleanup phase—often the most time-consuming part of the vacuum process. When enabled:

  1. The leader process scans the table heap and collects dead tuple identifiers
  2. Multiple parallel workers simultaneously clean indexes
  3. The leader process removes dead tuples from the heap
  4. The cycle repeats until the table is fully vacuumed

Key point: Only the index cleanup phase is parallelized. Table scanning and heap cleanup remain single-threaded, but since index cleanup often dominates vacuum time (especially for tables with many indexes), the speedup can be substantial.

5.2 Enabling Parallel Vacuum

PostgreSQL automatically uses parallel vacuum when:

  • The table has at least 2 indexes
  • min_parallel_index_scan_size threshold is met (default 512KB per index)
  • Sufficient parallel workers are available

Configuration Parameters

-- Check current settings
SHOW max_parallel_maintenance_workers;  -- Default: 2
SHOW min_parallel_index_scan_size;       -- Default: 512kB
SHOW max_parallel_workers;                -- Overall parallel worker limit
-- Adjust for better parallelism (requires reload)
ALTER SYSTEM SET max_parallel_maintenance_workers = 4;
ALTER SYSTEM SET min_parallel_index_scan_size = '256kB';
SELECT pg_reload_conf();

Parameter descriptions:

  • max_parallel_maintenance_workers: Maximum workers for maintenance operations (VACUUM, CREATE INDEX). Limited by max_parallel_workers
  • min_parallel_index_scan_size: Minimum index size to consider for parallel processing
  • max_parallel_workers: System-wide limit for all parallel operations

5.3 Per-Table Parallel Configuration

For specific large tables, you can control parallel vacuum behavior:

-- Enable parallel vacuum with specific worker count
ALTER TABLE user_activities SET (parallel_workers = 4);
-- Disable parallel vacuum for a specific table
ALTER TABLE sensitive_table SET (parallel_workers = 0);
-- Check table-level settings
SELECT 
    schemaname,
    tablename,
    reloptions
FROM pg_tables
WHERE tablename = 'user_activities';

5.6 Manual Vacuum with Parallel Workers

When running manual vacuum, you can specify the degree of parallelism:

-- Vacuum with explicit parallel workers
VACUUM (PARALLEL 4, VERBOSE) user_activities;
-- Vacuum with parallel disabled
VACUUM (PARALLEL 0, VERBOSE) user_activities;
-- Let PostgreSQL decide (based on table and system settings)
VACUUM (VERBOSE) user_activities;

5.7 Monitoring Parallel Vacuum

Check if vacuum is using parallel workers:

-- View active vacuum operations and their parallel workers
SELECT 
    pid,
    datname,
    usename,
    query_start,
    state,
    wait_event_type,
    wait_event,
    query
FROM pg_stat_activity
WHERE query LIKE '%VACUUM%'
   OR backend_type LIKE '%parallel worker%'
ORDER BY query_start;

Watch for “parallel worker” processes that accompany the main vacuum process.

5.8 Performance Testing

Compare vacuum performance with and without parallelism:

-- Create test table with multiple indexes
CREATE TABLE vacuum_test AS 
SELECT * FROM user_activities LIMIT 1000000;
CREATE INDEX idx1 ON vacuum_test(user_id);
CREATE INDEX idx2 ON vacuum_test(activity_type);
CREATE INDEX idx3 ON vacuum_test(created_at);
CREATE INDEX idx4 ON vacuum_test(session_id);
-- Generate some dead tuples
UPDATE vacuum_test SET activity_type = 'modified' WHERE random() < 0.2;
-- Test serial vacuum
\timing on
VACUUM (PARALLEL 0, VERBOSE) vacuum_test;
-- Note the time
-- Test parallel vacuum
VACUUM (PARALLEL 4, VERBOSE) vacuum_test;
-- Note the time and compare

5.9 Aurora-Specific Considerations

When using parallel vacuum with Aurora PostgreSQL:

Instance sizing: Ensure your instance has sufficient vCPUs for parallel operations

  • db.r5.large (2 vCPUs): max_parallel_maintenance_workers = 2
  • db.r5.xlarge (4 vCPUs): max_parallel_maintenance_workers = 2-3
  • db.r5.2xlarge (8 vCPUs): max_parallel_maintenance_workers = 4
  • db.r5.4xlarge+ (16+ vCPUs): max_parallel_maintenance_workers = 4-6

Memory considerations: Each parallel worker requires its own memory allocation from maintenance_work_mem (or autovacuum_work_mem for autovacuum). With 4 workers and maintenance_work_mem = 2GB, you could use up to 8GB total.

-- Conservative Aurora parallel vacuum configuration
-- For db.r5.2xlarge (8 vCPU, 64GB RAM)
ALTER SYSTEM SET max_parallel_maintenance_workers = 4;
ALTER SYSTEM SET maintenance_work_mem = '1GB';  -- 4GB total with 4 workers
ALTER SYSTEM SET max_worker_processes = 16;     -- Ensure worker pool is sufficient
SELECT pg_reload_conf();

Reader endpoint impact: Parallel vacuum on the writer can increase replication lag to reader endpoints. Monitor ReplicaLag CloudWatch metric during parallel vacuum operations.

5.10 When Parallel Vacuum Helps Most

Parallel vacuum provides the biggest benefit when:

Tables have 4+ indexes – More indexes = more parallelizable work

Indexes are large (>1GB each) – Meets min_parallel_index_scan_size threshold

Sufficient CPU cores available – Won’t compete with application queries

I/O isn’t the bottleneck – Aurora’s storage architecture handles concurrent I/O well

Parallel vacuum helps less when:

Tables have only 1-2 small indexes – Limited parallelizable work

CPU is already saturated – Parallel workers compete with application

During peak traffic hours – Better to run with fewer workers to avoid contention

5.12 Autovacuum and Parallelism

Autovacuum workers can also use parallel processing (PostgreSQL 13+):

-- Enable parallel autovacuum for specific table
ALTER TABLE user_activities SET (
    parallel_workers = 3,
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_vacuum_threshold = 5000
);

However, be cautious with parallel autovacuum on production systems:

  • Each autovacuum worker can spawn additional parallel workers
  • With autovacuum_max_workers = 3 and parallel_workers = 4, you could have 12 total workers
  • This can quickly exhaust max_worker_processes and max_connections

Recommendation: Start with parallel_workers = 2 for autovacuum, monitor resource usage, then adjust.

5.13 Practical Example: Optimizing a Large Table

-- Scenario: 100GB table with 6 indexes taking 2 hours to vacuum
-- Step 1: Check current configuration
SHOW max_parallel_maintenance_workers;  -- Returns 2
-- Step 2: Analyze the table
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
    pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) as indexes_size,
    (SELECT count(*) FROM pg_indexes WHERE tablename = 'user_activities') as num_indexes
FROM pg_stat_user_tables
WHERE tablename = 'user_activities';
-- Result: 100GB total, 45GB indexes, 6 indexes
-- Step 3: Enable parallel vacuum
ALTER TABLE user_activities SET (parallel_workers = 4);
-- Step 4: Increase maintenance workers (if needed)
ALTER SYSTEM SET max_parallel_maintenance_workers = 4;
SELECT pg_reload_conf();
-- Step 5: Run vacuum with timing
\timing on
VACUUM (VERBOSE) user_activities;
-- Expected result: Vacuum time reduced from 2 hours to 45-60 minutes

5.14 Troubleshooting Parallel Vacuum

Problem: Vacuum not using parallel workers

-- Check if indexes meet size threshold
SELECT 
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE tablename = 'user_activities'
ORDER BY pg_relation_size(indexrelid) DESC;
-- If indexes < 512KB, lower the threshold
ALTER SYSTEM SET min_parallel_index_scan_size = '128kB';
SELECT pg_reload_conf();

Problem: Running out of worker processes

-- Check worker process limits
SHOW max_worker_processes;  -- Total worker pool
SHOW max_parallel_workers;  -- Max parallel workers allowed
-- Increase if needed
ALTER SYSTEM SET max_worker_processes = 16;
ALTER SYSTEM SET max_parallel_workers = 8;
-- Requires restart for max_worker_processes

Problem: High memory usage during parallel vacuum

-- Reduce per-worker memory allocation
SET maintenance_work_mem = '512MB';  -- Each worker gets this amount
VACUUM (PARALLEL 4) user_activities;  -- 2GB total

5.15 Best Practices For Parallelisation

  1. Baseline first: Measure vacuum time before enabling parallel processing
  2. Match CPU availability: Set max_parallel_maintenance_workers based on vCPUs and workload
  3. Consider memory: maintenance_work_mem × parallel_workers = total memory usage
  4. Start conservative: Begin with 2-3 workers, increase based on results
  5. Monitor during peak: Watch CPU and memory metrics when parallel vacuum runs
  6. Test index threshold: Lower min_parallel_index_scan_size if indexes are smaller
  7. Schedule strategically: Use parallel vacuum during maintenance windows for predictable performance
  8. Aurora readers: Monitor replication lag impact on read replicas

Parallel vacuum is a powerful tool for managing large tables in Aurora PostgreSQL, but it requires careful configuration to balance vacuum speed against resource consumption. When properly tuned, it can reduce vacuum time by 50-70% for index-heavy tables.

6.0 Optimizing Aurora PostgreSQL Vacuums with TOAST Table Parameters

What is TOAST?

TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL’s mechanism for handling data that exceeds the standard 8KB page size limit. When you store large text fields, JSON documents, bytea columns, or other substantial data types, PostgreSQL automatically moves this data out of the main table into a separate TOAST table. This keeps the main table pages compact and efficient for scanning, while the oversized data is stored separately and retrieved only when needed.

Every table with potentially large columns has an associated TOAST table (named pg_toast.pg_toast_<oid>) that operates behind the scenes. While this separation improves query performance on the main table, TOAST tables can accumulate dead tuples from updates and deletes just like regular tables, requiring their own vacuum maintenance.

6.1 Understanding TOAST Autovacuum Parameters

TOAST tables can be tuned independently from their parent tables using specific parameters. Here are the key options and their recommended values:

toast.autovacuum_vacuum_cost_delay

  • Default: Inherits from autovacuum_vacuum_cost_delay (typically 2ms in Aurora)
  • Recommended: 0 for high-throughput systems
  • Purpose: Controls the delay between vacuum operations to throttle I/O impact
  • Effect: Setting to 0 removes throttling, allowing vacuums to complete faster at the cost of higher instantaneous I/O
ALTER TABLE your_large_table SET (toast.autovacuum_vacuum_cost_delay = 0);

toast.autovacuum_vacuum_threshold

  • Default: 50 tuples
  • Recommended: 1000-5000 for large, frequently updated tables
  • Purpose: Minimum number of dead tuples before triggering an autovacuum
  • Effect: Higher values reduce vacuum frequency but may allow more bloat
ALTER TABLE your_large_table SET (toast.autovacuum_vacuum_threshold = 2000);

toast.autovacuum_vacuum_scale_factor

  • Default: 0.2 (20% of table size)
  • Recommended: 0.05-0.1 for very large tables, 0.2-0.3 for smaller tables
  • Purpose: Percentage of table size that, when combined with threshold, triggers autovacuum
  • Effect: Lower values mean more frequent vacuums, preventing excessive bloat
ALTER TABLE your_large_table SET (toast.autovacuum_vacuum_scale_factor = 0.1);

toast.autovacuum_vacuum_cost_limit

  • Default: Inherits from autovacuum_vacuum_cost_limit (typically 200 in Aurora)
  • Recommended: 2000-4000 for aggressive cleanup
  • Purpose: Maximum “cost” budget before vacuum process sleeps
  • Effect: Higher values allow more work per cycle before throttling kicks in
ALTER TABLE your_large_table SET (toast.autovacuum_vacuum_cost_limit = 3000);

6.2 Practical Example

For a large table with frequent updates to text or JSON columns in Aurora PostgreSQL:

-- Optimize TOAST table for aggressive, fast vacuuming
ALTER TABLE user_profiles SET (
    toast.autovacuum_vacuum_cost_delay = 0,
    toast.autovacuum_vacuum_threshold = 2000,
    toast.autovacuum_vacuum_scale_factor = 0.05,
    toast.autovacuum_vacuum_cost_limit = 3000
);

This configuration ensures TOAST tables are vacuumed frequently and quickly, preventing bloat from degrading performance while leveraging Aurora’s optimized storage layer. Monitor your vacuum activity using pg_stat_user_tables and adjust these parameters based on your workload’s specific characteristics.

7.0 “No Regrets” Optimizations for Mission-Critical Large Tables

When you have mission-critical large tables and sufficient infrastructure to scale memory and CPU, these optimizations will deliver immediate performance improvements without significant trade-offs:

7.1 Increase Maintenance Memory Allocation

Set generous memory limits to ensure vacuum operations complete in a single index scan pass:

-- For large instances with adequate RAM
ALTER SYSTEM SET maintenance_work_mem = '4GB';
ALTER SYSTEM SET autovacuum_work_mem = '2GB';
SELECT pg_reload_conf();

Why this works: Each dead tuple requires 6 bytes in memory. Insufficient memory forces multiple expensive index scan passes. With adequate memory, vacuum completes faster and more efficiently.

Impact: Reduces vacuum time by 40-60% for tables requiring multiple index scans.

7.2 Enable Aggressive Parallel Vacuum

Leverage multiple CPU cores for dramatically faster vacuum operations:

-- System-wide settings (adjust based on available vCPUs)
ALTER SYSTEM SET max_parallel_maintenance_workers = 4;
ALTER SYSTEM SET min_parallel_index_scan_size = '256kB';
SELECT pg_reload_conf();
-- Per-table optimization for mission-critical tables
ALTER TABLE your_critical_table SET (parallel_workers = 4);

Why this works: Parallel vacuum distributes index cleanup across multiple workers. For tables with 4+ indexes, this parallelization provides substantial speedups.

Impact: 50-70% reduction in vacuum time for index-heavy tables.

7.3 Remove Autovacuum Throttling

Eliminate I/O throttling delays to let vacuum run at full speed:

-- Apply to critical tables
ALTER TABLE your_critical_table SET (
    autovacuum_vacuum_cost_delay = 0,
    autovacuum_vacuum_cost_limit = 10000
);

Why this works: Default throttling was designed for resource-constrained systems. With sufficient infrastructure, removing these limits allows vacuum to complete faster without impacting performance.

Impact: 30-50% faster vacuum completion with no downside on properly provisioned systems.

7.4 Tune TOAST Table Parameters

Optimize vacuum for oversized attribute storage:

ALTER TABLE your_critical_table SET (
    toast.autovacuum_vacuum_cost_delay = 0,
    toast.autovacuum_vacuum_threshold = 2000,
    toast.autovacuum_vacuum_scale_factor = 0.05,
    toast.autovacuum_vacuum_cost_limit = 3000
);

Why this works: TOAST tables accumulate dead tuples independently and are often overlooked. Aggressive TOAST vacuuming prevents hidden bloat in large text/JSON columns.

Impact: Eliminates TOAST bloat, which can represent 20-40% of total table bloat.

7.5 Lower Autovacuum Thresholds

Trigger vacuum earlier to prevent bloat accumulation:

ALTER TABLE your_critical_table SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- Down from default 0.2
    autovacuum_vacuum_threshold = 5000
);

Why this works: More frequent, smaller vacuums are faster and less disruptive than infrequent, massive cleanup operations. This prevents bloat before it impacts query performance.

Impact: Maintains bloat under 10% consistently, preventing query degradation.

7.6 Install and Configure pg_repack

Have pg_repack ready for zero-downtime space reclamation:

CREATE EXTENSION pg_repack;

Why this works: When bloat exceeds 30-40%, pg_repack reclaims space without the long exclusive locks required by VACUUM FULL. Critical tables remain online throughout the operation.

Impact: Space reclamation during business hours without downtime.

7.7 Complete Configuration Template

For a mission-critical large table on a properly sized Aurora instance:

-- Main table optimization
ALTER TABLE your_critical_table SET (
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_vacuum_threshold = 5000,
    autovacuum_vacuum_cost_delay = 0,
    autovacuum_vacuum_cost_limit = 10000,
    parallel_workers = 4,
    toast.autovacuum_vacuum_cost_delay = 0,
    toast.autovacuum_vacuum_threshold = 2000,
    toast.autovacuum_vacuum_scale_factor = 0.05,
    toast.autovacuum_vacuum_cost_limit = 3000
);
-- System-wide settings (for db.r5.2xlarge or larger)
ALTER SYSTEM SET maintenance_work_mem = '4GB';
ALTER SYSTEM SET autovacuum_work_mem = '2GB';
ALTER SYSTEM SET max_parallel_maintenance_workers = 4;
ALTER SYSTEM SET min_parallel_index_scan_size = '256kB';
SELECT pg_reload_conf();

These optimizations are “no regrets” because they:

  • Require no application changes
  • Leverage existing infrastructure capacity
  • Provide immediate, measurable improvements
  • Have minimal risk when resources are available
  • Prevent problems rather than reacting to them

8.0 Conclusion

Effective vacuum management is not a one-time configuration task—it’s an ongoing optimization process that scales with your database. As your PostgreSQL Aurora tables grow, the default vacuum settings that worked initially can become a significant performance bottleneck, leading to bloat, degraded query performance, and wasted storage.

The strategies covered in this guide provide a comprehensive toolkit for managing vacuum at scale:

  • Monitoring queries help you identify bloat before it impacts performance
  • Table-level autovacuum tuning allows you to customize behavior for high-churn tables
  • Memory configuration (maintenance_work_mem and autovacuum_work_mem) ensures vacuum operations complete efficiently without multiple index scans
  • Parallel vacuuming leverages multiple CPU cores to dramatically reduce vacuum time for large, index-heavy tables
  • pg_repack offers a near-zero-downtime solution for reclaiming space from heavily bloated tables
  • Automated maintenance workflows enable proactive vacuum management during off-peak hours

The key is to be proactive rather than reactive. Regularly run the monitoring queries and memory estimation scripts provided in this article. Watch for warning signs like increasing dead tuple counts, growing bloat percentages, and tables requiring multiple index scan passes. When you spot these indicators, apply targeted tuning before they escalate into production issues.

For tables with multiple indexes that take hours to vacuum, parallel vacuuming offers a game-changing performance boost—often reducing vacuum time by 50-70% by distributing index cleanup across multiple CPU cores. However, this power comes with resource trade-offs: each parallel worker consumes its own memory allocation and CPU cycles. The key is finding the sweet spot for your Aurora instance size, testing with 2-3 workers initially and scaling up based on available vCPUs and observed performance gains. This is especially valuable during maintenance windows when you need to vacuum large tables quickly without blocking operations for extended periods.

Remember that vacuum optimization is a balance: too aggressive and you risk impacting production workload; too conservative and bloat accumulates faster than it can be cleaned. Start with the conservative recommendations provided, monitor the results, and iterate based on your specific workload patterns and Aurora instance capabilities.

With the right monitoring, configuration, and tooling in place, you can maintain healthy tables even as they scale to hundreds of gigabytes—ensuring consistent query performance and optimal storage utilization for your PostgreSQL Aurora database.

Windows Domain Controller: Monitor and Log LDAP operations/queries use of resources

The script below monitors LDAP operations on a Domain Controller and logs detailed information about queries that exceed specified thresholds for execution time, CPU usage, or results returned. It helps identify problematic LDAP queries that may be impacting domain controller performance.

Parameter: ThresholdSeconds
Minimum query duration in seconds to log (default: 5)

Parameter: LogPath
Path where log files will be saved (default: C:\LDAPDiagnostics)

Parameter: MonitorDuration
How long to monitor in minutes (default: continuous)

EXAMPLE
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -LogPath “C:\Logs\LDAP”

[CmdletBinding()]
param(
    [int]$ThresholdSeconds = 5,
    [string]$LogPath = "C:\LDAPDiagnostics",
    [int]$MonitorDuration = 0  # 0 = continuous
)

# Requires Administrator privileges
#Requires -RunAsAdministrator

# Create log directory if it doesn't exist
if (-not (Test-Path $LogPath)) {
    New-Item -ItemType Directory -Path $LogPath -Force | Out-Null
}

$logFile = Join-Path $LogPath "LDAP_Diagnostics_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
$csvFile = Join-Path $LogPath "LDAP_Queries_$(Get-Date -Format 'yyyyMMdd_HHmmss').csv"

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logMessage = "[$timestamp] [$Level] $Message"
    Write-Host $logMessage
    Add-Content -Path $logFile -Value $logMessage
}

function Get-LDAPStatistics {
    try {
        # Query NTDS performance counters for LDAP statistics
        $ldapStats = @{
            ActiveThreads = (Get-Counter '\NTDS\LDAP Active Threads' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            SearchesPerSec = (Get-Counter '\NTDS\LDAP Searches/sec' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            ClientSessions = (Get-Counter '\NTDS\LDAP Client Sessions' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
            BindTime = (Get-Counter '\NTDS\LDAP Bind Time' -ErrorAction SilentlyContinue).CounterSamples.CookedValue
        }
        return $ldapStats
    }
    catch {
        Write-Log "Error getting LDAP statistics: $_" "ERROR"
        return $null
    }
}

function Parse-LDAPEvent {
    param($Event)
    
    $eventData = @{
        TimeCreated = $Event.TimeCreated
        ClientIP = $null
        ClientPort = $null
        StartingNode = $null
        Filter = $null
        SearchScope = $null
        AttributeSelection = $null
        ServerControls = $null
        VisitedEntries = $null
        ReturnedEntries = $null
        TimeInServer = $null
    }

    # Parse event XML for detailed information
    try {
        $xml = [xml]$Event.ToXml()
        $dataNodes = $xml.Event.EventData.Data
        
        foreach ($node in $dataNodes) {
            switch ($node.Name) {
                "Client" { $eventData.ClientIP = ($node.'#text' -split ':')[0] }
                "StartingNode" { $eventData.StartingNode = $node.'#text' }
                "Filter" { $eventData.Filter = $node.'#text' }
                "SearchScope" { $eventData.SearchScope = $node.'#text' }
                "AttributeSelection" { $eventData.AttributeSelection = $node.'#text' }
                "ServerControls" { $eventData.ServerControls = $node.'#text' }
                "VisitedEntries" { $eventData.VisitedEntries = $node.'#text' }
                "ReturnedEntries" { $eventData.ReturnedEntries = $node.'#text' }
                "TimeInServer" { $eventData.TimeInServer = $node.'#text' }
            }
        }
    }
    catch {
        Write-Log "Error parsing event XML: $_" "WARNING"
    }

    return $eventData
}

Write-Log "=== LDAP Query Diagnostics Started ===" "INFO"
Write-Log "Threshold: $ThresholdSeconds seconds" "INFO"
Write-Log "Log Path: $LogPath" "INFO"
Write-Log "Monitor Duration: $(if($MonitorDuration -eq 0){'Continuous'}else{$MonitorDuration + ' minutes'})" "INFO"

# Enable Field Engineering logging if not already enabled
Write-Log "Checking Field Engineering diagnostic logging settings..." "INFO"
try {
    $regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics"
    $currentValue = Get-ItemProperty -Path $regPath -Name "15 Field Engineering" -ErrorAction SilentlyContinue
    
    if ($currentValue.'15 Field Engineering' -lt 5) {
        Write-Log "Enabling Field Engineering logging (level 5)..." "INFO"
        Set-ItemProperty -Path $regPath -Name "15 Field Engineering" -Value 5
        Write-Log "Field Engineering logging enabled. You may need to restart NTDS service for full effect." "WARNING"
    }
    else {
        Write-Log "Field Engineering logging already enabled at level $($currentValue.'15 Field Engineering')" "INFO"
    }
}
catch {
    Write-Log "Error configuring diagnostic logging: $_" "ERROR"
}

# Create CSV header
$csvHeader = "TimeCreated,ClientIP,StartingNode,Filter,SearchScope,AttributeSelection,VisitedEntries,ReturnedEntries,TimeInServer,ServerControls"
Set-Content -Path $csvFile -Value $csvHeader

Write-Log "Monitoring for expensive LDAP queries (threshold: $ThresholdSeconds seconds)..." "INFO"
Write-Log "Press Ctrl+C to stop monitoring" "INFO"

$startTime = Get-Date
$queriesLogged = 0

try {
    while ($true) {
        # Check if monitoring duration exceeded
        if ($MonitorDuration -gt 0) {
            $elapsed = (Get-Date) - $startTime
            if ($elapsed.TotalMinutes -ge $MonitorDuration) {
                Write-Log "Monitoring duration reached. Stopping." "INFO"
                break
            }
        }

        # Get current LDAP statistics
        $stats = Get-LDAPStatistics
        if ($stats) {
            Write-Verbose "Active Threads: $($stats.ActiveThreads), Searches/sec: $($stats.SearchesPerSec), Client Sessions: $($stats.ClientSessions)"
        }

        # Query Directory Service event log for expensive LDAP queries
        # Event ID 1644 = expensive search operations
        $events = Get-WinEvent -FilterHashtable @{
            LogName = 'Directory Service'
            Id = 1644
            StartTime = (Get-Date).AddSeconds(-10)
        } -ErrorAction SilentlyContinue

        foreach ($event in $events) {
            $eventData = Parse-LDAPEvent -Event $event
            
            # Convert time in server from milliseconds to seconds
            $timeInSeconds = if ($eventData.TimeInServer) { 
                [int]$eventData.TimeInServer / 1000 
            } else { 
                0 
            }

            if ($timeInSeconds -ge $ThresholdSeconds) {
                $queriesLogged++
                
                Write-Log "=== Expensive LDAP Query Detected ===" "WARNING"
                Write-Log "Time: $($eventData.TimeCreated)" "WARNING"
                Write-Log "Client IP: $($eventData.ClientIP)" "WARNING"
                Write-Log "Duration: $timeInSeconds seconds" "WARNING"
                Write-Log "Starting Node: $($eventData.StartingNode)" "WARNING"
                Write-Log "Filter: $($eventData.Filter)" "WARNING"
                Write-Log "Search Scope: $($eventData.SearchScope)" "WARNING"
                Write-Log "Visited Entries: $($eventData.VisitedEntries)" "WARNING"
                Write-Log "Returned Entries: $($eventData.ReturnedEntries)" "WARNING"
                Write-Log "Attributes: $($eventData.AttributeSelection)" "WARNING"
                Write-Log "Server Controls: $($eventData.ServerControls)" "WARNING"
                Write-Log "======================================" "WARNING"

                # Write to CSV
                $csvLine = "$($eventData.TimeCreated),$($eventData.ClientIP),$($eventData.StartingNode),`"$($eventData.Filter)`",$($eventData.SearchScope),`"$($eventData.AttributeSelection)`",$($eventData.VisitedEntries),$($eventData.ReturnedEntries),$($eventData.TimeInServer),`"$($eventData.ServerControls)`""
                Add-Content -Path $csvFile -Value $csvLine
            }
        }

        Start-Sleep -Seconds 5
    }
}
catch {
    Write-Log "Error during monitoring: $_" "ERROR"
}
finally {
    Write-Log "=== LDAP Query Diagnostics Stopped ===" "INFO"
    Write-Log "Total expensive queries logged: $queriesLogged" "INFO"
    Write-Log "Log file: $logFile" "INFO"
    Write-Log "CSV file: $csvFile" "INFO"
}
```
## Usage Examples

### Basic Usage (Continuous Monitoring)

Run with default settings - monitors queries taking 5+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1
```

### Custom Threshold and Duration

Monitor for 30 minutes, logging queries that take 3+ seconds:

```powershell
.\Diagnose-LDAPQueries.ps1 -ThresholdSeconds 3 -MonitorDuration 30
```

### Custom Log Location

Save logs to a specific directory:

```powershell
.\Diagnose-LDAPQueries.ps1 -LogPath "D:\Logs\LDAP"
```

### Verbose Output

See real-time LDAP statistics while monitoring:

```powershell
.\Diagnose-LDAPQueries.ps1 -Verbose
```

## Requirements

- **Administrator privileges** on the domain controller
- **Windows Server** with Active Directory Domain Services role
- **PowerShell 5.1 or later**

## Understanding the Output

### Log File Example

```
[2025-01-15 14:23:45] [WARNING] === Expensive LDAP Query Detected ===
[2025-01-15 14:23:45] [WARNING] Time: 01/15/2025 14:23:43
[2025-01-15 14:23:45] [WARNING] Client IP: 192.168.1.50
[2025-01-15 14:23:45] [WARNING] Duration: 8.5 seconds
[2025-01-15 14:23:45] [WARNING] Starting Node: DC=contoso,DC=com
[2025-01-15 14:23:45] [WARNING] Filter: (&(objectClass=user)(memberOf=*))
[2025-01-15 14:23:45] [WARNING] Search Scope: 2
[2025-01-15 14:23:45] [WARNING] Visited Entries: 45000
[2025-01-15 14:23:45] [WARNING] Returned Entries: 12000
```

### What to Look For

- **High visited/returned ratio** - Indicates an inefficient filter
- **Subtree searches from root** - Often unnecessarily broad
- **Wildcard filters** - Like `(cn=*)` can be very expensive
- **Unindexed attributes** - Queries on non-indexed attributes visit many entries
- **Repeated queries** - Same client making the same expensive query repeatedly

## Troubleshooting Common Issues

### No Events Appearing

If you're not seeing Event ID 1644, you may need to lower the expensive search threshold in Active Directory:

```powershell
# Lower the threshold to 1000ms (1 second)
Get-ADObject "CN=Query-Policies,CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=yourdomain,DC=com" | 
    Set-ADObject -Replace @{lDAPAdminLimits="MaxQueryDuration=1000"}
```

### Script Requires Restart

After enabling Field Engineering logging, you may need to restart the NTDS service:

```powershell
Restart-Service NTDS -Force
```

Best Practices

1. **Run during peak hours** to capture real-world problematic queries
2. **Start with a lower threshold** (2-3 seconds) to catch more queries
3. **Analyze the CSV** in Excel or Power BI for patterns
4. **Correlate with client IPs** to identify problematic applications
5. **Work with application owners** to optimize queries with indexes or better filters

Once you’ve identified expensive queries:

1. **Add indexes** for frequently searched attributes
2. **Optimize LDAP filters** to be more specific
3. **Reduce search scope** where possible
4. **Implement paging** for large result sets
5. **Cache results** on the client side when appropriate

This script has helped me identify numerous performance bottlenecks in production environments. I hope it helps you optimize your Active Directory infrastructure as well!