https://andrewbaker.ninja/wp-content/themes/twentysixteen/fonts/merriweather-plus-montserrat-plus-inconsolata.css

👁12views
The Hitchhikers Guide to Fixing Why a Thumbnail Image Does Not Show for Your Article on WhatsApp, LinkedIn, Twitter or Instagram

Diagnostic script output showing WhatsApp link preview troubleshooting results

When you share a link on WhatsApp, LinkedIn, X, or Instagram and nothing appears except a bare URL, it feels broken in a way that is surprisingly hard to diagnose. The page loads fine in a browser, the image exists, the og:image tag is there, yet the preview is blank. This post gives you a single unified diagnostic script that checks every known failure mode, produces a categorised report, and flags the specific fix for each issue it finds. It then walks through each failure pattern in detail so you understand what the output means and what to do about it.

1. How Link Preview Crawlers Work

When you paste a URL into WhatsApp, LinkedIn, X, or Instagram, the platform does not wait for you to send it. A background process immediately dispatches a headless HTTP request to that URL and this request looks like a bot because it is one. It reads the page’s <head> section, extracts Open Graph meta tags, fetches the og:image URL, and caches the result. The preview you see is assembled entirely from that cached crawl with no browser rendering involved at any point.

Every platform runs its own crawler with its own user agent string, its own image dimension requirements, its own file size tolerance, and its own sensitivity to HTTP response headers. If anything in that chain fails, the preview either shows no image, shows the wrong image, or does not render at all. The key insight is that your website must serve correct, accessible, standards-compliant responses not to humans in browsers but to automated crawlers that look nothing like browsers. Security rules that protect against bots can inadvertently block the very crawlers you need.

2. Platform Requirements at a Glance

PlatformCrawler User AgentRecommended og:image SizeMax File SizeAspect Ratio
WhatsAppWhatsApp/2.x, facebookexternalhit/1.1, Facebot1200 x 630 px~300 KB1.91:1
LinkedInLinkedInBot/1.01200 x 627 px5 MB1.91:1
X (Twitter)Twitterbot/1.01200 x 675 px5 MB1.91:1
Instagramfacebookexternalhit/1.11200 x 630 px8 MB1.91:1
Facebookfacebookexternalhit/1.11200 x 630 px8 MB1.91:1
iMessagefacebookexternalhit/1.1, Facebot, Twitterbot/1.01200 x 630 px5 MB1.91:1

The minimum required OG tags across all platforms are the same five properties and every page you want to share should carry all of them:

<meta property="og:title" content="Your Page Title" />
<meta property="og:description" content="A brief description" />
<meta property="og:image" content="https://example.com/image.jpg" />
<meta property="og:url" content="https://example.com/page" />
<meta property="og:type" content="article" />

X additionally requires two Twitter Card tags to render the large image preview format. Without these, X falls back to a small summary card with no prominent image:

<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:image" content="https://example.com/image.jpg" />

3. Why WhatsApp Is the Most Sensitive Platform

WhatsApp imposes constraints that none of the other major platforms enforce as strictly and most of them are undocumented. The first and most commonly missed is the image file size limit. Facebook supports og:image files up to 8 MB and LinkedIn up to 5 MB, but WhatsApp silently drops the thumbnail if the image exceeds roughly 300 KB. There is no error anywhere in your logs, no HTTP error code, no indication in Cloudflare analytics, and the preview simply renders without an image. WhatsApp also caches that failure, which means users who share the link shortly after you publish will see a bare URL even after you fix the underlying image.

A single WhatsApp link share can trigger requests from three distinct Meta crawler user agents: WhatsApp/2.x, facebookexternalhit/1.1, and Facebot. If your WAF or bot protection blocks any one of them the preview fails. Cloudflare’s Super Bot Fight Mode treats facebookexternalhit as an automated bot by default and will challenge or block it unless you have explicitly created an exception. Unlike LinkedIn’s bot which retries on challenge pages, WhatsApp’s crawler has no retry mechanism and if it gets a 403, a challenge page, or a slow response, it caches the failure immediately.

Response time compounds this further because WhatsApp’s crawler has an aggressive timeout, and if your origin server takes more than a few seconds to respond the crawl times out before it can read any OG tags at all. This matters most on cold start servers or on cache miss paths where your origin has to run full PHP to generate the page. Redirect chains make things worse still because each hop consumes time against WhatsApp’s timeout budget and a chain of three or four redirects on a slow origin can tip a borderline-fast site over the threshold. The diagnostic script follows every hop and prints each one with its timing so you can see exactly where the time is going.

4. The Unified Diagnostic Script

This is the only script you need. Run it against any URL and it produces a full categorised report covering all known failure modes. It tests everything in a single pass: OG tags, image size and dimensions, redirect chains, TTFB, Cloudflare detection, WAF bypass verification, CSP image blocking, meta refresh redirects, robots.txt crawler directives, and all five major crawler user agents.

The install wrapper below writes the script to disk and makes it executable in one paste. Run it as bash install-diagnose-social-preview.sh and it creates diagnose-social-preview.sh ready to use. Then point it at any URL with bash diagnose-social-preview.sh https://yoursite.com/your-post/.

cat > diagnose-social-preview.sh << 'EOF'
#!/usr/bin/env bash
# diagnose-social-preview.sh
# Usage: bash diagnose-social-preview.sh <url>

set -uo pipefail

TARGET_URL="${1:-}"

if [[ -z "$TARGET_URL" ]]; then
  echo "Usage: bash diagnose-social-preview.sh <url>"
  exit 1
fi

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
BOLD='\033[1m'
RESET='\033[0m'

PASS=0
WARN=0
FAIL=0

pass()    { echo -e "  ${GREEN}[PASS]${RESET} $1"; PASS=$((PASS+1)); }
warn()    { echo -e "  ${YELLOW}[WARN]${RESET} $1"; WARN=$((WARN+1)); }
fail()    { echo -e "  ${RED}[FAIL]${RESET} $1"; FAIL=$((FAIL+1)); }
info()    { echo -e "  ${CYAN}[INFO]${RESET} $1"; }
section() { echo -e "\n${BOLD}${CYAN}=== $1 ===${RESET}"; }
fix()     { echo -e "       ${BOLD}FIX :${RESET} $1"; }

WA_UA="WhatsApp/2.23.24.82 A"
FB_UA="facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
LI_UA="LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)"
TW_UA="Twitterbot/1.0"
FACEBOT_UA="Facebot"

TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT

echo -e "\n${BOLD}=============================================="
echo -e "  Social Preview Diagnostic Report"
echo -e "  $TARGET_URL"
echo -e "==============================================${RESET}"

# ==============================================================================
section "1. HTTPS"
# ==============================================================================

if [[ "$TARGET_URL" =~ ^https:// ]]; then
  pass "URL uses HTTPS"
else
  fail "URL does not use HTTPS"
  fix "Redirect all traffic to HTTPS. WhatsApp will not render previews over plain HTTP."
fi

# ==============================================================================
section "2. Redirect Chain"
# ==============================================================================

info "Tracing redirect chain as WhatsApp crawler..."
CHAIN_OUTPUT=$(curl -sI \
  -A "$WA_UA" \
  --max-redirs 10 \
  --max-time 20 \
  -w "\nFINAL_HTTP:%{http_code}\nFINAL_URL:%{url_effective}\nTOTAL_TIME:%{time_total}" \
  "$TARGET_URL" 2>/dev/null) || CHAIN_OUTPUT=""

HOP=0
while IFS= read -r line; do
  if [[ "$line" =~ ^HTTP/ ]]; then
    CODE=$(echo "$line" | awk '{print $2}')
    if [[ "$CODE" =~ ^(301|302|303|307|308)$ ]]; then
      HOP=$((HOP+1))
      info "Hop $HOP: HTTP $CODE"
    fi
  elif [[ "$line" =~ ^[Ll]ocation: ]]; then
    info "  -> $(echo "$line" | cut -d' ' -f2-)"
  fi
done <<< "$CHAIN_OUTPUT"

TOTAL_TIME=$(echo "$CHAIN_OUTPUT" | grep "^TOTAL_TIME:" | cut -d: -f2 || echo "0")

if (( HOP == 0 )); then
  pass "No redirects. Direct 200 response to WhatsApp crawler."
elif (( HOP <= 2 )); then
  warn "$HOP redirect(s) detected. Each hop burns time against WhatsApp's timeout budget."
  fix "Reduce to zero hops. Update og:url to the canonical final URL."
else
  fail "$HOP redirect(s) detected. This many hops will cause WhatsApp crawler timeouts."
  fix "Fix the chain. $HOP hops at 200ms each burns $(echo "$HOP * 200" | bc)ms before the page loads."
fi

# ==============================================================================
section "3. Cloudflare Detection"
# ==============================================================================

HEADERS=$(curl -sI -A "$WA_UA" -L --max-time 10 "$TARGET_URL" 2>/dev/null) || HEADERS=""
CF_RAY=$(echo "$HEADERS" | grep -i "^cf-ray:" | head -1 || true)
CF_CACHE=$(echo "$HEADERS" | grep -i "^cf-cache-status:" | head -1 | cut -d: -f2- | tr -d ' \r\n' || true)

if [[ -n "$CF_RAY" ]]; then
  info "Site is behind Cloudflare (cf-ray header present)"
  if [[ -n "$CF_CACHE" ]]; then
    info "CF-Cache-Status: $CF_CACHE"
    if [[ "$CF_CACHE" == "HIT" ]]; then
      pass "Cloudflare is serving this URL from cache. Crawlers will get a fast cached response."
    elif [[ "$CF_CACHE" == "MISS" || "$CF_CACHE" == "EXPIRED" ]]; then
      warn "Cloudflare cache is $CF_CACHE. Crawler hit origin directly. Check caching rules."
      fix "Ensure Cloudflare caches HTML pages. A cache miss on the crawler path causes slow TTFB."
    elif [[ "$CF_CACHE" == "BYPASS" ]]; then
      fail "Cloudflare cache is BYPASS. Every crawler request hits your origin server."
      fix "Remove the Cache-Control: no-store or Cache bypass rule for this URL path."
    else
      info "CF-Cache-Status: $CF_CACHE"
    fi
  fi
else
  info "No Cloudflare detected (no cf-ray header). Skipping Cloudflare-specific checks."
fi

# ==============================================================================
section "4. Meta Refresh Redirects"
# ==============================================================================

HTML=$(curl -s -A "$WA_UA" -L --max-time 15 "$TARGET_URL" 2>/dev/null) || HTML=""
HTML_LEN=${#HTML}

if (( HTML_LEN > 10000 )); then
  pass "HTML response received (${HTML_LEN} bytes). Page is rendering content."
elif (( HTML_LEN > 0 )); then
  warn "HTML response is only ${HTML_LEN} bytes. Page may be returning a stub or error page."
else
  fail "Empty HTML response. Crawler would find no OG tags at all."
fi

META_REFRESH=$(echo "$HTML" | grep -oi '<meta[^>]*http-equiv=.*refresh[^>]*>' | head -3 || true)
if [[ -n "$META_REFRESH" ]]; then
  fail "Meta refresh redirect found: $META_REFRESH"
  fix "Remove meta refresh tags. Social crawlers do not execute them and will read the pre-redirect page."
else
  pass "No meta refresh redirects in HTML"
fi

# ==============================================================================
section "5. Response Time and TTFB"
# ==============================================================================

TTFB=$(curl -o /dev/null -s \
  -w "%{time_starttransfer}" \
  -A "$WA_UA" -L --max-time 15 \
  "$TARGET_URL" 2>/dev/null) || TTFB="0"

TOTAL_MS=$(echo "$TOTAL_TIME * 1000 / 1" | bc 2>/dev/null || echo "9999")
TTFB_MS=$(echo "$TTFB * 1000 / 1" | bc 2>/dev/null || echo "9999")

info "Total response time: ${TOTAL_TIME}s"
info "Time to first byte:  ${TTFB}s"

if (( TTFB_MS < 800 )); then
  pass "TTFB ${TTFB}s is under 800ms. Well within WhatsApp crawler tolerance."
elif (( TTFB_MS < 2000 )); then
  warn "TTFB ${TTFB}s is slow. WhatsApp crawler may timeout on cold cache misses."
  fix "Enable full-page caching. Target TTFB under 800ms."
else
  fail "TTFB ${TTFB}s is too slow. WhatsApp crawler will timeout before reading any OG tags."
  fix "Enable aggressive caching. Origin is too slow for social crawlers."
fi

if (( TOTAL_MS < 3000 )); then
  pass "Total response time ${TOTAL_TIME}s is under 3s. Fine for all crawlers."
elif (( TOTAL_MS < 5000 )); then
  warn "Total response time ${TOTAL_TIME}s is borderline. WhatsApp may timeout on slow connections."
else
  fail "Total response time ${TOTAL_TIME}s exceeds 5s. Crawler will timeout before reading OG tags."
fi

# ==============================================================================
section "6. Open Graph Tags"
# ==============================================================================

check_og() {
  local prop="$1"
  local val=""
  val=$(echo "$HTML" | grep -oi "property=['\"]${prop}['\"][^>]*content=['\"][^'\"]*['\"]" \
    | grep -oi "content=['\"][^'\"]*['\"]" \
    | sed "s/content=['\"]//;s/['\"]$//" | head -1 || true)
  if [[ -z "$val" ]]; then
    val=$(echo "$HTML" | grep -oi "content=['\"][^'\"]*['\"][^>]*property=['\"]${prop}['\"]" \
      | grep -oi "content=['\"][^'\"]*['\"]" \
      | sed "s/content=['\"]//;s/['\"]$//" | head -1 || true)
  fi
  echo "$val"
}

OG_TITLE=$(check_og "og:title")
OG_DESC=$(check_og "og:description")
OG_IMAGE=$(check_og "og:image")
OG_URL=$(check_og "og:url")
OG_TYPE=$(check_og "og:type")
TW_CARD=$(echo "$HTML" | grep -oi 'name=["\x27]twitter:card["\x27][^>]*content=["\x27][^"\x27]*["\x27]' \
  | grep -oi 'content=["\x27][^"\x27]*["\x27]' \
  | sed 's/content=["'"'"']//;s/["'"'"']$//' | head -1 || true)

if [[ -n "$OG_TITLE" ]]; then
  pass "og:title found: ${OG_TITLE:0:80}"
else
  fail "og:title missing. All platforms require this tag."
  fix 'Add <meta property="og:title" content="Your Title" /> to the page head.'
fi

if [[ -n "$OG_DESC" ]]; then
  pass "og:description found"
else
  fail "og:description missing. Preview cards will show no description."
  fix 'Add <meta property="og:description" content="..." /> to the page head.'
fi

if [[ -n "$OG_IMAGE" ]]; then
  pass "og:image found: ${OG_IMAGE:0:80}"
else
  fail "og:image missing. No thumbnail will appear on any platform."
  fix "Set a featured image in WordPress. Your SEO plugin generates og:image from it."
fi

if [[ -n "$OG_URL" ]]; then
  pass "og:url found: ${OG_URL:0:80}"
else
  warn "og:url missing. Platforms may cache the wrong canonical URL."
  fix 'Add <meta property="og:url" content="https://yoursite.com/canonical-url" />.'
fi

if [[ -n "$OG_TYPE" ]]; then
  pass "og:type found: $OG_TYPE"
else
  warn "og:type missing. Defaults to website but best practice is to set it explicitly."
  fix 'Add <meta property="og:type" content="article" />.'
fi

if [[ -n "$TW_CARD" ]]; then
  pass "twitter:card found: $TW_CARD. X/Twitter will render large image format."
else
  warn "twitter:card missing. X/Twitter will fall back to a small summary card with no prominent image."
  fix 'Add <meta name="twitter:card" content="summary_large_image" />.'
fi

# ==============================================================================
section "7. og:image Validation"
# ==============================================================================

if [[ -z "$OG_IMAGE" ]]; then
  warn "Skipping og:image validation because the tag is missing (see Section 6)."
else
  IMG_FILE="$TMP/og_image"

  IMG_HTTP=$(curl -sI -A "$WA_UA" -L --max-time 15 "$OG_IMAGE" 2>/dev/null \
    | grep -i "^HTTP/" | tail -1 | awk '{print $2}') || IMG_HTTP=""

  IMG_MIME=$(curl -sI -A "$WA_UA" -L --max-time 15 "$OG_IMAGE" 2>/dev/null \
    | grep -i "^content-type:" | tail -1 | cut -d: -f2- | tr -d ' \r\n') || IMG_MIME=""

  if [[ "$IMG_HTTP" == "200" ]]; then
    pass "og:image URL is accessible. Returns HTTP 200 to WhatsApp crawler UA."
  else
    fail "og:image URL returns HTTP ${IMG_HTTP:-unknown} to WhatsApp crawler. Thumbnail will not load."
    fix "Check S3 bucket policy, CDN rules, or Cloudflare firewall on the image URL itself."
  fi

  if [[ "$OG_IMAGE" =~ ^https:// ]]; then
    pass "og:image uses HTTPS"
  else
    fail "og:image uses HTTP. WhatsApp will not load mixed-content images."
    fix "Update the og:image URL to HTTPS."
  fi

  if [[ "$IMG_MIME" =~ image/(jpeg|png|gif|webp) ]]; then
    pass "og:image MIME type is correct: $IMG_MIME"
  else
    warn "og:image MIME type is unexpected: ${IMG_MIME:-unknown}. Expected image/jpeg or image/png."
    fix "Ensure the image is served with a correct Content-Type header."
  fi

  curl -sL -A "$WA_UA" --max-time 20 -o "$IMG_FILE" "$OG_IMAGE" 2>/dev/null || true

  if [[ -f "$IMG_FILE" ]]; then
    IMG_BYTES=$(wc -c < "$IMG_FILE" | tr -d ' ')
    IMG_KB=$((IMG_BYTES / 1024))

    info "og:image file size: ${IMG_KB}KB (WhatsApp hard limit is ~300KB; Facebook/LinkedIn allow up to 5-8MB)"

    if (( IMG_KB < 200 )); then
      pass "og:image is ${IMG_KB}KB. Well under the WhatsApp 300KB limit. Will render on all platforms."
    elif (( IMG_KB < 300 )); then
      warn "og:image is ${IMG_KB}KB. Approaching the WhatsApp 300KB hard limit. Compress now to be safe."
      fix "Run: convert input.jpg -resize 1200x630 -quality 80 -strip output.jpg"
    else
      fail "og:image is ${IMG_KB}KB and exceeds the WhatsApp 300KB hard limit. Thumbnail will NOT appear on WhatsApp even though it works on LinkedIn and Facebook."
      fix "Run: convert input.jpg -resize 1200x630 -quality 75 -strip output.jpg"
    fi

    if command -v identify &>/dev/null; then
      DIMS=$(identify -format "%wx%h" "$IMG_FILE" 2>/dev/null || true)
      W=$(echo "$DIMS" | cut -dx -f1)
      H=$(echo "$DIMS" | cut -dx -f2)
      info "og:image dimensions: ${W}x${H}px (recommended minimum is 1200x630px)"
      if (( W >= 1200 && H >= 630 )); then
        pass "og:image dimensions ${W}x${H}px meet the 1200x630 minimum for all platforms."
      else
        warn "og:image dimensions ${W}x${H}px are below the recommended 1200x630px. Some platforms may show a small or cropped thumbnail."
        fix "Resize: convert input.jpg -resize 1200x630^ -gravity center -extent 1200x630 output.jpg"
      fi
    else
      warn "ImageMagick not installed so image dimensions could not be checked. Install: brew install imagemagick"
    fi
  fi
fi

# ==============================================================================
section "8. Content-Security-Policy"
# ==============================================================================

CSP=$(curl -sI -A "$WA_UA" -L --max-time 10 "$TARGET_URL" 2>/dev/null \
  | grep -i "^content-security-policy:" | cut -d: -f2-) || CSP=""

if [[ -z "$CSP" ]]; then
  pass "No Content-Security-Policy header present. No CSP restrictions to worry about."
else
  IMG_DOMAIN=$(echo "$OG_IMAGE" | grep -oE 'https?://[^/]+' | head -1 || true)
  info "CSP header found. Checking img-src for og:image domain: $IMG_DOMAIN"
  if echo "$CSP" | grep -q "img-src"; then
    if echo "$CSP" | grep -qE "${IMG_DOMAIN}|'self'|\*"; then
      pass "CSP img-src allows the og:image domain. No blocking."
    else
      warn "CSP img-src may be blocking your og:image domain ($IMG_DOMAIN). Preview image may not render."
      fix "Add $IMG_DOMAIN to your CSP img-src directive."
    fi
  else
    pass "CSP does not restrict img-src. No blocking."
  fi
fi

# ==============================================================================
section "9. robots.txt"
# ==============================================================================

ROBOTS_URL=$(echo "$TARGET_URL" | grep -oE 'https?://[^/]+')/robots.txt
info "Fetching $ROBOTS_URL"
ROBOTS=$(curl -s -L --max-time 10 "$ROBOTS_URL" 2>/dev/null) || ROBOTS=""

for BOT in "facebookexternalhit" "WhatsApp" "Facebot" "LinkedInBot" "Twitterbot"; do
  BLOCK=$(echo "$ROBOTS" | grep -A5 -i "User-agent: $BOT" | grep -i "Disallow" || true)
  if [[ -n "$BLOCK" ]]; then
    fail "robots.txt is blocking $BOT ($BLOCK). This crawler cannot fetch your page."
    fix "Remove the Disallow rule for $BOT or add Allow: / under its User-agent block."
  else
    pass "robots.txt does not block $BOT"
  fi
done

# ==============================================================================
section "10. Crawler User Agent Tests"
# ==============================================================================

info "Testing each crawler UA for HTTP 200 and absence of Cloudflare challenge page..."

check_ua() {
  local name="$1"
  local ua="$2"
  local response="" full_html="" cf_challenge=""

  response=$(curl -s -o /dev/null -w "%{http_code}" -A "$ua" -L --max-time 15 "$TARGET_URL" 2>/dev/null) || response="000"
  full_html=$(curl -s -A "$ua" -L --max-time 15 "$TARGET_URL" 2>/dev/null) || full_html=""
  cf_challenge=$(echo "$full_html" | grep -oi "ray id\|challenge\|cf-browser-verification\|just a moment" | head -1 || true)

  if [[ "$response" == "200" && -z "$cf_challenge" ]]; then
    pass "$name: HTTP $response and no challenge page. Crawler can read your OG tags."
  elif [[ -n "$cf_challenge" ]]; then
    fail "$name: HTTP $response but Cloudflare returned a challenge page. Crawler sees no OG tags and will cache a blank preview."
    fix "Create a Cloudflare WAF Skip rule for this UA. See Section 6 of the blog post."
  else
    fail "$name: HTTP $response. Crawler is being blocked."
    fix "Check WAF, firewall, or bot protection rules blocking $name."
  fi
}

check_ua "WhatsApp"            "$WA_UA"
check_ua "facebookexternalhit" "$FB_UA"
check_ua "Facebot"             "$FACEBOT_UA"
check_ua "LinkedInBot"         "$LI_UA"
check_ua "Twitterbot"          "$TW_UA"

# ==============================================================================
echo -e "\n${BOLD}=============================================="
echo -e "  Summary"
echo -e "==============================================${RESET}"
echo -e "  ${GREEN}PASS: $PASS${RESET}   ${YELLOW}WARN: $WARN${RESET}   ${RED}FAIL: $FAIL${RESET}"

if (( FAIL == 0 && WARN == 0 )); then
  echo -e "\n  ${GREEN}${BOLD}All checks passed. Social previews should render correctly on all platforms.${RESET}"
elif (( FAIL == 0 )); then
  echo -e "\n  ${YELLOW}${BOLD}No hard failures but $WARN warning(s) found. Review WARN items above.${RESET}"
else
  echo -e "\n  ${RED}${BOLD}$FAIL failure(s) detected. Fix the FAIL items above before sharing links.${RESET}"
fi
echo ""
EOF

chmod +x diagnose-social-preview.sh
echo "Written: diagnose-social-preview.sh"

5. Understanding the Report: Known Issue Patterns

Each numbered section below corresponds to a failure pattern the script detects. When you see a FAIL or WARN, this is what it means and exactly what to do.

5.1 Image Over 300 KB (WhatsApp Silent Failure)

The script reports [FAIL] og:image size 412KB exceeds WhatsApp 300KB hard limit. WhatsApp silently drops the thumbnail if the og:image file exceeds roughly 300 KB and there is no error in your logs, no HTTP error code, and no indication in Cloudflare analytics. The preview simply renders without an image and WhatsApp also caches that failure, so users who share the link before you fix the image will continue to see a bare URL until WhatsApp’s cache expires, typically around 7 days and not under your control. This is the single most common cause of missing WhatsApp thumbnails. Facebook supports images up to 8 MB and LinkedIn up to 5 MB, so developers publishing a large hero image have no idea anything is wrong until they test specifically on WhatsApp.

The fix is to compress the image to under 250 KB to leave a safe margin. At 1200×630 pixels, JPEG quality 80 will almost always achieve this. After recompressing, force a cache refresh using the Facebook Sharing Debugger and then retest with diagnose-social-preview.sh.

convert input.jpg -resize 1200x630 -quality 80 -strip output.jpg
jpegoptim --size=250k --strip-all image.jpg
cwebp -q 80 input.png -o output.webp

5.2 Cloudflare Blocking Meta Crawlers (Super Bot Fight Mode)

The script reports [FAIL] facebookexternalhit: HTTP 200 but Cloudflare challenge page detected. This is the second most common failure on WordPress sites behind Cloudflare. Cloudflare’s Super Bot Fight Mode classifies facebookexternalhit as an automated bot and serves it a JavaScript challenge page. The challenge returns HTTP 200 with an HTML body that looks like a normal page, the crawler reads it, finds no OG tags, and caches a blank preview. This is particularly insidious because your monitoring will show HTTP 200 and you will have no idea why previews are broken. A single WhatsApp link preview can trigger requests from three distinct Meta crawler user agents, specifically WhatsApp/2.x, facebookexternalhit/1.1, and Facebot, and all three must be allowed. If any one is challenged, previews fail intermittently depending on which crawler fires first. The fix is to create a Cloudflare WAF Custom Rule as described in Section 6.

5.3 Slow TTFB Causing Crawler Timeout

The script reports [FAIL] TTFB 4.2s. This will cause WhatsApp crawler timeouts on cache miss. WhatsApp’s crawler has an aggressive HTTP timeout and if your origin takes more than a few seconds to deliver the first bytes of HTML, the crawl times out before any OG tags are read. This is most common on cold start servers, WordPress sites with no page cache where every crawler request hits the database, and servers under load where the crawler request queues behind real user traffic. Your CDN cache may be serving humans fine while every crawler request is a cache miss, because crawlers send unique user agent strings that your cache rules do not recognise. Ensure your page cache serves all user agent strings and not just browser user agents. In Cloudflare, verify that your cache rules are not excluding non-browser UAs. The target is a TTFB under 800ms.

5.4 Redirect Chain

The script reports [FAIL] 4 redirect(s) in chain. Very likely causing WhatsApp crawler timeouts. Each redirect hop consumes time against WhatsApp’s timeout budget and a chain of four hops at 200ms each costs 800ms before the origin even begins delivering HTML. Common causes include an HTTP to HTTPS redirect followed by a www to non-www redirect followed by a trailing slash normalisation redirect, old permalink structures redirecting to new ones, and canonical URL enforcement with multiple intermediate redirects. The goal is zero redirects for the canonical URL and your og:url tag should match the exact final URL with no redirects between them.

5.5 CSP Blocking the Image URL

The script reports [FAIL] CSP img-src may block your og:image domain (cdn.example.com). A Content-Security-Policy header with a restrictive img-src directive can interfere with WhatsApp’s internal image rendering pipeline in certain client versions and if your CSP blocks the image URL in the browser context used for preview rendering, the preview will show the title and description but not the image. Add your image CDN domain to the img-src directive:

Content-Security-Policy: img-src 'self' https://your-cdn-domain.com https://s3.af-south-1.amazonaws.com;

5.6 Meta Refresh Redirect

The script reports [FAIL] Meta refresh redirect found in HTML. Meta refresh tags are HTML-level redirects that social crawlers do not execute. The crawler reads the page at the original URL, finds the meta refresh, ignores it, and attempts to extract OG tags from the pre-redirect page. If the pre-redirect page has no OG tags the preview is blank. This appears in some WordPress themes, landing page plugins, and maintenance mode plugins. Replace meta refresh redirects with proper HTTP 301 or 302 redirects at the server or Cloudflare redirect rule level.

6. The Cloudflare WAF Skip Rule

If the diagnostic script detects a Cloudflare challenge page for any Meta crawler user agent, this is exactly how to fix it. Navigate to your Cloudflare dashboard, select your domain, and go to Security then WAF then Custom Rules and click Create rule. Set the rule name to WhatsappThumbnail, switch to Edit expression mode, and paste the following expression:

(http.user_agent contains "WhatsApp") or
(http.user_agent contains "facebookexternalhit") or
(http.user_agent contains "Facebot")

Set the action to Skip. Under WAF components to skip, enable all rate limiting rules, all managed rules, and all Super Bot Fight Mode Rules, but leave all remaining custom rules unchecked. This ensures your Fail2ban IP block list still applies even to these user agents because real attackers spoofing a Meta user agent cannot bypass your IP blocklist while legitimate Meta crawlers get through. Turn Log matching requests off because these are high-frequency crawls and logging every one will consume your event quota quickly.

CloudFlare WAF blocking social media crawlers from accessing website content
Screenshot

On rule priority, ensure this rule sits below your Fail2ban Block List rule because Cloudflare evaluates WAF rules top to bottom and the IP blocklist must fire first. The reason all three user agents are required is that a single WhatsApp link preview can trigger requests from each of them independently and if any one is missing from the skip rule, previews will fail intermittently.

7. WordPress Specific: Posts with Missing Featured Images

If you are running WordPress and the diagnostic script is passing all checks but some posts still have no og:image, the likely cause is that those posts have no featured image set. Most WordPress SEO plugins generate the og:image tag from the featured image and if it is not set, there is no tag. This script SSHs into your WordPress server and audits which published posts are missing a featured image. Update the four variables at the top before running, then run it as bash audit-wp-og.sh audit or bash audit-wp-og.sh fix <post-id>.

cat > audit-wp-og.sh << 'EOF'
#!/usr/bin/env bash
# audit-wp-og.sh
# Usage: bash audit-wp-og.sh audit|fix [post-id]
# Audits WordPress posts for missing og:image via WP-CLI on remote EC2.
#
# Update the four variables below before running.

set -euo pipefail

MODE="${1:-audit}"
SPECIFIC_POST="${2:-}"

EC2_HOST="[email protected]"
SSH_KEY="$HOME/.ssh/your-key.pem"
WP_PATH="/var/www/html"
SITE_URL="https://yoursite.com"

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
BOLD='\033[1m'
RESET='\033[0m'

echo -e "\n${BOLD}${CYAN}WordPress OG Tag Auditor${RESET}"
echo -e "Mode: ${BOLD}$MODE${RESET}\n"

if [[ "$MODE" == "audit" ]]; then
  echo -e "${YELLOW}Fetching published posts with no featured image...${RESET}\n"

  ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no "$EC2_HOST" bash <<'REMOTE'
echo "Posts with no featured image (og:image will be missing for these):"
wp post list \
  --post_type=post \
  --post_status=publish \
  --fields=ID,post_title,post_date \
  --format=table \
  --meta_query='[{"key":"_thumbnail_id","compare":"NOT EXISTS"}]' \
  --path=/var/www/html \
  --allow-root \
  2>/dev/null || echo "(WP-CLI not available or no posts found)"

echo ""
echo "Total published posts:"
wp post list \
  --post_type=post \
  --post_status=publish \
  --format=count \
  --path=/var/www/html \
  --allow-root \
  2>/dev/null

echo ""
echo "Posts with featured image set:"
wp post list \
  --post_type=post \
  --post_status=publish \
  --format=count \
  --meta_key=_thumbnail_id \
  --path=/var/www/html \
  --allow-root \
  2>/dev/null
REMOTE

  echo -e "\n${YELLOW}Spot-checking live og:image tags on recent posts...${RESET}\n"
  WA_UA="WhatsApp/2.23.24.82 A"
  URLS=$(curl -s "${SITE_URL}/post-sitemap.xml" \
    | grep -oE '<loc>[^<]+</loc>' \
    | sed 's|<loc>||;s|</loc>||' \
    | head -10)

  if [[ -z "$URLS" ]]; then
    echo -e "${YELLOW}Could not fetch sitemap at ${SITE_URL}/post-sitemap.xml${RESET}"
  else
    printf "%-70s %s\n" "URL" "og:image"
    printf "%-70s %s\n" "---" "--------"
    while IFS= read -r url; do
      html=$(curl -s -A "$WA_UA" -L --max-time 8 "$url" 2>/dev/null)
      og_img=$(echo "$html" \
        | grep -oiE 'property="og:image"[^>]+content="[^"]+"' \
        | grep -oiE 'content="[^"]+"' \
        | sed 's/content="//;s/"//' \
        | head -1)
      if [[ -n "$og_img" ]]; then
        printf "%-70s ${GREEN}%s${RESET}\n" "$(echo "$url" | sed "s|${SITE_URL}||")" "PRESENT"
      else
        printf "%-70s ${RED}%s${RESET}\n" "$(echo "$url" | sed "s|${SITE_URL}||")" "MISSING"
      fi
    done <<< "$URLS"
  fi

elif [[ "$MODE" == "fix" ]]; then
  if [[ -z "$SPECIFIC_POST" ]]; then
    echo -e "${RED}Provide a post ID: bash audit-wp-og.sh fix <post-id>${RESET}"
    exit 1
  fi

  ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no "$EC2_HOST" bash <<REMOTE
echo "Available media attachments (most recent 10):"
wp post list \
  --post_type=attachment \
  --posts_per_page=10 \
  --fields=ID,post_title,guid \
  --format=table \
  --path=$WP_PATH \
  --allow-root \
  2>/dev/null
REMOTE

  echo -e "\n${YELLOW}To assign a featured image to post $SPECIFIC_POST:${RESET}"
  echo "  ssh -i $SSH_KEY $EC2_HOST \\"
  echo "    wp post meta set $SPECIFIC_POST _thumbnail_id <ATTACHMENT_ID> --path=$WP_PATH --allow-root"
  echo ""
  echo "Then retest: bash diagnose-social-preview.sh ${SITE_URL}/?p=${SPECIFIC_POST}"

else
  echo -e "${RED}Unknown mode: $MODE. Use 'audit' or 'fix'.${RESET}"
  exit 1
fi
EOF

chmod +x audit-wp-og.sh
echo "Written and made executable: audit-wp-og.sh"

8. The Diagnostic Checklist

Before you create a Cloudflare rule or start modifying OG tags, run diagnose-social-preview.sh against your URL. It will work through every item below in under 30 seconds and flag exactly which one is failing. The script verifies that the URL uses HTTPS, that there is no redirect chain or the chain is two hops or fewer, that there are no meta refresh redirects in the HTML, that TTFB is under 800ms and total response time is under 3s, that og:title, og:description, og:image, og:url, and og:type are all present and non-empty, that twitter:card is present for the X/Twitter large image format, that the og:image URL returns HTTP 200 with the correct MIME type and uses HTTPS, that the og:image file size is under 300 KB, that og:image dimensions are at least 1200×630 px, that CSP img-src does not block the og:image domain, that robots.txt does not disallow facebookexternalhit, WhatsApp, or Facebot, and that all five crawler user agents return HTTP 200 with no challenge page detected.

The two most common failures on WordPress sites behind Cloudflare are Super Bot Fight Mode blocking facebookexternalhit and an og:image file exceeding 300 KB. Both are invisible in your logs and immediately visible when you run the script.

Leave a Reply

Your email address will not be published. Required fields are marked *