Macbook: Fixing the Wireshark Permissions bug “You don’t have permission to capture on that device”

If you see the error “The capture session could not be initiated on the device “en0″ (You don’t have permission to capture on that device)” when trying to start a pcap on wireshare you can try installing ChmodBPF; but I suspect you will need to follow the steps below:

$ whoami
superman
$ cd /dev
/dev $ sudo chown superman:admin bp*
Password:
$ ls -la | grep bp
crw-------   1 cp363412  admin     0x17000000 Jan 13 21:48 bpf0
crw-------   1 cp363412  admin     0x17000001 Jan 14 09:56 bpf1
crw-------   1 cp363412  admin     0x17000002 Jan 13 20:57 bpf2
crw-------   1 cp363412  admin     0x17000003 Jan 13 20:57 bpf3
crw-------   1 cp363412  admin     0x17000004 Jan 13 20:57 bpf4
/dev $

Macbook: Changing prompt $ information in the mac terminal window

When you open terminal you will see that it defaults the information that you see on the prompt, which can use up quite a bit of the screen real estate.

Last login: Sat Jan 14 11:13:00 on ttys000
cp363412~$ 

Customize the zsh Prompt in Terminal

Typically, the default zsh prompt carries information like the username, machine name, and location starting in the user’s home directory. These details are stored in the zsh shell’s system file at the /etc/zshrc location.

PS1="%n@%m %1~ %# "

In this string of variables:

  • %n is the username of your account. 
  • %m is the MacBook’s model name. 
  • %1~ means the current working directory path where the ~ strips the $HOME directory location. 
  • %# means that the prompt will show # if the shell is running with root (administrator) privileges, or else offers % if it doesn’t.

Below are a few other options that I have used previously:

\h   The hostname, up to the first . (e.g. andrew) 
\H   The hostname. (e.g. andrew.ninja.com)
\j   The number of jobs currently managed by the shell. 
\l   The basename of the shell's terminal device name. 
\s   The name of the shell, the basename of $0 (the portion following 
      the final slash).
\w   The current working directory. 
\W   The basename of $PWD. 
\!   The history number of this command. 
\#   The command number of this command

To change this, open Terminal, type the following command, and hit Return:

nano ~/.zshrc

Below is my favourite, which will just display your login name (use Ctrl + X to exit and save):

PROMPT='%n$ '

I prefer to see the path (less the home directory) in the prompt:

PROMPT='%n:%1~$ '

You can pick a font colour from black, white, yellow, green, red, blue, cyan, and magenta. Here’s how to use them:

PROMPT='%F{cyan}%n%f:~$ '

There are more modifications to this, but this is as far as i go 🙂

Mac OS X: Perform basic vulnerability checks with nmap vulners scripts

This is a very short post to help anyone quickly setup vulnerability checking for a site they own (and have permission to scan). I like the vulners scripts as they cover a lot of basic ground quickly with one script.

## First go to your NMAP script directory
$ cd /usr/local/share/nmap/scripts
## Now install vulners
git clone https://github.com/vulnersCom/nmap-vulners.git
## Now copy the files up a directory
$ cd nmap-vulners
$ ls
LICENSE				example.png			http-vulners-regex.json		paths_regex_example.png		vulners.nse
README.md			http-vulners-paths.txt		http-vulners-regex.nse		simple_regex_example.png
$ sudo cp *.* ..
## Now update NMAP NSE script database
$ nmap --script-updatedb
## Now run the scripts
$ nmap -sV --script vulners tesla.com
## Now do a wildcard scan
$ nmap --script "http-*" tesla.com

Mac OS X: View the details of a websites supported TLS certificates from terminal

The below script will give you basic information on a websites certificate:

$ curl --insecure -vvI https://andrewbaker.ninja 2>&1 | awk 'BEGIN { cert=0 } /^\* SSL connection/ { cert=1 } /^\*/ { if (cert) print }'
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=andrewbaker.ninja
*  start date: Nov  4 23:00:13 2022 GMT
*  expire date: Feb  2 23:00:12 2023 GMT
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
* Connection #0 to host andrewbaker.ninja left intact

NMAP is provides a simple way to get a list of available ciphers from a host website / server. Additionally, nmap provides a strength rating of strong, weak, or unknown for each available cipher. First, download the ssl-enum-ciphers.nse nmap script (explanation here). Then from the same directory as the script, run nmap as follows:

$ nmap --script ssl-enum-ciphers -p 443 andrewbaker.ninja
Starting Nmap 7.93 ( https://nmap.org ) at 2023-05-11 10:40 SAST
Nmap scan report for andrewbaker.ninja (13.244.140.33)
Host is up (0.051s latency).
rDNS record for 13.244.140.33: ec2-13-244-140-33.af-south-1.compute.amazonaws.com

PORT    STATE SERVICE
443/tcp open  https
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_DHE_RSA_WITH_AES_256_CBC_SHA (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_128_CBC_SHA (dh 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.1:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_DHE_RSA_WITH_AES_256_CBC_SHA (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_128_CBC_SHA (dh 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (secp256r1) - A
|       TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
|       TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_256_CBC_SHA (dh 2048) - A
|       TLS_DHE_RSA_WITH_AES_128_CBC_SHA (dh 2048) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.3:
|     ciphers:
|       TLS_AKE_WITH_AES_256_GCM_SHA384 (secp256r1) - A
|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (secp256r1) - A
|       TLS_AKE_WITH_AES_128_GCM_SHA256 (secp256r1) - A
|     cipher preference: server
|_  least strength: A

Nmap done: 1 IP address (1 host up) scanned in 9.61 seconds

Next up (and probably my favourite), sslscan is a really decent tool because it tests connecting with TLS and SSL including obsolete SSL versions. It then reports about the server’s cipher suites and certificate.

$ brew install sslscan
$ sslscan andrewbaker.ninja
Version: 2.0.15
OpenSSL 3.0.7 1 Nov 2022

Connected to 13.244.140.33

Testing SSL server andrewbaker.ninja on port 443 using SNI name andrewbaker.ninja

  SSL/TLS Protocols:
SSLv2     disabled
SSLv3     disabled
TLSv1.0   enabled
TLSv1.1   enabled
TLSv1.2   enabled
TLSv1.3   enabled

  TLS Fallback SCSV:
Server supports TLS Fallback SCSV

  TLS renegotiation:
Secure session renegotiation supported

  TLS Compression:
OpenSSL version does not support compression
Rebuild with zlib1g-dev package for zlib support

  Heartbleed:
TLSv1.3 not vulnerable to heartbleed
TLSv1.2 not vulnerable to heartbleed
TLSv1.1 not vulnerable to heartbleed
TLSv1.0 not vulnerable to heartbleed

  Supported Server Cipher(s):
Preferred TLSv1.3  256 bits  TLS_AES_256_GCM_SHA384        Curve P-256 DHE 256
Accepted  TLSv1.3  256 bits  TLS_CHACHA20_POLY1305_SHA256  Curve P-256 DHE 256
Accepted  TLSv1.3  128 bits  TLS_AES_128_GCM_SHA256        Curve P-256 DHE 256
Preferred TLSv1.2  256 bits  ECDHE-RSA-AES256-GCM-SHA384   Curve P-256 DHE 256
Accepted  TLSv1.2  128 bits  ECDHE-RSA-AES128-GCM-SHA256   Curve P-256 DHE 256
Accepted  TLSv1.2  128 bits  ECDHE-RSA-AES128-SHA          Curve P-256 DHE 256
Accepted  TLSv1.2  256 bits  ECDHE-RSA-AES256-SHA          Curve P-256 DHE 256
Accepted  TLSv1.2  128 bits  ECDHE-RSA-AES128-SHA256       Curve P-256 DHE 256
Accepted  TLSv1.2  256 bits  ECDHE-RSA-AES256-SHA384       Curve P-256 DHE 256
Accepted  TLSv1.2  256 bits  AES256-GCM-SHA384
Accepted  TLSv1.2  128 bits  AES128-GCM-SHA256
Accepted  TLSv1.2  256 bits  AES256-SHA
Accepted  TLSv1.2  128 bits  AES128-SHA
Accepted  TLSv1.2  256 bits  DHE-RSA-AES256-GCM-SHA384     DHE 2048 bits
Accepted  TLSv1.2  128 bits  DHE-RSA-AES128-GCM-SHA256     DHE 2048 bits
Accepted  TLSv1.2  256 bits  DHE-RSA-AES256-SHA256         DHE 2048 bits
Accepted  TLSv1.2  128 bits  DHE-RSA-AES128-SHA256         DHE 2048 bits
Accepted  TLSv1.2  256 bits  DHE-RSA-AES256-SHA            DHE 2048 bits
Accepted  TLSv1.2  128 bits  DHE-RSA-AES128-SHA            DHE 2048 bits
Preferred TLSv1.1  128 bits  ECDHE-RSA-AES128-SHA          Curve P-256 DHE 256
Accepted  TLSv1.1  256 bits  ECDHE-RSA-AES256-SHA          Curve P-256 DHE 256
Accepted  TLSv1.1  256 bits  AES256-SHA
Accepted  TLSv1.1  128 bits  AES128-SHA
Accepted  TLSv1.1  256 bits  DHE-RSA-AES256-SHA            DHE 2048 bits
Accepted  TLSv1.1  128 bits  DHE-RSA-AES128-SHA            DHE 2048 bits
Preferred TLSv1.0  128 bits  ECDHE-RSA-AES128-SHA          Curve P-256 DHE 256
Accepted  TLSv1.0  256 bits  ECDHE-RSA-AES256-SHA          Curve P-256 DHE 256
Accepted  TLSv1.0  256 bits  AES256-SHA
Accepted  TLSv1.0  128 bits  AES128-SHA
Accepted  TLSv1.0  256 bits  DHE-RSA-AES256-SHA            DHE 2048 bits
Accepted  TLSv1.0  128 bits  DHE-RSA-AES128-SHA            DHE 2048 bits

  Server Key Exchange Group(s):
TLSv1.3  128 bits  secp256r1 (NIST P-256)
TLSv1.3  192 bits  secp384r1 (NIST P-384)
TLSv1.3  260 bits  secp521r1 (NIST P-521)
TLSv1.2  128 bits  secp256r1 (NIST P-256)
TLSv1.2  192 bits  secp384r1 (NIST P-384)
TLSv1.2  260 bits  secp521r1 (NIST P-521)

  SSL Certificate:
Signature Algorithm: sha256WithRSAEncryption
RSA Key Strength:    2048

Subject:  andrewbaker.ninja
Altnames: DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
Issuer:   Zscaler Intermediate Root CA (zscaler.net) (t)

Not valid before: May  6 06:30:35 2023 GMT
Not valid after:  May 20 06:30:35 2023 GMT

If you want a detailed dump of the certificate run (you will need openssl installed):

$ openssl s_client -connect andrewbaker.ninja:443 </dev/null 2>/dev/null | openssl x509 -inform pem -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            03:bd:20:6e:ef:67:55:93:2a:a8:90:9f:40:e4:b2:a8:c0:fe
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Let's Encrypt, CN = R3
        Validity
            Not Before: Nov  4 23:00:13 2022 GMT
            Not After : Feb  2 23:00:12 2023 GMT
        Subject: CN = andrewbaker.ninja
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:c8:30:00:b3:f0:fb:03:10:90:57:4a:df:7f:28:
                    34:b9:2e:94:1a:28:29:41:2b:88:48:3b:c0:48:2a:
                    f0:62:3d:57:0d:32:db:30:9b:c5:98:11:b3:14:a7:
                    a8:e0:30:1d:d7:ec:cc:86:6f:d2:f1:7b:a4:70:9c:
                    98:e0:63:34:ae
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                B9:28:D2:09:38:B0:B1:03:77:DA:8F:C6:AD:2E:51:EF:0F:7F:23:4F
            X509v3 Authority Key Identifier:
                keyid:14:2E:B3:17:B7:58:56:CB:AE:50:09:40:E6:1F:AF:9D:8B:14:C2:C6

            Authority Information Access:
                OCSP - URI:http://r3.o.lencr.org
                CA Issuers - URI:http://r3.i.lencr.org/

            X509v3 Subject Alternative Name:
                DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.1
                Policy: 1.3.6.1.4.1.44947.1.1.1
                  CPS: http://cps.letsencrypt.org

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : B7:3E:FB:24:DF:9C:4D:BA:75:F2:39:C5:BA:58:F4:6C:
                                5D:FC:42:CF:7A:9F:35:C4:9E:1D:09:81:25:ED:B4:99
                    Timestamp : Nov  5 00:00:13.652 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:89:98:62:15:D5:40:1D:80:9D:40:4B:
                                31:B1:E3:C5:3B:65:41:11:4D:98:D2:E1:23:16:45:0D:
                                DA:08:FE:72:AB:02:21:00:A7:F0:5D:49:63:4F:91:4C:
                                CF:60:8D:FF:26:F6:0B:1B:0C:47:9C:B6:70:57:7C:68:
                                AB:F0:9B:35:48:34:08:A4
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 7A:32:8C:54:D8:B7:2D:B6:20:EA:38:E0:52:1E:E9:84:
                                16:70:32:13:85:4D:3B:D2:2B:C1:3A:57:A3:52:EB:52
                    Timestamp : Nov  5 00:00:14.177 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:21:00:E1:8B:7F:3F:75:05:20:8A:27:3D:30:
                                64:BB:4B:FE:EF:24:C9:7E:85:6C:6D:DF:16:ED:BE:23:
                                9C:97:67:E1:DD:02:20:60:89:B6:D9:0F:BE:C4:E0:7B:
                                05:E1:EE:6D:0B:2D:78:C9:58:AA:0F:10:C0:34:FE:79:
                                FA:63:DD:2D:50:01:5B
    Signature Algorithm: sha256WithRSAEncryption
         4a:54:e0:ec:05:b8:58:ef:44:de:a8:5f:89:fc:1d:cb:86:39:
         05:1d:d3:b2:57:73:bd:6d:11:e5:c2:fd:cd:1a:6b:ee:62:11:
         f8:94:6b:22:b9:16:d6:e3:95:ed:04:9e:7c:ba:1b:3e:5f:dc:
         4f:a0:ae:58:ec:3c:25:a0:41:a5:c8:b9:c8:7a:3c:2f:1f:17:
         60:e8:7d:f0:a2:8e:0d:45:cb:7b:b1:06:13:75:3b:b0:cb:f6:
         6e:2f:71:70:6a:55:96:34:58:db:42:06:5a:7f:78:00:8f:7d:
         e3:83:02:30:82:49:52:38:da:07:6b:c3:ba:ad:09:1e:7e:33:
         0c:f5:0b:49:33:9d:b7:4e:1a:16:c2:ef:47:6f:ec:02:03:4a:
         84:75:bb:30:6e:8a:b4:22:da:d6:ac:43:5d:9b:3c:8b:2a:13:
         af:2b:2e:ab:02:58:dd:80:73:04:8c:dc:2e:48:71:ae:57:c4:
         0e:40:8c:6d:52:b5:91:0c:6b:0d:5e:98:01:6f:09:d1:3a:1b:
         41:7c:70:cc:66:9a:89:b3:b7:27:3d:6f:62:10:66:bb:63:67:
         59:08:ed:7e:c0:c3:31:1c:89:dd:ce:f2:6f:42:fd:42:21:94:
         c3:27:6e:d9:ea:d1:5f:5a:6f:58:26:eb:3e:ba:a6:ee:ed:45:
         00:99:e3:9e
-----BEGIN CERTIFICATE-----
MIIEdTCCA12gAwIBAgISA70gbu9nVZMqqJCfQOSyqMD+MA0GCSqGSIb3DQEBCwUA
MDIxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQD
EwJSMzAeFw0yMjExMDQyMzAwMTNaFw0yMzAyMDIyMzAwMTJaMBwxGjAYBgNVBAMT
EWFuZHJld2Jha2VyLm5pbmphMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEyDAA
s/D7AxCQV0rffyg0uS6UGigpQSuISDvASCrwYj1XDTLbMJvFmBGzFKeo4DAd1+zM
hm/S8XukcJyY4GM0rqOCAmQwggJgMA4GA1UdDwEB/wQEAwIHgDAdBgNVHSUEFjAU
BggrBgEFBQcDAQYIKwYBBQUHAwIwDAYDVR0TAQH/BAIwADAdBgNVHQ4EFgQUuSjS
CTiwsQN32o/GrS5R7w9/I08wHwYDVR0jBBgwFoAUFC6zF7dYVsuuUAlA5h+vnYsU
wsYwVQYIKwYBBQUHAQEESTBHMCEGCCsGAQUFBzABhhVodHRwOi8vcjMuby5sZW5j
ci5vcmcwIgYIKwYBBQUHMAKGFmh0dHA6Ly9yMy5pLmxlbmNyLm9yZy8wMwYDVR0R
BCwwKoIRYW5kcmV3YmFrZXIubmluamGCFXd3dy5hbmRyZXdiYWtlci5uaW5qYTBM
BgNVHSAERTBDMAgGBmeBDAECATA3BgsrBgEEAYLfEwEBATAoMCYGCCsGAQUFBwIB
FhpodHRwOi8vY3BzLmxldHNlbmNyeXB0Lm9yZzCCAQUGCisGAQQB1nkCBAIEgfYE
gfMA8QB3ALc++yTfnE26dfI5xbpY9Gxd/ELPep81xJ4dCYEl7bSZAAABhEUWgVQA
AAQDAEgwRgIhAImYYhXVQB2AnUBLMbHjxTtlQRFNmNLhIxZFDdoI/nKrAiEAp/Bd
SWNPkUzPYI3/JvYLGwxHnLZwV3xoq/CbNUg0CKQAdgB6MoxU2LcttiDqOOBSHumE
FnAyE4VNO9IrwTpXo1LrUgAAAYRFFoNhAAAEAwBHMEUCIQDhi38/dQUgiic9MGS7
S/7vJMl+hWxt3xbtviOcl2fh3QIgYIm22Q++xOB7BeHubQsteMlYqg8QwDT+efpj
3S1QAVswDQYJKoZIhvcNAQELBQADggEBAEpU4OwFuFjvRN6oX4n8HcuGOQUd07JX
c71tEeXC/c0aa+5iEfiUayK5Ftbjle0Enny6Gz5f3E+grljsPCWgQaXIuch6PC8f
F2DoffCijg1Fy3uxBhN1O7DL9m4vcXBqVZY0WNtCBlp/eACPfeODAjCCSVI42gdr
w7qtCR5+Mwz1C0kznbdOGhbC70dv7AIDSoR1uzBuirQi2tasQ12bPIsqE68rLqsC
WN2AcwSM3C5Ica5XxA5AjG1StZEMaw1emAFvCdE6G0F8cMxmmomztyc9b2IQZrtj
Z1kI7X7AwzEcid3O8m9C/UIhlMMnbtnq0V9ab1gm6z66pu7tRQCZ454=
-----END CERTIFICATE-----

Linux: Automatically renew your certs for a wordpress site using letsencrypt

If you want to automatically renew your certs then the easiest way is to setup a cron just to call letsencrypt periodically. Below is an example cron job:

First create the bash script to renew the certificate

$ pwd
/home/bitnami
$ sudo nano renew-certificate.sh

Now enter the script in the following format into nano:

#!/bin/bash

sudo /opt/bitnami/ctlscript.sh stop apache
sudo /opt/bitnami/letsencrypt/lego --path /opt/bitnami/letsencrypt --email="myemail@myemail.com" --http --http-timeout 30 --http.webroot /opt/bitnami/apps/letsencrypt --domains=andrewbaker.ninja renew --days 90
sudo /opt/bitnami/ctlscript.sh start apache

Now edit the crontab to run the renew script:

$ crontab -e
0 0 * * * sudo /home/bitnami/renew-certificate.sh 2> /dev/null

Mac OS X: Using dig and whois to resolve DNS issues between your DNS server and the authoritive DNS Server

When debugging DNS issues its important to verify the local DNS response with the authoritive DNS nameserver. With dig we can directly query the authoritative name servers for a domain, these are the DNS servers that hold the authoritative records for the domains DNS zone; the source of truth. If a correct response is received from the authoritative DNS server but not when querying against your own DNS server then you should investigate why your local DNS server is not able to resolve the record.

Lets first see where our DNS traffic is going:

$ scutil --dns | grep 'nameserver\[[0-9]*\]'
  nameserver[0] : 100.64.0.1
  nameserver[0] : 192.168.0.
  nameserver[0] : 192.168.0.1

The first DNS server in the list – at 100.64.0.1 will need to accept TCP and UDP traffic over port 53 from our client/server. A port scanner such as the nmap tool can be used to confirm if the DNS server is available on port 53 as shown below.

# First check UDP

$ nmap -sU -p 53 100.64.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2022-11-21 22:04 SAST
Nmap scan report for 100.64.0.1
Host is up.

PORT   STATE         SERVICE
53/udp open|filtered domain

Nmap done: 1 IP address (1 host up) scanned in 2.08 seconds

## Next check TCP

$ nmap -sT -p 53 100.64.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2022-11-21 22:07 SAST
Nmap scan report for 100.64.0.1
Host is up (0.00059s latency).

PORT   STATE SERVICE
53/tcp open  domain

Nmap done: 1 IP address (1 host up) scanned in 0.15 seconds

It’s worth noting that scanning UDP with nmap is not reliable due to the nature of UDP, this is why the state is listed as open or filtered. We can clearly see that TCP 53 is definitely open and responding which is a good sign, if the state was reported as filtered the next thing to investigate would be the connectivity to the DNS server, in particular any firewall running on the DNS server would need to be configured to allow TCP and UDP port 53 traffic in.

We can also run tcpdump to watch the traffic going to our local DNS server:

$ sudo tcpdump -n host 100.64.0.1
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on pktap, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
23:17:39.411500 IP 100.64.0.1.58169 > 100.64.0.1.53: 50915+ A? logs.af-south-1.amazonaws.com. (47)
23:17:39.411594 IP 100.64.0.1.58169 > 100.64.0.1.53: 50915+ A? logs.af-south-1.amazonaws.com. (47)
23:17:39.411703 IP 100.64.0.1.53 > 100.64.0.1.58169: 50915 1/0/0 A 100.64.1.18 (63)
23:17:39.411734 IP 100.64.0.1.53 > 100.64.0.1.58169: 50915 1/0/0 A 100.64.1.18 (63)
23:17:39.412167 IP 100.64.0.1.57548 > 100.64.1.18.443: Flags [SEW], seq 630452899, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 542272848 ecr 0,sackOK,eol], length 0
23:17:39.412204 IP 100.64.1.18.57548 > 100.64.0.1.9010: Flags [SEW], seq 630452899, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 542272848 ecr 0,sackOK,eol], length 0
23:17:39.412302 IP 100.64.0.1.9010 > 100.64.1.18.57548: Flags [S.E], seq 2920832254, ack 630452900, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 974661492 ecr 542272848,sackOK,eol], length 0

Next up, query the local DNS response (and you will note that the A record is missing):

$ dig andrewbaker.ninja
; <<>> DiG 9.10.6 <<>> andrewbaker.ninja
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 35921
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;andrewbaker.ninja.		IN	A

;; Query time: 1348 msec
;; SERVER: 100.64.0.1#53(100.64.0.1)
;; WHEN: Mon Nov 21 19:44:32 SAST 2022
;; MSG SIZE  rcvd: 46

Next, to get the authoritive name servers of a domain we can use the ‘whois’ command as shown below.

$ whois andrewbaker.ninja | grep -i "name server"
Name Server: ns-983.awsdns-58.net
Name Server: ns-462.awsdns-57.co
Name Server: ns-1745.awsdns-26.co.uk
Name Server: ns-1363.awsdns-42.org
Name Server: NS-1363.AWSDNS-42.ORG
Name Server: NS-462.AWSDNS-57.COM
Name Server: NS-1745.AWSDNS-26.CO.UK
Name Server: NS-983.AWSDNS-58.NET

As shown andrewbaker.ninja currently has 8 authoritative name servers. If we run a dig directly against any of these we should receive an authoritative response, that is an up to date and non cached response straight from the source rather than from our local DNS server. In the below example we have run our query against @ns-983.awsdns-58.net

$ dig @ns-983.awsdns-58.net andrewbaker.ninja

; <<>> DiG 9.10.6 <<>> @ns-983.awsdns-58.net andrewbaker.ninja
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64987
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;andrewbaker.ninja.		IN	A

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33

;; AUTHORITY SECTION:
andrewbaker.ninja.	172800	IN	NS	ns-1254.awsdns-28.org.
andrewbaker.ninja.	172800	IN	NS	ns-1514.awsdns-61.org.
andrewbaker.ninja.	172800	IN	NS	ns-1728.awsdns-24.co.uk.
andrewbaker.ninja.	172800	IN	NS	ns-1875.awsdns-42.co.uk.
andrewbaker.ninja.	172800	IN	NS	ns-491.awsdns-61.com.
andrewbaker.ninja.	172800	IN	NS	ns-496.awsdns-62.com.
andrewbaker.ninja.	172800	IN	NS	ns-533.awsdns-02.net.
andrewbaker.ninja.	172800	IN	NS	ns-931.awsdns-52.net.

;; Query time: 20 msec

You can now see the A record is returned. Also note that in this dig response we now have the “aa” flag in the header which represents that this is an authoritative answer and is not a cached response (note: qr = query response and rd = recursion desired). If we run this same dig command again, the 300 second TTL that was returned in the answer section will continually state that the TTL is 300 seconds as the response is authoritative.

However if we were to run this dig without specifying @ns-983.awsdns-58.net we would be querying our local DNS server which is not authoritative for the andrewbaker.ninja domain, after the first result the record will be cached locally. This can be confirmed by running the dig command again, as the TTL value will drop down until it reaches 0 and is removed from the cache completely.

By querying the authoritative name server directly we ensure that we are getting the most up to date response rather than a potential old cached response from our own local DNS server or local DNS cache.

Linux: Find the maximum packet size (MTU) between two hosts (using do not fragment flag)


If you have ever tried to use jumbo packets, or trace a weird slowness on the network – one of the things that frequently comes up is packet fragmentation. This is basically where a source machine is sending bigger packets than can be consumed along its pathway to a destination machine. This means the packets will need to be split up. This causes a host of performance issues.

So how do you diagnose this? Well Ping is your friend. It allows you to flag packets to not be fragmented and specify a minimum and maximum packet size. The example below sends a 1460 byte do not fragment packet from the host to example.com:

$ ping -M do -s 1460 example.com 
PING example.com (93.184.216.34) 1460(1488) bytes of data. 
1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=1 ttl=45 time=223 ms
1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=2 ttl=45 time=223 ms 1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=3 ttl=45 time=223 ms

Taking the example above and running on a Macbook/OSX:

$ ping -D -s 1460 example.com
PING example.com (93.184.216.34): 1460 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4

The maximum packet size over the internet is 1500 bytes. So 1490 should be fine, right?

$ ping -M do -s 1490 example.com 
PING example.com (93.184.216.34) 1490(1518) bytes of data. 
ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500

The same test on Macbook/OSX:

$ ping -D -s 1490 example.com
PING example.com (93.184.216.34): 1460 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4

As you can see, this breaks beneath the expected 1500 byte packet size. Running “ping -M do -s 1490 example.com” says that the ICMP data size is 1490 bytes and fragmentation is not allowed. But remember the size of ICMP data, ICMP size (i.e., header + data) will be 1498 bytes. Next you need to add the IP header and so the new frame size becomes 1518 bytes. The frame size can’t exceed MTU size of the interface and you can see this in the error message (MTU for the interface is 1500 bytes). Without fragmentation, this message can’t be sent. Since fragmentation is not allowed, ping fails saying message is too long.

Ok, so what if I do this?

$ ping -M want -s 1490 example.com 
PING example.com (93.184.216.34) 1490(1518) bytes of data. 
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=1 ttl=45 time=223 ms
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=2 ttl=45 time=223 ms
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=3 ttl=45 time=223 ms

Ok, why did this work? Well -M want will allow local fragmentation.

Mac OS X: Find the maximum unfragmented packet size (MTU) to reach a host

If you have ever tried to use jumbo packets, or trace a weird slowness on the network – one of the things that frequently comes up is packet fragmentation. This is basically where a source machine is sending bigger packets than can be consumed along its pathway to a destination machine. This means the packets will need to be split and essentially causes a host of performance issues.

So how do you diagnose this? Well Ping is your friend. It allows you to flag packets to not be fragmented and specify a minimum and maximum packet size. Using this you can simply loop through test packet sizes until a packet fails and then you have your MTU.

The command below sends packets from 1350 to 1520 and increases the packet size by 10 bytes each time.

ping -g 1350 -G 1520 -h 10 -D andrewbaker.ninja

Linux: Diagnose your linux server in under a minute using standard (free) command line tools

Imagine you have trauma and could figure out whats causing in under one minute. Obviously, the preference is an observability platform – but for my little wordpress site I don’t really have the budget. So I just use a few tools to isolate common issues. The idea behind this blog is to quickly isolate the fault by looking for errors and saturation metrics, as they are both easy to interpret, and then check overall resource utilisation.

Note: Some of these commands require the sysstat package installed.

1. uptime

Might seem like an odd choice, but uptime actually provides more than just uptime. It is a quick way to view the average loads (over the last 15 mins) and indicate the number of processes waiting to run.

~$ uptime
 08:38:49 up 87 days, 18:31,  1 user,  load average: 70.34, 25.02, 0.00

The last three blocks show a marked increase in load. The increments are 1, 5 and 15 mins sample times. So something is definitely going on… maybe my web site went viral!!!

2. dmesg | tail

dmesg views the last 10 system messages (if there are any). Look for errors that can cause performance issues. The example above includes the oom-killer, and TCP dropping a request. If you don’t know what a message means then gify. Note: you can modify the tail size by changing the numeric and you don’t need sudo if you host is properly setup (unlike mine).

$ sudo dmesg | tail 10
[    3.453000] audit: type=1400 audit(1661436454.032:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/haveged" pid=293 comm="apparmor_parser"
[    3.466526] audit: type=1400 audit(1661436454.044:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=294 comm="apparmor_parser"
[    3.482004] audit: type=1400 audit(1661436454.044:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=294 comm="apparmor_parser"
[    3.496937] audit: type=1400 audit(1661436454.044:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=294 comm="apparmor_parser"
[    3.510178] audit: type=1400 audit(1661436454.084:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/chronyd" pid=292 comm="apparmor_parser"
[    4.310300] IPv6: ADDRCONF(NETDEV_UP): ens5: link is not ready
[    5.223697] IPv6: ADDRCONF(NETDEV_CHANGE): ens5: link becomes ready
[   24.859623] Adding 649996k swap on /mnt/.bitnami.swap.  Priority:-2 extents:15 across:1321740k SSFS
[1440586.071042] device-mapper: uevent: version 1.0.3
[1440586.075493] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com

3. vmstat 1

vmstat is short for virtual memory statistics. vmstat was run with an argument of 1 which means it will print rolling one second summaries (until you hit Ctrl + C). The first line of output (in this version of vmstat) has some columns that show the average since boot, instead of the previous second. For now, skip the first line, unless you want to learn and remember which column is which.

Columns to check:

  • r: Number of processes running on CPU and waiting for a turn. This provides a better signal than load averages for determining CPU saturation, as it does not include I/O. To interpret: an “r” value greater than the CPU count is overloaded/saturated.
  • free: Free memory in kilobytes. If there are too many digits to count, you have enough free memory. The “free -m” command, included as command 7, better explains the state of free memory.
  • si, so: Swap-ins and swap-outs. If these are non-zero, you’re out of memory.
  • us, sy, id, wa, st: These are breakdowns of CPU time, on average across all CPUs. They are user time, system time (kernel), idle, wait I/O, and stolen time (by other guests, or with Xen, the guest’s own isolated driver domain).

The CPU time breakdowns will confirm if the CPUs are busy, by adding user + system time. A constant degree of wait I/O points to a disk bottleneck; this is where the CPUs are idle, because tasks are blocked waiting for pending disk I/O. You can treat wait I/O as another form of CPU idle, one that gives a clue as to why they are idle.

System time is necessary for I/O processing. A high system time average, over 20%, can be interesting to explore further: perhaps the kernel is processing the I/O inefficiently.

In the above example, CPU time is almost entirely in user-level, pointing to application level usage instead. The CPUs are also well over 90% utilized on average. This isn’t necessarily a problem; check for the degree of saturation using the “r” column.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 322104 122480 130724 310440    0    0     1     8    2    1  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     4  205  397  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     0  187  379  0  1 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     0  179  373  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     4  187  381  0  0 99  0  0
 0  0 322104 134736 130724 310440    0    0     0     0  209  391  0  1 99  0  0
 0  0 322104 143156 130724 310440    0    0     0     0  176  374  0  0 100  0  0
 0  0 322104 143156 130728 310440    0    0     0    28  178  366  0  0 100  0  0
 0  0 322104 143156 130728 310440    0    0     0     0  171  372  0  0 100  0  0

## Now view free memory
$ free -m
              total        used        free      shared  buff/cache   available
Mem:            961         393         132          61         436         351
Swap:           634         314         320

4. mpstat -P ALL 1

This command prints CPU time breakdowns per CPU, which can be used to check for an imbalance. A single hot CPU can be evidence of a saturated single-threaded application. Nothing doing below..

## mpstat is part of sysstat - so might not be installed
$ sudo apt-get install sysstat
## Now run mpstat every 5 seconds
$ mpstat -P ALL 5
Linux 4.19.0-21-cloud-amd64 (ip-172-31-20-121) 	11/21/2022 	_x86_64_	(2 CPU)

10:08:22 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:08:23 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:08:23 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:08:23 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

5. pidstat 1

Pidstat is a little like top’s per-process summary, but prints a rolling summary instead of clearing the screen. This can be useful for watching patterns over time, and also recording what you saw (copy-n-paste) into a record of your investigation.

The below example identifies two java processes as responsible for consuming CPU. The %CPU column is the total across all CPUs; 1591% shows that that java processes is consuming almost 16 CPUs.

$ pidstat 5
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)

07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos/0
07:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave
07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java
07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java
07:41:03 PM     0      6564 1571.70    7.55    0.00 1579.25    28  java
07:41:03 PM 60004     60154    0.94    4.72    0.00    5.66     9  pidstat

07:41:03 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:04 PM     0      4214    6.00    2.00    0.00    8.00    15  mesos-slave
07:41:04 PM     0      6521 1590.00    1.00    0.00 1591.00    27  java
07:41:04 PM     0      6564 1573.00   10.00    0.00 1583.00    28  java
07:41:04 PM   108      6718    1.00    0.00    0.00    1.00     0  snmp-pass
07:41:04 PM 60004     60154    1.00    4.00    0.00    5.00     9  pidstat
^C

6. iostat -xz 1

$ iostat -xz 5
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          73.96    0.00    3.73    0.03    0.06   22.21

Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09
xvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25
xvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26
dm-0        0.00     0.00    0.69    2.32    10.47    31.69    28.01     0.01    3.23    0.71    3.98   0.13   0.04
dm-1        0.00     0.00    0.00    0.94     0.01     3.78     8.00     0.33  345.84    0.04  346.81   0.01   0.00
dm-2        0.00     0.00    0.09    0.07     1.35     0.36    22.50     0.00    2.55    0.23    5.62   1.78   0.03
[...]
^C

This is a great tool for understanding block devices (disks), both the workload applied and the resulting performance. Look for:

  • r/s, w/s, rkB/s, wkB/s: These are the delivered reads, writes, read Kbytes, and write Kbytes per second to the device. Use these for workload characterization. A performance problem may simply be due to an excessive load applied.
  • await: The average time for the I/O in milliseconds. This is the time that the application suffers, as it includes both time queued and time being serviced. Larger than expected average times can be an indicator of device saturation, or device problems.
  • avgqu-sz: The average number of requests issued to the device. Values greater than 1 can be evidence of saturation (although devices can typically operate on requests in parallel, especially virtual devices which front multiple back-end disks.)
  • %util: Device utilization. This is really a busy percent, showing the time each second that the device was doing work. Values greater than 60% typically lead to poor performance (which should be seen in await), although it depends on the device. Values close to 100% usually indicate saturation.

If the storage device is a logical disk device fronting many back-end disks, then 100% utilization may just mean that some I/O is being processed 100% of the time, however, the back-end disks may be far from saturated, and may be able to handle much more work.

Bear in mind that poor performing disk I/O isn’t necessarily an application issue. Many techniques are typically used to perform I/O asynchronously, so that the application doesn’t block and suffer the latency directly (e.g., read-ahead for reads, and buffering for writes).

7. free -m

$ free -m
             total       used       free     shared    buffers     cached
Mem:        245998      24545     221453         83         59        541
-/+ buffers/cache:      23944     222053
Swap:            0          0          0

The right two columns show:

  • buffers: For the buffer cache, used for block device I/O.
  • cached: For the page cache, used by file systems.

We just want to check that these aren’t near-zero in size, which can lead to higher disk I/O (confirm using iostat), and worse performance. The above example looks fine, with many Mbytes in each.

The “-/+ buffers/cache” provides less confusing values for used and free memory. Linux uses free memory for the caches, but can reclaim it quickly if applications need it. So in a way the cached memory should be included in the free memory column, which this line does. There’s even a website, linuxatemyram, about this confusion.

It can be additionally confusing if ZFS on Linux is used, as we do for some services, as ZFS has its own file system cache that isn’t reflected properly by the free -m columns. It can appear that the system is low on free memory, when that memory is in fact available for use from the ZFS cache as needed.

8. sar -n DEV 1

Use this tool to check network interface throughput: rxkB/s and txkB/s, as a measure of workload, and also to check if any limit has been reached. In the above example, eth0 receive is reaching 22 Mbytes/s, which is 176 Mbits/sec (well under, say, a 1 Gbit/sec limit).

This version also has %ifutil for device utilization (max of both directions for full duplex), which is something we also use Brendan’s nicstat tool to measure. And like with nicstat, this is hard to get right, and seems to not be working in this example (0.00).

$ sar -n DEV 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015     _x86_64_    (32 CPU)

12:16:48 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
12:16:49 AM      eth0  18763.00   5032.00  20686.42    478.30      0.00      0.00      0.00      0.00
12:16:49 AM        lo     14.00     14.00      1.36      1.36      0.00      0.00      0.00      0.00
12:16:49 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:16:49 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
12:16:50 AM      eth0  19763.00   5101.00  21999.10    482.56      0.00      0.00      0.00      0.00
12:16:50 AM        lo     20.00     20.00      3.25      3.25      0.00      0.00      0.00      0.00
12:16:50 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
^C

9. sar -n TCP,ETCP 1

This is a summarized view of some key TCP metrics. These include:

  • active/s: Number of locally-initiated TCP connections per second (e.g., via connect()).
  • passive/s: Number of remotely-initiated TCP connections per second (e.g., via accept()).
  • retrans/s: Number of TCP retransmits per second.

The active and passive counts are often useful as a rough measure of server load: number of new accepted connections (passive), and number of downstream connections (active). It might help to think of active as outbound, and passive as inbound, but this isn’t strictly true (e.g., consider a localhost to localhost connection).

Retransmits are a sign of a network or server issue; it may be an unreliable network (e.g., the public Internet), or it may be due a server being overloaded and dropping packets. The example above shows just one new TCP connection per-second.

$ sar -n TCP,ETCP 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)

12:17:19 AM  active/s passive/s    iseg/s    oseg/s
12:17:20 AM      1.00      0.00  10233.00  18846.00

12:17:19 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
12:17:20 AM      0.00      0.00      0.00      0.00      0.00

12:17:20 AM  active/s passive/s    iseg/s    oseg/s
12:17:21 AM      1.00      0.00   8359.00   6039.00

12:17:20 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
12:17:21 AM      0.00      0.00      0.00      0.00      0.00
^C

10. netstat

For a proper rummage into the network, you cant really beat netstat. Below are a few useful calls, including a quick summary, followed by picking out MTU issues. First get a summary:

$ netstat -s
Ip:
    Forwarding: 2
    5143907 total packets received
    4 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    5143854 incoming packets delivered
    5546420 requests sent out
Icmp:
    456 ICMP messages received
    25 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 446
        timeout in transit: 10
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
IcmpMsg:
        InType3: 446
        InType11: 10
Tcp:
    127397 active connection openings
    334839 passive connection openings
    16631 failed connection attempts
    68477 connection resets received
    1 connections established
    4973994 segments received
    6069615 segments sent out
    875032 segments retransmitted
    3229 bad segments received
    92637 resets sent
    InCsumErrors: 3224
Udp:
    169404 packets received
    0 packets to unknown port received
    0 packet receive errors
    169404 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    16631 resets received for embryonic SYN_RECV sockets
    124 packets pruned from receive queue because of socket buffer overrun
    26 ICMP packets dropped because they were out-of-window
    175789 TCP sockets finished time wait in fast timer
    309 packetes rejected in established connections because of timestamp
    123801 delayed acks sent
    132 delayed acks further delayed because of locked socket
    Quick ack mode was activated 6253 times
    22 SYNs to LISTEN sockets dropped
    704920 packet headers predicted
    1298057 acknowledgments not containing data payload received
    443211 predicted acknowledgments
    6 times recovered from packet loss due to fast retransmit
    TCPSackRecovery: 2370
    TCPSACKReneging: 5
    Detected reordering 7817 times using SACK
    Detected reordering 341 times using reno fast retransmit
    Detected reordering 257 times using time stamp
    98 congestion windows fully recovered without slow start
    250 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 106
    643 congestion windows recovered without slow start after partial ack
    TCPLostRetransmit: 9844
    23 timeouts after reno fast retransmit
    TCPSackFailures: 336
    357 timeouts in loss state
    4267 fast retransmits
    1711 retransmits in slow start
    TCPTimeouts: 765878
    TCPLossProbes: 11885
    TCPLossProbeRecovery: 2984
    TCPSackRecoveryFail: 408
    TCPDSACKOldSent: 6375
    TCPDSACKOfoSent: 32
    TCPDSACKRecv: 3057
    TCPDSACKOfoRecv: 31
    4001 connections reset due to unexpected data
    62649 connections reset due to early user close
    1185 connections aborted due to timeout
    TCPDSACKIgnoredOld: 16
    TCPDSACKIgnoredNoUndo: 1905
    TCPSpuriousRTOs: 30
    TCPSackShifted: 704
    TCPSackMerged: 2102
    TCPSackShiftFallback: 17281
    TCPBacklogDrop: 3
    TCPDeferAcceptDrop: 276406
    TCPRcvCoalesce: 230111
    TCPOFOQueue: 3829
    TCPOFOMerge: 32
    TCPChallengeACK: 1117
    TCPSYNChallenge: 5
    TCPFastOpenCookieReqd: 7
    TCPSpuriousRtxHostQueues: 2
    TCPAutoCorking: 91106
    TCPFromZeroWindowAdv: 29
    TCPToZeroWindowAdv: 29
    TCPWantZeroWindowAdv: 301
    TCPSynRetrans: 845110
    TCPOrigDataSent: 3320995
    TCPHystartTrainDetect: 113
    TCPHystartTrainCwnd: 3689
    TCPHystartDelayDetect: 53
    TCPHystartDelayCwnd: 2057
    TCPACKSkippedSynRecv: 387
    TCPACKSkippedPAWS: 145
    TCPACKSkippedSeq: 418
    TCPACKSkippedTimeWait: 179
    TCPACKSkippedChallenge: 116
    TCPWinProbe: 25
    TCPDelivered: 3265480
    TCPDeliveredCE: 4
    TCPAckCompressed: 13
IpExt:
    InOctets: 1448714037
    OutOctets: 3058374840
    InNoECTPkts: 5355501
    InECT1Pkts: 298
    InECT0Pkts: 63984

Now take a look at MTU, receiving and transferring packets in the kernel interface table:

$ netstat -i
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
ens5      9001  5188212      0      0 0       6103306      0      0      0 BMRU
lo       65536   434754      0      0 0        434754      0      0      0 LRU

If you want to quickly test your route to the server, wrt to MTU then send a don’t fragment ping request to see if you have MTU issues. Below I am testing a 1490 packet to example.com (and its successful).

$ ping -s 1490 example.com
PING example.com (93.184.216.119) 1490(1518) bytes of data.
1498 bytes from 93.184.216.119: icmp_seq=1 ttl=51 time=1119 ms
1498 bytes from 93.184.216.119: icmp_seq=2 ttl=51 time=1130 ms
1498 bytes from 93.184.216.119: icmp_seq=3 ttl=51 time=1260 ms

11. top

The top command includes many of the metrics we checked earlier. It can be handy to run it to see if anything looks wildly different from the earlier commands, which would indicate that load is variable.

A downside to top is that it is harder to see patterns over time, which may be more clear in tools like vmstat and pidstat, which provide rolling output. Evidence of intermittent issues can also be lost if you don’t pause the output quick enough (Ctrl-S to pause, Ctrl-Q to continue), and the screen clears.

$ top
top - 00:15:40 up 21:56,  1 user,  load average: 31.09, 29.87, 29.92
Tasks: 871 total,   1 running, 868 sleeping,   0 stopped,   2 zombie
%Cpu(s): 96.8 us,  0.4 sy,  0.0 ni,  2.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  25190241+total, 24921688 used, 22698073+free,    60448 buffers
KiB Swap:        0 total,        0 used,        0 free.   554208 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 20248 root      20   0  0.227t 0.012t  18748 S  3090  5.2  29812:58 java
  4213 root      20   0 2722544  64640  44232 S  23.5  0.0 233:35.37 mesos-slave
 66128 titancl+  20   0   24344   2332   1172 R   1.0  0.0   0:00.07 top
  5235 root      20   0 38.227g 547004  49996 S   0.7  0.2   2:02.74 java
  4299 root      20   0 20.015g 2.682g  16836 S   0.3  1.1  33:14.42 java
     1 root      20   0   33620   2920   1496 S   0.0  0.0   0:03.82 init
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd
     3 root      20   0       0      0      0 S   0.0  0.0   0:05.35 ksoftirqd/0
     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
     6 root      20   0       0      0      0 S   0.0  0.0   0:06.94 kworker/u256:0
     8 root      20   0       0      0      0 S   0.0  0.0   2:38.05 rcu_sched
  • PID: Shows task’s unique process id.
  • PR: The process’s priority. The lower the number, the higher the priority.
  • VIRT: Total virtual memory used by the task.
  • USER: User name of owner of task.
  • %CPU: Represents the CPU usage.
  • TIME+: CPU Time, the same as ‘TIME’, but reflecting more granularity through hundredths of a second.
  • SHR: Represents the Shared Memory size (kb) used by a task.
  • NI: Represents a Nice Value of task. A Negative nice value implies higher priority, and positive Nice value means lower priority.
  • %MEM: Shows the Memory usage of task.
  • RES: How much physical RAM the process is using, measured in kilobytes.
  • COMMAND: The name of the command that started the process.

Follow-on Analysis

There are many more commands and methodologies you can apply to drill deeper. See Brendan’s Linux Performance Tools tutorial from Velocity 2015, which works through over 40 commands, covering observability, benchmarking, tuning, static performance tuning, profiling, and tracing.

Tackling system reliability and performance problems at web scale is one of our passions. If you would like to join us in tackling these kinds of challenges we are hiring!

Mac OS X: Using nmap or sslscan to review the ciphers supported by a website

To retrieve a list of the SSL/TLS cipher suites a particular website offers you can either use sslscan or nmap

brew install sslscan
sslscan andrewbaker.ninja
Version: 2.0.15
OpenSSL 3.0.7 1 Nov 2022

Connected to 13.244.140.33

Testing SSL server andrewbaker.ninja on port 443 using SNI name andrewbaker.ninja

  SSL/TLS Protocols:
SSLv2     disabled
SSLv3     disabled
TLSv1.0   enabled
TLSv1.1   enabled
TLSv1.2   enabled
TLSv1.3   enabled

  TLS Fallback SCSV:
Server supports TLS Fallback SCSV

  TLS renegotiation:
Secure session renegotiation supported

  TLS Compression:
OpenSSL version does not support compression
Rebuild with zlib1g-dev package for zlib support

  Heartbleed:
TLSv1.3 not vulnerable to heartbleed
TLSv1.2 not vulnerable to heartbleed
TLSv1.1 not vulnerable to heartbleed
TLSv1.0 not vulnerable to heartbleed

  Supported Server Cipher(s):
Preferred TLSv1.3  256 bits  TLS_AES_256_GCM_SHA384        Curve 25519 DHE 253
Accepted  TLSv1.3  256 bits  TLS_CHACHA20_POLY1305_SHA256  Curve 25519 DHE 253
Accepted  TLSv1.3  128 bits  TLS_AES_128_GCM_SHA256        Curve 25519 DHE 253
Preferred TLSv1.2  256 bits  ECDHE-ECDSA-AES256-GCM-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-GCM-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-SHA384     Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-CAMELLIA256-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-SHA256     Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-CAMELLIA128-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-CHACHA20-POLY1305 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-CCM8       Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-CCM        Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-ARIA256-GCM-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-CCM8       Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-CCM        Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-ARIA128-GCM-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253
Preferred TLSv1.1  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.1  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253
Preferred TLSv1.0  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.0  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253

  Server Key Exchange Group(s):
TLSv1.3  128 bits  secp256r1 (NIST P-256)
TLSv1.3  192 bits  secp384r1 (NIST P-384)
TLSv1.3  260 bits  secp521r1 (NIST P-521)
TLSv1.3  128 bits  x25519
TLSv1.3  224 bits  x448
TLSv1.2  128 bits  secp256r1 (NIST P-256)

  SSL Certificate:
Signature Algorithm: sha256WithRSAEncryption
ECC Curve Name:      prime256v1
ECC Key Strength:    128

Subject:  andrewbaker.ninja
Altnames: DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
Issuer:   R3

Not valid before: Nov  4 23:00:13 2022 GMT
Not valid after:  Feb  2 23:00:12 2023 GMT

alternatively you can just use nmap (note: i use “-e en0” to bypass zscaler):

% brew install nmap
% nmap --script ssl-enum-ciphers -p 443 andrewbaker.ninja -e en0
Starting Nmap 7.93 ( https://nmap.org ) at 2022-11-19 22:30 SAST
Nmap scan report for andrewbaker.ninja (13.244.140.33)
Host is up (0.014s latency).
rDNS record for 13.244.140.33: ec2-13-244-140-33.af-south-1.compute.amazonaws.com

PORT    STATE SERVICE
443/tcp open  https
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.1:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_CBC_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_CBC_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CCM (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_ARIA_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CCM (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_ARIA_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.3:
|     ciphers:
|       TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|     cipher preference: server
|_  least strength: A

Nmap done: 1 IP address (1 host up) scanned in 1.52 seconds

Another variant (including cert dates, again “-e en0” is used to bypass zscaler):

$ nmap -e en0 --script ssl-cert -p 443 andrewbaker.ninja
Starting Nmap 7.93 ( https://nmap.org ) at 2023-06-23 18:41 SAST
Nmap scan report for andrewbaker.ninja (13.244.140.33)
Host is up (0.019s latency).
rDNS record for 13.244.140.33: ec2-13-244-140-33.af-south-1.compute.amazonaws.com

PORT    STATE SERVICE
443/tcp open  https
| ssl-cert: Subject: commonName=andrewbaker.ninja
| Subject Alternative Name: DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
| Issuer: commonName=Zscaler Intermediate Root CA (zscaler.net) (t) /organizationName=Zscaler Inc./stateOrProvinceName=California/countryName=US
| Public Key type: rsa
| Public Key bits: 2048
| Signature Algorithm: sha256WithRSAEncryption
| Not valid before: 2023-06-17T02:07:23
| Not valid after:  2023-07-01T02:07:23
| MD5:   a20b5ae2900569601de116b49b7a29bd
|_SHA-1: 27d681607f0ccffbec6e303d14d6d41fd24c0851

Nmap done: 1 IP address (1 host up) scanned in 0.59 seconds