Linux: Find the maximum packet size (MTU) between two hosts (using do not fragment flag)


If you have ever tried to use jumbo packets, or trace a weird slowness on the network – one of the things that frequently comes up is packet fragmentation. This is basically where a source machine is sending bigger packets than can be consumed along its pathway to a destination machine. This means the packets will need to be split up. This causes a host of performance issues.

So how do you diagnose this? Well Ping is your friend. It allows you to flag packets to not be fragmented and specify a minimum and maximum packet size. The example below sends a 1460 byte do not fragment packet from the host to example.com:

$ ping -M do -s 1460 example.com 
PING example.com (93.184.216.34) 1460(1488) bytes of data. 
1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=1 ttl=45 time=223 ms
1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=2 ttl=45 time=223 ms 1468 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=3 ttl=45 time=223 ms

Taking the example above and running on a Macbook/OSX:

$ ping -D -s 1460 example.com
PING example.com (93.184.216.34): 1460 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4

The maximum packet size over the internet is 1500 bytes. So 1490 should be fine, right?

$ ping -M do -s 1490 example.com 
PING example.com (93.184.216.34) 1490(1518) bytes of data. 
ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500

The same test on Macbook/OSX:

$ ping -D -s 1490 example.com
PING example.com (93.184.216.34): 1460 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4

As you can see, this breaks beneath the expected 1500 byte packet size. Running “ping -M do -s 1490 example.com” says that the ICMP data size is 1490 bytes and fragmentation is not allowed. But remember the size of ICMP data, ICMP size (i.e., header + data) will be 1498 bytes. Next you need to add the IP header and so the new frame size becomes 1518 bytes. The frame size can’t exceed MTU size of the interface and you can see this in the error message (MTU for the interface is 1500 bytes). Without fragmentation, this message can’t be sent. Since fragmentation is not allowed, ping fails saying message is too long.

Ok, so what if I do this?

$ ping -M want -s 1490 example.com 
PING example.com (93.184.216.34) 1490(1518) bytes of data. 
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=1 ttl=45 time=223 ms
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=2 ttl=45 time=223 ms
1498 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=3 ttl=45 time=223 ms

Ok, why did this work? Well -M want will allow local fragmentation.

Mac OS X: Find the maximum unfragmented packet size (MTU) to reach a host

If you have ever tried to use jumbo packets, or trace a weird slowness on the network – one of the things that frequently comes up is packet fragmentation. This is basically where a source machine is sending bigger packets than can be consumed along its pathway to a destination machine. This means the packets will need to be split and essentially causes a host of performance issues.

So how do you diagnose this? Well Ping is your friend. It allows you to flag packets to not be fragmented and specify a minimum and maximum packet size. Using this you can simply loop through test packet sizes until a packet fails and then you have your MTU.

The command below sends packets from 1350 to 1520 and increases the packet size by 10 bytes each time.

ping -g 1350 -G 1520 -h 10 -D andrewbaker.ninja

Linux: Diagnose your linux server in under a minute using standard (free) command line tools

Imagine you have trauma and could figure out whats causing in under one minute. Obviously, the preference is an observability platform – but for my little wordpress site I don’t really have the budget. So I just use a few tools to isolate common issues. The idea behind this blog is to quickly isolate the fault by looking for errors and saturation metrics, as they are both easy to interpret, and then check overall resource utilisation.

Note: Some of these commands require the sysstat package installed.

1. uptime

Might seem like an odd choice, but uptime actually provides more than just uptime. It is a quick way to view the average loads (over the last 15 mins) and indicate the number of processes waiting to run.

~$ uptime
 08:38:49 up 87 days, 18:31,  1 user,  load average: 70.34, 25.02, 0.00

The last three blocks show a marked increase in load. The increments are 1, 5 and 15 mins sample times. So something is definitely going on… maybe my web site went viral!!!

2. dmesg | tail

dmesg views the last 10 system messages (if there are any). Look for errors that can cause performance issues. The example above includes the oom-killer, and TCP dropping a request. If you don’t know what a message means then gify. Note: you can modify the tail size by changing the numeric and you don’t need sudo if you host is properly setup (unlike mine).

$ sudo dmesg | tail 10
[    3.453000] audit: type=1400 audit(1661436454.032:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/haveged" pid=293 comm="apparmor_parser"
[    3.466526] audit: type=1400 audit(1661436454.044:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=294 comm="apparmor_parser"
[    3.482004] audit: type=1400 audit(1661436454.044:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=294 comm="apparmor_parser"
[    3.496937] audit: type=1400 audit(1661436454.044:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=294 comm="apparmor_parser"
[    3.510178] audit: type=1400 audit(1661436454.084:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/chronyd" pid=292 comm="apparmor_parser"
[    4.310300] IPv6: ADDRCONF(NETDEV_UP): ens5: link is not ready
[    5.223697] IPv6: ADDRCONF(NETDEV_CHANGE): ens5: link becomes ready
[   24.859623] Adding 649996k swap on /mnt/.bitnami.swap.  Priority:-2 extents:15 across:1321740k SSFS
[1440586.071042] device-mapper: uevent: version 1.0.3
[1440586.075493] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com

3. vmstat 1

vmstat is short for virtual memory statistics. vmstat was run with an argument of 1 which means it will print rolling one second summaries (until you hit Ctrl + C). The first line of output (in this version of vmstat) has some columns that show the average since boot, instead of the previous second. For now, skip the first line, unless you want to learn and remember which column is which.

Columns to check:

  • r: Number of processes running on CPU and waiting for a turn. This provides a better signal than load averages for determining CPU saturation, as it does not include I/O. To interpret: an “r” value greater than the CPU count is overloaded/saturated.
  • free: Free memory in kilobytes. If there are too many digits to count, you have enough free memory. The “free -m” command, included as command 7, better explains the state of free memory.
  • si, so: Swap-ins and swap-outs. If these are non-zero, you’re out of memory.
  • us, sy, id, wa, st: These are breakdowns of CPU time, on average across all CPUs. They are user time, system time (kernel), idle, wait I/O, and stolen time (by other guests, or with Xen, the guest’s own isolated driver domain).

The CPU time breakdowns will confirm if the CPUs are busy, by adding user + system time. A constant degree of wait I/O points to a disk bottleneck; this is where the CPUs are idle, because tasks are blocked waiting for pending disk I/O. You can treat wait I/O as another form of CPU idle, one that gives a clue as to why they are idle.

System time is necessary for I/O processing. A high system time average, over 20%, can be interesting to explore further: perhaps the kernel is processing the I/O inefficiently.

In the above example, CPU time is almost entirely in user-level, pointing to application level usage instead. The CPUs are also well over 90% utilized on average. This isn’t necessarily a problem; check for the degree of saturation using the “r” column.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 322104 122480 130724 310440    0    0     1     8    2    1  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     4  205  397  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     0  187  379  0  1 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     0  179  373  0  0 100  0  0
 0  0 322104 122472 130724 310440    0    0     0     4  187  381  0  0 99  0  0
 0  0 322104 134736 130724 310440    0    0     0     0  209  391  0  1 99  0  0
 0  0 322104 143156 130724 310440    0    0     0     0  176  374  0  0 100  0  0
 0  0 322104 143156 130728 310440    0    0     0    28  178  366  0  0 100  0  0
 0  0 322104 143156 130728 310440    0    0     0     0  171  372  0  0 100  0  0

## Now view free memory
$ free -m
              total        used        free      shared  buff/cache   available
Mem:            961         393         132          61         436         351
Swap:           634         314         320

4. mpstat -P ALL 1

This command prints CPU time breakdowns per CPU, which can be used to check for an imbalance. A single hot CPU can be evidence of a saturated single-threaded application. Nothing doing below..

## mpstat is part of sysstat - so might not be installed
$ sudo apt-get install sysstat
## Now run mpstat every 5 seconds
$ mpstat -P ALL 5
Linux 4.19.0-21-cloud-amd64 (ip-172-31-20-121) 	11/21/2022 	_x86_64_	(2 CPU)

10:08:22 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:08:23 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:08:23 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:08:23 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

5. pidstat 1

Pidstat is a little like top’s per-process summary, but prints a rolling summary instead of clearing the screen. This can be useful for watching patterns over time, and also recording what you saw (copy-n-paste) into a record of your investigation.

The below example identifies two java processes as responsible for consuming CPU. The %CPU column is the total across all CPUs; 1591% shows that that java processes is consuming almost 16 CPUs.

$ pidstat 5
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)

07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos/0
07:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave
07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java
07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java
07:41:03 PM     0      6564 1571.70    7.55    0.00 1579.25    28  java
07:41:03 PM 60004     60154    0.94    4.72    0.00    5.66     9  pidstat

07:41:03 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:04 PM     0      4214    6.00    2.00    0.00    8.00    15  mesos-slave
07:41:04 PM     0      6521 1590.00    1.00    0.00 1591.00    27  java
07:41:04 PM     0      6564 1573.00   10.00    0.00 1583.00    28  java
07:41:04 PM   108      6718    1.00    0.00    0.00    1.00     0  snmp-pass
07:41:04 PM 60004     60154    1.00    4.00    0.00    5.00     9  pidstat
^C

6. iostat -xz 1

$ iostat -xz 5
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          73.96    0.00    3.73    0.03    0.06   22.21

Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09
xvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25
xvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26
dm-0        0.00     0.00    0.69    2.32    10.47    31.69    28.01     0.01    3.23    0.71    3.98   0.13   0.04
dm-1        0.00     0.00    0.00    0.94     0.01     3.78     8.00     0.33  345.84    0.04  346.81   0.01   0.00
dm-2        0.00     0.00    0.09    0.07     1.35     0.36    22.50     0.00    2.55    0.23    5.62   1.78   0.03
[...]
^C

This is a great tool for understanding block devices (disks), both the workload applied and the resulting performance. Look for:

  • r/s, w/s, rkB/s, wkB/s: These are the delivered reads, writes, read Kbytes, and write Kbytes per second to the device. Use these for workload characterization. A performance problem may simply be due to an excessive load applied.
  • await: The average time for the I/O in milliseconds. This is the time that the application suffers, as it includes both time queued and time being serviced. Larger than expected average times can be an indicator of device saturation, or device problems.
  • avgqu-sz: The average number of requests issued to the device. Values greater than 1 can be evidence of saturation (although devices can typically operate on requests in parallel, especially virtual devices which front multiple back-end disks.)
  • %util: Device utilization. This is really a busy percent, showing the time each second that the device was doing work. Values greater than 60% typically lead to poor performance (which should be seen in await), although it depends on the device. Values close to 100% usually indicate saturation.

If the storage device is a logical disk device fronting many back-end disks, then 100% utilization may just mean that some I/O is being processed 100% of the time, however, the back-end disks may be far from saturated, and may be able to handle much more work.

Bear in mind that poor performing disk I/O isn’t necessarily an application issue. Many techniques are typically used to perform I/O asynchronously, so that the application doesn’t block and suffer the latency directly (e.g., read-ahead for reads, and buffering for writes).

7. free -m

$ free -m
             total       used       free     shared    buffers     cached
Mem:        245998      24545     221453         83         59        541
-/+ buffers/cache:      23944     222053
Swap:            0          0          0

The right two columns show:

  • buffers: For the buffer cache, used for block device I/O.
  • cached: For the page cache, used by file systems.

We just want to check that these aren’t near-zero in size, which can lead to higher disk I/O (confirm using iostat), and worse performance. The above example looks fine, with many Mbytes in each.

The “-/+ buffers/cache” provides less confusing values for used and free memory. Linux uses free memory for the caches, but can reclaim it quickly if applications need it. So in a way the cached memory should be included in the free memory column, which this line does. There’s even a website, linuxatemyram, about this confusion.

It can be additionally confusing if ZFS on Linux is used, as we do for some services, as ZFS has its own file system cache that isn’t reflected properly by the free -m columns. It can appear that the system is low on free memory, when that memory is in fact available for use from the ZFS cache as needed.

8. sar -n DEV 1

Use this tool to check network interface throughput: rxkB/s and txkB/s, as a measure of workload, and also to check if any limit has been reached. In the above example, eth0 receive is reaching 22 Mbytes/s, which is 176 Mbits/sec (well under, say, a 1 Gbit/sec limit).

This version also has %ifutil for device utilization (max of both directions for full duplex), which is something we also use Brendan’s nicstat tool to measure. And like with nicstat, this is hard to get right, and seems to not be working in this example (0.00).

$ sar -n DEV 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015     _x86_64_    (32 CPU)

12:16:48 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
12:16:49 AM      eth0  18763.00   5032.00  20686.42    478.30      0.00      0.00      0.00      0.00
12:16:49 AM        lo     14.00     14.00      1.36      1.36      0.00      0.00      0.00      0.00
12:16:49 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:16:49 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
12:16:50 AM      eth0  19763.00   5101.00  21999.10    482.56      0.00      0.00      0.00      0.00
12:16:50 AM        lo     20.00     20.00      3.25      3.25      0.00      0.00      0.00      0.00
12:16:50 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
^C

9. sar -n TCP,ETCP 1

This is a summarized view of some key TCP metrics. These include:

  • active/s: Number of locally-initiated TCP connections per second (e.g., via connect()).
  • passive/s: Number of remotely-initiated TCP connections per second (e.g., via accept()).
  • retrans/s: Number of TCP retransmits per second.

The active and passive counts are often useful as a rough measure of server load: number of new accepted connections (passive), and number of downstream connections (active). It might help to think of active as outbound, and passive as inbound, but this isn’t strictly true (e.g., consider a localhost to localhost connection).

Retransmits are a sign of a network or server issue; it may be an unreliable network (e.g., the public Internet), or it may be due a server being overloaded and dropping packets. The example above shows just one new TCP connection per-second.

$ sar -n TCP,ETCP 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)

12:17:19 AM  active/s passive/s    iseg/s    oseg/s
12:17:20 AM      1.00      0.00  10233.00  18846.00

12:17:19 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
12:17:20 AM      0.00      0.00      0.00      0.00      0.00

12:17:20 AM  active/s passive/s    iseg/s    oseg/s
12:17:21 AM      1.00      0.00   8359.00   6039.00

12:17:20 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
12:17:21 AM      0.00      0.00      0.00      0.00      0.00
^C

10. netstat

For a proper rummage into the network, you cant really beat netstat. Below are a few useful calls, including a quick summary, followed by picking out MTU issues. First get a summary:

$ netstat -s
Ip:
    Forwarding: 2
    5143907 total packets received
    4 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    5143854 incoming packets delivered
    5546420 requests sent out
Icmp:
    456 ICMP messages received
    25 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 446
        timeout in transit: 10
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
IcmpMsg:
        InType3: 446
        InType11: 10
Tcp:
    127397 active connection openings
    334839 passive connection openings
    16631 failed connection attempts
    68477 connection resets received
    1 connections established
    4973994 segments received
    6069615 segments sent out
    875032 segments retransmitted
    3229 bad segments received
    92637 resets sent
    InCsumErrors: 3224
Udp:
    169404 packets received
    0 packets to unknown port received
    0 packet receive errors
    169404 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    16631 resets received for embryonic SYN_RECV sockets
    124 packets pruned from receive queue because of socket buffer overrun
    26 ICMP packets dropped because they were out-of-window
    175789 TCP sockets finished time wait in fast timer
    309 packetes rejected in established connections because of timestamp
    123801 delayed acks sent
    132 delayed acks further delayed because of locked socket
    Quick ack mode was activated 6253 times
    22 SYNs to LISTEN sockets dropped
    704920 packet headers predicted
    1298057 acknowledgments not containing data payload received
    443211 predicted acknowledgments
    6 times recovered from packet loss due to fast retransmit
    TCPSackRecovery: 2370
    TCPSACKReneging: 5
    Detected reordering 7817 times using SACK
    Detected reordering 341 times using reno fast retransmit
    Detected reordering 257 times using time stamp
    98 congestion windows fully recovered without slow start
    250 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 106
    643 congestion windows recovered without slow start after partial ack
    TCPLostRetransmit: 9844
    23 timeouts after reno fast retransmit
    TCPSackFailures: 336
    357 timeouts in loss state
    4267 fast retransmits
    1711 retransmits in slow start
    TCPTimeouts: 765878
    TCPLossProbes: 11885
    TCPLossProbeRecovery: 2984
    TCPSackRecoveryFail: 408
    TCPDSACKOldSent: 6375
    TCPDSACKOfoSent: 32
    TCPDSACKRecv: 3057
    TCPDSACKOfoRecv: 31
    4001 connections reset due to unexpected data
    62649 connections reset due to early user close
    1185 connections aborted due to timeout
    TCPDSACKIgnoredOld: 16
    TCPDSACKIgnoredNoUndo: 1905
    TCPSpuriousRTOs: 30
    TCPSackShifted: 704
    TCPSackMerged: 2102
    TCPSackShiftFallback: 17281
    TCPBacklogDrop: 3
    TCPDeferAcceptDrop: 276406
    TCPRcvCoalesce: 230111
    TCPOFOQueue: 3829
    TCPOFOMerge: 32
    TCPChallengeACK: 1117
    TCPSYNChallenge: 5
    TCPFastOpenCookieReqd: 7
    TCPSpuriousRtxHostQueues: 2
    TCPAutoCorking: 91106
    TCPFromZeroWindowAdv: 29
    TCPToZeroWindowAdv: 29
    TCPWantZeroWindowAdv: 301
    TCPSynRetrans: 845110
    TCPOrigDataSent: 3320995
    TCPHystartTrainDetect: 113
    TCPHystartTrainCwnd: 3689
    TCPHystartDelayDetect: 53
    TCPHystartDelayCwnd: 2057
    TCPACKSkippedSynRecv: 387
    TCPACKSkippedPAWS: 145
    TCPACKSkippedSeq: 418
    TCPACKSkippedTimeWait: 179
    TCPACKSkippedChallenge: 116
    TCPWinProbe: 25
    TCPDelivered: 3265480
    TCPDeliveredCE: 4
    TCPAckCompressed: 13
IpExt:
    InOctets: 1448714037
    OutOctets: 3058374840
    InNoECTPkts: 5355501
    InECT1Pkts: 298
    InECT0Pkts: 63984

Now take a look at MTU, receiving and transferring packets in the kernel interface table:

$ netstat -i
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
ens5      9001  5188212      0      0 0       6103306      0      0      0 BMRU
lo       65536   434754      0      0 0        434754      0      0      0 LRU

If you want to quickly test your route to the server, wrt to MTU then send a don’t fragment ping request to see if you have MTU issues. Below I am testing a 1490 packet to example.com (and its successful).

$ ping -s 1490 example.com
PING example.com (93.184.216.119) 1490(1518) bytes of data.
1498 bytes from 93.184.216.119: icmp_seq=1 ttl=51 time=1119 ms
1498 bytes from 93.184.216.119: icmp_seq=2 ttl=51 time=1130 ms
1498 bytes from 93.184.216.119: icmp_seq=3 ttl=51 time=1260 ms

11. top

The top command includes many of the metrics we checked earlier. It can be handy to run it to see if anything looks wildly different from the earlier commands, which would indicate that load is variable.

A downside to top is that it is harder to see patterns over time, which may be more clear in tools like vmstat and pidstat, which provide rolling output. Evidence of intermittent issues can also be lost if you don’t pause the output quick enough (Ctrl-S to pause, Ctrl-Q to continue), and the screen clears.

$ top
top - 00:15:40 up 21:56,  1 user,  load average: 31.09, 29.87, 29.92
Tasks: 871 total,   1 running, 868 sleeping,   0 stopped,   2 zombie
%Cpu(s): 96.8 us,  0.4 sy,  0.0 ni,  2.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  25190241+total, 24921688 used, 22698073+free,    60448 buffers
KiB Swap:        0 total,        0 used,        0 free.   554208 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 20248 root      20   0  0.227t 0.012t  18748 S  3090  5.2  29812:58 java
  4213 root      20   0 2722544  64640  44232 S  23.5  0.0 233:35.37 mesos-slave
 66128 titancl+  20   0   24344   2332   1172 R   1.0  0.0   0:00.07 top
  5235 root      20   0 38.227g 547004  49996 S   0.7  0.2   2:02.74 java
  4299 root      20   0 20.015g 2.682g  16836 S   0.3  1.1  33:14.42 java
     1 root      20   0   33620   2920   1496 S   0.0  0.0   0:03.82 init
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd
     3 root      20   0       0      0      0 S   0.0  0.0   0:05.35 ksoftirqd/0
     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
     6 root      20   0       0      0      0 S   0.0  0.0   0:06.94 kworker/u256:0
     8 root      20   0       0      0      0 S   0.0  0.0   2:38.05 rcu_sched
  • PID: Shows task’s unique process id.
  • PR: The process’s priority. The lower the number, the higher the priority.
  • VIRT: Total virtual memory used by the task.
  • USER: User name of owner of task.
  • %CPU: Represents the CPU usage.
  • TIME+: CPU Time, the same as ‘TIME’, but reflecting more granularity through hundredths of a second.
  • SHR: Represents the Shared Memory size (kb) used by a task.
  • NI: Represents a Nice Value of task. A Negative nice value implies higher priority, and positive Nice value means lower priority.
  • %MEM: Shows the Memory usage of task.
  • RES: How much physical RAM the process is using, measured in kilobytes.
  • COMMAND: The name of the command that started the process.

Follow-on Analysis

There are many more commands and methodologies you can apply to drill deeper. See Brendan’s Linux Performance Tools tutorial from Velocity 2015, which works through over 40 commands, covering observability, benchmarking, tuning, static performance tuning, profiling, and tracing.

Tackling system reliability and performance problems at web scale is one of our passions. If you would like to join us in tackling these kinds of challenges we are hiring!

Mac OS X: Using nmap or sslscan to review the ciphers supported by a website

To retrieve a list of the SSL/TLS cipher suites a particular website offers you can either use sslscan or nmap

brew install sslscan
sslscan andrewbaker.ninja
Version: 2.0.15
OpenSSL 3.0.7 1 Nov 2022

Connected to 13.244.140.33

Testing SSL server andrewbaker.ninja on port 443 using SNI name andrewbaker.ninja

  SSL/TLS Protocols:
SSLv2     disabled
SSLv3     disabled
TLSv1.0   enabled
TLSv1.1   enabled
TLSv1.2   enabled
TLSv1.3   enabled

  TLS Fallback SCSV:
Server supports TLS Fallback SCSV

  TLS renegotiation:
Secure session renegotiation supported

  TLS Compression:
OpenSSL version does not support compression
Rebuild with zlib1g-dev package for zlib support

  Heartbleed:
TLSv1.3 not vulnerable to heartbleed
TLSv1.2 not vulnerable to heartbleed
TLSv1.1 not vulnerable to heartbleed
TLSv1.0 not vulnerable to heartbleed

  Supported Server Cipher(s):
Preferred TLSv1.3  256 bits  TLS_AES_256_GCM_SHA384        Curve 25519 DHE 253
Accepted  TLSv1.3  256 bits  TLS_CHACHA20_POLY1305_SHA256  Curve 25519 DHE 253
Accepted  TLSv1.3  128 bits  TLS_AES_128_GCM_SHA256        Curve 25519 DHE 253
Preferred TLSv1.2  256 bits  ECDHE-ECDSA-AES256-GCM-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-GCM-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-SHA384     Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-CAMELLIA256-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-SHA256     Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-CAMELLIA128-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-CHACHA20-POLY1305 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-CCM8       Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-CCM        Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-ARIA256-GCM-SHA384 Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-CCM8       Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-CCM        Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-ARIA128-GCM-SHA256 Curve 25519 DHE 253
Accepted  TLSv1.2  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.2  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253
Preferred TLSv1.1  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.1  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253
Preferred TLSv1.0  256 bits  ECDHE-ECDSA-AES256-SHA        Curve 25519 DHE 253
Accepted  TLSv1.0  128 bits  ECDHE-ECDSA-AES128-SHA        Curve 25519 DHE 253

  Server Key Exchange Group(s):
TLSv1.3  128 bits  secp256r1 (NIST P-256)
TLSv1.3  192 bits  secp384r1 (NIST P-384)
TLSv1.3  260 bits  secp521r1 (NIST P-521)
TLSv1.3  128 bits  x25519
TLSv1.3  224 bits  x448
TLSv1.2  128 bits  secp256r1 (NIST P-256)

  SSL Certificate:
Signature Algorithm: sha256WithRSAEncryption
ECC Curve Name:      prime256v1
ECC Key Strength:    128

Subject:  andrewbaker.ninja
Altnames: DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
Issuer:   R3

Not valid before: Nov  4 23:00:13 2022 GMT
Not valid after:  Feb  2 23:00:12 2023 GMT

alternatively you can just use nmap (note: i use “-e en0” to bypass zscaler):

% brew install nmap
% nmap --script ssl-enum-ciphers -p 443 andrewbaker.ninja -e en0
Starting Nmap 7.93 ( https://nmap.org ) at 2022-11-19 22:30 SAST
Nmap scan report for andrewbaker.ninja (13.244.140.33)
Host is up (0.014s latency).
rDNS record for 13.244.140.33: ec2-13-244-140-33.af-south-1.compute.amazonaws.com

PORT    STATE SERVICE
443/tcp open  https
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.1:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_CBC_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_CBC_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CCM (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_ARIA_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CCM (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_ARIA_128_GCM_SHA256 (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
|     compressors:
|       NULL
|     cipher preference: server
|   TLSv1.3:
|     ciphers:
|       TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
|       TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
|     cipher preference: server
|_  least strength: A

Nmap done: 1 IP address (1 host up) scanned in 1.52 seconds

Another variant (including cert dates, again “-e en0” is used to bypass zscaler):

$ nmap -e en0 --script ssl-cert -p 443 andrewbaker.ninja
Starting Nmap 7.93 ( https://nmap.org ) at 2023-06-23 18:41 SAST
Nmap scan report for andrewbaker.ninja (13.244.140.33)
Host is up (0.019s latency).
rDNS record for 13.244.140.33: ec2-13-244-140-33.af-south-1.compute.amazonaws.com

PORT    STATE SERVICE
443/tcp open  https
| ssl-cert: Subject: commonName=andrewbaker.ninja
| Subject Alternative Name: DNS:andrewbaker.ninja, DNS:www.andrewbaker.ninja
| Issuer: commonName=Zscaler Intermediate Root CA (zscaler.net) (t) /organizationName=Zscaler Inc./stateOrProvinceName=California/countryName=US
| Public Key type: rsa
| Public Key bits: 2048
| Signature Algorithm: sha256WithRSAEncryption
| Not valid before: 2023-06-17T02:07:23
| Not valid after:  2023-07-01T02:07:23
| MD5:   a20b5ae2900569601de116b49b7a29bd
|_SHA-1: 27d681607f0ccffbec6e303d14d6d41fd24c0851

Nmap done: 1 IP address (1 host up) scanned in 0.59 seconds

Mac OS X or Linux: Use terminal to get http/https response headers of a url using the curl command

Web devs need to know the http headers their apps/webpages. This can be easily achieved using a browser plugin for Chrome or Firefox. But I prefer to use the command terminal, and curl makes this really easy.

curl -I andrewbaker.ninja
HTTP/1.1 302 Found
Date: Thu, 17 Nov 2022 14:01:53 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Location: http://ec2-13-246-2-19.af-south-1.compute.amazonaws.com/
Connection: close
Content-Type: text/html; charset=iso-8859-1

## Alternative
url --head http://ec2-13-246-2-19.af-south-1.compute.amazonaws.com
HTTP/1.1 200 OK
Date: Thu, 17 Nov 2022 14:08:36 GMT
Server: Apache
X-Powered-By: PHP/7.3.18
Link: <http://ec2-13-246-2-19.af-south-1.compute.amazonaws.com/wp-json/>; rel="https://api.w.org/", <http://ec2-13-246-2-19.af-south-1.compute.amazonaws.com/wp-json/wp/v2/pages/78>; rel="alternate"; type="application/json", <http://ec2-13-246-2-19.af-south-1.compute.amazonaws.com/>; rel=shortlink
X-Frame-Options: SAMEORIGIN
Cache-Control: max-age=0, no-cache
Connection: close
Content-Type: text/html; charset=UTF-8

Macbook: Exploring DNS using DIG (Domain Information Groper)

DIG is an awesome command line utility to explore DNS. Below is a quick guide to get you started.

Query Specific Name Server

By default, if no name server is specified, dig will use the servers listed in /etc/resolv.conf file. To view the default server use:

% cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver 100.64.0.1

You can override the name server against which the query will be executed, use the @ (at) symbol followed by the name server IP address or hostname.

For example, to query the Google name server (8.8.8.8) for information about andrewbaker.ninja you would use:

% dig andrewbaker.ninja @8.8.8.8

; <<>> DiG 9.10.6 <<>> andrewbaker.ninja @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33993
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;andrewbaker.ninja.		IN	A

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33

;; Query time: 1099 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Nov 17 11:26:55 SAST 2022
;; MSG SIZE  rcvd: 62

Get a Short Answer

To get a short answer to your query, use the +short option:

% dig andrewbaker.ninja +short
13.244.140.33

Query a Record Type

Dig allows you to perform any valid DNS query by appending the record type to the end of the query. In the following section, we will show you examples of how to search for the most common records, such as A (the IP address), CNAME (canonical name), TXT (text record), MX (mail exchanger), and NS (name servers).

Querying A records

To get a list of all the address(es) for a domain name, use the a option:

% dig +nocmd andrewbaker.ninja a +noall +answer
andrewbaker.ninja.	156	IN	A	13.244.140.33

Querying CNAME records

To find the alias domain name use the cname option:

dig +nocmd mail.google.com cname +noall +answer
mail.google.com.	553482	IN	CNAME	googlemail.l.google.com.

Querying TXT records

Use the txt option to retrieve all the TXT records for a specific domain:

% dig +nocmd google.com txt +noall +answer
google.com.		3600	IN	TXT	"globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com.		3600	IN	TXT	"MS=E4A68B9AB2BB9670BCE15412F62916164C0B20BB"
google.com.		3600	IN	TXT	"docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com.		3600	IN	TXT	"onetrust-domain-verification=de01ed21f2fa4d8781cbc3ffb89cf4ef"
google.com.		3600	IN	TXT	"apple-domain-verification=30afIBcvSuDV2PLX"
google.com.		3600	IN	TXT	"google-site-verification=TV9-DBe4R80X4v0M4U_bd_J9cpOJM0nikft0jAgjmsQ"
google.com.		3600	IN	TXT	"facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com.		3600	IN	TXT	"webexdomainverification.8YX6G=6e6922db-e3e6-4a36-904e-a805c28087fa"
google.com.		3600	IN	TXT	"docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
google.com.		3600	IN	TXT	"v=spf1 include:_spf.google.com ~all"
google.com.		3600	IN	TXT	"atlassian-domain-verification=5YjTmWmjI92ewqkx2oXmBaD60Td9zWon9r6eakvHX6B77zzkFQto8PQ9QsKnbf4I"
google.com.		3600	IN	TXT	"google-site-verification=wD8N7i1JTNTkezJ49swvWW48f8_9xveREV4oB-0Hf5o"

Querying MX records

To get a list of all the mail servers for a specific domain using the mx option:

% dig +nocmd google.com mx +noall +answer
google.com.		48	IN	MX	10 smtp.google.com.

Querying All Records

Use the any option to get a list of all DNS records for a specific domain:

dig +nocmd andrewbaker.ninja any +noall +answer
andrewbaker.ninja.	300	IN	A	13.244.140.33
andrewbaker.ninja.	21600	IN	NS	ns-1254.awsdns-28.org.
andrewbaker.ninja.	21600	IN	NS	ns-1514.awsdns-61.org.
andrewbaker.ninja.	21600	IN	NS	ns-1728.awsdns-24.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-1875.awsdns-42.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-491.awsdns-61.com.
andrewbaker.ninja.	21600	IN	NS	ns-496.awsdns-62.com.
andrewbaker.ninja.	21600	IN	NS	ns-533.awsdns-02.net.
andrewbaker.ninja.	21600	IN	NS	ns-931.awsdns-52.net.
andrewbaker.ninja.	900	IN	SOA	ns-1363.awsdns-42.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

Tracing DNS Resolution

DNS query resolution follows a simple recursive process outlined below:

  1. You as the DNS client (or stub resolver) query your recursive resolver for www.example.com.
  2. Your recursive resolver queries the root name server for www.example.com.
  3. The root name server refers your recursive resolver to the .com Top-Level Domain (TLD) authoritative server.
  4. Your recursive resolver queries the .com TLD authoritative server for www.example.com.
  5. The .com TLD authoritative server refers your recursive server to the authoritative servers for example.com.
  6. Your recursive resolver queries the authoritative servers for www.example.com, and receives 1.2.3.4 as the answer.
  7. Your recursive resolver caches the answer for the duration of the time to live (TTL) specified on the record, and returns it to you.

Below is an example trace:

% dig +trace andrewbaker.ninja

; <<>> DiG 9.10.6 <<>> +trace andrewbaker.ninja
;; global options: +cmd
.			62163	IN	NS	g.root-servers.net.
.			62163	IN	NS	j.root-servers.net.
.			62163	IN	NS	e.root-servers.net.
.			62163	IN	NS	l.root-servers.net.
.			62163	IN	NS	d.root-servers.net.
.			62163	IN	NS	a.root-servers.net.
.			62163	IN	NS	b.root-servers.net.
.			62163	IN	NS	i.root-servers.net.
.			62163	IN	NS	m.root-servers.net.
.			62163	IN	NS	h.root-servers.net.
.			62163	IN	NS	c.root-servers.net.
.			62163	IN	NS	k.root-servers.net.
.			62163	IN	NS	f.root-servers.net.
.			62163	IN	RRSIG	NS 8 0 518400 20221129170000 20221116160000 18733 . MbE0OpdxRbInDK0olZm8n585L4oPq3q8iVbn/O0S7bfelS9wauhHQnnY Ifuj3D6Owp6R7H2Om6utfeB2kjrocJG9ZQPy0UQhWvgcFp9I4KnWRr1L H/yvmSM2EejR7kQHp4OBrb55RBsX4tojvr1UU+fWRuy988prwBVBdKj6 EElNwteQCosJHxVzqP0z6UpP9i5rUkRNGOD7OvdwF8ynBV93F4FpOI9r yuKzz0hdE3YAQJztOY84VuLkXM2DPs51LR6ftibxswUwoeUg04QUS7py gzn1z9en99oUgX+Lic6fLKc5Q0LpeZGhW0qBCY2CB9KEaRth+ZCD6WEU tjOBCw==
;; Received 525 bytes from 8.8.8.8#53(8.8.8.8) in 249 ms

ninja.			172800	IN	NS	v0n2.nic.ninja.
ninja.			172800	IN	NS	v2n1.nic.ninja.
ninja.			172800	IN	NS	v0n0.nic.ninja.
ninja.			172800	IN	NS	v0n1.nic.ninja.
ninja.			172800	IN	NS	v2n0.nic.ninja.
ninja.			172800	IN	NS	v0n3.nic.ninja.
ninja.			86400	IN	DS	46082 8 2 C8F816A7A575BDB2F997F682AAB2653BA2CB5EDDB69B036A30742A33 BEFAF141
ninja.			86400	IN	RRSIG	DS 8 1 86400 20221130050000 20221117040000 18733 . xoEolCAm4d+f6LxulPa/lnCwKuwWLPI8LzlgmOVvMNL7z8J/21FqTWBu 4tZT8KZTciAvcTcRo3TDAg0Qr48QvJI30ld4yYa81HGHpVKVuTSoNCtn FnxvCuZmqDY+aFM/zn9jSTdCcT8EhwLJrsHq/zj/iasymLZ/UvanJo8j X/PRSorGfWJjUeDSSjCOpOITjRLqzHeBcY9+Qpf7O5fDguqtkhzc/8pS qKmjUh2B+yJA4QgDSaoxdv9LRQIvdSL1Iwq9eAXnl9azJy3GbVIUVZCw bA8ZsFYhw9sQbk39ZDi3K4pS717uymh4RBlk4r/5EuqdKBpWFYdOW4ZC EGDBcg==
;; Received 763 bytes from 198.41.0.4#53(a.root-servers.net) in 285 ms

andrewbaker.ninja.	3600	IN	NS	ns-1363.awsdns-42.org.
andrewbaker.ninja.	3600	IN	NS	ns-1745.awsdns-26.co.uk.
andrewbaker.ninja.	3600	IN	NS	ns-462.awsdns-57.com.
andrewbaker.ninja.	3600	IN	NS	ns-983.awsdns-58.net.
4vnuq0b3phnjevus6h4meuj446b44iqj.ninja.	3600 IN	NSEC3 1 1 10 332539EE7F95C32A 4VVVNRI7K3EH48N753IKM6TUI5G921J7  NS SOA RRSIG DNSKEY NSEC3PARAM
4vnuq0b3phnjevus6h4meuj446b44iqj.ninja.	3600 IN	RRSIG NSEC3 8 2 3600 20221208121502 20221117111502 22878 ninja. RIuQHRcUrHqMNg1lab6s/oRNmflV4e+8r2553miiZdlGqCl8Q05+e1f5 /AY0enkAaG4DvoXCAlwroL7B7iYgivgrmPXklPTEahnzdeZV76UWimRs 2WjKLI9DSUsSl5yPZBDloqYBxhQlHwY7RPcKxELX2wO7ld8Dk+cSpQIu CQQ=
dg8umbqgrvdemk76n4dtbddckfghtloo.ninja.	3600 IN	NSEC3 1 1 10 332539EE7F95C32A DGG261SH46I7K27S1MPEID8CER0BFH07  NS DS RRSIG
dg8umbqgrvdemk76n4dtbddckfghtloo.ninja.	3600 IN	RRSIG NSEC3 8 2 3600 20221130155636 20221109145636 22878 ninja. b3g1om7FYmaboSk49ZuQC/wiyuZ0zQXOs/HbfrtDP1wUGyvXMAG1ofik //wSTVEvi7bufrbKUCSkBrxiBweSkRIKokaB/5j90Izpb9znaN0MWmOQ gywML7TQ3etOWb9s8L/oUmiBUUUtBtPGAy/e4hsbuYKQt+awJZVhR4G/ GBM=
;; Received 691 bytes from 65.22.21.4#53(v0n1.nic.ninja) in 892 ms

andrewbaker.ninja.	300	IN	A	13.244.140.33
andrewbaker.ninja.	172800	IN	NS	ns-1254.awsdns-28.org.
andrewbaker.ninja.	172800	IN	NS	ns-1514.awsdns-61.org.
andrewbaker.ninja.	172800	IN	NS	ns-1728.awsdns-24.co.uk.
andrewbaker.ninja.	172800	IN	NS	ns-1875.awsdns-42.co.uk.
andrewbaker.ninja.	172800	IN	NS	ns-491.awsdns-61.com.
andrewbaker.ninja.	172800	IN	NS	ns-496.awsdns-62.com.
andrewbaker.ninja.	172800	IN	NS	ns-533.awsdns-02.net.
andrewbaker.ninja.	172800	IN	NS	ns-931.awsdns-52.net.
;; Received 328 bytes from 205.251.195.215#53(ns-983.awsdns-58.net) in 53 ms

As you can see above, the first set of results are the NS (nameservers) for the root domain (.), followed by the NS for .ninja, then finally the NS for andrewbaker.ninja (hosted in AWS).

Macbook: Show which applications have ports open and to what IP address

Below is a dump of examples of doing pretty much the same thing differently. I mostly use netstat and lsof, coupled with some bash scripts.

You can argue that this is overkill, but below is a simple bash function that you can paste into terminal and call it whenever you want to see which application/process IDs have open ports:

macnst (){ netstat -Watnlv | grep LISTEN | awk '{"ps -o comm= -p " $9 | getline procname;colred="\033[01;31m";colclr="\033[0m"; print colred "proto: " colclr $1 colred " | addr.port: " colclr $4 colred " | pid: " colclr $9 colred " | name: " colclr procname; }' | column -t -s "|" }

## Example: 
proto: tcp46 addr.port: *.8770 pid: 1459 name: /usr/libexec/sharingd proto: tcp4 addr.port: 127.0.0.1.9000 pid: 787 name: /Applications/Zscaler/Zscaler.app/Contents/PlugIns/ZscalerTunnel proto: tcp4 addr.port: 100.64.0.1.9000 pid: 787 name: /Applications/Zscaler/Zscaler.app/Contents/PlugIns/ZscalerTunnel proto: tcp6 addr.port: *.56365 pid: 1080 name: /usr/libexec/rapportd proto: tcp4 addr.port: *.56365 pid: 1080 name: /usr/libexec/rapportd proto: tcp4 addr.port: 100.64.0.1.9010 pid: 787 name: /usr/libexec/rapportd proto: tcp6 addr.port: ::1.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::1%lo0.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::244b:70ff:fe0a:ffaa%anpi2.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::244b:70ff:fe0a:ffa8%anpi0.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::244b:70ff:fe0a:ffa9%anpi1.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::109d:a6ff:fed1:244c%awdl0.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp6 addr.port: fe80::109d:a6ff:fed1:244c%llw0.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq proto: tcp4 addr.port: 127.0.0.1.53 pid: 784 name: /opt/homebrew/opt/dnsmasq/sbin/dnsmasq

Below is an alternative to the above using netstat:

$ netstat -ap tcp | grep ESTABLISHED 
tcp4 0 0 192.168.123.227.57278 52.114.104.174.https ESTABLISHED tcp4 0 0 100.64.0.1.cslistener 52.114.104.174.57277 ESTABLISHED tcp4 0 0 100.64.0.1.57277 52.114.104.174.https ESTABLISHED tcp4 0 0 100.64.0.1.57275 13.89.179.10.https ESTABLISHED tcp4 0 0 100.64.0.1.57262 40.79.141.153.https ESTABLISHED tcp4 0 0 100.64.0.1.57258 52.97.201.226.https ESTABLISHED tcp4 0 0 192.168.123.227.57250 52.113.194.132.https ESTABLISHED tcp4 0 0 100.64.0.1.cslistener 52.113.194.132.57249 ESTABLISHED tcp4 0 0 100.64.0.1.57249 52.113.194.132.https ESTABLISHED tcp4 0 0 100.64.0.1.57240 193.0.160.129.https ESTABLISHED tcp4 0 0 100.64.0.1.57239 jnb02s11-in-f6.1.https ESTABLISHED tcp4 0 0 100.64.0.1.57238 944.bm-nginx-loa.https ESTABLISHED tcp4 0 0 100.64.0.1.57237 159.248.227.35.b.https ESTABLISHED tcp4 0 0 100.64.0.1.57236 ip98.ip-51-75-86.https ESTABLISHED tcp4 0 0 100.64.0.1.57235 185.94.180.126.https ESTABLISHED tcp4 0 0 100.64.0.1.57234 a-0001.a-msedge..https ESTABLISHED tcp4 0 0 100.64.0.1.57233 a-0001.a-msedge..https ESTABLISHED

If you want to find the processes listening on a specific port, use the following:

sudo lsof -nP -i4TCP:9000 | grep LISTEN
ZscalerTu 787 root   49u  IPv4 0xfa4872984902c87f      0t0  TCP 100.64.0.1:9000 (LISTEN)
ZscalerTu 787 root   64u  IPv4 0xfa48729849d9138f      0t0  TCP 127.0.0.1:9000 (LISTEN)
## Then you can kill the process using: sudo kill -9 <PID>
sudo kill 787

Following the theme of creating bash scripts for the sake of it, below is a simple listening script:

listening() {
    if [ $# -eq 0 ]; then
        sudo lsof -iTCP -sTCP:LISTEN -n -P
    elif [ $# -eq 1 ]; then
        sudo lsof -iTCP -sTCP:LISTEN -n -P | grep -i --color $1
    else
        echo "Usage: listening [pattern]"
    fi
}

## Example
% listening 9000
ZscalerTu 38629     root   13u  IPv4 0xfa48729848a2f4bf      0t0  TCP 100.64.0.1:9000 (LISTEN)
ZscalerTu 38629     root   14u  IPv4 0xfa48729849edffcf      0t0  TCP 127.0.0.1:9000 (LISTEN)

Next up, using lsof to view TCP sessions (-i4 : IPV4; -n : prevent conversion to host name):

sudo lsof -i4 -n -P | grep TCP | grep ESTABLISHED
identitys  1205       cp363412   37u  IPv6 0xfa487293786896c7      0t0    TCP [fe80:16::c79c:1b6f:a073:9eca]:1024->[fe80:16::e858:3f4a:1724:69c1]:1024 (ESTABLISHED)
identitys  1205       cp363412   38u  IPv6 0xfa4872937868cb47      0t0    TCP [fe80:16::c79c:1b6f:a073:9eca]:1025->[fe80:16::e858:3f4a:1724:69c1]:1026 (ESTABLISHED)
identitys  1205       cp363412   39u  IPv6 0xfa4872937868cb47      0t0    TCP [fe80:16::c79c:1b6f:a073:9eca]:1025->[fe80:16::e858:3f4a:1724:69c1]:1026 (ESTABLISHED)
Google     2149       cp363412   20u  IPv4 0xfa48729848bee74f      0t0    TCP 100.64.0.1:58416->172.217.170.10:443 (ESTABLISHED)
Google     2149       cp363412   26u  IPv4 0xfa48729848bfb25f      0t0    TCP 100.64.0.1:58600->216.58.223.132:443 (ESTABLISHED)
Google     2149       cp363412   30u  IPv4 0xfa48729848aa938f      0t0    TCP 100.64.0.1:58388->151.101.3.9:443 (ESTABLISHED)
Google     2149       cp363412   33u  IPv4 0xfa4872984590512f      0t0    TCP 100.64.0.1:58601->216.58.223.132:443 (ESTABLISHED)
Google     2149       cp363412   35u  IPv4 0xfa487298489734bf      0t0    TCP 100.64.0.1:58602->172.217.170.170:443 (ESTABLISHED)
Google     2149       cp363412   36u  IPv4 0xfa487298489cf25f      0t0    TCP 100.64.0.1:58470->13.244.140.33:443 (ESTABLISHED)
Google     2149       cp363412   41u  IPv4 0xfa487298458fde9f      0t0    TCP 100.64.0.1:58231->172.217.170.10:443 (ESTABLISHED)
Google     2149       cp363412   42u  IPv4 0xfa48729848b25e9f      0t0    TCP 100.64.0.1:58451->142.250.27.188:443 (ESTABLISHED)
Google     2149       cp363412   45u  IPv4 0xfa48729848a8fd6f      0t0    TCP 100.64.0.1:58452->142.250.27.188:443 (ESTABLISHED)
Google     2149       cp363412   47u  IPv4 0xfa48729848b19c3f      0t0    TCP 100.64.0.1:58473->172.217.170.99:443 (ESTABLISHED)
Google     2149       cp363412   57u  IPv4 0xfa48729849ee1c3f      0t0    TCP 100.64.0.1:57722->192.0.78.23:443 (ESTABLISHED)
Google     2149       cp363412   60u  IPv4 0xfa4872984908325f      0t0    TCP 100.64.0.1:57973->198.252.206.25:443 (ESTABLISHED)
WhatsApp   2225       cp363412   21u  IPv4 0xfa4872984590674f      0t0    TCP 192.168.123.227:58288->102.132.100.60:443 (ESTABLISHED)
UPMServic  2333           root  248u  IPv4 0xfa48729848b1325f      0t0    TCP 192.168.123.227:56364->147.161.204.128:443 (ESTABLISHED)
Microsoft 25966       cp363412   44u  IPv4 0xfa48729849d9dc3f      0t0    TCP 100.64.0.1:58615->52.112.238.155:443 (ESTABLISHED)
Microsoft 37667       cp363412   20u  IPv4 0xfa48729849ef9e9f      0t0    TCP 100.64.0.1:58566->52.113.194.132:443 (ESTABLISHED)
Microsoft 37667       cp363412   22u  IPv4 0xfa4872984901887f      0t0    TCP 100.64.0.1:58378->52.112.120.216:443 (ESTABLISHED)
Microsoft 37667       cp363412   23u  IPv4 0xfa487298489e34bf      0t0    TCP 100.64.0.1:58536->20.42.65.84:443 (ESTABLISHED)
Microsoft 37667       cp363412   24u  IPv4 0xfa4872984591487f      0t0    TCP 100.64.0.1:58613->52.112.238.155:443 (ESTABLISHED)
Microsoft 37667       cp363412   27u  IPv4 0xfa48729848bed12f      0t0    TCP 100.64.0.1:58549->52.114.228.1:443 (ESTABLISHED)
Microsoft 37678       cp363412   51u  IPv4 0xfa487298489ddc3f      0t0    TCP 192.168.123.227:56382->52.112.120.204:443 (ESTABLISHED)
Microsoft 37678       cp363412   59u  IPv4 0xfa4872984902912f      0t0    TCP 100.64.0.1:56147->52.114.224.23:443 (ESTABLISHED)
ZscalerTu 38629           root    8u  IPv4 0xfa48729848bde74f      0t0    TCP 100.64.0.1:9000->52.114.228.1:58549 (ESTABLISHED)
ZscalerTu 38629           root    9u  IPv4 0xfa48729849061c3f      0t0    TCP 192.168.123.227:58330->13.244.131.129:443 (ESTABLISHED)
ZscalerTu 38629           root   10u  IPv4 0xfa48729848a9de9f      0t0    TCP 192.168.123.227:58550->52.114.228.1:443 (ESTABLISHED)
ZscalerTu 38629           root   16u  IPv4 0xfa48729849eea74f      0t0    TCP 100.64.0.1:9000->52.113.194.132:58566 (ESTABLISHED)
ZscalerTu 38629           root   17u  IPv4 0xfa4872984904f25f      0t0    TCP 192.168.123.227:58567->52.113.194.132:443 (ESTABLISHED)
ZscalerTu 38629           root   20u  IPv4 0xfa487298489e725f      0t0    TCP 100.64.0.1:9000->52.112.238.155:58613 (ESTABLISHED)

For analysing what is listening to a port lsof also gives you a short history of the state of the connection:

sudo lsof -i tcp:9000
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ZscalerTu 53971 root   13u  IPv4 0xfa4872984902f4bf      0t0  TCP 100.64.0.1:cslistener (LISTEN)
ZscalerTu 53971 root   14u  IPv4 0xfa48729848bdf25f      0t0  TCP localhost:cslistener (LISTEN)
ZscalerTu 53971 root   18u  IPv4 0xfa487298489f112f      0t0  TCP 100.64.0.1:cslistener->147.161.204.128:63038 (ESTABLISHED)
ZscalerTu 53971 root   19u  IPv4 0xfa487298489f69af      0t0  TCP 100.64.0.1:cslistener->147.161.204.128:63036 (CLOSE_WAIT)
ZscalerTu 53971 root   24u  IPv4 0xfa4872984897674f      0t0  TCP 100.64.0.1:cslistener->a23-2-112-62.deploy.static.akamaitechnologies.com:63040 (ESTABLISHED)
ZscalerTu 53971 root   28u  IPv4 0xfa487298489d138f      0t0  TCP localhost:63045->localhost:cslistener (CLOSE_WAIT)
ZscalerTu 53971 root   29u  IPv4 0xfa4872984900912f      0t0  TCP localhost:cslistener->localhost:63045 (FIN_WAIT_2)

Above you can see port 9000 (the zscaler port); after I have restarted zscaler. It shows the state transitions of the port.

Macbook: MyTraceRoute an alternative ICMP route tracing which works with Zscaler / Zero Trust architecture

If your on a zero trust network adapter like zscaler or netskope, you will see that traceroute doesn’t work as expected. The article below shows how to install mtr (my trace route) using brew:

## Install xcode
xcode-select --install
## Install mtr
brew install mtr


Next we need to change the owner of the MTR package and it’s permissions (otherwise you will need to run it as root every time):

sudo chown root /opt/homebrew/Cellar/mtr/0.95/sbin/mtr-packet
sudo chmod 4755 /opt/homebrew/Cellar/mtr/0.95/sbin/mtr-packet
## Symlink to the new mtr package instead of the default MAC version
ln -s /opt/homebrew/Cellar/mtr/0.95/sbin/mtr /opt/homebrew/bin/
ln -s /opt/homebrew/Cellar/mtr/0.95/sbin/mtr-packet /opt/homebrew/bin/


To run a rolling traceroute with ICMP echo’s use the following:

mtr andrewbaker.ninja
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                       Packets               Pings
 Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev

The issue is that Zscaler will attempt to tunnel this traffic. This can be observed by viewing your current routes:

netstat -rn
Internet:
Destination        Gateway            Flags           Netif Expire
default            192.168.0.1        UGScg             en0
1                  100.64.0.1         UGSc            utun6
2/7                100.64.0.1         UGSc            utun6
4/6                100.64.0.1         UGSc            utun6
8/5                100.64.0.1         UGSc            utun6
10/12              100.64.0.1         UGSc            utun6
10.1.30.3          100.64.0.1         UGHS            utun6
10.1.30.15         100.64.0.1         UGHS            utun6
10.1.31/24         100.64.0.1         UGSc            utun6
10.1.31.3          100.64.0.1         UGHS            utun6
10.1.31.41         100.64.0.1         UGHS            utun6
10.1.31.101        100.64.0.1         UGHS            utun6
10.1.31.103        100.64.0.1         UGHS            utun6
10.10.0.11         100.64.0.1         UGHS            utun6
10.10.0.12         100.64.0.1         UGHS            utun6
10.10.160.86       100.64.0.1         UGHS            utun6

As you can see from the above, it lists the routes that are being sent to the Zscaler tunnel interface “utun6” (this is unique to your machine but will look similar). To get around this you can specify the source interface the MTR should run from with the “-I” flag. Below we instruct mtr to use en0 (the lan cable):

mtr andrewbaker.ninja -I en0
                                                                                                                                                                                                            Packets               Pings
 Host                                                                                                                                                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. unfisecuregateway                                                                                                                                                                                      1.8%    56    2.0   2.2   1.5   4.5   0.6
 2. 41.71.48.65                                                                                                                                                                                            0.0%    56    4.2   8.1   3.1  28.3   6.0
 3. 41.74.176.249                                                                                                                                                                                          0.0%    56    4.2   4.5   3.4   8.2   0.9
 4. 196.10.140.105                                                                                                                                                                                         0.0%    55    3.0   4.0   2.6  18.8   2.4
 5. 52.93.57.88                                                                                                                                                                                            0.0%    55    5.1   6.3   3.7  12.4   2.0
 6. 52.93.57.103                                                                                                                                                                                           0.0%    55    4.9   4.1   2.6  12.5   1.5
 7. (waiting for reply)
 8. 150.222.94.230                                                                                                                                                                                         0.0%    55    4.0   4.8   3.1  13.8   1.8
 9. 150.222.94.243                                                                                                                                                                                         0.0%    55    4.3   5.3   2.9  37.6   5.2
10. 150.222.94.242                                                                                                                                                                                         0.0%    55   15.2   4.9   2.9  15.2   2.2
11. 150.222.94.237                                                                                                                                                                                         0.0%    55    3.4   5.7   3.1  18.9   2.9
12. 150.222.93.218                                                                                                                                                                                         0.0%    55    4.6   5.5   3.8  11.4   1.3
13. (waiting for reply)

MTR supports TCP, UDP and SCTP based traceroutes. This is useful when testing path latency and packet loss in external or internal networks where QoS is applied to different protocols and ports. Multiple flags are available (man mtr), but for a TCP based MTR use  -T (indicates TCP should be used) and -P (port to trace to):

mtr andrewbaker.ninja -T -P 443 -I en0

Ping specifying source interface

Ping supports specifying the source interface you would like to initiate the ping from. The “-S” flag indicates that the following IP is the source IP address the ping should be done from. This is useful if you want to ping using an internal resource bypassing a route manipulator tool such as Zscaler.

ping outlook.office.com -S 10.220.64.37

Technologists: Please Stop asking for requirements 😎

I think you’re a genius! You found this blog and your reading it – what more evidence do I need?! So why do you keep asking others to think for you?

There is a harmful bias built into most technology projects that assumes “the customer knows best” and this is simply a lie. The customer will know what works and what doesn’t when you give them a product; but thats not the same as being able to give specification/requirements. Sadly, somehow technologists have been relegated to order takers that are unable to make decisions or move forwards without detailed requirements. I disagree.

In general, everyone (including technologists) should fixate on understanding your customers, collaborating across all disciplines, testing ideas with customers, making decisions and executing. If you get it wrong, learn, get feedback, fix issues, then rinse and repeat. If you are going through a one way door or making a big call; then by all means validate. But don’t forget that your a genius and you work with other geniuses. So stop asking for requirements, switch your brain on and show off your unfiltered genius. You may even meet requirements that your customers haven’t even dreamt of! 

Many corporate technology teams are unable to operate without an analyst to gather, collate and serve up pages of requirements. This learnt helplessness is problematic. There are definitely times, especially on complex projects where analysts working together with technologists can create more focus and speed up product development. But there is also a balance to be found in that a technology teams should feel confident to ideate solutions themselves.

Finally, one of the biggest causes for large delays on technology workstreams is the lack of challenge around requirements. If your customer wants an edge case feature that’s extremely difficult to do; then you should consider delaying it or even not doing it. Try to find a way around complex requirements, develop other features or evolve the feature to something that is deliverable. Never get bogged down on a requirement that will sink your project. You should always have way more features than you can ever deliver, so if you deliver everything your customer wanted there is an argument to say this is wasteful and indulgent. You will also be constantly disappointed when your customer changes their minds!

Macbook/Linux: Secure Copy from your local machine to an EC2 instance

I always forget the syntax of SCP and so this is a short article with a simple example of how to SCP a file from your laptop to your EC2 instance and how to copy it back from EC2 to your laptop:

Copying from Laptop to EC2

scp -i "mylocalpemfile.pem" mylocalfile.zip ec2-user@myEc2DnsOrIpAdress:/home/mydestinationfolder

scp -i identity_file.pem source_file.extention username@public_ipv4_dns:/remote_path

scp: Secure copy protocol
-i: Identity file
source_file.extension: The file that you want to copy
username: Username of the remote system (ubuntu for Ubuntu, ec2-user for Linux AMI or bitnami for wordpress)
public_ipv4_dns: DNS/IPv4 address of an instance
remote_path: Destination path

Copying from EC2 to your Laptop

scp -i "mylocalpemfile.pem" ec2-user@myEc2DnsOrIpAdress:/home/myEc2Folder/myfile.zip /Users/accountNmae/Dow
nloads
  • scp -i identity_file.pem username@public_ipv4_dns:/remote_path/source_file.extension ~/destination_local_path
Ex: scp -i access.pem bitnami@0.0.0.0:/home/bitnami/temp.txt ~/Documents/destination_dir