AWS: Automatically Stop and Start your EC2 Services

Below is a quick (am busy) outline on how to automatically stop and start your EC2 instances.

Step 1: Tag your resources

In order to decide which instances stop and start you first need to add an auto-start-stop: Yes tag to all the instances you want to be affected by the start / stop functions. Note: You can use “Resource Groups and Tag Editor” to bulk apply these tags to the resources you want to be affected by the lambda functions you are going to create. See below (click the orange button called “Manage tags of Selected Resources”).

Step 2: Create a new role for our lambda functions

First we need to create the IAM role to run the Lambda functions. Go to IAM and click the “Create Role” button. Then select “AWS Service” from the “Trusted entity options”, and select Lambda from the “Use Cases” options. Then click “Next”, followed by “Create Policy”. To specify the permission, simply Click the JSON button on the right of the screen and enter the below policy (swapping the region and account id for your region and account id):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:StartInstances",
                "ec2:DescribeTags",
                "logs:*",
                "ec2:DescribeInstanceTypes",
                "ec2:StopInstances",
                "ec2:DescribeInstanceStatus"
            ],
            "Resource": "arn:aws:ec2:<region>:<accountID>:instance/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/auto-start-stop": "Yes"
                }
            }
        }
    ]
}

Hit next and under “Review and create”, save the above policy as ec2-lambda-start-stop by clicking the “Create Policy” button. Next, search for this newly created policy and select it as per below and hit “Next”.

You will now see the “Name, review, and create” screen. Here you simply need to hit “Create Role” after you enter the role name as ec2-lambda-start-stop-role.

Note the policy is restricted to only have access to EC2 instances that contains auto-start-stop: Yes tags (least privileges).

If you want to review your role, this is how it should look. You can see I have filled in my region and account number in the policy:

Step 3: Create Lambda Functions To Start/Stop EC2 Instances

In this section we will create two lambda functions, one to start the instances and the other to stop the instances.

Step 3a: Add the Stop EC2 instance function

  • Goto Lambda console and click on create function
  • Create a lambda function with a function name of stop-ec2-instance-lambda, python3.11 runtime, and ec2-lambda-stop-start-role (see image below).

Next add the lamdba stop function and save it as stop-ec2-instance. Note, you will need to change the value of the region_name parameter accordingly.

import json
import boto3

ec2 = boto3.resource('ec2', region_name='af-south-1')
def lambda_handler(event, context):
   instances = ec2.instances.filter(Filters=[{'Name': 'instance-state-name', 'Values': ['running']},{'Name': 'tag:auto-start-stop','Values':['Yes']}])
   for instance in instances:
       id=instance.id
       ec2.instances.filter(InstanceIds=[id]).stop()
       print("Instance ID is stopped:- "+instance.id)
   return "success"

This is how your Lambda function should look:

Step 3b: Add the Start EC2 instance function

  • Goto Lambda console and click on create function
  • Create lambda functions with start-ec2-instance, python3.11 runtime, and ec2-lambda-stop-start-role.
  • Then add the below code and save the function as start-ec2-instance-lambda.

Note, you will need to change the value of the region_name parameter accordingly.

import json
import boto3

ec2 = boto3.resource('ec2', region_name='af-south-1')
def lambda_handler(event, context):
   instances = ec2.instances.filter(Filters=[{'Name': 'instance-state-name', 'Values': ['stopped']},{'Name': 'tag:auto-start-stop','Values':['Yes']}])
   for instance in instances:
       id=instance.id
       ec2.instances.filter(InstanceIds=[id]).stop()
       print("Instance ID is stopped:- "+instance.id)
   return "success"

4. Summary

If either of the above lambda functions are triggered, they will start or stop your EC2 instances based on the instance state and the value of auto-start-stop tag. To automate this you can simply setup up cron jobs, step functions, AWS Event Bridge, Jenkins etc.

How to Optimise your Technology Teams Structure to improve flow

I have seen many organisations restructure their technology teams over and over, but whichever model they opt for – they never seem to be able to get the desired results with respect to speed, resilience and quality. For this reason organisations will tend to oscillate from centralised teams, which are organised around skills and reuse, to federated teams that are organised around products and time to market. This article exams the failure modes for both centralised and federated teams and then questions whether there are better alternatives?

Day 1: Centralisation

Centralised technology teams tend to create frustrated product teams, long backlogs and lots of prioritisation escalations. Quality is normally fairly consistent (it can be consistently bad or consistently good), but speed is generally considered problematic.

These central teams will often institute some kind of ticketing system to coordinate their activities. They will even use tickets to coordinate activities between their own, narrowly focused central teams. These teams will create reports that demonstrate “millions” of tickets have been closed each month. This will be sold as some form of success / progress. Dependent product teams, on the other hand, will still struggle to get their work loads into production and frequently escalate the bottlenecks created using a centralised approach.

Central teams tend to focus on reusability, creating large consolidated central services and cost compression. Their architecture will tend to create massive “risk concentrators”, whereby they reuse the same infrastructure for the entire organisation. Any upgrades to these central services tend to be a life threatening event, making things like minor version changes and even patching extremely challenging. These central services will have poorly designed logical boundaries. This means that “bad” consumers of these shared services, can create saturation outages which affect the entire enterprise. These teams will be comfortable with mainframes, large physical datacenters and have a poor culture of learning. The technology stack will be at least 20 years old and you will often hear the term, “tried and tested”. They will view change as a bad thing and will create reports showing that change is causing outages. They will periodically suggest a slow down, or freezes to combat the great evil of “change”. There will be no attempt made to get better at delivering change and everything will be described as a “journey”. It will take years to get anything done in this world and the technology stack will be a legacy, expensive, immutable blob of tightly coupled vendor products.

Day 2: Lets Federate!

Eventually delivery pressure builds and the organisation capitulates into the chaotic world of federation. This is quickly followed by an explosion in headcount, as each product team attempts to utopian state of “end to end autonomy”. End to end autonomy, is the equivalent of absolute freedom – it simple does not and cannot exist. Why can’t you have an absolute state of full autonomy? It turns out that unless you’re a one product startup, you will have to share certain services and channels with other products. This means that any single products “autonomy” expressed in any shared channel/framework, ends up becoming another products “constraint”.

A great example of this is a client facing channel, like an app or a web site. Imagine if you carved up a channel into little product size pieces. Imagine how hard it would be to support your clients, where do you route support queries? Even something basic, like trying to keep the channel available would be difficult, as there is no single team stopping other teams from injecting failure planes and vendor SDKs into critical shared areas. In this world, each product does what it wants, when it wants, how it wants and the channel ends up yielding an inconsistent, frustrating and unstable customer experience. No product team will ever get around to dealing with complex issues, like fraud, cyber security or even basics like observability. Instead they will naturally chase PnL. In the end, you will have to resort to using social media to field complaints and resolve issues. Game theory depicts this as something called the “Tragedy of the Commons” – it’s these common assets that die in the federated world.

In the federated world, the lack of scope for staff results in ballooning your headcount and aggregating roles across multiple disciplines for the staff you have managed to hire. Highly skilled staff tend to get very bored with their “golden cages” and search out more challenging roles at other companies.

You will see lots of key man risk in this world. Because product teams can never fully fund their end to end autonomy – you will commonly see individuals that looks after networking, DBA and storage and cyber. When this person eventually resigns, the risks from their undocumented “tactile fixes” quickly start to materialise as the flow of outages starts to takes a grip. You will struggle to hire highly skilled resources into this model, as the scope of the roles you advertise are restrictively narrow, eg a DBA to look after a single database, a senior UX person to look after 3 screens. Obviously, if you manage to hire a senior UX person, you can then show them the 2 databases you also want them to manage 😂

If not this, then what?

Is the issue that we didn’t try hard enough? Maybe we should have given the model more time to bear fruit? Maybe we didn’t get the teams buy in? So what I am saying? I am saying BOTH federated and centralised models will eventually fail, because they are extremes. These are not the only choices on the table, there are a vast number of states in between, depending on your architecture, the size of your organization and the pools of skills you have.

Before you start tinkering with your organisations structure it’s key that you agree on what is the purpose of your organisation structure? Specifically – what are you trying to do and how are you trying to do it? Centralists will argue that economies of scale, better quality are key. But the federation proponents will point to time to market and speed. So how do you design your organisation?

There are two main parameters that you should try to optimise for:

  1. Domain Optimisation: Design your structure around people and skills (from the centralised model). Give your staff enough domain / scope to be able to solve complex problems and add value across multiple products in your enterprise. The benefit of teams with wide domains, is that you can put your best resources on your biggest problems. But watch out, because as the domain of each team increases, so will the dependencies on this team.
  2. Dependency Optimisation: Design your structure around flow/output by removing dependencies and enabling self service (from the federated model). Put simply, try to pay down dependencies by changing your architecture to enable self service such that product teams can execute quickly, whilst benefiting from high quality, reusable building blocks.

These two parameters are antagonistic, with your underlying architecture being the lever to change the gradient of the yield.

Domain Optimisation

Your company cannot be successful, if you narrow the scope of your roles down to a single product. Senior, skilled engineers need domain and companies need to make sure that complicated problems flow to those who can best solve these problems. Having myopically scoped roles, not only balloons your headcount, it also means that your best staff might be sat on your easiest problems. More than this, how do you practically hire the various disciplines you need, if you’re scope is restricted to a single product that might only occasionally use a fraction of your skills.

You need to give staff from a skill / discipline enough scope to make sure they are stretched and that you’re getting value from them. If this domain creates a bottleneck, then you should look to fracture these pools of skills by creating multiple layers. Keeping one team close to operational workloads (to reduce dependencies) and a second team looking a more strategic problems. For example, you can have a UX team look after a single channel, but also have a strategic UX team look after more long dated / complex UX challenges (like peer analysis, telemetry insights, redesigns etc).

Dependency Optimisation

As we already discussed, end to end autonomy is a bogus construct. But teams should absolutely look to shed as many dependencies as possible, so that they can yield flow without begging other teams to do their jobs. There are two ways of reducing dependency:

  1. Reduce the scope of a role and look at creating multiple pools of skills with different scopes.
  2. Change your technology architecture.

Typically only item 1) is considered, and this is the crux of this article. Performing periodic org structure rewrites simply gives you a new poison to swallow. This is great, maybe you like strawberry flavoured arsenic! But my question is, why not stop taking poison altogether?

If you look at the anecdotal graph below you can see the relationship between domain / scope and dependency. This graph shows you that as you reduce domain you reduce dependency. Put simply, the lower your dependancies them more “federated” your organisation and the more domain your staff have the more “centralised” your organisation is.

What you will also observe that poorly architectured systems exhibit a “dependency cliff”. What this means is that even as you reduce the scope of your roles – your will not see any dependency benefit. This is because your systems are so tightly coupled that any amount of org structure gymnastics will not give you lower dependencies. If you attempt to carve up any shared systems that are exhibiting a dependency cliff, you have to hire more staff, you will have more outages, less output and more escalations.

To resolve any dependency cliffs, you have a few choices:

  1. De-aggregate/re-architect workloads. If all your products sit in a single core banking platform, then do NOT buy a new core banking platform. Instead, rather rearchitect these products, to separate the shared services (eg ledger) from the product services (eg home loans). This is a complex topic and needs a detailed debate.
  2. Optimise workloads. Acknowledge that a channel or platform can be a product in its own right and ensure that most of the changes product teams want to make on a channel / platform can be automated. Create product specific pipelines, create product enclaves (eg PWA), allow the product teams the ability to store and update state on the channel without having to go through testing release cycles.
  3. Ensure any central services are opensourced. This will enable product teams to contribute changes and not feel “captive” to the cadence of central teams.
  4. Deliver all services with REST APIs to ensure all shared services can be self-service.

The Conclusion

There is no easy win when it comes to org structure, because its typically your architecture that drives all your issues. So shuffling people around from one line manager to another will achieve very little. If you want to be successful you will need to look at each service and product in detail and try and remove dependencies by making architectural changes such that product teams can self service wherever possible. When you remove architectural constraints, you will steepen the gradient of the line and give your staff broad domains, without adding dependencies and bottlenecks.

Am done with this article. I really thought I was going to be quick to write and I have run out of energy. I will probably do another push on this in a few weeks. Please DM with spelling mistakes or something that doesn’t make sense.

Macbook OSX: Using gping over a Zero Trust Network Client (like Zscaler)

Once you start using a zero trust network, the first causality is normally the Ping command. The gping (Graphical Ping) command line displays a color coded realtime graph of continuous pings to a specified host and it supports specifying alternate interfaces/gateways.

First lets find which interface to use. The “arp -a” command is used to display the ARP cache on a computer, including both dynamic and static entries. This command is similar to the arp command without any options, but it also displays the status of the entries in the cache.

$ arp -a
unfisecuregateway (192.168.0.1) at 74:83:c2:d0:c8:cd on en0 ifscope [ethernet]
amazon-ce482021d.localdomain (192.168.0.66) at 8:7c:39:e3:de:af on en0 ifscope [ethernet]
km98e898.localdomain (192.168.0.117) at 0:17:c8:87:5a:f7 on en0 ifscope [ethernet]
? (192.168.0.210) at 9c:14:63:5c:aa:de on en0 ifscope [ethernet]
? (192.168.0.211) at 9c:14:63:5c:aa:ac on en0 ifscope [ethernet]
? (192.168.0.212) at 9c:14:63:5c:aa:e1 on en0 ifscope [ethernet]
? (192.168.0.213) at 9c:14:63:5c:ab:20 on en0 ifscope [ethernet]
? (192.168.0.214) at 9c:14:63:5c:aa:9 on en0 ifscope [ethernet]
? (192.168.0.215) at 9c:14:63:5c:aa:74 on en0 ifscope [ethernet]
? (192.168.0.216) at 9c:14:63:5c:ab:64 on en0 ifscope [ethernet]
? (192.168.0.217) at 9c:14:63:2d:23:5f on en0 ifscope [ethernet]
? (192.168.0.255) at ff:ff:ff:ff:ff:ff on en0 ifscope [ethernet]
? (224.0.0.251) at 1:0:5e:0:0:fb on en0 ifscope permanent [ethernet]
? (239.255.255.250) at 1:0:5e:7f:ff:fa on en0 ifscope permanent [ethernet]

You will see I have an en0 interface. Lets try an gping via the en0 interface:

$ brew install gping
$ gping -i en0 google.com
google.com (172.217.170.110)             last 27.274ms min 26.945ms  max 134.849ms avg 41.896ms  jtr 2.916ms   p95 107.494ms t/o 0
130.606ms│
         │
         │                   ⢀
         │                   ⢸
         │                   ⢸
112.88ms │                   ⢸
         │                   ⣼
         │                   ⣿
         │        ⡆          ⣿
95.154ms │   ⢀    ⡇          ⣿
         │   ⢸    ⡇          ⣿
         │   ⢸    ⡇          ⣿
         │   ⢸   ⢠⡇          ⣿          ⢀
77.428ms │   ⣼   ⢸⡇      ⡆   ⣿          ⢸
         │   ⣿   ⢸⡇      ⡇  ⢸ ⡇         ⢸
         │   ⣿   ⢸⡇     ⢰⡇⢀ ⢸ ⡇         ⢸
         │   ⣿   ⢸⢇     ⢸⡇⢸ ⢸ ⡇         ⡇⡇
59.702ms │   ⡟⡄  ⢸⢸     ⢸⢇⣼ ⢸ ⡇         ⡇⡇
         │   ⡇⡇  ⢸⢸     ⡸⢸⡏⡆⢸ ⡇         ⡇⡇
         │   ⡇⡇  ⢸⢸     ⡇⢸⡇⡇⢸ ⡇     ⡆   ⡇⡇
         │   ⡇⡇  ⢸⢸ ⢰   ⡇⢸⠃⡇⢸ ⡇    ⢀⢇   ⡇⡇
41.976ms │  ⢸ ⡇  ⡇⢸ ⣼  ⢀⠇⢸ ⡇⡎ ⡇⡆   ⢸⢸   ⡇⡇
         │  ⢸ ⡇  ⡇⢸ ⡿⡀ ⢸   ⢇⡇ ⣷⡇   ⢸⢸  ⢸ ⢸
         │  ⢸ ⡇  ⡇⢸ ⡇⣇⢆⡸   ⢸⡇ ⣿⢱   ⡸⠸⡀⢠⢸ ⢸
         │⣀ ⡸ ⡇  ⡇⢸⢸ ⡿⠈⠇   ⢸⡇ ⡇⢸ ⢀⡀⡇ ⡇⡜⢿ ⢸
24.25ms  │ ⠉  ⠉⠉⠉⠁⠈⠉ ⠁     ⠈   ⠈⠉⠁⠈⠁ ⠉⠁⠈ ⠈⠁
         └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  11:39:10                                                             11:39:25                                                    11:39:40

Mac OSX : Tracing which network interface will be used to route traffic to an IP/DNS address

If you have multiple connections on your device (and maybe you have a zero trust client installed); how do you find out which network interface on your device will be used to route the traffic?

Below is a route get request for googles DNS service:

$ route get 8.8.8.8

   route to: dns.google
destination: dns.google
    gateway: 100.64.0.1
  interface: utun3
      flags: <UP,GATEWAY,HOST,DONE,WASCLONED,IFSCOPE,IFREF>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1400         0

If you have multiple interfaces enabled, then the first item in the Service Order will be used. If you want to see the default interface for your device:

$ route -n get 0.0.0.0 | grep interface
  interface: en0

Lets go an see whats going on in my default interface:

$ netstat utun3 | grep ESTABLISHED
tcp4       0      0  100.64.0.1.65271       jnb02s11-in-f4.1.https ESTABLISHED
tcp4       0      0  100.64.0.1.65269       jnb02s02-in-f14..https ESTABLISHED
tcp4       0      0  100.64.0.1.65262       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65261       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65260       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65259       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65258       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65257       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65256       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65255       192.0.73.2.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65254       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65253       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65252       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65251       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65250       192.0.78.23.https      ESTABLISHED
tcp4       0      0  100.64.0.1.65249       192.0.76.3.https       ESTABLISHED
tcp4       0      0  100.64.0.1.65248       ec2-13-244-140-3.https ESTABLISHED
tcp4       0      0  100.64.0.1.65247       192.0.73.2.https       ESTABLISHED

Finding and Setting the Maximum Transmission Unit (MTU) on a Windows Machine

If you have just changed ISPs or moved house and your internet suddenly starts misbehaving the likelihood is your Maximum Transmission Unit (MTU) is set too high for your ISP. The default internet facing MTU is 1500 bytes, BUT depending on your setup, this often needs to be set much lower.

Step 1:

First check your current MTU across all your ipv4 interfaces using netsh:

netsh interface ipv4 show subinterfaces
   MTU  MediaSenseState   Bytes In  Bytes Out  Interface
------  ---------------  ---------  ---------  -------------
4294967295                1          0          0  Loopback Pseudo-Interface 1
  1492                1        675        523  Local Area Connection

As you can see, the Local Area Connection interface is set to a 1492 bytes MTU. So how do we find out what it should be? We are going to send a fixed size Echo packet out, and tell the network not to fragment this packet. If somewhere along the line this packet is too big then this request will fail.

Next enter (if it fails then you know your MTU is too high):

ping 8.8.8.8 -f -l 1492

Procedure to find optimal MTU:

For PPPoE, your Max MTU should be no more than 1492 to allow space for the 8 byte PPPoE “wrapper”. 1492 + 8 = 1500. The ping test we will be doing does not include the IP/ICMP header of 28 bytes. 1500 – 28 = 1472. Include the 8 byte PPPoE wrapper if your ISP uses PPPoE and you get 1500 – 28 – 8 = 1464.

The best value for MTU is that value just before your packets get fragmented. Add 28 to the largest packet size that does not result in fragmenting the packets (since the ping command specifies the ping packet size, not including the IP/ICMP header of 28 bytes), and this is your Max MTU setting.

The below is an automated ping sweep, that tests various packet sizes until it fails (increasing in 10 bytes per iteration):

C:\Windows\system32>for /l %i in (1360,10,1500) do @ping -n 1 -w 8.8.8.8 -l %i -f

Pinging 8.8.8.8. with 1400 bytes of data:
Reply from 8.8.8.8: bytes=1400 time=6ms TTL=64

Ping statistics for 8.8.8.8:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 6ms, Maximum = 6ms, Average = 6ms

Pinging 8.8.8.8 with 1401 bytes of data:
Reply from 8.8.8.8: bytes=1401 time<1ms TTL=64

Ping statistics for 8.8.8.8:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

Pinging 8.8.8.8 with 1402 bytes of data:
Reply from 8.8.8.8: bytes=1402 time<1ms TTL=64

Ping statistics for 8.8.8.8:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

Pinging 8.8.8.8 with 1403 bytes of data:
Reply from 8.8.8.8: bytes=1403 time<1ms TTL=64

Ping statistics for 8.8.8.8:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms 

Once you find the MTU, you can set it as per below:

set subinterface “Local Area Connection” mtu=1360 store=persistent

Finding and Setting the Maximum Transmission Unit (MTU) on Mac/OSX

If you have just changed ISPs or moved house and your internet suddenly starts misbehaving the likelihood is your Maximum Transmission Unit (MTU) is set too high for your ISP. The default internet facing MTU is 1500 bytes, BUT depending on your setup, this often needs to be set much lower.

Step 1:

First check your current MTU.

$ networksetup -getMTU en0
Active MTU: 1500 (Current Setting: 1500)

As you can see, the Mac is set to 1500 bytes MTU. So how do we find out what it should be? We are going to send a fixed size Echo packet out, and tell the network not to fragment this packet. If somewhere along the line this packet is too big then this request will fail.

Next enter:

$ ping -D -s 1500 www.google.com
PING www.google.com (172.217.170.100): 1500 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
ping: sendto: Message too long

Ok, so our MTU is too high.

Procedure to find optimal MTU:

Hint: For PPPoE, your Max MTU should be no more than 1492 to allow space for the 8 byte PPPoE “wrapper”. 1492 + 8 = 1500. The ping test we will be doing does not include the IP/ICMP header of 28 bytes. 1500 – 28 = 1472. Include the 8 byte PPPoE wrapper if your ISP uses PPPoE and you get 1500 – 28 – 8 = 1464.

The best value for MTU is that value just before your packets get fragmented. Add 28 to the largest packet size that does not result in fragmenting the packets (since the ping command specifies the ping packet size, not including the IP/ICMP header of 28 bytes), and this is your Max MTU setting.

The below is an automated ping sweep, that tests various packet sizes until it fails (increasing in 10 bytes per iteration):

$ ping -g 1300 -G 1600 -h 10 -D www.google.com
PING www.google.com (172.217.170.100): (1300 ... 1600) data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
ping: sendto: Message too long
Request timeout for icmp_seq 7

As you can see it failed on the 7th attempt (giving you a 1300 + 60 MTU).

Once you find the MTU, you can set it as per below:

$ ping -D -s 1360 www.google.com
PING www.google.com (172.217.170.100): 1370 data bytes
Request timeout for icmp_seq 0

So I can set my MTU as 1360 + 28 = 1386:

networksetup -setMTU en0 1386

Macbook Tip: iTerm2 clearing your scroll back history

I frequently forget this command shortcut, so this post is simply because I am lazy. To clear your history in iTerm press Command + K. Control + L only clears the screen, so as soon as you run the next command you will see the scroll back again.

If you want to view your command history (for terminal) type:

$ ls -a ~ | grep hist
.zsh_history
$ cat .zsh_history

Macbook: Check a DNS (web site) to see if basic email security has been setup (SPF, DKIM and DMARC)

There are three basic ways to secure email, these are: Sender Policy Framework (SPF), Domain Keys Identified Mail (DKIM), Domain-based Message Authentication, Reporting & Conformance (DMARC) definitions. Lets quickly discuss these before we talk about how to check if they have been setup:

SPF helps prevent spoofing by verifying the sender’s IP address

SPF (Sender Policy Framework) is a DNS record containing information about servers allowed to send emails from a specific domain (eg which servers can send emails from andrewbaker.ninja). 

With it, you can verify that messages coming from your domain are sent by mail servers and IP addresses authorized by you. This might be your email servers or servers of another company you use for your email sending. If SPF isn’t set, scammers can take advantage of it and send fake messages that look like they come from you. 

It’s important to remember that there can be only one SPF record for one domain. Within one SPF record, however, there can be several servers and IP addresses mentioned (for instance, if emails are sent from several mailing platforms).

DKIM shows that the email hasn’t been tampered with

DKIM (DomainKeys Identified Mail) adds a digital signature to the header of your email message, which the receiving email servers then check to ensure that the email content hasn’t changed. Like SPF, a DKIM record exists in the DNS.

DMARC provides reporting visibility on the prior controls

DMARC (Domain-based Message Authentication, Reporting & Conformance) defines how the recipient’s mail server should process incoming emails if they don’t pass the authentication check (either SPF, DKIM, or both).

Basically, if there’s a DKIM signature, and the sending server is found in the SPF records, the email is sent to the recipient’s inbox. 

If the message fails authentication, it’s processed according to the selected DMARC policy: none, reject, or quarantine.

  • Under the “none” policy, the receiving server doesn’t take any action if your emails fail authentication. It doesn’t impact your deliverability. But it also doesn’t protect you from scammers, so we don’t recommend setting it. Only by introducing stricter policies can you block them in the very beginning and let the world know you care about your customers and brand. 
  • Here, messages that come from your domain but don’t pass the DMARC check go to “quarantine.” In such a case, the provider is advised to send your email to the spam folder. 
  • Under the “reject” policy, the receiving server rejects all messages that don’t pass email authentication. This means such emails won’t reach an addressee and will result in a bounce.

The “reject” option is the most effective, but it’s better to choose it only if you are sure that everything is configured correctly.

Now that we’ve clarified all the terms, let’s see how you can check if you have an existing SPF record, DKIM record, and DMARC policy set in place.

1. First Lets Check if SPF is setup

$ dig txt google.com | grep "v=spf"
google.com.		3600	IN	TXT	"v=spf1 include:_spf.google.com ~all"

How to read SPF correctly

  • The “v=spf1” part shows that the record is of SPF type (version 1). 
  • The “include” part lists servers allowed to send emails for the domain. 
  • The “~all” part indicates that if any part of the sent message doesn’t match the record, the recipient server will likely decline it.

2. Next Lets Check if DKIM is setup

What is a DKIM record?

A DKIM record stores the DKIM public key — a randomized string of characters that is used to verify anything signed with the private key. Email servers query the domain’s DNS records to see the DKIM record and view the public key.

A DKIM record is really a DNS TXT (“text”) record. TXT records can be used to store any text that a domain administrator wants to associate with their domain. DKIM is one of many uses for this type of DNS record. (In some cases, domains have stored their DKIM records as CNAME records that point to the key instead; however, the official RFC requires these records to be TXT.)

Here is an example of a DKIM DNS TXT record:

NameTypeContentTTL
big-email._domainkey.example.comTXTv=DKIM1; p=76E629F05F70
9EF665853333
EEC3F5ADE69A
2362BECE4065
8267AB2FC3CB
6CBE
6000

Name

Unlike most DNS TXT records, DKIM records are stored under a specialized name, not just the name of the domain. DKIM record names follow this format:

[selector]._domainkey.[domain]

The selector is a specialized value issued by the email service provider used by the domain. It is included in the DKIM header to enable an email server to perform the required DKIM lookup in the DNS. The domain is the email domain name. ._domainkey. is included in all DKIM record names.

If you want to find the value of the selector, you can view this by selecting “Show Original” when you have the email open in gmail:

Once you are able to view the original email, perform a text search for “DKIM-Signature”. This DKIM-Signature contains an attribute ‘s=’, this is the DKIM selector being used for this domain. In the example below (an amazon email), we can see the DKIM selector is “jvxsykglqiaiibkijmhy37vqxh4mzqr6”. 

DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=jvxsykglqiaiibkijmhy37vqxh4mzqr6; d=amazon.com; t=1675842267; h=Date:From:Reply-To:To:Message-ID:Subject:MIME-Version:Content-Type; bh=BJxF0PCdQ4TBdiPcAK83Ah0Z65hMjsvFIWVgzM0O8b0=; b=NUSl8nwZ2aF6ULhIFOJPCANWEeuQNUrnym4hobbeNsB6PPTs2/9jJPFCEEjAh8/q s1l53Vv5qAGx0zO4PTjASyB/UVOZj5FF+LEgDJtUclQcnlNVegRSodaJUHRL3W2xNxa ckDYAnSPr8fTNLG287LPrtxvIL2n8LPOTZWclaGg=
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1675842267; h=Date:From:Reply-To:To:Message-ID:Subject:MIME-Version:Content-Type:Feedback-ID; bh=BJxF0PCdQ4TBdiPcAK83Ah0Z65hMjsvFIWVgzM0O8b0=; b=ivBW6HbegrrlOj7BIB293ZNNy6K8D008I3+wwXoNvZdrBI6SBhL+QmCvCE3Sx0Av qh2hWMJyJBkVVcVwJns8cq8sn6l3NTY7nfN0H5RmuFn/MK4UHJw1vkkzEKKWSDncgf9 6K3DyNhKooBGopkxDOhg/nU8ZX8paHKlD67q7klc=
Date: Wed, 8 Feb 2023 07:44:27 +0000

To look up the DKIM record, email servers use the DKIM selector provided by the email service provider, not just the domain name. Suppose example.com uses Big Email as their email service provider, and suppose Big Email uses the DKIM selector big-email. Most of example.com’s DNS records would be named example.com, but their DKIM DNS record would be under the name big-email._domainkey.example.com, which is listed in the example above.

Content

This is the part of the DKIM DNS record that lists the public key. In the example above, v=DKIM1 indicates that this TXT record should be interpreted as DKIM, and the public key is everything after p=.

Below we query the linuxincluded.com domain using the “dkim” selector.

$ dig TXT dkim._domainkey.linuxincluded.com

; <<>> DiG 9.10.6 <<>> TXT dkim._domainkey.linuxincluded.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45496
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;dkim._domainkey.linuxincluded.com. IN	TXT

;; ANSWER SECTION:
dkim._domainkey.linuxincluded.com. 3600	IN TXT	"v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDdLyUk58Chz538ZQE4PnZ1JqBiYkSVWp8F77QpVF2onPCM4W4BnVJWXDSCC+yn747XFKv+XkVwayLexUkiAga7hIw6GwOj0gplVjv2dirFCoKecS2jvvqXc6/O0hjVqYlTYXwiYFJMSptaBWoHEEOvpS7VWelnQB+1m3UHHPJRiQIDAQAB; s=email"

;; Query time: 453 msec
;; SERVER: 100.64.0.1#53(100.64.0.1)
;; WHEN: Thu Feb 02 13:39:40 SAST 2023
;; MSG SIZE  rcvd: 318

3. Finally Lets Check if DMARC is setup

What is a DMARC record?

A DMARC record stores a domain’s DMARC policy. DMARC records are stored in the Domain Name System (DNS) as DNS TXT records. A DNS TXT record can contain almost any text a domain administrator wants to associate with their domain. One of the ways DNS TXT records are used is to store DMARC policies.

(Note that a DMARC record is a DNS TXT record that contains a DMARC policy, not a specialized type of DNS record.)

Example.com’s DMARC policy might look like this:

NameTypeContentTTL
example.comTXTv=DMARC1; p=quarantine; adkim=r; aspf=r; rua=mailto:example@third-party-example.com;3260
$ dig txt _dmarc.google.com

; <<>> DiG 9.10.6 <<>> txt _dmarc.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16231
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_dmarc.google.com.		IN	TXT

;; ANSWER SECTION:
_dmarc.google.com.	300	IN	TXT	"v=DMARC1; p=reject; rua=mailto:mailauth-reports@google.com"

;; Query time: 209 msec
;; SERVER: 100.64.0.1#53(100.64.0.1)
;; WHEN: Thu Feb 02 13:42:03 SAST 2023
;; MSG SIZE  rcvd: 117

Macbook: Querying DNS using the Host Command

1. Find a list of IP addresses linked to a domain

To find the IP address for a particular domain, simply pass the target domain name as an argument after the host command.

$ host andrewbaker.ninja
andrewbaker.ninja has address 13.244.140.33

For a comprehensive lookup using the verbose mode, use -a or -v flag option.

$ host -a andrewbaker.ninja
Trying "andrewbaker.ninja"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45489
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;andrewbaker.ninja.		IN	ANY

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33
andrewbaker.ninja.	21600	IN	NS	ns-1254.awsdns-28.org.
andrewbaker.ninja.	21600	IN	NS	ns-1514.awsdns-61.org.
andrewbaker.ninja.	21600	IN	NS	ns-1728.awsdns-24.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-1875.awsdns-42.co.uk.
andrewbaker.ninja.	21600	IN	NS	ns-491.awsdns-61.com.
andrewbaker.ninja.	21600	IN	NS	ns-496.awsdns-62.com.
andrewbaker.ninja.	21600	IN	NS	ns-533.awsdns-02.net.
andrewbaker.ninja.	21600	IN	NS	ns-931.awsdns-52.net.
andrewbaker.ninja.	900	IN	SOA	ns-1363.awsdns-42.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

Received 396 bytes from 100.64.0.1#53 in 262 ms

The -a option is used to find all Domain records and Zone information. You can also notice the local DNS server address utilised for the lookup.

2. Reverse Lookup

The command below performs a reverse lookup on the IP address and displays the hostname or domain name.

$ host 13.244.140.33
33.140.244.13.in-addr.arpa domain name pointer ec2-13-244-140-33.af-south-1.compute.amazonaws.com.

3. To find Domain Name servers

Use the -t option to get the domain name servers. It’s used to specify the query type. Below we pass the -t argument to find nameservers of a specific domain. NS record specifies the authoritative nameservers.

$ host -t ns andrewbaker.ninja
andrewbaker.ninja name server ns-1254.awsdns-28.org.
andrewbaker.ninja name server ns-1514.awsdns-61.org.
andrewbaker.ninja name server ns-1728.awsdns-24.co.uk.
andrewbaker.ninja name server ns-1875.awsdns-42.co.uk.
andrewbaker.ninja name server ns-491.awsdns-61.com.
andrewbaker.ninja name server ns-496.awsdns-62.com.
andrewbaker.ninja name server ns-533.awsdns-02.net.
andrewbaker.ninja name server ns-931.awsdns-52.net.

4. To query certain nameserver for a specific domain

To query details about a specific authoritative domain name server, use the below command.

$ host google.com olga.ns.cloudflare.com
Using domain server:
Name: olga.ns.cloudflare.com
Address: 173.245.58.137#53
Aliases:

google.com has address 172.217.170.14
google.com has IPv6 address 2c0f:fb50:4002:804::200e
google.com mail is handled by 10 smtp.google.com.

5. To find domain MX records

To get a list of a domain’s MX ( Mail Exchanger ) records.

$ host -t MX google.com
google.com mail is handled by 10 smtp.google.com.

6. To find domain TXT records

To get a list of a domain’s TXT ( human-readable information about a domain server ) record.

$ host -t txt google.com
google.com descriptive text "docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com descriptive text "v=spf1 include:_spf.google.com ~all"
google.com descriptive text "google-site-verification=TV9-DBe4R80X4v0M4U_bd_J9cpOJM0nikft0jAgjmsQ"
google.com descriptive text "facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com descriptive text "atlassian-domain-verification=5YjTmWmjI92ewqkx2oXmBaD60Td9zWon9r6eakvHX6B77zzkFQto8PQ9QsKnbf4I"
google.com descriptive text "onetrust-domain-verification=de01ed21f2fa4d8781cbc3ffb89cf4ef"
google.com descriptive text "globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com descriptive text "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
google.com descriptive text "apple-domain-verification=30afIBcvSuDV2PLX"
google.com descriptive text "google-site-verification=wD8N7i1JTNTkezJ49swvWW48f8_9xveREV4oB-0Hf5o"
google.com descriptive text "webexdomainverification.8YX6G=6e6922db-e3e6-4a36-904e-a805c28087fa"
google.com descriptive text "MS=E4A68B9AB2BB9670BCE15412F62916164C0B20BB"

7. To find domain SOA record

To get a list of a domain’s Start of Authority record

$ host -t soa google.com
google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60

Use the command below to compare the SOA records from all authoritative nameservers for a particular zone (the specific portion of the DNS namespace).

$ host -C google.com
Nameserver 216.239.36.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.38.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.32.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60
Nameserver 216.239.34.10:
	google.com has SOA record ns1.google.com. dns-admin.google.com. 505465897 900 900 1800 60

8. To find domain CNAME records

CNAME stands for canonical name record. This DNS record is responsible for redirecting one domain to another, which means it maps the original domain name to an alias.

To find out the domain CNAME DNS records, use the below command.

$ host -t cname www.yahoo.com
www.yahoo.com is an alias for new-fp-shed.wg1.b.yahoo.com.
$ dig www.yahoo.com
]
; <<>> DiG 9.10.6 <<>> www.yahoo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45503
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.yahoo.com.			IN	A

;; ANSWER SECTION:
www.yahoo.com.		12	IN	CNAME	new-fp-shed.wg1.b.yahoo.com.
new-fp-shed.wg1.b.yahoo.com. 38	IN	A	87.248.100.215
new-fp-shed.wg1.b.yahoo.com. 38	IN	A	87.248.100.216

;; Query time: 128 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Jan 30 17:07:55 SAST 2023
;; MSG SIZE  rcvd: 106

In the above shown example CNAME entry, if you want to reach “www.yahoo.com”, your computer’s DNS resolver will first fire an address lookup for “www.yahoo.com“. Your resolver then sees that it was returned a CNAME record of “new-fp-shed.wg1.b.yahoo.com“, and in response it will now fire another lookup for “new-fp-shed.wg1.b.yahoo.com“. It will then be returned the A record. So its important to note here is that there are two separate and independent DNS lookups performed by the resolver in order to convert a CNAME into a usable A record.

9. To find domain TTL information

TTL Stands for Time to live. It is a part of the Domain Name Server. It is automatically set by an authoritative nameserver for each DNS record.

In simple words, TTL refers to how long a DNS server caches a record before refreshing the data. Use the below command to see the TTL information of a domain name (in the example below its 300 seconds/5 minutes).

$ host -v -t a andrewbaker.ninja
Trying "andrewbaker.ninja"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27738
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;andrewbaker.ninja.		IN	A

;; ANSWER SECTION:
andrewbaker.ninja.	300	IN	A	13.244.140.33

Received 51 bytes from 8.8.8.8#53 in 253 ms

Hacking: Using a Macbook and Nikto to Scan your Local Network

Nikto is becoming one of my favourite tools. I like it because of its wide ranging use cases and its simplicity. So whats an example use case for Nikto? When I am bored right now and so I am going to hunt around my local network and see what I can find…

# First install Nikto
brew install nikto
# Now get my ipaddress range
ifconfig
# Copy my ipaddress into to ipcalculator to get my cidr block
eth0      Link encap:Ethernet  HWaddr 00:0B:CD:1C:18:5A
          inet addr:172.16.25.126  Bcast:172.16.25.63  Mask:255.255.255.224
          inet6 addr: fe80::20b:cdff:fe1c:185a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2341604 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2217673 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:293460932 (279.8 MiB)  TX bytes:1042006549 (993.7 MiB)
          Interrupt:185 Memory:f7fe0000-f7ff0000
# Get my Cidr range (brew install ipcalc)
ipcalc 172.16.25.126
cp363412:~ $ ipcalc 172.16.25.126
Address:   172.16.25.126        10101100.00010000.00011001. 01111110
Netmask:   255.255.255.0 = 24   11111111.11111111.11111111. 00000000
Wildcard:  0.0.0.255            00000000.00000000.00000000. 11111111
=>
Network:   172.16.25.0/24       10101100.00010000.00011001. 00000000
HostMin:   172.16.25.1          10101100.00010000.00011001. 00000001
HostMax:   172.16.25.254        10101100.00010000.00011001. 11111110
Broadcast: 172.16.25.255        10101100.00010000.00011001. 11111111
Hosts/Net: 254                   Class B, Private Internet
# Our NW range is "Network:   172.16.25.0/24"

Now lets pop across to nmap to get a list of active hosts in my network

# Now we run a quick nmap scan for ports 80 and 443 across the entire range looking for any hosts that respond and dump the results into a grepable file
nmap -p 80,433 172.16.25.0/24 -oG webhosts.txt
# View the list of hosts
cat webhosts.txt
$ cat webhosts.txt
# Nmap 7.93 scan initiated Wed Jan 25 20:17:42 2023 as: nmap -p 80,433 -oG webhosts.txt 172.16.25.0/26
Host: 172.16.25.0 ()	Status: Up
Host: 172.16.25.0 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.1 ()	Status: Up
Host: 172.16.25.1 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.2 ()	Status: Up
Host: 172.16.25.2 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.3 ()	Status: Up
Host: 172.16.25.3 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.4 ()	Status: Up
Host: 172.16.25.4 ()	Ports: 80/open/tcp//http///, 433/open/tcp//nnsp///
Host: 172.16.25.5 ()	Status: Up

Next we want to grep this webhost file and send all the hosts that responded to the port probe of to Nikto for scanning. To do this we can use some linux magic. First we cat to read the output stored in our webhosts.txt document. Next we use awk. This is a Linux tool that will help search for the patterns. In the command below we are asking it to look for “Up” (meaning the host is up). Then we tell it to print $2, which means to print out the second word in the line that we found the word “Up” on, i.e. to print the IP address. Finally, we send that data to a new file called niktoscan.txt.

cat webhosts.txt | awk '/Up$/{print $2}' | cat >> niktoscan.txt
cat niktoscan.txt
$ cat niktoscan.txt
172.16.25.0
172.16.25.1
172.16.25.2
172.16.25.3
172.16.25.4
172.16.25.5
172.16.25.6
172.16.25.7
172.16.25.8
172.16.25.9
172.16.25.10
...

Now let nikto do its stuff:

nikto -h niktoscan.txt -ssl >> niktoresults.txt
# Lets check what came back
cat niktoresults.txt