How to trigger Scaling Events using Stress-ng Command

If you are testing how your autoscaling policies respond to CPU load then a really simple way to test this is using the “stress” command. Note: this is a very crude mechanism to test and wherever possible you should try and generate synthetic application load.

#!/bin/bash

# DESCRIPTION: After updating from the repo, installs stress-ng, a tool used to create various system load for testing purposes.
yum update -y
# Install stress-ng
sudo apt install stress-ng

# CPU spike: Run a CPU spike for 5 seconds
uptime
stress-ng --cpu 4 --timeout 5s --metrics-brief
uptime

# Disk Test: Start N (2) workers continually writing, reading and removing temporary files:
stress-ng --disk 2 --timeout 5s --metrics-brief

# Memory stress test
# Populate memory. Use mmap N bytes per vm worker, the default is 256MB. 
# You can also specify the size as % of total available memory or in units of 
# Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g:
# Note: The --vm 2 will start N workers (2 workers) continuously calling 
# mmap/munmap and writing to the allocated memory. Note that this can cause 
# systems to trip the kernel OOM killer on Linux systems if not enough 
# physical memory and swap is not available
stress-ng --vm 2 --vm-bytes 1G --timeout 5s

# Combination Stress
# To run for 5 seconds with 4 cpu stressors, 2 io stressors and 1 vm 
# stressor using 1GB of virtual memory, enter:
stress-ng --cpu 4 --io 2 --vm 1 --vm-bytes 1G --timeout 5s --metrics-brief

AWS: Making use of S3s ETags to check if a file has been altered

I was playing with S3 the other day an I noticed that a file which I had uploaded twice, in two different locations had an identical ETag. This immediately made me think that this tag was some kind of hash. So I had a quick look AWS documentation and this ETag turns out to be marginally useful. ETag is an “Entity Tag” and its basically a MD5 hash of the file (although once the file is bigger than 5gb it appears to use another hashing algorithm).

So if you ever want to compare a local copy of a file with an AWS S3 copy of a file you just need to install MD5 (the below steps are for ubuntu linux):

# Update your ubunto
# Download the latest package lists
sudo apt update
# Perform the upgrade
sudo apt-get upgrade -y
# Now install common utils (inc MD5)
sudo apt install -y ucommon-utils
# Upgrades involving the Linux kernel, changing dependencies, adding / removing new packages etc
sudo apt-get dist-upgrade

Next to view the MD5 hash of a file simple type:

# View MD5 hash of
md5sum myfilename.myextension
2aa318899bdf388488656c46127bd814  myfilename.myextension
# The first number above will match your S3 Etag if its not been altered

Below is the screenshot of the properties that you will see in S3 with a matching MD5 hash:

Using TPC-H tools to Create Test Data for AWS Redshift and AWS EMR

If you need to test out your big data tools below is a useful set of scripts that I have used in the past for aws emr and redshift the below might be helpful:

install git
 sudo yum install make git -y
 install the tpch-kit
 git clone https://github.com/gregrahn/tpch-kit
 cd tpch-kit/dbgen
 sudo yum install gcc -y
 Compile the tpch kit
 make OS=LINUX
 Go home
 cd ~
 Now make your emr data
 mkdir emrdata
 Tell tcph to use the this dir
 export DSS_PATH=$HOME/emrdata
 cd tpch-kit/dbgen
 Now run dbgen in verbose mode, with tables (orders), 10gb data size
 ./dbgen -v -T o -s 10
 move the data to a s3 bucket
 cd $HOME/emrdata
 aws s3api create-bucket -- bucket andrewbakerbigdata --region af-south-1 --LocationConstraint=af-south-1
 aws s3 cp $HOME/emrdata s3://andrewbakerbigdata/emrdata --recursive
 cd $HOME
 mkdir redshiftdata
 Tell tcph to use the this dir
 export DSS_PATH=$HOME/redshiftdata
 Now make your redshift data
 cd tpch-kit/dbgen
 Now run dbgen in verbose mode, with tables (orders), 40gb data size
 ./dbgen -v -T o -s 40
 These are big files, so lets find out how big they are and split them
 Count lines
 cd $HOME/redshiftdata
 wc -l orders.tbl
 Now split orders into 15m lines per file
 split -d -l 15000000 -a 4 orders.tbl orders.tbl.
 Now split line items
 wc -l lineitem.tbl
 split -d -l 60000000 -a 4 lineitem.tbl lineitem.tbl.
 Now clean up the master files
 rm orders.tbl
 rm lineitem.tbl
 move the split data to a s3 bucket
 aws s3 cp $HOME/redshiftdata s3://andrewbakerbigdata/redshiftdata --recursive

AWS: Please Fix Poor Error Messages, API standards and Bad Defaulting

3 Ways to Handle Work Frustration (Without Quitting) | Business Markets and  Stocks News | madison.com

This is a short blog, and its actually just simple a plea to AWS. Please can you do three things?

  1. North Virginia appears to be the AWS master node. Having this region as a master region causes a large number of support issues (for example S3, KMS, Cloudfront, ACM all use this pet region and all of their APIs suffer as a result). This coupled with point 2) creates some material angst.
  2. Work a little harder on your error messages – they are often really (really) bad. I will post some examples at the bottom of this post over time. But you have to do some basics like reject unknown parameters (yes it’s useful to know there is a typo vs just ignore the parameter).
  3. Use standard parameters across your APIs (eg make specifying the region consistent (even within single products its not consistently applied) and make your verbs consistent).

As a simple example, below i am logged into an EC2 instances in af-south-1 and I can create an S3 bucket in North Virginia, but not in af-south-1. I am sure there is a “fix” (change some config, find out an API parameter was invalid and was silently ignored etc) – but this isn’t the point. The risk (and its real) is that in an attempt to debug this, developers will tend to open up security groups, open up NACLs, widen IAM roles etc. When the devs finally fix the issue; they will be very unlikely to retrace all their steps and restore everything else that they changed. This means that you end up with debugging scars that create overly permission services, due to poor errors messages, inconsistent API parameters/behaviors and a regional bias. Note: I am aware of commercial products, like Radware’s CWP – but that’s not the point. I shouldn’t ever need to debug by dialling back security. Observability was supposed to be there from day 1. The combination of tangential error messages, inconsistent APIs and lack of decent debug information from core services like IAM and S3, are creating a problem that shouldn’t exist.

AWS is a global cloud provider – services should work identically across all regions, and APIs should have standards, APIs shouldn’t silently ignore mistyped parameters, the base config required should either come from context (ie am running in region x) or config (aws config) – not a global default region.

Please note: I deleted the bucket between running the two commands ++ awsconfigure seemed to be ignored by createbucket

[ec2-user@ip-172-31-24-139 emrdata]$ aws s3api create-bucket --bucket ajbbigdatabucketlab2021
{
    "Location": "/ajbbigdatabucketlab2021"
}
[ec2-user@ip-172-31-24-139 emrdata]$ aws s3api create-bucket --bucket ajbbigdatabucketlab2021 --region af-south-1

An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: 
The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

Note, I worked around the createbucket behavior by replacing it with mb:

[ec2-user@ip-172-31-24-139 emrdata]$ aws s3 mb s3://ajbbigdatabucketlab2021 --region af-south-1
make_bucket: ajbbigdatabucketlab2021

Thanks to the AWS dudes for letting me know how to get this working. It turns out the create-bucket and mb APIs, dont use standard parameters. See below (region tag needs to be replaced by a verbose bucket config tag):

aws s3api create-bucket --bucket ajbbigdatabucketlab2021 --create-bucket-configuration LocationConstraint=af-south-1

Example IAM Policy to Enforce EBS encryption

Here is a useful IAM conditional policy which will force EBS volumes to be encrypted when created by an EC2 instances.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Sid": "Stmt2222222222222",
       "Effect": "Allow",
       "Action": [
         "ec2:CreateVolume"
       ],
       "Condition": {
         "Bool": {
           "ec2:Encrypted": "true"
         }
       },
       "Resource": [
         "*"
       ]
     },
     {
       "Sid": "Stmt1111111111111",
       "Effect": "Allow",
       "Action": [
         "ec2:DescribeVolumes",
         "ec2:DescribeAvailabilityZones",
         "ec2:CreateTags",
         "kms:ListAliases"
       ],
       "Resource": [
         "*"
       ]
     },
     {
       "Sid": "allowKmsKey",
       "Effect": "Allow",
       "Action": [
         "kms:Encrypt"
       ],
       "Resource": [
         "arn:aws:kms:us-east-1:999999999999:alias/aws/ebs"
       ]
     }
   ]
 }