Using TPC-H tools to Create Test Data for AWS Redshift and AWS EMR

If you need to test out your big data tools below is a useful set of scripts that I have used in the past for aws emr and redshift the below might be helpful:

install git
 sudo yum install make git -y
 install the tpch-kit
 git clone https://github.com/gregrahn/tpch-kit
 cd tpch-kit/dbgen
 sudo yum install gcc -y
 Compile the tpch kit
 make OS=LINUX
 Go home
 cd ~
 Now make your emr data
 mkdir emrdata
 Tell tcph to use the this dir
 export DSS_PATH=$HOME/emrdata
 cd tpch-kit/dbgen
 Now run dbgen in verbose mode, with tables (orders), 10gb data size
 ./dbgen -v -T o -s 10
 move the data to a s3 bucket
 cd $HOME/emrdata
 aws s3api create-bucket -- bucket andrewbakerbigdata --region af-south-1 --LocationConstraint=af-south-1
 aws s3 cp $HOME/emrdata s3://andrewbakerbigdata/emrdata --recursive
 cd $HOME
 mkdir redshiftdata
 Tell tcph to use the this dir
 export DSS_PATH=$HOME/redshiftdata
 Now make your redshift data
 cd tpch-kit/dbgen
 Now run dbgen in verbose mode, with tables (orders), 40gb data size
 ./dbgen -v -T o -s 40
 These are big files, so lets find out how big they are and split them
 Count lines
 cd $HOME/redshiftdata
 wc -l orders.tbl
 Now split orders into 15m lines per file
 split -d -l 15000000 -a 4 orders.tbl orders.tbl.
 Now split line items
 wc -l lineitem.tbl
 split -d -l 60000000 -a 4 lineitem.tbl lineitem.tbl.
 Now clean up the master files
 rm orders.tbl
 rm lineitem.tbl
 move the split data to a s3 bucket
 aws s3 cp $HOME/redshiftdata s3://andrewbakerbigdata/redshiftdata --recursive

Setting up ssh for ec2-user to your wordpress sites

So after getting frustrated (and even recreating my ec2 instances) due to a “Permission denied (publickey)”, I finally released that the worpress builds by default as set up for SSH using the bitnami account (or at least my build was).

This means each time I login using ec2-user I get:

sudo ssh -i CPT_Default_Key.pem ec2-user@ec2-13-244-140-33.af-south-1.compute.amazonaws.com
ec2-user@ec2-13-244-140-33.af-south-1.compute.amazonaws.com: Permission denied (publickey).

Being a limited human being, I will never cope with two user names. So to move over to a standard login name (ec2-user) is relatively simple. Just follow the below steps (after logging in using the bitnami account):

sudo useradd -s /bin/bash -o -u id -u -g id -g ec2-user

sudo mkdir ~ec2-user/
sudo cp -rp ~bitnami/.ssh ~ec2-user/
sudo cp -rp ~bitnami/.bashrc ~ec2-user/
sudo cp -rp ~bitnami/.profile ~ec2-user/

Next you need to copy your public key into the authorised keys file using:

cat mypublickey.pub >> /home/ec2-user/.ssh/authorized_key

Next to allow the ec2-user to execute commands as the root user, add the new user account to the bitnami-admins group, by executing the following command when logged in as the bitnami user:

sudo usermod -aG bitnami-admins ec2-user

Linux: Quick guide to the CD command – for windows dudes :)

Ok, so I am a windows dude and only after docker and K8 came along did I start to get all they hype around Linux. To be fair, Linux is special and I have been blown away with the engineering effort behind this OS (and also glad to leave my book of Daniel Appleman win32 api on the shelf for a few years!).

What surprises me with Linux is the number of shortcuts and so before I forget them I am going to document a few of my favorites (the context here is that I use WSL2 a lot and these are my favorite navigation commands).

Exchanging files between Linux and Windows:

This is a bit of a pain, so I just create a symbolic link to a windows root directory in my linux home directory so that I can easily copy files back an forth.

cd ~
ln -s /mnt/c/ mywindowsroot
cd mywindowsroot
ls
# copy everything from my windows root folder into my wsl linux directory
cp mywindowsroot/. .

Show Previous Directory

cd --

Switch back to your previous directory

cd -

Move to Home Directory

cd ~
or just use
cd

Pushing and Popping Directories

Pushd and popd are Linux commands in bash and certain other shell which saves current working directory location to memory or brings to the directory from memory and changes to this directory, respectively. This is very handy when your jumping around but don’t want to create symbolic links.

# Push the current directory onto the stack (you can also enter an absolute directory here, like pushd /var/www)
pushd .
# Go to the home dir
cd
ls
# Now move back to this directory
popd
ls

Windows 10: How to fix you desktop flashing icons

Just popping this here as I have had this a few times. Have you ever had your desktop icons flash and if you check explorer.exe its using high CPU. If so try:

Step 1. Delete the Icon Cache

Save the following as a batch file on your desktop and run it as admin:

cd /d %userprofile%\AppData\Local\Microsoft\Windows\Explorer 
attrib –h iconcache_*.db  
del iconcache_*.db 
start explorer
pause

Step 2. Tweak the Registry

Try setting these two reg keys to zero:

HKEY_CURRENT_USER\Control Panel\Desktop:ForegroundFlashCount
HKEY_CURRENT_USER\Control Panel\Desktop:ForegroundLockTimeout

Step 3. Switch of Indexing

Run either of these commands (as admin) to stop or suspend windows indexing

sc config cisvc start= disabled
sc config cisvc start= demand