AWS: Please Fix Poor Error Messages, API standards and Bad Defaulting

3 Ways to Handle Work Frustration (Without Quitting) | Business Markets and  Stocks News | madison.com

This is a short blog, and its actually just simple a plea to AWS. Please can you do three things?

  1. North Virginia appears to be the AWS master node. Having this region as a master region causes a large number of support issues (for example S3, KMS, Cloudfront, ACM all use this pet region and all of their APIs suffer as a result). This coupled with point 2) creates some material angst.
  2. Work a little harder on your error messages – they are often really (really) bad. I will post some examples at the bottom of this post over time. But you have to do some basics like reject unknown parameters (yes it’s useful to know there is a typo vs just ignore the parameter).
  3. Use standard parameters across your APIs (eg make specifying the region consistent (even within single products its not consistently applied) and make your verbs consistent).

As a simple example, below i am logged into an EC2 instances in af-south-1 and I can create an S3 bucket in North Virginia, but not in af-south-1. I am sure there is a “fix” (change some config, find out an API parameter was invalid and was silently ignored etc) – but this isn’t the point. The risk (and its real) is that in an attempt to debug this, developers will tend to open up security groups, open up NACLs, widen IAM roles etc. When the devs finally fix the issue; they will be very unlikely to retrace all their steps and restore everything else that they changed. This means that you end up with debugging scars that create overly permission services, due to poor errors messages, inconsistent API parameters/behaviors and a regional bias. Note: I am aware of commercial products, like Radware’s CWP – but that’s not the point. I shouldn’t ever need to debug by dialling back security. Observability was supposed to be there from day 1. The combination of tangential error messages, inconsistent APIs and lack of decent debug information from core services like IAM and S3, are creating a problem that shouldn’t exist.

AWS is a global cloud provider – services should work identically across all regions, and APIs should have standards, APIs shouldn’t silently ignore mistyped parameters, the base config required should either come from context (ie am running in region x) or config (aws config) – not a global default region.

Please note: I deleted the bucket between running the two commands ++ awsconfigure seemed to be ignored by createbucket

[ec2-user@ip-172-31-24-139 emrdata]$ aws s3api create-bucket --bucket ajbbigdatabucketlab2021
{
    "Location": "/ajbbigdatabucketlab2021"
}
[ec2-user@ip-172-31-24-139 emrdata]$ aws s3api create-bucket --bucket ajbbigdatabucketlab2021 --region af-south-1

An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: 
The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

Note, I worked around the createbucket behavior by replacing it with mb:

[ec2-user@ip-172-31-24-139 emrdata]$ aws s3 mb s3://ajbbigdatabucketlab2021 --region af-south-1
make_bucket: ajbbigdatabucketlab2021

Thanks to the AWS dudes for letting me know how to get this working. It turns out the create-bucket and mb APIs, dont use standard parameters. See below (region tag needs to be replaced by a verbose bucket config tag):

aws s3api create-bucket --bucket ajbbigdatabucketlab2021 --create-bucket-configuration LocationConstraint=af-south-1

A simple DDOS SYN flood Test

Getting an application knocked out with a simple SYN flood is both embarrassing and avoidable. Its also very easy to create a SYN flood and so its something you should design against. Below is the hping3 command line that I use to test my services against SYN floods. I have used quite a few mods, to make the test a bit more realistic – but you can also distribute this across a few machines to stretch the target host a bit more if you want to.

Parameters:

-c –count Stop after sending (and receiving) count response packets. After the last packet was sent, hping3 wait COUNTREACHED_TIMEOUT seconds target host replies. You are able to tune COUNTREACHED_TIMEOUT editing hping3.h

-d –data data size Set packet body size. Warning, using –data 40 hping3 will not generate 0 byte packets but protocol_header+40 bytes. hping3 will display packet size information as first line output, like this: HPING www.yahoo.com (ppp0 204.71.200.67): NO FLAGS are set, 40 headers + 40 data bytes

-S –syn Set SYN tcp flag

-w –win Set TCP window size. Default is 64.

-p –destport [+][+]dest port Set destination port, default is 0. If ‘+’ character precedes dest port number (i.e. +1024) destination port will be increased for each reply received. If double ‘+’ precedes dest port number (i.e. ++1024), destination port will be increased for each packet sent. By default destination port can be modified interactively using CTRL+z.

–flood send packets as fast as possible, without waiting for incoming replies. This is faster than the -i u0 option.

–rand-source This option enables the random source mode. hping will send packets with random source address. It is interesting to use this option to stress firewall state tables, and other per-ip basis dynamic tables inside the TCP/IP stacks and firewall software.

apt-get update
apt install hping3
hping3 -c 15000 -d 120 -S -w 64 -p 443 --flood --rand-source <my-ip-to-test>

The Triplication Paradigm

Biggest Wastes of Money (Part 5): Gadgets, Dining Out, Luxury Hotels, Gyms
Wasting money can often happen when you think your being clever…

Introduction

In most large corporates technology will typically report into either finance or operations. This means that it will tend to be subject to cultural inheritance, which is not always a good thing. One example of where the cultural default should be challenged is when managing IP duplication. In finance or operations duplication rarely yields any benefits and will often result in unnecessary costs and/or inconsistent customer experiences. Because of this, technology teams will tend to be asked to centrally analyse all incoming workstreams for convergence opportunities. If any seemingly overlapping effort is discovered, this would then typically be extracted into a central, “do it once” team. Experienced technologists will likely remark that it generally turns out that the analysis process is very slow, overlaps are small, the cost of extracting them are high, additional complexity is introduced, backlogs become unmanageable, testing the consolidated “swiss army knife” product is problematic and critically, the teams are typically reduced to crawling speed as they try to transport context and requirements to the central delivery team. I have called the above process “Triplication”, simply because is creates more waste and costs more than duplication ever could (and also because my finance colleagues seem to connect with this term).

The article below attempts to explain why we fear duplication and why slavishly trying to remove all duplication is a mistake. Having said this, a purely federated model or abundant resource model with no collaboration leads to similarly chronic issues (I will write an article about “Federated Strangulation” shortly).

The Three Big Corporate Fears

  1. The fear of doing something badly.
  2. The fear of doing something twice (duplication).
  3. The fear of doing nothing at all.

In my experience, most corporates focus on fear 1) and 2). They will typically focus on layers of governance, contractual bindings, interlocks and magic metric tracking (SLA, OLA, KPI etc etc). The governance is typically multi-layered, with each forum meeting infrequently and ingesting the data in a unique format (no sense in not duplicating the governance overhead, right?!). As a result these large corporates typically achieve fear 3) – they will do nothing at all.

Most start-ups/tech companies worry almost exclusively about 3) – as a result they achieve a bit of 1) and 2). Control is highly federated, decision trees are short, and teams are self empowered and self organising. Dead ends are found quickly, bad ideas are cancelled or remediated as the work progresses. Given my rather bias narrative about – it won’t be a surprise to learn that I believe 3) is the greatest of all evils. To allow yourself to be overtaken is the greatest of all evils, to watch a race that you should be running is the most extreme form a failure.

For me, managed duplication can be a positive thing. But the key is that you have to manage it properly. You will often see divergence and consolidation in equal measure as the various work streams mature. The key to managing duplication is to enforce scarcity of resources and collaboration. Additionally, you may find that a decentralised team could become conflicted when it is asked to manage multiple business units interests. This is actually success! This means this team has created something that has been virally absorbed by other parts of the business – it means you have created something thats actually good! When this happens look at your contribution options, and sometimes it may make sense to split the product team up into a several business facing teams and a core platform engineering team. If however, there is no collaboration and an abundance of resources are thrown at all problems, you end up with material and avoidable waste. Additionally, observe exactly what your duplicating – never duplicate a commodity and never federate data. You also need to avoid a snowflake culture and make sure that were it makes sense you are trying to share.

Triplication happens when a two or more products are misunderstood to be “similar” and then attempted to be fused together. The over aggregation of your product development streams will yield most of the below:

1) Cripplingly slow and expensive to develop.

2) Risk concentration/instability. Every release will cause trauma to multiple customer bases.

3) Unsupportable. It will take you days to work out what went wrong and how on earth you can fix the issue as you will suffer from Quantum Entanglement.

4) Untestable. The complexity of the product will guarantee each release causes distress.

5) Low grade client experience.

Initially these problems will be described as “teething problems”. After a while it becomes clearer that the problem is not fixing itself. Next you will likely start the “stability” projects. A year or so later after the next pile of cash is burnt there will be a realisation that this is as good as it gets. At this point, senior managers start to see the writing on the wall and will quickly distance themselves from the product. Luckily for them, nobody will likely remember exactly whom in the many approval forums thought this was a good idea in the first place. Next the product starts to get linked to the term “legacy”. The final chapter for this violation of common sense, is the multi-year decommissioning process. BUT – its highly likely that the strategic replacement contains the exact same flaws as the legacy product…

The Conclusion

To conclude, I created the term “Triplication” as I needed a way to succinctly explain that things can get worse when you lump them together without a good understanding of why your doing this.  I needed a way to help challenge statements like, “you have to be able to extract efficiencies if you just lump all your teams together”. This thinking is equivalent to saying; “hey I have a great idea…! We ALL like music, right?? So lets save money – lets go buy a single CD for all of us!”

The reality for those that have played out the triplication scenario in real life, is that you will see costs balloon, progress grind to a halt, revenues fall of a cliff and the final step in the debacle is usually a loss of trust – followed by the inevitable outsourcing pill. On the other hand collaboration, scarcity, lean, quick MVPs, shared learning, cloud, open source, common rails and internal mobility are the friends of fast deliverables, customer satisfaction and yes – low costs!

Part 1: The Great Public Cloud Crusade…

“Not all cloud transformations are created equally…!”

The cloud is hot…. not just a little hot, but smokin hot!! Covid is messing with the economy, customers are battling financially, the macro economic outlook is problematic, vendor costs are high and climbing and security needs more investment every year. What on earth do we do??!! I know…. lets start a crusade – lets go to the cloud!!!!

Cloud used to be just for the cool kids, the start ups, the hipsters… but not anymore, now corporates are coming and they are coming in their droves. The cloud transformation conversation is playing out globally for almost all sectors, from health care, to pharmaceuticals and finance. The hype and urban legends around public cloud are a creating a lot of FOMO.

For finance teams under severe cost pressures, the cloud has to be an obvious place to seek out some much need pain relief. CIOs are giving glorious on stage testimonials, decrying victory after having gone live with their first “bot in the cloud”. So what is there to blog about, it’s all wonderful right…? Maybe not…

The Backdrop…

Imagine your a CIO or CTO, you haven’t cut code for a while or maybe you have a finance background. Anyway your architecture skills are a bit rusty/vacant, you have been outsourcing technology work for years, you are awash with vendor products, all the integration points are “custom” (aka arc welded) and and hence your stack is very fragile. In fact its so fragile you can trigger outages when someone closes your datacentre door a little too hard! Your technology teams all have low/zero cloud knowledge and now you have been asked to transform your organisation by shipping it off to the cloud… So what do you do???

Lots of organisations believe this challenge is simply a case of finding the cheapest cloud provider, write a legal document, some SLAs, find a vendor who can whiz your servers into the cloud – then you simply cut a cheque. But the truth is the cloud requires IP and if you don’t have IP (aka engineers) then you have a problem…

Plan A: Project Borg

This is an easy, problem – right? Just ask the AWS borg to assimilate you!!! The “Borg” strategy can be achieved by:

  1. Install some software agents in your data centers to come up with a total thumb suck on how much you think you will spend in the cloud. Note: your lack of any real understanding of how the cloud works should not ring any warning bells.
  2. Factor down this thumb suck using another made up / arbitrary “risk factor”.
  3. Next, sign an intergalactic cloud commit with your cloud provider of choice and try to squeeze more than a 10px discount out for taking this enormous risk.
  4. Finally pick up the phone to one of the big 5 consultants and get them to “assimilate” you in the cloud (using some tool to perform a bitwise copy of your servers into the cloud).

Before you know it your peppering your board and excos with those ghastly cloud packs, you are sending out group wide emails with pictures of clouds on them, you are telling your teams to become “cloud ready”. What’s worse your burning serious money as the consultancy team you called in did the usual land and expand. But you cant seem to get a sense of any meaningful progress (and no, a BOT in the cloud doesn’t count as progress).

To fund this new cloud expense line you have to start strangling your existing production spending, maybe you are running your servers for an extra year or two, strangling the network spend, keep these storage arrays for just a little while longer. But don’t worry, before you know it you will be in the cloud – right??

The Problem Statement

The problem is that public cloud was never about physically your iffy datacentre software with someone else; it’s was supposed to be about transformation of this software. The legacy software in your datacentre is almost certainly poisonous and in interdependencies will be as lethal as they are opaque. If you move it, pain will follow and you wont see any real commercial benefits for years.

Put another way, your datacentre is the technical equivalent of a swap. Luckily those lovely cloud people have built you a nice clean swimming pool. BUT don’t go and pump your swamp into this new swimming pool!

Crusades have never give us rational outcomes, you forgot to imagine where the customer was in this painful sideways move, what exactly did you want from this? In fact cloud crusades suffer from a list of oversights, weaknesses and risks:

  1. Actual digital “transformation” will take years to realise (if ever). All you did was just changed your hosting and how you pay for technology – nothing else actually changed.
  2. Your customer value proposition will be totally unchanged, sadly you are still as digital as a fax machine!
  3. Key infrastructure teams will start realising their is no future for them and start wandering. Creating even more instability.
  4. Stability will be problematic as your hybrid network has created a BGP birds nest.
  5. Your company signed a 5 year cloud commit. You took your current tech spend, halved it and then asked your cloud provider to give you discounts on this projected spend. You will likely see around a 10px-15px reduction in your EDP (enterprise discount program) rates, and for this you are taking ENORMOUS downside risks. You’re also accidentally discouraging efficient utilisation of resources. in favour of a culture of “ram it in the cloud and review it once our EDP period expires”.
  6. Your balance sheet will balloon, such that you will end up with a cost base of not dissimilar to NASA, you will need a PhD to diagnose issues and your delivery cadence will be close to zero. Additionally, you will need to create an impairment factory to deal with all your stranded assets.

So what does this approach actually achieve? You will of added a ton of intangible assets by balance sheeting a bunch of profees, you will likely be less stable and even be less secure (more on this later), you know that this is an unsustainable project and that it is the equivalent of an organisational heart transplant. The only people that now understand your organisation, are a team of well paid consultants on a 5x salary multiple and sadly you cannot stop this process – you have to keep paying and praying. Put simply, cloud mass migration (aka assimilation) is a bad idea – so don’t do it!

The key here is that your tech teams have to transform themselves. Nothing can act on them, the transformation has to come from within. When you review organisations that have been around for a while, they may have had a few mergers, have high vendor dependencies and low technology skills; you will tend to end up with the combined/systemic complexity suffering from something similar to Quantum Entanglement. Next we ask an external agency with a suite of tools to unpack this unobservable, irreducible complexity with a few tools, then we get expensive external forces to reverse engineer these entangled systems and recreate them somewhere else. This is not reasonable or rationale – its daft and we should stop doing this.

If not this, then what?

The “then what” bit is even longer that the “not this” bit. So I am posting this as it and if I get 100 hits I will write up the other way – little by little 🙂

Click here to read the work in progress link on another approach to scaling cloud usage…

External k8gb presentation to Kubernetes SIG multicluster

Today I am a happy bunny!!!! Yury Tsarev (a very clever dude) did a presentation to one of the Kubernetes co-founders Tim Hockin. The demo was one of absa banks opensource projects called K8GB (a cloud native GSLB for K8s): https://www.k8gb.io/

Why do I like K8GB? Because it uses a single CRD that integrates to all the big DNS providers (like NS1, Infoblox and Route 53), it then uses zone delegation to allow the developers to manage their GTM in a single CRD. It also has native health checks using the liveliness and readiness kubernates probes (rather than just IMCP or http responses). Put simply it saves a whole bunch of unnecessary yak shaving!

So, if you have a few minutes you can watch the video (the demo starts after 10mins or so):

https://www.youtube.com/watch?v=jeUeRQM-ZyM

GIF so happy - animated GIF on GIFER

Windows 10: How to fix you desktop flashing icons

Just popping this here as I have had this a few times. Have you ever had your desktop icons flash and if you check explorer.exe its using high CPU. If so try:

Step 1. Delete the Icon Cache

Save the following as a batch file on your desktop and run it as admin:

cd /d %userprofile%\AppData\Local\Microsoft\Windows\Explorer 
attrib –h iconcache_*.db  
del iconcache_*.db 
start explorer
pause

Step 2. Tweak the Registry

Try setting these two reg keys to zero:

HKEY_CURRENT_USER\Control Panel\Desktop:ForegroundFlashCount
HKEY_CURRENT_USER\Control Panel\Desktop:ForegroundLockTimeout

Step 3. Switch of Indexing

Run either of these commands (as admin) to stop or suspend windows indexing

sc config cisvc start= disabled
sc config cisvc start= demand

Running Corporate Technology: Smart vs Traditional

There are two fundamental ways to run technology inside your company (and various states in-between)

The Truth

Before we get started, what lies beneath is actually a rant. It’s not even a well disguised rant and the reason for this is two fold. Firstly, I don’t expect many people to read this, so the return on the effort required to dress it up would be low. Secondly, I wrote this for my benefit to relieve frustration – nothing to do with trying to educate or for some greater good thingy. So if you continue to read this article, you are essentially paying my “frustration tax”… So thanks for that 🙂

The Models

Large corporates will tend to confuse a bad culture with poor performing staff. My experience is that a lot of execution issues center around cultural issues.

There are broadly two fundamental models of organising tech inside a corperate (and various states in-between).  To assess which world you live in, I have described the general characteristics of the two models below:

Model A:

  1. Your technology leadership/technology exco consists of auditors, CFOs, HR, risk management and generic management. They have never, will never and can never make a technology decision, as there is no understanding of technology. To solve for this, the leadership simply syndicate every single decision out to the widest possible group. Decisions take ages, as most of these groups also dont understand technology. When an eventual decision is made there can be no accountability as its unclear who made the call. By default they will always favour the status quo, they are extremely uncomfortable with all forms of change, they are fragile leaders and will focus a lof of effort on blame offloading. They still believe you cant get fired for buying IBM (no disrespect to IBM).
  2. You will spend a number of months collating voluminous documents containing all possible requirements for your products over the next n years. This is often mistaken to be a “strategy document”.
  3. You believe in developing technology away from your customer in “the center”. One day you feel you will “extract synergies” and “economies of scale” if you do this.
  4. You believe each business head should have a “single face off” into technology. This “engagement manager” (aka human inbox) attempts to use “influence” by routing escalation emails through the organisations hierarchy to try generate outcomes.
  5. Your central functions haven’t quite delivered yet, cost a small fortune, have top heavy management structure, BUT will “definitely deliver next year”. These functions manage using “gates and governance” and often send out new multi-page “Target Operating Model” documents which nobody reads/understands.
  6. You believe you can fix technology with manually captured metrics like SLAs, OLAs, KPIs etc etc. You have hundreds of these measures and you pay a small army to collate these metrics each month. These measures always come back as “Green”, but you are still unhappy with the performance of your technology teams. You feel that if you could just come up with that “killer metric” that would show how bad technology really is; you could then finally sign an outsourcing deal to “fix it”.
  7. You ask for documents from technology showing exactly what will be delivered over the next 3-5 years time horizon. You believe in these documents and feel that they are valuable. Historically they have never been close to accurate. You will likely see an item in 2025 which states “Digital Onboarding of Customers” – this is around the same time that the rest of the world will be enjoying the benefits of teleportation!
  8. You hire an army of people to track that technology are “delivering on time”, they are not delivering on time and you still don’t know why. To scale output you hire more people to track why technology is not delivering on time.
  9. You are constantly underwhelmed with what eventually gets delivered. It doesn’t do anything you really wanted it to and your customer feedback is lackluster at best.
  10. There are clear lines of distinction between technology and the business. Phrases like “I could make my budget… if I could only get some delivery from tech” are common place.
  11. The products delivered have all the charisma and charm of a road traffic accident.
  12. You believe, if you can just get the governance right and capture the requirements, then you will get the product you wanted.
  13. Your engagements with technology will take the form of “I need EVERYTHING from product X delivered by by date X. How much will it cost?”. When you get the reply a few months later, you will ALWAYS reply “that’s crazy!”. You will then cycle requirements down until you get beneath a “magic” number (that’s unknown to technology). This process and take years.
  14. Your teams spend more time in meetings or delivering static documents, than delivering business value.
  15. You layer governance meetings with various disconnected management teams, all with competing interests. This essentially turns an approval process into a multi-week/month attrition circus. The participants of this process have to explain and re-explain the exact same business case in multiple formats and leave the process with zero added insights, but minus a chunk of morale.
  16. If you ever looked at any documents signed related to technology spend, there are likely to be dozens of signatures on the documents and nobody quite recalls why they signed the document and who owns the actual decision for the spend.
  17. You direct as much technology spend as possible through a project governance process. This gives the illusion of a highly variable tech spend, focused on strategic investments. Your aggregate technology spend is nothing short of intergalactic. Funding abruptly expires at the end of each project and the resources promptly vanish. Most of your products are immature/half finished, you have a high number of (expensive) contractors, you leak IP, you love NDAs, nobody knows how to maintain or improve the products in production (as the contractors that delivered it have all left) and if you ever tried to reduced the SI spend, the “world will end”. 
  18. You can’t quite understand why everything in technology costs 10x what it would if you gave it to a start up.
  19. Technology is a grudge purchase, and internal technology spend is an absolute last resort. Most of the people in your technology teams have never written a line of code. You urge your CIO to sign outsourcing documents with a “low cost location”. You believe this will give you a 20px reduction in costs over 5 yrs. All the previous attempts to do this have failed. Next time will be different as this time you will capture the correct SLAs to fix technology. You use expressions like “we are a bank, not a technology company”. 
  20. You believe all the value stream is created in the selling of the digital artefact; not the creation of the artefact itself. Your solution to poorly designed artefacts that are hard to sell is to hire more sales staff.
  21. You fear open source and prefer to use multimillion dollar vendors for basics like middleware (please see Spring), business process management and databases. You build everything in your own data centre as you believe the “it’s a regulatory requirement”. It takes you several months to deliver hardware, or a engage with your vendors.
  22. You actively promote Monoculture. You believe that “do it once, everywhere, for everything, forever” is a meaningful sentence. You install single instances of monolithical services that everyone is force to use. These services are not automated and so generate hugely painful backlogs, both for maintenance and adding new services. You can never upgrade these monoliths without asking everyone to “stop what your doing and retest everything”. Teams spend a chunk of their time trying to engineer ways of “hacking” these monoliths in order to be able to deliver value to customers.
  23. You believe that most things in software are “utilities” and therefore belong in the center. You never look to prove that you get any benefits from this centralisation like lower costs, better quality or faster delivery times. The consumer of this service and the service provider will now both hire managers to work the functional boundary.
  24. In response to chronic stability issues you create various heavily populated central functions to solve things like “Capacity Management”, “Service Management”, “Change Management”, “Stability Management”. You can prepend the words: “We are really bad at ” to any of the above. Every time there is an outage the various management functions lead a charge into the application teams to report on what went wrong. The application teams spend more time speaking to the various functions than solving the actual issue. Getting a route cause is lengthy, as none of the various teams cannot agree on one version of the truth and there is infighting as each of the various areas looks to abstract themselves of any responsibility. You have effectively created a “functionally gridlock”.
  25. The business tend to export poor unilateral technology decisions to technology, and technology exports poor unilateral business decisions back. Technology will add arbitrarily points of friction/governance layers, change freezes, make material spend decisions and freeze hiring on critical products. Business will in turn sign low grade technology deals and hand them over the fence to technology to “make good” the decision.
  26. Your technology leaders will slavishly stick to their multiyear strategies; irrespective of how painful these choices may prove to be. If any innovation or alternatives are discovered during the sluggish execution of these strategies, they will resist any attempts to alter direction and try something new. 
  27. Bonus is assigned in a single amorphous blob at the top of the function. Both performing and non performing technology teams are equally “punished” with a low grade aggregate pot. Business, customer feedback and product quality have close to zero influence on either the pot size or its distribution. There are obvious advantages to hiring lower performing candidates and just a few rockstars/key men (as you would not be able to cover fair compensation for teams of high performers). It is highly likely that your organisation institutes some kind of bell curve on PD grades which further enforces that you can only have a rationed percentage of “A players”. This serves to foster the “key man” culture, ignoring the benefit of creating highly performing teams and creates a built in mechanism for “bonus casualties”.
  28. In an attempt to create efficiencies within an inefficient structure, you will tend to triplicate (find superficially similar products and artificially force them together at great expense, reduced output and increased cost/complexity). See The Triplication Paradigm
  29. Alignment costs cripple your organisation. You often find yourself aligning, re-aligning and then aligning again. Minutes are tacitly approved, then disputed, memories of what was agreed between the various functional areas are diverse. You periodically get back in a room with the same people, to discuss the same subjects and cover the same ground. Eventually you will tend to yield – resigning yourself to doing nothing (or find a “work around”).
  30. Your organisation believe rules will protect quality and consistency (the scaled efficiency model). Your process-focused culture will drive out the high-performing employees. When the market shifts quickly due to new technology, competitors, or business models, you can’t keep up and lose customers to competitors who adapt. You are slow-moving, rule-oriented and over time will grind “painfully into irrelevance”.

Model B:

  1. You hire technologists to make technology decisions. They are both collaborative and inclusive, but are happy to weigh up the facts and make decisions. These leaders constantly learn and love to share and tech.
  2. Business and technology is a pure partnership, with product development often starting with a sentence similar to, “I have x USD – work with me to get an MVP out in 3 months.”
  3. You realise the centre is not the place for client centric delivery. Technology is seen as an intimate collaborative process. 
  4. You believe in self organising and self navigation. No layering, no aggregation, no grand standing.
  5. Your central functions are thin, cheap and deliver scaled learning to the business facing teams. Other teams want to talk to them to share success, they are non political, encourage collaboration and do not sell gates and governance. The functions serve using the “zero wait” principle (i.e. they run with you to help, or let you run ahead). They NEVER block. 
  6. You have no manual metrics like SLAs, OLAs, KPIs etc. You just talk to the customer and fix what’s broken. All your products have comprehensive dashboards showing machine generated metrics, that would help track issues.
  7. You have no idea what you will be doing in 3 – 5 years.
  8. You have zero people tracking/reporting delivery. If you need to look in the rear view mirror, get it yourself from Jira.
  9. You are constantly flawed with the innovative solutions your teams have come up with. You could actually sell your products to other companies!
  10. There are no clear lines of distinction between technology and the business.
  11. The products delivered are designed iteratively with the customer at the centre, they have an “inspired” experience.
  12. You don’t believe in any governance, except macro finance.
  13. You neither read nor write requirements documents. Product documentation is developed after the fact and where possible is automated (eg Swagger).
  14. You teams reject most meetings, have a 15 min daily standup, use instance messaging groups, phone calls etc etc.
  15. Approval always sits at the lowest federated level, ideally with a single person or product owner.
  16. If you ever looked at any documents signed relating to technology spend, there would be single signatory. This person would be the actual decision maker, not an arbitrary person high up in the organisational hierarchy.
  17. You have zero projects. You work with finance at a macro level, not micro. Everything is a product and your variable tech spend is low. You have a high number of permanent staff and encourage internal mobility. Changing existing products is easy. You can’t remember when you last signed an NDA.
  18. You are able to both compete with and partner with start-ups. Your teams review start-up investments and find often find them “lacking”.
  19. Big outsourcing deals are never considered. You either buy and build (e.g. frameworks, cloud platforms) or build (with focus on open source). You actively recruit world class developers. You encourage a policy of “eat what you kill” to decommission expensive vendor solutions that are choking investment in the rest of the estate.
  20. You believe value is created equally in both selling and creation of digital artefacts; and you reward symmetrically.
  21. All new products are developed in the cloud, with an open source first approach. Your teams even share back to the wider open source community, e.g. https://twitter.com/donovancmuller. Your developers are proactively trained in security, you anonymising data, carry out automatic penetration testing as a part of your CI and hardware is just a debate about the efficient consumption of compute (e.g. serverless compute https://aws.amazon.com/lambda/details/).
  22. You actively encourage scaled learning, width of thought and interop – not points of singularity.
  23. You have have no system issues and therefore no central functions to track, the teams are empowered to solve the customer experience end to end. 
  24. All decisions (technology or business) are co-authored and validated as a single product team.
  25. Pay is dealt holistically, can often include revenue share and you are incentivised to hire the best and create the best products. There are no bell curves, good feedback is not rationed.
  26. Your technology leaders will not publish strategy documents, but instead focus on scaled learning, constant iteration, experimentation and prototypes. They will tend to evolve their strategies as they go and will actively look to absorb new ideas and constantly course correct.
  27. You pay zero for alignment; you have self organising teams that own their customers and stacks end to end.
  28. You believe in accountability, empowerment and Learning at Scale. You have principles, but no rules or heavy processes to “protect” quality/output.

I appreciate this article might be provocative, and that’s by design. The main point from the above is that the incumbent tech status quo in most corporates is not really challenged. Instead the tendency is to continuously draw and then redraw functional boundaries – in the process traumatising staff, relearning the exact same lessons and creating an internal talent vacuum. It’s interesting that the removal of these boundaries altogether is rarely questioned. In fact, in response to failed tech strategies most corporates will actually choose to scale these failed strategies – rather than question the validity of the approach. This is my view is a material mistake – technology is all about intimacy, and to get it right you need to embed it into your companies DNA.

To summarise, if you build an institution with “padded walls” you would expect a certain type of individual to be comfortable living there. Similarly, if you build a corporate that does not reflect it’s customer, if the roles of the leaders of this organisation are so generic it’s not clear what sits where and who does what, if layering your successful teams is culturally acceptable – then the staff working in this organisation will typically be staff that don’t care about details like the customer and being successful. This, I would argue, is not a long term game plan.

Example IAM Policy to Enforce EBS encryption

Here is a useful IAM conditional policy which will force EBS volumes to be encrypted when created by an EC2 instances.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Sid": "Stmt2222222222222",
       "Effect": "Allow",
       "Action": [
         "ec2:CreateVolume"
       ],
       "Condition": {
         "Bool": {
           "ec2:Encrypted": "true"
         }
       },
       "Resource": [
         "*"
       ]
     },
     {
       "Sid": "Stmt1111111111111",
       "Effect": "Allow",
       "Action": [
         "ec2:DescribeVolumes",
         "ec2:DescribeAvailabilityZones",
         "ec2:CreateTags",
         "kms:ListAliases"
       ],
       "Resource": [
         "*"
       ]
     },
     {
       "Sid": "allowKmsKey",
       "Effect": "Allow",
       "Action": [
         "kms:Encrypt"
       ],
       "Resource": [
         "arn:aws:kms:us-east-1:999999999999:alias/aws/ebs"
       ]
     }
   ]
 }

The DAO Ethereum Recursion Bug: El Gordo!

If you found my article, I would consider it a reasonable assumption that you already understand the importance of this

Brief Introduction

The splitDAO function was created in order for some members of the DAO to separate themselves and their tokens from the main DAO, creating a new ‘child DAO’, for example in case they found themselves in disagreement with the majority.

The child DAO goes through the same 27 day creation period as the original DAO. Pre-requisite steps in order to call a splitDAO function are the creation of a new proposal on the original DAO and designation of a curator for the child DAO.

The child DAO created by the attacker has been referred as ‘darkDAO’ on reddit and the name seems to have stuck. The proposal and split process necessary for the attack was initiated at least 7 days prior to the incident.

The exploit one alone would have been economically unviable (attacker would have needed to put up 1/20th of the stolen amount upfront in the original DAO) and the second one alone would have been time intensive because normally only one splitDAO call could be done per transaction.

One way to see this is that the attacker performed a fraud, or a theft. Another way, more interesting for its implications, is that the attacker took the contract literally and followed the rule of code.

In their (allegedly) own words:

@stevecalifornia on Hacker News – https://news.ycombinator.com/item?id=11926150

“DAO, I closely read your contract and agreed to execute the clause where I can withdraw eth repeatedly and only be charged for my initial withdraw.

Thank you for the $70 million. Let me know if you draw up any other contracts I can participate in.

Regards, 0x304a554a310c7e546dfe434669c62820b7d83490″

The HACK

An “attacker” managed to combine two “exploits” in the DAO.

1) The attacker called the splitDAO function recursively (up to 20 times).

2) To make the attack more efficient, the attacker also managed to replicate the incident from the same two addresses (using the same tokens over and over again (approx 250 times).

Quote from “Luigi Renna”: To put this instance in a natural language perspective the Attacker requested the DAO “I want to withdraw all my tokens, and before that I want to withdraw all my tokens, and before that I want to… etc.” And be charged only once.

The Code That was Hacked

Below is the now infamous SplitDAO function in all its glory:

function splitDAO(
uint _proposalID,
address _newCurator
) noEther onlyTokenholders returns (bool _success) {
...
// Get the ether to be moved. Notice that this is done first!
uint fundsToBeMoved =
(balances[msg.sender] * p.splitData[0].splitBalance) /
p.splitData[0].totalSupply;
if (p.splitData[0].newDAO.createTokenProxy.value(fundsToBeMoved)(msg.sender) == false) // << This is
the line the attacker wants to run more than once
throw;
...
// Burn DAO Tokens
Transfer(msg.sender, 0, balances[msg.sender]);
withdrawRewardFor(msg.sender); // be nice, and get his rewards
// XXXXX Notice the preceding line is critically before the next few
•http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/ Question: So what about Blockchain/DLs?
totalSupply -= balances[msg.sender]; // THIS IS DONE LAST
balances[msg.sender] = 0; // THIS IS DONE LAST TOO
paidOut[msg.sender] = 0;
return true;
}

The basic idea behind the hack was:

1) Propose a split.
2) Execute the split.
3) When the DAO goes to withdraw your reward, call the function to execute a split before that withdrawal
finishes (ie recursion).
4) The function will start running again without updating your balance!!!
5) The line we marked above as “This is the line the attacker wants to run more than once“ will run more
than once!

Thoughts on the Hack

The code is easy to fix, you can just simply zero the balances immediately after calculating the fundsToBeMoved. But this bug is not the real issue for me – the main problem can be split into two areas:

  1. Ethereum is a Turing complete language with a stack/heap, exceptions and recursion. This means that even with the best intentions, we will create a vast number of routes through the code base to allow similar recursion and other bugs to be exposed. The only barrier really is how much ether it will cost to expose the bugs.
  2. There is no Escape Hatch for “bad” contracts. Emin Gun Sirer wrote a great article on this here: Escape hatches for smart contracts

Whilst a lot of focus is going in 2) – I believe that more focus needs to be put into making the language “safer”. For me, this would involve basic changes like:

  1. Blocking recursion. Blocking recursion at compile and runtime would be a big step forward. I struggle to see a downside to this, as in general recursion doesn’t scale, is slow and can always be replaced with iteration. Replace Recursion with Iteration
  2. Sandbox parameterisation. Currently there are no sandbox parameters to sit the contracts in. This means contracts the economic impact of these contracts are more empirical than deterministic. If you could abstract core parameters of the contract like the amount, wallet addresses etc etc and place these key values in an immutable wrapper than unwanted outcomes would be harder to achieve.
  3. Transactionality. Currently there in no obvious mechanism to peform economic functions wraped in an ATOMIC transaction. This ability would mean that economic values could be copied from the heap to the stack and moved as desired; but critically recursive calls could no revist the heap to essentially “duplicate” the value. It would also mean that if a benign fault occured the execution of the contract would be idempotent. Obviously blocking recursive calls goes some way towards this; but adding transactionality would be a material improvement.

I have other ideas including segregation of duties using API layers between core and contract code – but am still getting my head around Ethereum’s architecture.