How to Optimise your Technology Teams Structure to improve flow

I have seen many organisations restructure their technology teams over and over, but whichever model they opt for – they never seem to be able to get the desired results with respect to speed, resilience and quality. For this reason organisations will tend to oscillate from centralised teams, which are organised around skills and reuse, to federated teams that are organised around products and time to market. This article exams the failure modes for both centralised and federated teams and then questions whether there are better alternatives?

Day 1: Centralisation

Centralised technology teams tend to create frustrated product teams, long backlogs and lots of prioritisation escalations. Quality is normally fairly consistent (it can be consistently bad or consistently good), but speed is generally considered problematic.

These central teams will often institute some kind of ticketing system to coordinate their activities. They will even use tickets to coordinate activities between their own, narrowly focused central teams. These teams will create reports that demonstrate “millions” of tickets have been closed each month. This will be sold as some form of success / progress. Dependent product teams, on the other hand, will still struggle to get their work loads into production and frequently escalate the bottlenecks created using a centralised approach.

Central teams tend to focus on reusability, creating large consolidated central services and cost compression. Their architecture will tend to create massive “risk concentrators”, whereby they reuse the same infrastructure for the entire organisation. Any upgrades to these central services tend to be a life threatening event, making things like minor version changes and even patching extremely challenging. These central services will have poorly designed logical boundaries. This means that “bad” consumers of these shared services, can create saturation outages which affect the entire enterprise. These teams will be comfortable with mainframes, large physical datacenters and have a poor culture of learning. The technology stack will be at least 20 years old and you will often hear the term, “tried and tested”. They will view change as a bad thing and will create reports showing that change is causing outages. They will periodically suggest a slow down, or freezes to combat the great evil of “change”. There will be no attempt made to get better at delivering change and everything will be described as a “journey”. It will take years to get anything done in this world and the technology stack will be a legacy, expensive, immutable blob of tightly coupled vendor products.

Day 2: Lets Federate!

Eventually delivery pressure builds and the organisation capitulates into the chaotic world of federation. This is quickly followed by an explosion in headcount, as each product team attempts to utopian state of “end to end autonomy”. End to end autonomy, is the equivalent of absolute freedom – it simple does not and cannot exist. Why can’t you have an absolute state of full autonomy? It turns out that unless you’re a one product startup, you will have to share certain services and channels with other products. This means that any single products “autonomy” expressed in any shared channel/framework, ends up becoming another products “constraint”.

A great example of this is a client facing channel, like an app or a web site. Imagine if you carved up a channel into little product size pieces. Imagine how hard it would be to support your clients, where do you route support queries? Even something basic, like trying to keep the channel available would be difficult, as there is no single team stopping other teams from injecting failure planes and vendor SDKs into critical shared areas. In this world, each product does what it wants, when it wants, how it wants and the channel ends up yielding an inconsistent, frustrating and unstable customer experience. No product team will ever get around to dealing with complex issues, like fraud, cyber security or even basics like observability. Instead they will naturally chase PnL. In the end, you will have to resort to using social media to field complaints and resolve issues. Game theory depicts this as something called the “Tragedy of the Commons” – it’s these common assets that die in the federated world.

In the federated world, the lack of scope for staff results in ballooning your headcount and aggregating roles across multiple disciplines for the staff you have managed to hire. Highly skilled staff tend to get very bored with their “golden cages” and search out more challenging roles at other companies.

You will see lots of key man risk in this world. Because product teams can never fully fund their end to end autonomy – you will commonly see individuals that looks after networking, DBA and storage and cyber. When this person eventually resigns, the risks from their undocumented “tactile fixes” quickly start to materialise as the flow of outages starts to takes a grip. You will struggle to hire highly skilled resources into this model, as the scope of the roles you advertise are restrictively narrow, eg a DBA to look after a single database, a senior UX person to look after 3 screens. Obviously, if you manage to hire a senior UX person, you can then show them the 2 databases you also want them to manage 😂

If not this, then what?

Is the issue that we didn’t try hard enough? Maybe we should have given the model more time to bear fruit? Maybe we didn’t get the teams buy in? So what I am saying? I am saying BOTH federated and centralised models will eventually fail, because they are extremes. These are not the only choices on the table, there are a vast number of states in between, depending on your architecture, the size of your organization and the pools of skills you have.

Before you start tinkering with your organisations structure it’s key that you agree on what is the purpose of your organisation structure? Specifically – what are you trying to do and how are you trying to do it? Centralists will argue that economies of scale, better quality are key. But the federation proponents will point to time to market and speed. So how do you design your organisation?

There are two main parameters that you should try to optimise for:

  1. Domain Optimisation: Design your structure around people and skills (from the centralised model). Give your staff enough domain / scope to be able to solve complex problems and add value across multiple products in your enterprise. The benefit of teams with wide domains, is that you can put your best resources on your biggest problems. But watch out, because as the domain of each team increases, so will the dependencies on this team.
  2. Dependency Optimisation: Design your structure around flow/output by removing dependencies and enabling self service (from the federated model). Put simply, try to pay down dependencies by changing your architecture to enable self service such that product teams can execute quickly, whilst benefiting from high quality, reusable building blocks.

These two parameters are antagonistic, with your underlying architecture being the lever to change the gradient of the yield.

Domain Optimisation

Your company cannot be successful, if you narrow the scope of your roles down to a single product. Senior, skilled engineers need domain and companies need to make sure that complicated problems flow to those who can best solve these problems. Having myopically scoped roles, not only balloons your headcount, it also means that your best staff might be sat on your easiest problems. More than this, how do you practically hire the various disciplines you need, if you’re scope is restricted to a single product that might only occasionally use a fraction of your skills.

You need to give staff from a skill / discipline enough scope to make sure they are stretched and that you’re getting value from them. If this domain creates a bottleneck, then you should look to fracture these pools of skills by creating multiple layers. Keeping one team close to operational workloads (to reduce dependencies) and a second team looking a more strategic problems. For example, you can have a UX team look after a single channel, but also have a strategic UX team look after more long dated / complex UX challenges (like peer analysis, telemetry insights, redesigns etc).

Dependency Optimisation

As we already discussed, end to end autonomy is a bogus construct. But teams should absolutely look to shed as many dependencies as possible, so that they can yield flow without begging other teams to do their jobs. There are two ways of reducing dependency:

  1. Reduce the scope of a role and look at creating multiple pools of skills with different scopes.
  2. Change your technology architecture.

Typically only item 1) is considered, and this is the crux of this article. Performing periodic org structure rewrites simply gives you a new poison to swallow. This is great, maybe you like strawberry flavoured arsenic! But my question is, why not stop taking poison altogether?

If you look at the anecdotal graph below you can see the relationship between domain / scope and dependency. This graph shows you that as you reduce domain you reduce dependency. Put simply, the lower your dependancies them more “federated” your organisation and the more domain your staff have the more “centralised” your organisation is.

What you will also observe that poorly architectured systems exhibit a “dependency cliff”. What this means is that even as you reduce the scope of your roles – your will not see any dependency benefit. This is because your systems are so tightly coupled that any amount of org structure gymnastics will not give you lower dependencies. If you attempt to carve up any shared systems that are exhibiting a dependency cliff, you have to hire more staff, you will have more outages, less output and more escalations.

To resolve any dependency cliffs, you have a few choices:

  1. De-aggregate/re-architect workloads. If all your products sit in a single core banking platform, then do NOT buy a new core banking platform. Instead, rather rearchitect these products, to separate the shared services (eg ledger) from the product services (eg home loans). This is a complex topic and needs a detailed debate.
  2. Optimise workloads. Acknowledge that a channel or platform can be a product in its own right and ensure that most of the changes product teams want to make on a channel / platform can be automated. Create product specific pipelines, create product enclaves (eg PWA), allow the product teams the ability to store and update state on the channel without having to go through testing release cycles.
  3. Ensure any central services are opensourced. This will enable product teams to contribute changes and not feel “captive” to the cadence of central teams.
  4. Deliver all services with REST APIs to ensure all shared services can be self-service.

The Conclusion

There is no easy win when it comes to org structure, because its typically your architecture that drives all your issues. So shuffling people around from one line manager to another will achieve very little. If you want to be successful you will need to look at each service and product in detail and try and remove dependencies by making architectural changes such that product teams can self service wherever possible. When you remove architectural constraints, you will steepen the gradient of the line and give your staff broad domains, without adding dependencies and bottlenecks.

Am done with this article. I really thought I was going to be quick to write and I have run out of energy. I will probably do another push on this in a few weeks. Please DM with spelling mistakes or something that doesn’t make sense.

0
0

Technologists: Please Stop asking for requirements 😎

I think you’re a genius! You found this blog and your reading it – what more evidence do I need?! So why do you keep asking others to think for you?

There is a harmful bias built into most technology projects that assumes “the customer knows best” and this is simply a lie. The customer will know what works and what doesn’t when you give them a product; but thats not the same as being able to give specification/requirements. Sadly, somehow technologists have been relegated to order takers that are unable to make decisions or move forwards without detailed requirements. I disagree.

In general, everyone (including technologists) should fixate on understanding your customers, collaborating across all disciplines, testing ideas with customers, making decisions and executing. If you get it wrong, learn, get feedback, fix issues, then rinse and repeat. If you are going through a one way door or making a big call; then by all means validate. But don’t forget that your a genius and you work with other geniuses. So stop asking for requirements, switch your brain on and show off your unfiltered genius. You may even meet requirements that your customers haven’t even dreamt of! 

Many corporate technology teams are unable to operate without an analyst to gather, collate and serve up pages of requirements. This learnt helplessness is problematic. There are definitely times, especially on complex projects where analysts working together with technologists can create more focus and speed up product development. But there is also a balance to be found in that a technology teams should feel confident to ideate solutions themselves.

Finally, one of the biggest causes for large delays on technology workstreams is the lack of challenge around requirements. If your customer wants an edge case feature that’s extremely difficult to do; then you should consider delaying it or even not doing it. Try to find a way around complex requirements, develop other features or evolve the feature to something that is deliverable. Never get bogged down on a requirement that will sink your project. You should always have way more features than you can ever deliver, so if you deliver everything your customer wanted there is an argument to say this is wasteful and indulgent. You will also be constantly disappointed when your customer changes their minds!

0
0

Definition: Bonuscide

bonuscide

noun

Definition of bonuscide:

Bonuscide is a term used to describe incentive schemes that progressively poisons an organisation by ensuring the flow of discretionary pay is non does not serve the organisations goals. These schemes can be observed in two main ways, the loss of key staff or the reduction in client/customer base.

Bonuscide becomes more observable during a major crisis (for example covid 19). Companies that practise this will create self harm by amplifying the fiscal impact of the crisis on a specific population of staff that are key to the companies success. For example, legacy organisations will tend to target skills that the board or exco don’t understand and disproportionately target its technology teams, whilst protect their many layers of management.

The kinds of symptoms that will be visible are listed below:

  1. Rolling downside metrics: A metric will be used to reduce the discretionary pay pool, but this metric was never previously used to as an upside metric. If at some future stage the metric becomes favourable
  2. Pivot Upside Metrics: If the financial measure that was chosen in 1) improves in the future; a new/alternative unfavourable financial measure will be substituted.
  3. Status Quo: Discretionary pay will always favour the preservation of the management of status quo. Incentives will never flow to those involved in execution or change, because these companies are governed Pournelle’s Iron Law of Bureaucracy.
  4. Panic Pay: Companies that practice bonuside are periodically forced to carry out poorly thought through emergency incentives to their residual staff. This will create a negative selection process (whereby they lockin the tail performers after loosing their top talent).
  5. Trust Vacuum: Leaders involved in managing this pay process will feel compromised, as they know that the trusted relationship with their team will be indefinitely tainted.
  6. Business Case: The savings generated by the reduced discretionary compensation will be a small fraction of the additional costs and revenue impact that that the saving in compensation will have. This phenomenon is well covered in my previous post on Constraint Theory.

Put simply, if a business case was created for this exercise, it wouldn’t see the light of day. The end result of bonuscide is the creation of a corporate trust / talent vacuum that leads to significant long term harm and brand damage.

0
0

Part 2: Increasing your Cloud consumption (the sane way)

Introduction

This article follows on from the “Cloud Migrations Crusade” blog post…

A single tenancy datacenter is a fixed scale, fixed price service on a closed network. The costs of the resources in the datacenter are divided up and shared out to the enterprise constituents on a semi-random basis. If anyone uses less resources than the forecast this generates waste which is shared back to the enterprise. If there is more demand than forecasted, it will either generate service degradation, panic or an outage! This model is clearly fragile and doesn’t respond quickly to change; it is also wasteful as it requires a level of overprovisioning based on forecast consumption (otherwise you will experience delays in projects, service degradation or have reduced resilience).

Cloud, on the other hand is a multi-tenanted on demand software service which you pay for as you use. But surely having multiple tenants running on the same fixed capacity actually increases the risks, and just because its in the cloud it doesn’t mean that you can get away without over provisioning – so who sits with the over provisioned costs? The cloud providers have to build this into their rates. So cloud providers have to deal with a balance sheet of fixed capacity shared amongst customers running on demand infrastructure. They do this with very clever forecasting, very short provisioning cycles and asking their customers for forecasts and then offering discounts for pre-commits.

Anything that moves you back towards managing resources levels / forecasting will destroy a huge portion of the value of moving to the cloud in the first instance. For example, if you have ever been to a Re:Invent you will be flawed by the rate of innovation and also how easy it is to absorb these new innovative products. But wait – you just signed a 5yr cost commit and now you learn about Aurura’s new serverless database model. You realise that you can save millions of dollars; but you have to wait for your 5yr commits to expire before you adopt or maybe start mining bitcoin with all your excess commits! This is anti-innovation and anti-customer.

Whats even worse is that pre-commits are typically signed up front on day 1- this is total madness!!! At the point where you know nothing about your brave new world, you use the old costs as a proxy to predict the new costs so that you can squeeze a lousy 5px saving at the risk of 100px of the commit size! What you will start to learn is that your cloud success is NOT based on the commercial contract that you sign with your cloud provider; its actually based on the quality of the engineering talent that your organisation is able to attract. Cloud is a IP war – its not a legal/sourcing war. Allow yourself to learn, don’t box yourself in on day 1. When you sign the pre-commit you will notice your first year utilisation projections are actually tiny and therefore the savings are small. So whats the point of signing so early on when the risk is at a maximum and the gains are at a minimum? When you sign this deal you are essentially turning the cloud into a “financial data center” – you have destroyed the cloud before you even started!

A Lesson from the field – Solving Hadoop Compute Demand Spike:

We moved 7000 cores of burst compute to AWS to solve a capacity issue on premise. That’s expensive, so lets “fix the costs”! We can go a sign a RI (reserved instance), play with spot, buy savings plans or even beg / barter for some EDP relief. But instead we plugged the service usuage into Quicksight and analysed the queries. We found one query was using 60 percent of the entire banks compute! Nobody confessed to owning the query, so we just disabled it (if you need a reason for your change management; describe the change as “disabling a financial DDOS”). We quickly found the service owner and explained that running a table scan across billions of rows to return a report with just last months data is not a good idea. We also explained that if they don’t fix this we will start billing them in 6 weeks time (a few million dollars). The team deployed a fix and now we run the banks big data stack at half the costs – just by tuning one query!!!

So the point of the above is that there is no substitute for engineering excellence. You have to understand and engineer the cloud to win, you cannot contract yourself into the cloud. The more contracts you sign the more failures you will experience. This leads me to point 2…

Step 2: Training, Training, Training

Start the biggest training campaign you possibly can – make this your crusade. Train everyone; business, finance, security, infrastructure – you name it, you train it. Don’t limit what anyone can train on, training is cheap – feast as much as you can. Look at Udemy, ACloudGuru, Youtube, WhizLabs etc etc etc. If you get this wrong then you will find your organisation fills up with expensive consultants and bespoke migration products that you don’t need ++ can easily do yourself, via opensource or with your cloud provider toolsets. In fact I would go one step further – if your not prepared to learn about the cloud, your not ready to go there.

Step 3: The OS Build

When you do start your cloud migration and begin to review your base OS images – go right back to the very beginning, remove every single product in all of these base builds. Look at what you can get out the box from your cloud provider and really push yourself hard on what do I really need vs nice to have. But the trick is that to get the real benefit from a cloud migration, you have to start by making your builds as “naked” as possible. Nothing should move into the base build without a good reason. Ownership and report lines are not a good enough reason for someones special “tool” to make it into the build. This process, if done correctly, should deliver you between 20-40px of your cloud migration savings. Do this badly and your costs, complexity and support will all head in the wrong direction.

Security HAS to be a first class citizen of your new world. In most organizations this will likely make for some awkward cultural collisions (control and ownership vs agility) and some difficult dialogs. The cloud, by definition, should be liberating – so how do you secure it without creating a “cloud bunker” that nobody can actually use? More on this later… 🙂

Step 4: Hybrid Networking

For any organisation with data centers – make no mistake, if you get this wrong its over before it starts.

0
0

The Least Privileged Lie

In technology, there is a tendency to solve a problem badly by using gross simplification, then come up with a catchy one liner and then broadcast this as doctrine or a principle. Nothing ticks more boxes in this regard, than the principle of least privileges. The ensuing enterprise scale deadlocks created by a crippling implementation of least privileges, is almost certainly lost on its evangelists. This blog will try to put an end to the slavish efforts of many security teams that are trying to ration out micro permissions and hope the digital revolution can fit into some break glass approval process.

What is this “Least Privileged” thing? Why does it exist? What are the alternatives? Wikipedia gives you a good overview of this here. The first line contains an obvious and glaring issue: “The principle means giving a user account or process only those privileges which are essential to perform its intended function”. Here the principle is being applied equally to users and processes/code. The principle also states only give privileges that are essential. What this principle is trying to say, is that we should treat human beings and code as the same thing and that we should only give humans “essential” permissions. Firstly, who on earth figures out what that bar for essential is and how do they ascertain what is and what is not essential? Do you really need to use storage? Do you really need an API? If I give you an API, do you need Puts and Gets?

Human beings are NOT deterministic. If I have a team of humans that can operate under the principle of least privileges then I don’t need them in the first place. I can simply replace them with some AI/RPA. Imagine the brutal pain of a break glass activity every time someone needed to do something “unexpected”. “Hi boss, I need to use the bathroom on the 1st floor – can you approve this? <Gulp> Boss you took too long… I no longer need your approval!”. Applying least privileges to code would seem to make some sense; BUT only if you never updated the code and if did update the code you need to make sure you have 100px test coverage.

So why did some bright spark want to duck tape the world to such a brittle pain yielding principle? At the heart of this are three issues. Identity, Immutability, and Trust. If there are other ways to solve these issues then we don’t need to pain and risks of trying to implement something that will never actually work, creates friction and critically creates a false sense of security. Least Privileges will never save anyone, you will just be told that if you could have performed this security miracle then you would have been fine. But you cannot and so you are not.

Whats interesting to me is that the least privileged lie is so widely ignored. For example, just think about how we implement user access. If we truly believed in least privileges then every user would have a unique set of privileges assigned to them. Instead, because we acknowledge this is burdensome we approximate the privileges that a user will need using policies which we attach to groups. The moment we add a user to one of these groups, we are approximating their required privileges and start to become overly permissive.

Lets be clear with each other, anyone trying to implement least privileges is living a lie. The extent of the lie normally only becomes clear after the event. So this blog post is designed to re-point energy towards sustainable alternatives that work, and additionally remove the need for the myriad of micro permissive handbrakes (that routinely get switched off to debug outages and issues).

Who are you?

This is the biggest issue and still remains the largest risk in technology today. If I don’t know who you are then I really really want to limit what you can do. Experiencing a root/super user account take over, is a doomsday scenario for any organisation. So lets limit the blast zone of these accounts right?

This applies equally to code and humans. For code this problem has been solved a long time ago, and if you look

Is this really my code?

0
0

The Triplication Paradigm

Biggest Wastes of Money (Part 5): Gadgets, Dining Out, Luxury Hotels, Gyms
Wasting money can often happen when you think your being clever…

Introduction

In most large corporates technology will typically report into either finance or operations. This means that it will tend to be subject to cultural inheritance, which is not always a good thing. One example of where the cultural default should be challenged is when managing IP duplication. In finance or operations duplication rarely yields any benefits and will often result in unnecessary costs and/or inconsistent customer experiences. Because of this, technology teams will tend to be asked to centrally analyse all incoming workstreams for convergence opportunities. If any seemingly overlapping effort is discovered, this would then typically be extracted into a central, “do it once” team. Experienced technologists will likely remark that it generally turns out that the analysis process is very slow, overlaps are small, the cost of extracting them are high, additional complexity is introduced, backlogs become unmanageable, testing the consolidated “swiss army knife” product is problematic and critically, the teams are typically reduced to crawling speed as they try to transport context and requirements to the central delivery team. I have called the above process “Triplication”, simply because is creates more waste and costs more than duplication ever could (and also because my finance colleagues seem to connect with this term).

The article below attempts to explain why we fear duplication and why slavishly trying to remove all duplication is a mistake. Having said this, a purely federated model or abundant resource model with no collaboration leads to similarly chronic issues (I will write an article about “Federated Strangulation” shortly).

The Three Big Corporate Fears

  1. The fear of doing something badly.
  2. The fear of doing something twice (duplication).
  3. The fear of doing nothing at all.

In my experience, most corporates focus on fear 1) and 2). They will typically focus on layers of governance, contractual bindings, interlocks and magic metric tracking (SLA, OLA, KPI etc etc). The governance is typically multi-layered, with each forum meeting infrequently and ingesting the data in a unique format (no sense in not duplicating the governance overhead, right?!). As a result these large corporates typically achieve fear 3) – they will do nothing at all.

Most start-ups/tech companies worry almost exclusively about 3) – as a result they achieve a bit of 1) and 2). Control is highly federated, decision trees are short, and teams are self empowered and self organising. Dead ends are found quickly, bad ideas are cancelled or remediated as the work progresses. Given my rather bias narrative about – it won’t be a surprise to learn that I believe 3) is the greatest of all evils. To allow yourself to be overtaken is the greatest of all evils, to watch a race that you should be running is the most extreme form a failure.

For me, managed duplication can be a positive thing. But the key is that you have to manage it properly. You will often see divergence and consolidation in equal measure as the various work streams mature. The key to managing duplication is to enforce scarcity of resources and collaboration. Additionally, you may find that a decentralised team could become conflicted when it is asked to manage multiple business units interests. This is actually success! This means this team has created something that has been virally absorbed by other parts of the business – it means you have created something thats actually good! When this happens look at your contribution options, and sometimes it may make sense to split the product team up into a several business facing teams and a core platform engineering team. If however, there is no collaboration and an abundance of resources are thrown at all problems, you end up with material and avoidable waste. Additionally, observe exactly what your duplicating – never duplicate a commodity and never federate data. You also need to avoid a snowflake culture and make sure that were it makes sense you are trying to share.

Triplication happens when a two or more products are misunderstood to be “similar” and then attempted to be fused together. The over aggregation of your product development streams will yield most of the below:

1) Cripplingly slow and expensive to develop.

2) Risk concentration/instability. Every release will cause trauma to multiple customer bases.

3) Unsupportable. It will take you days to work out what went wrong and how on earth you can fix the issue as you will suffer from Quantum Entanglement.

4) Untestable. The complexity of the product will guarantee each release causes distress.

5) Low grade client experience.

Initially these problems will be described as “teething problems”. After a while it becomes clearer that the problem is not fixing itself. Next you will likely start the “stability” projects. A year or so later after the next pile of cash is burnt there will be a realisation that this is as good as it gets. At this point, senior managers start to see the writing on the wall and will quickly distance themselves from the product. Luckily for them, nobody will likely remember exactly whom in the many approval forums thought this was a good idea in the first place. Next the product starts to get linked to the term “legacy”. The final chapter for this violation of common sense, is the multi-year decommissioning process. BUT – its highly likely that the strategic replacement contains the exact same flaws as the legacy product…

The Conclusion

To conclude, I created the term “Triplication” as I needed a way to succinctly explain that things can get worse when you lump them together without a good understanding of why your doing this.  I needed a way to help challenge statements like, “you have to be able to extract efficiencies if you just lump all your teams together”. This thinking is equivalent to saying; “hey I have a great idea…! We ALL like music, right?? So lets save money – lets go buy a single CD for all of us!”

The reality for those that have played out the triplication scenario in real life, is that you will see costs balloon, progress grind to a halt, revenues fall of a cliff and the final step in the debacle is usually a loss of trust – followed by the inevitable outsourcing pill. On the other hand collaboration, scarcity, lean, quick MVPs, shared learning, cloud, open source, common rails and internal mobility are the friends of fast deliverables, customer satisfaction and yes – low costs!

0
0

Part 1: The Great Public Cloud Crusade…

“Not all cloud transformations are created equally…!”

The cloud is hot…. not just a little hot, but smokin hot!! Covid is messing with the economy, customers are battling financially, the macro economic outlook is problematic, vendor costs are high and climbing and security needs more investment every year. What on earth do we do??!! I know…. lets start a crusade – lets go to the cloud!!!!

Cloud used to be just for the cool kids, the start ups, the hipsters… but not anymore, now corporates are coming and they are coming in their droves. The cloud transformation conversation is playing out globally for almost all sectors, from health care, to pharmaceuticals and finance. The hype and urban legends around public cloud are a creating a lot of FOMO.

For finance teams under severe cost pressures, the cloud has to be an obvious place to seek out some much need pain relief. CIOs are giving glorious on stage testimonials, decrying victory after having gone live with their first “bot in the cloud”. So what is there to blog about, it’s all wonderful right…? Maybe not…

The Backdrop…

Imagine your a CIO or CTO, you haven’t cut code for a while or maybe you have a finance background. Anyway your architecture skills are a bit rusty/vacant, you have been outsourcing technology work for years, you are awash with vendor products, all the integration points are “custom” (aka arc welded) and and hence your stack is very fragile. In fact its so fragile you can trigger outages when someone closes your datacentre door a little too hard! Your technology teams all have low/zero cloud knowledge and now you have been asked to transform your organisation by shipping it off to the cloud… So what do you do???

Lots of organisations believe this challenge is simply a case of finding the cheapest cloud provider, write a legal document, some SLAs, find a vendor who can whiz your servers into the cloud – then you simply cut a cheque. But the truth is the cloud requires IP and if you don’t have IP (aka engineers) then you have a problem…

Plan A: Project Borg

This is an easy, problem – right? Just ask the AWS borg to assimilate you!!! The “Borg” strategy can be achieved by:

  1. Install some software agents in your data centers to come up with a total thumb suck on how much you think you will spend in the cloud. Note: your lack of any real understanding of how the cloud works should not ring any warning bells.
  2. Factor down this thumb suck using another made up / arbitrary “risk factor”.
  3. Next, sign an intergalactic cloud commit with your cloud provider of choice and try to squeeze more than a 10px discount out for taking this enormous risk.
  4. Finally pick up the phone to one of the big 5 consultants and get them to “assimilate” you in the cloud (using some tool to perform a bitwise copy of your servers into the cloud).

Before you know it your peppering your board and excos with those ghastly cloud packs, you are sending out group wide emails with pictures of clouds on them, you are telling your teams to become “cloud ready”. What’s worse your burning serious money as the consultancy team you called in did the usual land and expand. But you cant seem to get a sense of any meaningful progress (and no, a BOT in the cloud doesn’t count as progress).

To fund this new cloud expense line you have to start strangling your existing production spending, maybe you are running your servers for an extra year or two, strangling the network spend, keep these storage arrays for just a little while longer. But don’t worry, before you know it you will be in the cloud – right??

The Problem Statement

The problem is that public cloud was never about physically your iffy datacentre software with someone else; it’s was supposed to be about transformation of this software. The legacy software in your datacentre is almost certainly poisonous and in interdependencies will be as lethal as they are opaque. If you move it, pain will follow and you wont see any real commercial benefits for years.

Put another way, your datacentre is the technical equivalent of a swap. Luckily those lovely cloud people have built you a nice clean swimming pool. BUT don’t go and pump your swamp into this new swimming pool!

Crusades have never give us rational outcomes, you forgot to imagine where the customer was in this painful sideways move, what exactly did you want from this? In fact cloud crusades suffer from a list of oversights, weaknesses and risks:

  1. Actual digital “transformation” will take years to realise (if ever). All you did was just changed your hosting and how you pay for technology – nothing else actually changed.
  2. Your customer value proposition will be totally unchanged, sadly you are still as digital as a fax machine!
  3. Key infrastructure teams will start realising their is no future for them and start wandering. Creating even more instability.
  4. Stability will be problematic as your hybrid network has created a BGP birds nest.
  5. Your company signed a 5 year cloud commit. You took your current tech spend, halved it and then asked your cloud provider to give you discounts on this projected spend. You will likely see around a 10px-15px reduction in your EDP (enterprise discount program) rates, and for this you are taking ENORMOUS downside risks. You’re also accidentally discouraging efficient utilisation of resources. in favour of a culture of “ram it in the cloud and review it once our EDP period expires”.
  6. Your balance sheet will balloon, such that you will end up with a cost base of not dissimilar to NASA, you will need a PhD to diagnose issues and your delivery cadence will be close to zero. Additionally, you will need to create an impairment factory to deal with all your stranded assets.

So what does this approach actually achieve? You will of added a ton of intangible assets by balance sheeting a bunch of profees, you will likely be less stable and even be less secure (more on this later), you know that this is an unsustainable project and that it is the equivalent of an organisational heart transplant. The only people that now understand your organisation, are a team of well paid consultants on a 5x salary multiple and sadly you cannot stop this process – you have to keep paying and praying. Put simply, cloud mass migration (aka assimilation) is a bad idea – so don’t do it!

The key here is that your tech teams have to transform themselves. Nothing can act on them, the transformation has to come from within. When you review organisations that have been around for a while, they may have had a few mergers, have high vendor dependencies and low technology skills; you will tend to end up with the combined/systemic complexity suffering from something similar to Quantum Entanglement. Next we ask an external agency with a suite of tools to unpack this unobservable, irreducible complexity with a few tools, then we get expensive external forces to reverse engineer these entangled systems and recreate them somewhere else. This is not reasonable or rationale – its daft and we should stop doing this.

If not this, then what?

The “then what” bit is even longer that the “not this” bit. So I am posting this as it and if I get 100 hits I will write up the other way – little by little 🙂

Click here to read the work in progress link on another approach to scaling cloud usage…

0
0

Running Corporate Technology: Smart vs Traditional

There are two fundamental ways to run technology inside your company (and various states in-between)

There are two fundamentally different ways technology is run inside large organisations.
Engineers feel the difference immediately usually within their first week.

One model treats engineers as risk to be controlled.
The other treats engineers as the system designers they actually are.

Everything else morale, velocity, quality, resilience is a downstream effect.

Model A — Technology Run By Generic Management

This is the dominant corporate model. It survives not because it works, but because it looks controllable.
1. Decisions are made by non engineers.
Architecture is dictated by people who don’t ship code, debug prod, or carry pagers.
2. Design authority is replaced by committees.
If enough people sign off, the design must be safe — even if it’s incoherent.
3. Requirements are frozen early.
Learning is treated as failure. Discovery is treated as scope creep.
4. Central teams own everything and nothing.
Platforms, tooling, security, release management — all abstracted away from delivery and accountability.
5. Engineers are throughput units.
Work arrives as tickets. Context is stripped out. Ownership is temporary.
6. Documentation is valued over correctness.
A wrong design with a diagram is preferable to a correct one discovered late.
7. Metrics reward appearances, not outcomes.
Green dashboards coexist peacefully with outages, rollbacks and customer pain.
8. Failure is investigated, not absorbed.
Post-mortems focus on who missed a process, not what the system made inevitable.
9. Scaling means adding layers.
When things slow down, coordination is added instead of removing constraints.
10. Release is an event.
Change is risky, infrequent, and stressful — therefore avoided.
11. Tooling lags years behind reality.
Engineers compensate with scripts, side systems and quiet heroics.
12. Abstraction is accidental.
Complexity leaks everywhere, but no one owns simplifying it.
13. Engineers stop caring.
Not because they’re lazy — because the system punishes initiative.
14. Talent retention becomes a mystery.
Leadership blames “the market” instead of the environment they created.
15. Technology becomes brittle.


The system works, until it doesn’t, and no one knows how it actually works anymore.

This is not a skills problem.
This is a system design failure.

Model B — Technology Run By Engineers

This model looks chaotic to outsiders.
To engineers, it feels calm, rational, and fast.
1. Engineers make engineering decisions.
Authority is tied to competence, not org charts.
2. Architecture emerges through use.
Designs evolve based on real constraints, not imaginary futures.
3. Requirements are negotiable.
Learning early is success, not failure.
4. Teams own systems end to end.
Build it, run it, improve it, or simplify it away.
5. Context is preserved.
Engineers understand why something exists, not just what to build.
6. Documentation follows reality.
It’s generated from code, pipelines and contracts — not PowerPoint.
7. Metrics are operationally honest.
Latency, error rates, deploy frequency, recovery time — not vanity KPIs.
8. Failure is expected and designed for.
Systems degrade gracefully. Humans don’t have to improvise heroics.
9. Scaling means removing friction.
Fewer handoffs. Fewer gates. Smaller interfaces.
10. Release is continuous.
Change is small, reversible and routine.
11. Tooling matches how engineers actually work.
CI, observability, infra and security are accelerators, not obstacles.
12. Abstraction is intentional.
Complexity is isolated, owned and constantly questioned.
13. Engineers optimise for long term health.
Because they’ll still be on call for it next year.
14. Talent stays because the work is meaningful.
Engineers grow systems, and themselves.
15. Technology compounds.
Each improvement makes the next one easier.

This model does not require “rockstar engineers”.
It requires trust in engineering judgement.

The Part No One Likes Hearing

Most corporate technology teams are dysfunctional, through fear of engineers making mistakes. So organisations replace judgement with process. They replace ownership with governance. They replace learning with compliance.

And then they wonder why:
• Systems are fragile
• Delivery is slow
• Engineers disengage
• Innovation comes from outside

If you don’t let engineers design systems, they’ll eventually stop understanding them. If they stop understanding them, no amount of process will save you.

0
0