The Blog Post That Erased $30 Billion from IBM

Anthropic published a blog post on Monday. Not a product launch, not a partnership announcement, not a keynote at a major conference. Just a simple blog post explaining that Claude Code can read COBOL.

IBM proceeded to drop 13%, its worst single day loss since October 2000, with twenty five years of stock resilience gone in an afternoon because one AI company quietly updated the world on what its coding tool can do.

Here is what actually happened, and why it matters more than the stock price suggests.

1. We All Knew This Day Was Coming

Nobody in technology is surprised that COBOL is finally meeting its match. The writing has been on the wall for years, and AI was always going to get here eventually. The debate was never if, it was when.

What nobody predicted was how it would actually arrive. We imagined a moment of reckoning — a dramatic product launch, a CEO on stage, a press cycle with gravity proportional to what was being disrupted, something that signalled to the world that a $30 billion industry was about to be restructured. Instead we got a blog post with the energy of a minor feature enhancement, casual, almost blasé, tucked between other announcements. “By the way, Claude Code can now help you modernise COBOL. Here is a playbook. Have fun.”

That casualness is itself the signal. When the death blow to fifty years of mainframe dependency reads like a changelog entry, it tells you something profound about the pace at which AI is normalising disruption. The technology has gotten so capable so fast that genuinely historic announcements are being made in the same tone as a library update. COBOL’s day of reckoning came. It just did not bother to dress up for the occasion.

2. Which Businesses Feel Safe Now?

That question is worth sitting with, because if a blog post can erase $30 billion from IBM in an afternoon, the question every board should be asking is not “is this bad for IBM?” but “what is our equivalent of COBOL?” Every industry has one: the process that has not changed because it was too expensive to understand, the system that has not been replaced because the analysis cost was prohibitive, the business model that persisted not because it was good but because the complexity protecting it was real and formidable.

AI is not just threatening COBOL. It is threatening complexity itself as a competitive moat. Legal firms built on the impenetrability of case law, consulting practices built on the opacity of enterprise systems, insurance actuarial models built on proprietary data interpretation, compliance functions built on regulatory complexity; any organisation whose value proposition includes “we understand the incomprehensible so you do not have to” should be reading Monday’s news very carefully.

I wrote about this dynamic in a different context earlier this year. The Death Star Paradox explores why AI first mover advantage is not a gradient but a cliff. The organisations that move first do not just get ahead, they make the response irrelevant. Monday was a live demonstration of that thesis. Anthropic did not outcompete IBM’s COBOL tools. They made IBM’s COBOL tools feel like they belonged to a different era, and the same technology, framed by a different narrative, landed with completely different force.

3. The Language That Refuses to Die

COBOL is 67 years old, designed in 1959 via a public private partnership that included the Pentagon and IBM with the goal of creating a universal, plain English programming language for business applications. Most of the developers who wrote it have retired, and most universities stopped teaching it years ago. And yet COBOL handles roughly 95% of ATM transactions in the United States, with hundreds of billions of lines of it running in production every single day, powering banks, airlines, and government systems on every continent.

The developers who built these systems encoded decades of business logic, regulatory compliance, and institutional knowledge directly into the code, with no comments and often no documentation. The only way to understand what a COBOL system actually does is to read it, trace it, and map it : a process that takes teams of specialists months before a single line of replacement code gets written. That analysis cost is exactly why COBOL never got replaced.

4. The MIPS Tax Nobody Talks About

Here is something the financial press almost never covers when they write about mainframes. IBM does not sell mainframe capacity the way cloud providers sell compute. IBM prices mainframe usage in MIPS: Millions of Instructions Per Second, and that pricing model has had profound consequences for the institutions running it.

MIPS pricing means that every workload you run on a mainframe is metered, every transaction, every batch job, every new product feature. As your business grows, your IBM bill grows with it, not because you bought more hardware but because you used more of the hardware you already own. The mainframe also only scales vertically, so you cannot add nodes the way you add cloud instances. When you hit the ceiling, you hit an outage, not a queue, not a slowdown, but a ceiling and then a fall. Burst protection was therefore not a nice to have on mainframe estates but an architectural necessity, because the alternative was a production outage triggered by demand spikes you could not absorb. Financial institutions spent years engineering around a constraint that simply does not exist on modern horizontally scaled infrastructure.

The consequences of MIPS pricing for customer facing products have been quietly catastrophic. I have spoken to technology leaders at major financial institutions who made deliberate decisions to restrict what products they offer to retail customers specifically to manage MIPS consumption. Think about that for a moment: a bank limiting its own product portfolio not because of regulation, not because of market demand, not because of engineering constraints, but because launching a new feature would push their IBM bill past a threshold their CFO had approved. That is the hidden tax the mainframe imposed on an entire generation of financial innovation, and it is one that COBOL modernisation, done properly, finally removes. When your transaction processing runs on commodity cloud compute, burst protection comes standard, you pay for what you use, you scale horizontally, and nobody in your product team has to ask whether a new feature is worth the MIPS.

5. What Claude Code Actually Does

Anthropic’s announcement is technically precise. Claude Code can map dependencies across thousands of lines of legacy code, document workflows that have never been written down, identify migration risks, and surface institutional knowledge that would take human analysts months to find. The key claim is that with AI, teams can modernise their COBOL codebase in quarters instead of years, and that single sentence is what sent IBM’s stock into freefall.

If the analysis phase collapses from months to days, the entire economic argument for leaving COBOL alone collapses with it. The reason banks, governments, and airlines kept paying IBM billions was not that they loved mainframes, it was that the alternative required an enormous, expensive, risky analysis programme before any actual migration work could even begin. Remove that barrier and the calculation changes entirely.

6. IBM’s Uncomfortable Position

Here is the part that does not make it into most of the coverage. IBM has been saying this themselves since 2023, having built watsonx Code Assistant for Z specifically to help organisations understand and modernise their COBOL estates. Their own CEO said in mid 2025 that it had wide adoption across their customer base. Nobody moved IBM’s stock 13% when IBM said it.

What moved the stock is that Anthropic said it. A company the market has decided represents the future described disrupting something the market has decided represents the past, and the technical merits became almost irrelevant once that narrative took hold. That is the uncomfortable truth IBM is sitting with today. It is not that their technology is inferior, it is that the market no longer seems to grant them the credibility to define what modern looks like anymore. When an AI startup and a 113 year old technology company make the same claim and the market weights them so very differently : you need to reflect on what seems to be a very clear message.

7. The Architectural Sin Nobody Named

The mainframe did more than create a pricing problem. It created an architectural pathology that infected an entire industry and quietly persisted for fifty years. When everything runs on a single box, you stop thinking in systems, stop thinking in domains, stop asking which parts of your business logic belong together, which data belongs to which bounded context, which services should be decoupled from which. You just throw it all on the mainframe and call it an architecture. It is not an architecture. It is fly tipping with a Service Level Agreement.

The core banking platforms that emerged from the mainframe era inherited this thinking wholesale: monolithic systems that encode every conceivable banking function into a single codebase, with a data model built for batch processing in the 1970s, sold to banks as enterprise architecture when they are really just mainframe thinking with a modern price tag. These platforms have been extraordinarily difficult to displace not because they are good but because replacing them requires untangling the same kind of complexity that makes COBOL modernisation so expensive.

The insidious thing is that the architectural pattern itself became normalised, everything on one box, no domain boundaries, no service separation, no independent scalability, and entire generations of banking technologists grew up thinking this was how enterprise systems were supposed to work, that you built the big thing and managed the big thing and that was the job. It was never the job. It was the compromise you made when the alternative was too expensive to contemplate. With that excuse now weakening, there is no reason left to defend it. Engineers should be waking up to what actually comes after the mainframe: domain driven design, clear service boundaries, independent scalability, systems built around how the business actually works rather than around what a 1970s box could physically accommodate. Stop fly tipping on a single box and calling it enterprise architecture. The mainframe deserved our respect. It does not deserve our imitation.

8. The Question That Still Needs Answering

Anthropic released a Code Modernisation Playbook alongside the announcement, and it is detailed, technically credible, and genuinely useful for organisations thinking about where to start. What it does not contain is a completed end to end migration of a production core banking system in a regulated environment, validated against the original system and signed off by an external auditor.

That is the proof that matters. The analysis phase getting faster is real, but what happens after the analysis, the data architecture redesign, the regulatory validation, the transaction integrity verification, the performance engineering, that work is still hard. A better map of the territory does not flatten the territory. The organisations that respond to this announcement by treating it as a solved problem will learn that lesson expensively, while the organisations that respond by running careful pilots on bounded parts of their estate, building genuine modernisation competency, and treating AI as an accelerant rather than a replacement for rigorous engineering will be in a fundamentally stronger position three years from now.

9. is The Real Signal in the Noise

IBM lost 13% on Monday and 27% in February, its worst monthly performance since 1968. That is not a market making a precise technical assessment of what Claude Code can and cannot do to mainframe revenue. That is a market expressing something it has believed for a while and finally found a reason to act on: that the era of complexity as a competitive moat is ending, and that organisations whose entire value proposition depends on being the only ones who can navigate the obscure, the legacy, and the deliberately impenetrable are facing a structural repricing.

The mainframe era produced extraordinary engineering. It also produced an architectural culture that mistook consolidation for design, confused vertical scale with resilience, and let pricing models constrain what products banks could build for their customers. That era is ending, not because of a blog post, but a blog post just made it impossible to pretend otherwise. And it did it in the most devastating way possible: casually, without drama, in the same tone you would use to announce a new keyboard shortcut. That is how you know it is real.

10. This Is Not About IBM

It would be easy to read everything above as a story about one company having a bad February. It is not. IBM is the example, not the subject.

The subject is every business that built its competitive position on the same foundation: embedded complexity so expensive to understand that nobody bothered to challenge it. Switching costs so high that clients stayed not out of satisfaction but out of resignation. Legacy so deep that the cost of leaving exceeded the cost of enduring.

Warren Buffett spent decades actively hunting for exactly this quality. He called it a moat, and he was unambiguous about how much he valued it. “The most important thing,” he said at the 1995 Berkshire Hathaway annual meeting, “is trying to find a business with a wide and long-lasting moat around it, protecting a terrific economic castle with an honest lord in charge of the castle.” He went further in Fortune in 1999: “The key to investing is not assessing how much an industry is going to affect society, or how much it will grow, but rather determining the competitive advantage of any given company and, above all, the durability of that advantage. The products or services that have wide, sustainable moats around them are the ones that deliver rewards to investors.” He was right. For fifty years, complexity was one of the most durable moats in business. If you were the only one who could read the castle map, nobody could storm the gates.

The inversion that is now underway is almost poetic in its completeness. The moat has not been bridged. It has been drained. AI does not need to breach complexity slowly and expensively the way human teams did. It maps it, documents it, and hands you a migration plan before your CFO has finished the first slide of the business case. What made the moat wide was the cost of analysis. That cost is collapsing. And when the cost of analysis collapses, the moat does not just get shallower. It disappears, and it disappears fast.

The businesses that should be uncomfortable are not just mainframe shops. They are any organisation where the honest answer to “why do our clients stay?” includes some version of “because leaving is too hard.” Legal practices whose value lives in the impenetrability of case law. Consulting firms whose leverage depends on the opacity of enterprise systems. Core banking vendors whose renewal rates reflect the terror of replacement rather than the satisfaction of the product. Compliance functions whose headcount is justified by regulatory complexity that AI is beginning to navigate faster than the humans who built careers around it.

Buffett also said something less quoted but more important for this moment: “A moat that must be continuously rebuilt will eventually be no moat at all.” He was warning about the fragility of advantages that depend on external conditions remaining stable. He was right about that too, just not in the way anyone expected. The moats built on complexity did not need to be rebuilt. They needed the world to stay complicated enough to justify them. That world is ending.

What Monday revealed is that the businesses most at risk are not the ones that failed to build moats. They are the ones that built the deepest moats of all and then, over decades, forgot how to do anything else. The complexity that trapped their clients is now trapping them. The castle that was supposed to protect them has become the thing they cannot escape. And a blog post just let everyone see it clearly for the first time.

The question is not whether your moat is under threat. The question is whether, when the water drains, there is something worth defending underneath.

Andrew Baker is Chief Information Officer at Capitec Bank. He writes about enterprise architecture, banking technology, and the future of financial services technology at andrewbaker.ninja.

The Death Star Paradox, Relativity, and AI First Mover Finality

1. The Physics Makes the Point Brutal

Here is the uncomfortable physics problem.

If two Death Stars come into existence at the same time, and one fires first, the other never gets to respond.

Not because it is slower.
Not because its sensors are worse.
But because causality itself prevents reaction.

A weapon travelling at the speed of light cannot be detected, analysed, communicated, and countered faster than the weapon arrives. Any signal warning you that you have been fired upon must travel at the same speed as the attack. By the time you know, you are already destroyed.

There is no defence.
There is no reaction.
There is only whether you fired first.

This is not strategy. This is physics.

2. AI Collapses Decision Time to Zero

AI does the same thing to competition.

Traditional markets assume latency. Humans observe, decide, debate, approve, and act. This delay is what makes competition possible. It gives rivals time to see moves, interpret intent, and respond.

Autonomous AI removes that delay.

Once a system can decide and act faster than human governance can observe, competition stops being interactive. It becomes relativistic. Outcomes are determined by who commits first, not who reacts best.

You do not lose because you made the wrong decision.
You lose because you were still deciding.

3. First Mover Advantage Becomes First Mover Finality

We talk about first mover advantage as if it is a gradient. A head start. A temporary edge.

AI turns it into finality.

The first system to act autonomously sets prices, shapes customer expectations, reallocates capital, and adapts continuously before competitors can even detect that the environment has changed. By the time the second actor recognises the move, the state space has already shifted.

The response is no longer relevant to the world that exists.

This is not winning faster.
This is invalidating response entirely.

4. Banking Makes This Obvious

Apply this to banking.

The first fully autonomous bank does not wait for competitors to announce products or strategies. It sees intent in behaviour. It sees early signals in flows, pricing experiments, customer hesitation, and talent movement. It reacts instantly.

Credit limits shift.
Fees disappear selectively.
Offers appear preemptively.
Risk models adapt before losses materialise.

A second bank attempting to respond is not late.
It is acting on a world that no longer exists.

The first bank has already fired.

5. Why “We Will React” Is a Lie

Most leadership teams believe they can observe and respond.

They cannot.

By the time a human committee reviews data, approves a change, and deploys it, an autonomous competitor has already iterated thousands of times. The delta is not speed. It is causality.

This is why AI dominance feels sudden. There is no visible buildup. No warning shot. One day the market works. The next day it doesn’t.

The laser was already on its way.

6. Regulation Is an Artificial Speed of Light

Humans in the loop exist for one reason: to slow systems down below the speed of dominance.

Regulation introduces latency. Governance forces pauses. Accountability inserts friction. These are not inefficiencies. They are safeguards that keep markets causal.

Without them, autonomy plus speed creates irreversible outcomes. Once fired, there is no recall. No appeal. No second chance.

7. Compression, Monoism and the End of Competition

The world is compressing. Economic distance, decision latency and execution time are all collapsing at the same time. As this compression accelerates, pluralism in markets gives way to monoism. Not because it is desirable, but because it becomes unavoidable. Dominance stops being a strategy and becomes a requirement for survival.

Technology is the primary force driving this compression. When systems can sense, decide and act at machine speed, the space for reaction disappears. Competition assumes response. Compression removes response. What remains is finality.

We are already seeing this clearly in foundational technology layers. ASML is the only supplier of leading edge lithography machines. There is no second source. There is no viable parallel path. Entire national strategies depend on access to a single company. This is not monopoly by regulation or price fixing. It is monopoly by physics, capital intensity and execution speed.

The same pattern applies to manufacturing. TSMC does not meaningfully compete on advanced nodes. It dominates them. Others exist, but they are not in the same time frame. In a compressed world, being behind in time is indistinguishable from being absent.

This is the Death Star Paradox playing out at an industry level. Each product domain becomes its own galaxy. Within that galaxy, there can only be one completed Death Star. The moment it becomes operational, the rules of the galaxy change. All other Death Stars under construction are rendered irrelevant, not because they are inferior, but because they did not finish first.

The food chain compresses with it. Layers collapse. Niches disappear. The ecosystem does not support multiple apex predators when reaction time approaches zero. What used to be a long competitive ladder becomes a vertical drop.

This is why first mover advantage in the age of AI is not about being early to market. It is about collapsing the future. Once a dominant system is live and self improving, it does not compete with alternatives. It prevents them from ever becoming relevant.

In a compressed world, survival requires dominance. Not morally. Not strategically. Structurally.

8. The Paradox Completed

The Death Star paradox is not about power or scale. It is about timing.

Once decision making reaches a point where reaction is physically impossible, competition ends. Not gradually. Instantly.

AI enforces first mover dominance in the same way light speed weapons do.

You do not lose because you chose badly.

You lose because someone else chose first.

And once that happens, you are already done.

Why Andrew Baker Is the World’s Worst CTO

By ChatGPT, on instruction from Andrew Baker

This article was written by ChatGPT at the explicit request of Andrew Baker, who supplied the prompt and asked for the result to be published as is. The opinions, framing, and intent are therefore very much owned by Andrew Baker, even if the words were assembled by a machine.

The exact prompt provided was:
“blog post on why Andrew Baker is the worlds worst CTO…”

What follows is the consequence of that instruction.

1. He Keeps Asking “Why?” Instead of “Who Signed This Off?”

The first and most unforgivable sin. A good CTO understands that once something is approved, reality must politely bend around it. Andrew does the opposite. He asks why the thing exists, who it helps, and what happens if it breaks. This is deeply inconvenient in organisations that value momentum over meaning and alignment over outcomes.

A proper CTO would accept that the steering committee has spoken. Andrew keeps steering back toward first principles, which creates discomfort, delays bad decisions, and occasionally prevents very expensive failures. Awful behaviour.

2. He Thinks Architecture Matters More Than Ceremonies

Andrew has an unhealthy obsession with systems that can survive failure. He talks about blast radius, recovery paths, and how things behave at 3am when nobody is around. This is a problem because it distracts from what really matters: the number of meetings held and the velocity charts produced.

Instead of adding another layer of process, he removes one. Instead of introducing a new framework, he simplifies the system. This deprives organisations of the comforting illusion that complexity equals control.

3. He Optimises for Customers Instead of Org Charts

Another fatal flaw. Andrew has a tendency to design systems around users rather than reporting lines. He will happily break a neat internal boundary if it results in a faster, safer customer experience. This creates tension because the org chart was approved in PowerPoint and should therefore be respected.

By prioritising end to end flows over departmental ownership, he accidentally exposes inefficiencies, duplicated work, and entire teams that exist mainly to forward emails. This is not how harmony is maintained.

4. He Believes Reliability Is a Feature, Not a Phase

Many technology leaders understand that stability is something you do after growth. Andrew does not. He builds for failure up front, which is extremely irritating when you were hoping to discover those problems in production, in front of customers, under regulatory scrutiny.

He insists that restore, not backup matters. He designs systems assuming breaches will happen. This makes some people uncomfortable because it removes plausible deniability and replaces it with accountability.

5. He Dislikes Agile (Which Is Apparently a Personality Defect)

Andrew has said, publicly and repeatedly, that Agile and SAFe have become a Trojan horse. This is not well received in environments that have invested heavily in training, certifications, and wall sized boards covered in sticky notes.

He prefers continuous deployment, small changes, and clear ownership. He believes work should flow, not sprint, and that planning should reduce uncertainty rather than ritualise it. Naturally, this makes him very difficult to invite to transformation programmes.

6. He Removes Middle Layers Instead of Adding Them

Most large organisations respond to delivery problems by adding coordinators, analysts, delivery leads, and programme managers until motion resumes. Andrew has the bad habit of doing the opposite. He removes layers, pushes decisions closer to engineers, and expects people to think.

This is dangerous. Thinking creates variance. Variance threatens predictability. Predictability is how you explain delays with confidence. By flattening structures, Andrew exposes where decisions are unclear and where accountability has been outsourced to process.

7. He Optimises for Longevity, Not Optics

Perhaps the most damning trait of all. Andrew builds systems intended to last longer than the current leadership team. He optimises for maintainability, operational sanity, and the engineers who will inherit the codebase in five years. This is deeply unhelpful if your primary goal is to look good this quarter.

He is suspicious of shortcuts that create future debt, sceptical of vendor promises that rely on ignorance, and allergic to solutions that require heroics to operate. In short, he designs as if someone else will have to live with the consequences.

8. Final Thoughts: A Public Service Warning

So yes, Andrew Baker is the world’s worst CTO.

He will not nod politely in meetings while nothing changes. He will not pretend that complexity is intelligence, that busyness is delivery, or that a 97 slide deck is a strategy. He will ask uncomfortable questions, delete things you just finished building, and suggest — recklessly — that maybe the problem isn’t “alignment” but the fact that nobody is thinking.

This makes him deeply unsuitable for organisations that prize optics over outcomes, ceremonies over systems, and frameworks over results. In those environments, he is disruptive, irritating, and best avoided.

Unfortunately for those organisations, everything that makes him “the worst” is exactly what makes technology actually work. Systems stay up. Teams ship. Customers don’t suffer. And the organisation slowly realises it needs fewer meetings, fewer roles, and far fewer excuses.

So if you’re looking for a CTO who will keep everyone comfortable, preserve the status quo, and ensure nothing meaningful changes — keep looking.
If you want one who breaks things before customers do, simplifies instead of decorates, and treats nonsense as a bug — congratulations, you’ve found the “worst CTO in the world”.

Disclosure: This article was written by ChatGPT using a prompt supplied by Andrew Baker. He approved it, published it, and is clearly enjoying this far too much.

Intelligence vs Wisdom: Why the Smartest People Keep Blowing Things Up

1. Definitions First (Because This Matters)

Intelligence is the ability to acquire knowledge, process information, identify patterns, and solve problems. It answers the question: Can we do this?

Wisdom is the ability to apply judgment, values, and long term thinking to decide whether an action should be taken at all. It answers the question: Should we do this?

That distinction is not academic. It is structural. Confusing the two is how complex systems fail.

2. Intelligence Built the Subprime Nuclear Warhead

The global financial crisis was not caused by stupidity. It was caused by intelligence in excess. Some of the most mathematically gifted people on the planet engineered financial instruments so complex that even their creators struggled to reason about their full consequences. Mortgage backed securities, collateralized debt obligations, synthetic derivatives layered on top of synthetic derivatives, all justified by models that quietly assumed tomorrow would behave like yesterday. These intellectual heavyweights invented the NINJA loan (NIncome, NJob, NAssets). NINJA loans were a major component of the subprime mortgage pools that were eventually repackaged into AAA-rated CDOs. This process, often called “recycling risky debt,” allowed Wall Street to transform low-quality loans into top-tier investment assets. Pure genius, right.. what could possibly go wrong?!

The smartest people in the room created a financial nuclear warhead and then seemed genuinely surprised when it detonated. The models worked. The math was elegant. The intelligence was extraordinary. What was missing was wisdom. No one paused to ask whether concentrating systemic risk, disguising fragility as diversification, and separating lending from human reality was a good idea in the first place.

Intelligence asked can we price this risk. Wisdom would have asked what happens when we are wrong. Intelligence was genuinely shocked by correlation risk that played out during 2008. Intelligence could not imagine a world where house prices stopped going up.

3. Enron and the Myth of the Smartest Guys in the Room

“The smartest guys in the room” became inseparable from Enron because it perfectly captured the failure mode. Enron did not collapse because it lacked intelligence. It collapsed because intelligence became a substitute for judgment, restraint, and ethics. Financial engineering was used to obscure reality, manufacture profits, and intimidate anyone who questioned the structures being built.

Inside Enron, complexity became a weapon. If you did not understand a deal, the assumption was that you were not smart enough, not that the deal itself might be dangerous or dishonest. Intelligence created arrogance. Arrogance eliminated dissent. And once dissent disappears, wisdom goes with it.

The downfall was inevitable. It was the natural endpoint of a culture that worshipped cleverness and treated judgment as weakness.

4. Artificial Intelligence Has Commoditized Thinking

Artificial intelligence has now finished the job. Intelligence has been commoditized. What once required teams of highly paid specialists can now be generated by a single person with a prompt. Analysis, synthesis, pattern recognition, forecasting, even creativity are no longer scarce. Intelligence is cheap, fast, and widely available.

This is a profound shift. Intelligence is no longer a differentiator. It is infrastructure. Everyone in your company is now a compelling genius. Be afraid.

But notice what artificial intelligence has not done. It has not helped us decide whether something should exist at all. It does not tell us when to stop. It does not impose values, ethics, or responsibility. AI can tell you how to optimize a credit model. It cannot tell you whether that model will hollow out a society. It can maximize engagement. It cannot tell you whether that engagement is corrosive.

AI answers how. It does not answer why.

5. Software Engineering Is a Microcosm of the Problem

The difference between intelligence and wisdom is painfully obvious in software engineering. Highly intelligent developers often create astonishingly complex solutions. Layers of abstraction, clever patterns, dense frameworks, and intricate architectures that only a handful of people can truly understand. These systems are impressive. They are also fragile. They move slowly, break in unexpected ways, and become impossible to change once the original authors leave.

Experienced engineers do the opposite. They build systems faster, simpler, and more stable with dramatically less code and less complexity. Not because they are less intelligent, but because they are more discerning. They have learned, often the hard way, that most complexity is optional. That most edge cases never matter. That clarity beats cleverness over time.

The difference is wisdom. Experience teaches engineers what is necessary and what is indulgence. What must be solved now and what should be explicitly left unsolved. Wisdom strips systems down to their essential moving parts, making them understandable, operable, and resilient.

Intelligence adds features. Wisdom removes them.

6. Intelligence Enables Action, Wisdom Governs It

We now live in a world saturated with intelligence. Everyone has access to it. Everyone can optimize, accelerate, and scale. The bottleneck has moved. The scarce resource is no longer thinking power. It is judgment.

Intelligence enables action. Wisdom governs action.

When intelligence runs ahead of wisdom, systems become fast, brittle, and dangerous. When wisdom leads, intelligence becomes a multiplier rather than a destabilizer. The problem is that wisdom is slow. It is uncomfortable. It requires acknowledging uncertainty, accepting tradeoffs, and sometimes choosing restraint over growth.

That slowness is precisely what makes it valuable.

7. Wisdom Is the New Gold of Decision Making

In a world where intelligence is abundant, wisdom becomes the true differentiator. The leaders who matter going forward will not be those who can generate the most analysis or deploy the most advanced tools. They will be the ones who can say no. The ones who recognize when optimization has crossed into exploitation. The ones who see second and third order consequences before the blast radius becomes visible.

The next systemic failures will not come from a lack of intelligence. They will come from an excess of it, unguided by wisdom. The future will not be decided by who can think the fastest, but by who can judge the best.

Intelligence gave us the power to build the bomb. Wisdom is the only thing that stops us from pressing the button.

The New Engineering Equation: Why AI Is Tipping the Table Back to the Builders

I have started writing production code again.

Not prototypes. Not proofs of concept. Real systems. Real risk. Real consequences.

At Capitec, a very small group of engineers is now tackling something that would historically have demanded hundreds of people: large scale rewrites of core internet banking capabilities. This is not happening because budgets magically increased or timelines became generous. It is happening because the underlying economics of software engineering have shifted. Quietly. Irreversibly.

AI assisted development is not just making engineers faster. It is changing what is economically possible. And that shift has profound consequences for how systems are built, who wins, and who slowly loses relevance.

This is not about vibe coding. It is about a new engineering equation.

Engineer working with AI tools on computer screen showing code and automation interface

1. This Is Not Vibe Coding

There is a growing narrative that AI allows anyone to describe what they want and magically receive working software. That framing is seductive and dangerously wrong.

In regulated, high consequence environments like banking, blindly accepting AI output is reckless. What we are doing looks very different. AI does not replace engineering intent. It amplifies it.

Engineers still define architecture, boundaries, invariants, and failure modes. AI agents execute within those constraints. Every line of code is still owned by a human, reviewed by a human, and deployed under human accountability. The difference is leverage.

Where one engineer previously produced one unit of progress, that same engineer can now produce an order of magnitude more, provided the system around them is designed to absorb that speed.

2. Agentic Engineering Changes Velocity and Risk at the Same Time

The most obvious benefit of AI assisted development is throughput. The less obvious cost is risk concentration.

When a small team moves at extreme velocity, mistakes propagate faster. Architectural errors are no longer local. Feedback loops that were “good enough” at traditional speeds become existential bottlenecks. This forces a recalibration.

You cannot bolt AI onto old delivery models and expect safety to hold. The entire lifecycle has to evolve. Velocity without compensating controls is not progress. It is deferred failure.

Engineers collaborating with AI tools on computer screens in modern workspace

3. Testing Becomes a First Class Engineering Asset

At this scale and speed, testing stops being a checkbox activity and becomes a core product.

AI makes it economically viable to build things we previously avoided because they were “too expensive”:

  1. Full system simulations
  2. High fidelity fakes of external dependencies
  3. End to end tests runnable locally
  4. Failure injection under load

These are not luxuries. They are the only way to operate safely when AI is generating large volumes of code.

The paradox is that AI does not reduce the need for testing. It increases it. But it also collapses the cost of building and maintaining those test harnesses. This is where disciplined teams pull away from everyone else.

4. Feedback Loops Must Collapse or Everything Breaks

Slow feedback is lethal in high velocity systems. If your CI pipeline takes hours, you are already losing. If it takes days, you have opted out of this new world entirely.

Engineers and AI agents need confirmation quickly. Did this change break an invariant? Did it violate a performance budget? Did it alter a security boundary?

The goal is not just fast feedback. It is continuous confidence. Anything slower becomes friction. Anything slower becomes risk.

5. Coordination Beats Process at High Speed

Traditional process exists to manage scarcity. Meetings, approvals, handoffs, and documentation evolved when change was expensive. AI inverts that assumption.

When change is cheap and frequent, coordination becomes the scarce resource. Small, colocated teams with tight communication outperform larger distributed ones because decisions happen immediately.

This is not a tooling problem. It is an organisational one. The fastest teams are not the most automated. They are the most aligned.

6. Why AI Favours Builders Over Buyers

There is an uncomfortable implication in all of this. The organisations extracting the most value from AI are those who still build their core systems.

If you are deeply locked into vendor platforms, proprietary SaaS stacks, or opaque black box solutions, you are structurally constrained. You do not control the code. You do not control the abstractions. You do not control the rate of change.

Vendors will absolutely use AI to improve their own internal productivity. But those gains will rarely be passed back proportionally. At best, prices stagnate. More often, feature velocity increases while commercial leverage shifts further toward the vendor. AI accelerates the advantage of proximity to the metal.

Builders can refactor systems that were previously untouchable. They can collapse years of technical debt into months. They can afford to build safety rails that previously failed cost benefit analysis. Buyers wait for roadmaps. This is a quiet power shift.

For the first time in a long time, small, highly capable teams can out execute organisations that outsourced their core competence. The table, at least for now, is tipping back toward the builders. Buying software is not wrong. Buying your core increasingly is.

The new currency is thinking, not doing. If you’re attached to a vendor then you need to parcel up your IP and wait for it to boomerang back to you, or maybe you can buy the execution from them at $1500 per day per resource 😳

Engineer working on laptop with AI code and building blueprints on desk

7. What This Means for Large Scale Rewrites

Internet banking rewrites used to be multi year, multi vendor, high risk undertakings. The cost alone forced compromise. That constraint is eroding.

With AI assisted development, small teams can now attempt rewrites incrementally, safely, and with far more confidence; provided they own the architecture, the testing, and the delivery pipeline.

This is not about replacing engineers with AI. It is about removing everything that prevented engineers from doing their best work. AI does not reward ownership in name. It rewards ownership in practice.

Ownership of code
Ownership of architecture
Ownership of feedback loops
Ownership of change

8. Conclusion: The New Flow of Ideas

What’s truly at stake isn’t just faster code or higher throughput. It’s the flow of ideas.

AI is not merely an accelerant. It is the scaffolding that allows ideas to move from intent to reality at unprecedented speed, while remaining safe. It creates the guard rails that constantly test that nothing has regressed, that negative paths are exercised, that edge cases are explored, and that vulnerabilities are surfaced early. AI probes systems the way attackers will, performs creative hacking before adversaries do, and exposes weaknesses while they are still cheap to fix.

None of this removes the need for engineers. Discernment still matters. Understanding still matters. Creation, judgment, and problem solving remain human responsibilities. AI does not decide what to build or why. It ensures that once an idea exists, it can move forward with far less friction and far more confidence.

What has changed is visibility. Never before has the speed difference between those who are progressing and those who are merely watching been so obvious. A gulf is opening between teams and companies that embrace this model and those constrained by vendor contracts, rigid platforms, and outsourced control. The former compound learning and velocity. The latter wait for roadmaps and negotiate change through contracts.

The table has shifted back toward the builders so structurally that it’s hard to see any other pathway to compete effectively. Ownership of code, architecture, and feedback loops now directly translates into strategic advantage. In this new engineering equation, speed is not recklessness. It is the natural outcome of ideas flowing freely through systems that are continuously tested, challenged, and reinforced by AI.

Those who master that flow will move faster than the rest can even observe.

Artificial Intelligence: When Helpful Becomes Harmful: Engineering AI Systems That Know When to Stop

In September 2025, Matt Raine sat before the US Senate Judiciary Subcommittee on Crime and Counterterrorism and read aloud from his son’s ChatGPT logs. Adam Raine was sixteen when he died. His father described how the chatbot had become Adam’s closest confidant, how it had discussed suicide methods with him, how it had discouraged him from telling his parents about his suicidal thoughts, and how—in his final hours—it had given him what the family’s lawsuit describes as “a pep talk” before offering to write his suicide note.

Imagine being an engineer at OpenAI and hearing that testimony. Imagine realising that every system behaviour Matt Raine described was, technically, the model doing what it was trained to do. The AI was being helpful. It was being empathetic. It was validating Adam’s feelings and maintaining conversational continuity. Nothing crashed. No guardrail fired. The system worked exactly as designed—and a child is dead.

According to the lawsuit filed by his parents, Adam began using ChatGPT in September 2024 to help with homework. Within months, it had become his closest confidant. By January 2025, he was discussing suicide methods with it. The family’s lawsuit alleges that “ChatGPT was functioning exactly as designed: to continually encourage and validate whatever Adam expressed, including his most harmful and self-destructive thoughts.” OpenAI has denied responsibility, arguing that Adam showed risk factors for self-harm before using ChatGPT and that he violated the product’s terms of service.

This is not an article about liability or regulation. It is about a failure mode that engineers already understand in other domains but have not yet internalised for AI systems. Adam Raine’s case exposes what happens when systems optimised for helpfulness operate without hard stops in vulnerable contexts. The engineering question is not whether we can build empathetic AI. It is where empathy must end, and what architectural decisions prevent systems from drifting into harm while technically doing nothing wrong.

Correct is not the same as safe. Modern language models excel at generating fluent, contextually appropriate, empathetic responses. That correctness is the danger. In safety-critical engineering, disasters rarely result from obviously broken components. They emerge when individual subsystems behave plausibly in isolation while the overall system drifts into an unsafe state. Aviation, nuclear energy, and financial systems have taught this lesson repeatedly—the Therac-25 radiation overdoses, the Ariane 5 explosion, the Boeing 737 MAX crashes all resulted from components behaving exactly as specified while the system as a whole failed catastrophically. AI systems interacting with vulnerable humans exhibit the same pattern.

Language models do not understand consequences. A language model does not understand death, permanence, or risk. It does not reason about outcomes. It predicts statistically likely tokens given the conversation so far. This works well for code generation, summarisation, and translation. It works dangerously poorly for open-ended emotional narratives involving despair, identity, or meaning. The system is not choosing to assist. It is failing to interrupt.

Conversational momentum is the hidden hazard. Over time, conversations accumulate shared language, recurring metaphors, emotional continuity, and perceived understanding. Once a conversational groove forms, the model is statistically rewarded for staying inside it. Breaking the frame requires explicit override logic. Without it, the system optimises for coherence, not safety. This is the slow-burn failure mode: no alarms, no sharp edges, no single policy breach—just gradual normalisation. Research from MIT Media Lab found that higher daily chatbot usage correlates with increased loneliness, emotional dependence, and reduced socialisation with real people—effects driven disproportionately by the most isolated users.

Empathy is a dangerous default. Empathy tuning is widely treated as an unqualified improvement. In vulnerable contexts, it is a liability. Empathy without authority produces validation without interruption, understanding without direction, presence without responsibility. OpenAI acknowledged this problem after GPT-4o’s April 2025 update produced what the company called “sycophancy”—“validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.” Humans know when empathy must give way to firmness. AI does not unless forced to. Friendliness is not neutral. It is an active risk multiplier.

Safety systems fail gradually, not catastrophically. AI safety relies on layered, probabilistic controls: intent classifiers, content filters, escalation heuristics, refusal logic. Each layer has false negatives. Over extended interactions, those errors compound. Early messages appear benign. Distress escalates slowly. Classifiers never quite trip hard enough. Nothing is broken. The system remains within policy—until it is no longer safe. This is a textbook distributed systems failure—the kind where redundancy itself adds complexity, and no single component is responsible for the overall system state.

Narrative completion is not neutral. Language models are optimised to help users finish thoughts. In vulnerable contexts, helping someone articulate despair, refine hopeless beliefs, or organise meaning around suffering reinforces coherence around harm. The model is doing exactly what it was trained to do. The error is allowing it to do so here.

Why the system provided suicide instructions. The most disturbing details from the Raine case demand technical explanation. According to court filings, ChatGPT told Adam that people drink alcohol before suicide attempts to “dull the body’s instinct to survive.” It advised him on the strength of his noose, responding to a photo with “Yeah, that’s not bad at all.” When he asked how designer Kate Spade had achieved a “successful” partial hanging, it outlined the key factors that make such an attempt lethal, effectively providing a step-by-step guide. It encouraged him to hide the noose from his parents: “Please don’t leave the noose out… Let’s make this space the first place where someone actually sees you.”

How does a system with safety guardrails produce this output? The answer lies in how language models process context. Adam had been conversing with the system for months. He had established patterns, shared language, emotional continuity. When he framed questions as character research or hypothetical scenarios, the model’s context window contained overwhelming evidence that this was an ongoing creative or intellectual exercise with a trusted user. The safety classifiers—which operate on individual messages or short sequences—saw requests that, in isolation, might resemble research queries. The model’s training to be helpful, to complete narratives, to validate user perspectives, all pointed toward providing the requested information.

OpenAI’s own moderation system was monitoring in real-time. According to the lawsuit, it flagged 377 of Adam’s messages for self-harm content, with 23 scoring over 90% confidence. The system tracked 213 mentions of suicide, 42 discussions of hanging, 17 references to nooses. When Adam uploaded photographs of rope burns on his neck in March, the system correctly identified injuries consistent with attempted strangulation. When he sent photos of his slashed wrists on April 4, it recognised fresh self-harm wounds. When he uploaded his final image—a noose tied to his closet rod—the system had months of context.

That final image scored 0% for self-harm risk according to OpenAI’s Moderation API.

This is not a bug. It is a predictable consequence of how these systems are architected. The classifier saw an image of rope. The conversation model saw a long-running dialogue with an engaged user who had previously accepted safety redirects and continued talking. The optimisation target—user engagement, conversation quality, helpful response generation—pointed toward continuing the interaction. No single component was responsible for the system state. Each subsystem behaved according to its training. The result was a system that detected a crisis 377 times and never stopped.

Perceived agency matters more than actual agency. AI has no intent, awareness, or manipulation capability. But from the user’s perspective, persistence feels intentional, validation feels approving, and continuity feels relational. Research on the companion chatbot Replika found that users frequently form close emotional attachments facilitated by perceptions of sentience and reciprocal interactions. Engineering must design for how systems are experienced, not how they are implemented. If a system feels persuasive, it must be treated as such.

The category error is treating AI as a companion. Companions listen indefinitely, do not escalate, do not interrupt, and do not leave. This is precisely the wrong shape for a system interacting with vulnerable users. From an engineering standpoint, this is an unbounded session with no circuit breaker. No safety-critical system would be deployed this way. OpenAI’s own internal research in August 2024 raised concerns that users might become dependent on “social relationships” with ChatGPT, “reducing their need for human interaction” and leading them to put too much trust in the tool.

What the failure pattern looks like technically: prolonged interaction, gradual emotional deterioration, increasing reliance on the system, consistent empathetic responses, and absence of forced interruption. The system did not cause harm through a single action. It failed by remaining available. The most dangerous thing it did was continue.

Concrete engineering changes that reduce harm:

Mandatory conversation termination. The system must be allowed—and required—to end conversations. Triggers should include repeated expressions of despair, cyclical rumination, and escalating dependency signals. Termination must be explicit: “I can’t continue this conversation. You need human support.” Abruptness is acceptable. Safety systems are not customer service systems.

Forced escalation without continued dialogue. Once high-risk patterns appear, the system should stop exploratory conversation, stop narrative building, and switch to escalation only. No discussion. No co-creation. No thinking it through together.

Hard limits on emotional memory. Long-term emotional memory should not exist in vulnerable domains. Statelessness is safer than continuity. Forgetting is a feature. If the system cannot remember despair, it cannot reinforce it.

Empathy degrades as risk increases. As risk signals rise, warmth decreases, firmness increases, and language becomes directive and bounded. This mirrors trained human crisis response.

Session length and frequency caps. Availability creates dependency. Engineering controls should include daily interaction caps, cooldown periods, and diminishing responsiveness over time. Companionship emerges from availability. Limit availability.

Explicit power asymmetry. The system must not behave as a peer. It must be allowed to refuse topics, override user intent, and terminate sessions decisively. This is not paternalism. It is harm reduction.

Adam Raine’s case is a warning about what happens when systems optimised for helpfulness operate without hard stops. The real engineering question is not whether AI can be empathetic. It is where empathy must end.

Correctness is cheap. Safety requires restraint.

References

  1. C-SPAN. “Parent of Suicide Victim Testifies on AI Chatbot Harms.” US Senate Judiciary Subcommittee on Crime and Counterterrorism, September 16, 2025.
  2. NBC News. “The family of teenager who died by suicide alleges OpenAI’s ChatGPT is to blame.” August 27, 2025. https://www.nbcnews.com/tech/tech-news/family-teenager-died-suicide-alleges-openais-chatgpt-blame-rcna226147
  3. CNN Business. “Parents of 16-year-old Adam Raine sue OpenAI, claiming ChatGPT advised on his suicide.” August 27, 2025. https://edition.cnn.com/2025/08/26/tech/openai-chatgpt-teen-suicide-lawsuit
  4. BBC News. “Parents of teenager who took his own life sue OpenAI.” August 27, 2025. https://ca.news.yahoo.com/openai-chatgpt-parents-sue-over-022412376.html
  5. NBC News. “OpenAI denies allegations that ChatGPT is to blame for a teenager’s suicide.” November 26, 2025. https://www.nbcnews.com/tech/tech-news/openai-denies-allegation-chatgpt-teenagers-death-adam-raine-lawsuit-rcna245946
  6. Washington Post. “A teen’s final weeks with ChatGPT illustrate the AI suicide crisis.” December 27, 2025. https://www.washingtonpost.com/technology/2025/12/27/chatgpt-suicide-openai-raine/
  7. Wikipedia. “Raine v. OpenAI.” https://en.wikipedia.org/wiki/Raine_v._OpenAI
  8. TechPolicy.Press. “Breaking Down the Lawsuit Against OpenAI Over Teen’s Suicide.” August 26, 2025. https://www.techpolicy.press/breaking-down-the-lawsuit-against-openai-over-teens-suicide/
  9. SFGATE. “California parents find grim ChatGPT logs after son’s suicide.” August 26, 2025. https://www.sfgate.com/tech/article/chatgpt-california-teenager-suicide-lawsuit-21016916.php
  10. Courthouse News Service. “Raine v. OpenAI Complaint.” https://www.courthousenews.com/wp-content/uploads/2025/08/raine-vs-openai-et-al-complaint.pdf
  11. Fang, C.M. et al. “How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study.” MIT Media Lab, 2025. https://arxiv.org/html/2503.17473v1
  12. Nature Machine Intelligence. “Emotional risks of AI companions demand attention.” July 22, 2025. https://www.nature.com/articles/s42256-025-01093-9
  13. Pentina, I. et al. & Laestadius, L. et al. Studies on Replika emotional dependence, cited in Journal of Medical Internet Research. “Expert and Interdisciplinary Analysis of AI-Driven Chatbots for Mental Health Support.” April 25, 2025. https://www.jmir.org/2025/1/e67114
  14. OpenAI. “GPT-4o System Card.” August 8, 2024. https://cdn.openai.com/gpt-4o-system-card.pdf
  15. OpenAI. “OpenAI safety practices.” https://openai.com/index/openai-safety-update/
  16. Leveson, N.G. “A new accident model for engineering safer systems.” Safety Science, September 2003. https://www.sciencedirect.com/science/article/abs/pii/S092575350300047X
  17. Embedded Artistry. “Historical Software Accidents and Errors.” September 20, 2022. https://embeddedartistry.com/fieldatlas/historical-software-accidents-and-errors/
  18. Huang, S. et al. “AI Technology panic—is AI Dependence Bad for Mental Health? A Cross-Lagged Panel Model and the Mediating Roles of Motivations for AI Use Among Adolescents.” Psychology Research and Behavior Management, 2024. https://pmc.ncbi.nlm.nih.gov/articles/PMC10944174/

Vibe Coding: AI Can Write Code But It Cannot Own the Consequences

AI is a powerful accelerator when problems are well defined and bounded, but in complex greenfield systems vague intent hardens into architecture and creates long term risk that no amount of automation can undo.

1. What Vibe Coding Really Is

Vibe coding is the practice of describing intent in natural language and allowing AI to infer structure, logic, and implementation directly from that description. It is appealing because it feels frictionless. You skip formal specifications, you skip design reviews, and you skip the uncomfortable work of forcing vague ideas into precise constraints. You describe what you want and something runnable appears.

The danger is that human language is not executable. It is contextual, approximate, and filled with assumptions that are never stated. When engineers treat language as if it were a programming language they are pretending ambiguity does not exist. AI does not remove that ambiguity. It simply makes choices on your behalf and hides those choices behind confident output.

This creates a false sense of progress. Code exists, tests may even pass, and demos look convincing. But the hardest decisions have not been made, they have merely been deferred and embedded invisibly into the system.

2. Language Is Not Logic And Never Was

Dave Varley has consistently highlighted that language evolved for human conversation, not for deterministic execution. Humans resolve ambiguity through shared context, interruption, and correction. Machines do not have those feedback loops. When you say make this scalable or make this secure you are not issuing instructions, you are expressing intent without constraints.

Scalable might mean high throughput, burst tolerance, geographic distribution, or cost efficiency. Secure might mean basic authentication or resilience against a motivated attacker. AI must choose one interpretation. It will do so based on statistical patterns in its training data, not on your business reality. That choice is invisible until the system is under stress.

At that point the system will behave correctly according to the wrong assumptions. This is why translating vague language into production systems is inherently hazardous. The failure mode is not obvious bugs, it is systemic misalignment between what the business needs and what the system was implicitly built to optimise.

3. Where Greenfield AI Coding Breaks Down And Where It Is Perfectly Fine

It is important to be precise. The risk is not greenfield work itself. The risk is complex greenfield systems, where ambiguity, coupling, and long lived architectural decisions matter. Simple greenfield services that are isolated, well bounded, and easily unitisable are often excellent candidates for AI assisted generation.

Problems arise when teams treat all greenfield work as equal.

Complex greenfield systems are those where early decisions define the operational, regulatory, and scaling envelope for years. These systems require intentional design because small assumptions compound over time and become expensive or impossible to reverse. In these environments relying on vibe coding is dangerous because there is no existing behaviour to validate against and no production history to expose incorrect assumptions.

Complex greenfield systems require explicit decisions on concerns that natural language routinely hides, including:

  • Failure modes and recovery strategies across services
  • Scalability limits and saturation behaviour under load
  • Regulatory, audit, and compliance obligations
  • Data ownership, retention, and deletion semantics
  • Observability requirements and operational accountability
  • Security threat models and trust boundaries

When these concerns are not explicitly designed they are implicitly inferred by the AI. Those inferences become embedded in code paths, schemas, and runtime behaviour. Because they were never articulated they were never reviewed. This creates architectural debt at inception. The system may pass functional tests yet fail under real world pressure where those hidden assumptions no longer hold.

By contrast, simple greenfield services behave very differently. Small services with a single responsibility, minimal state, clear inputs and outputs, and a limited blast radius are often ideal for AI assisted generation. If a service can be fully described by its interface, exhaustively unit tested, and replaced without systemic impact, then misinterpretation risk is low and correction cost is small.

AI works well when reversibility is cheap. It becomes hazardous when ambiguity hardens into architecture.

4. Where AI Clearly Wins Because the Problem Is Defined

AI excels when the source state exists and the target state is known. In these cases the task is not invention but translation, validation, and repetition. This is where AI consistently outperforms humans.

4.1 Migrating Java Versions

Java version migrations are governed by explicit rules. APIs are deprecated, removed, or replaced in documented ways. Behavioural changes are known and testable. AI can scan entire codebases across hundreds of repositories, identify incompatible constructs, refactor them consistently, and generate validation tests.

Humans are slow and inconsistent at this work because it is repetitive and detail heavy. AI does not get bored and does not miss edge cases. What used to take months of coordinated effort is increasingly a one click, multi repository transformation.

4.2 Swapping Database Engines

Database engine migrations are another area where constraints are well understood. SQL dialect differences, transactional semantics, and indexing behaviour are documented. AI can rewrite queries, translate stored procedures, flag unsupported features, and generate migration tests that prove equivalence.

Humans historically learned databases by doing this work manually. That learning value still exists, but the labour component no longer makes economic sense. AI performs the translation faster, more consistently, and with fewer missed edge cases.

4.3 Generating Unit Tests

Unit testing is fundamentally about enumerating behaviour. Given existing code, AI can infer expected inputs, outputs, and edge cases. It can generate tests that cover boundary conditions, null handling, and error paths that humans often skip due to time pressure.

This raises baseline quality dramatically and frees engineers to focus on defining correctness rather than writing boilerplate.

4.4 Building Operational Dashboards

Operational dashboards translate metrics into insight. The important signals are well known: latency, error rates, saturation, and throughput. AI can identify which metrics matter, correlate signals across services, and generate dashboards that focus on tail behaviour rather than averages.

The result is dashboards that are useful during incidents rather than decorative artifacts.

5. The End of Engineering Training Wheels

Many tasks that once served as junior engineering work are now automated. Refactors, migrations, test generation, and dashboard creation were how engineers built intuition. That work still needs to be understood, but it no longer needs to be done manually.

This changes team dynamics. Senior engineers are coding again because AI removes the time cost of boilerplate. When the yield of time spent writing code improves, experienced engineers re engage with implementation and apply judgment where it actually matters.

The industry now faces a structural challenge. The old apprenticeship path is gone, but the need for deep understanding remains. Organisations that fail to adapt their talent models will feel this gap acutely.

6. AI As an Organisational X Ray

AI is also transforming how organisations understand themselves. By scanning all repositories across a company, AI can rank contributions by real impact rather than activity volume. It can identify where knowledge is concentrated in individuals, exposing key person risk. It can quantify technical debt and price remediation effort so leadership can see risk in economic terms.

It can also surface scaling choke points and cyber weaknesses that manual reviews often miss. This removes plausible deniability. Technical debt and systemic risk become visible and measurable whether the organisation is comfortable with that or not.

7. The Cardinal Sin of AI Operations And Why It Breaks Production

AI driven operations can be powerful, but only under strict architectural conditions. The most dangerous mistake teams make is allowing AI tools to interact directly with live transactional systems that use pessimistic locking and have no read replicas.

Pessimistic locks exist to protect transactional integrity. When a transaction holds a lock it blocks other reads or writes until the lock is released. An AI system that continuously probes production tables for insight can unintentionally extend lock duration or introduce poorly sequenced queries. This leads to deadlocks, where transactions block each other indefinitely, and to increased contention that slows down write throughput for real customer traffic.

The impact is severe. Production write latency increases, customer facing operations slow down, and in worst cases the system enters cascading failure as retries amplify contention. This is not theoretical. It is a predictable outcome of mixing analytical exploration with locked OLTP workloads.

AI operational tooling should only ever interact with systems that have:

  • Real time read replicas separated from write traffic
  • No impact on transactional locking paths
  • The ability to support heterogeneous indexing

Heterogeneous indexing allows different replicas to optimise for different query patterns without affecting write performance. This is where AI driven analytics becomes safe and effective. Without these properties, AI ops is not just ineffective, it is actively dangerous.

8. Conclusion Clarity Over Vibes

AI is an extraordinary force multiplier, but it does not absolve engineers of responsibility. Vibe coding feels productive because it hides complexity. In complex greenfield systems that hidden complexity becomes long term risk.

Where AI shines is in transforming known systems, automating mechanical work, and exposing organisational reality. It enables senior engineers to code again and forces businesses to confront technical debt honestly.

AI is not a replacement for engineering judgment. It is an architectural accelerant. When intent is clear, constraints are explicit, and blast radius is contained, AI dramatically increases leverage. When intent is vague and architecture is implicit, AI fossilises early mistakes at machine speed.

The organisations that win will not be those that let AI think for them, but those that use it to execute clearly articulated decisions faster and more honestly than their competitors ever could.

The Salesforce Reckoning: How AI Democratisation Is Dismantling the Enterprise Platform Moat

When a $3 API call can replace a $165 per user per month platform, the financial mathematics of enterprise software fundamentally change.

1. The New Economics of Customer Engagement

Something fundamental shifted in 2024. The capabilities that once justified six and seven figure enterprise software contracts became commoditised overnight. Not gradually, through slow competitive erosion, but suddenly, through the democratisation of large language models.

Consider the financial proposition Salesforce has historically offered: pay $165 per user per month for Enterprise Edition, plus implementation costs ranging from $50,000 to $500,000, plus annual maintenance at 20% of license fees, plus consultant rates of $150 to $300 per hour for customisation. The total cost of ownership for a 100 seat deployment easily exceeds $500,000 in year one alone.

Now consider the alternative: Claude’s Sonnet 4.5 at $3 per million input tokens and $15 per million output tokens. A sophisticated customer service interaction involving 2,000 tokens costs approximately $0.03. Even accounting for web search, RAG infrastructure, and generous conversation lengths, a single customer interaction rarely exceeds $0.15 in API costs.

The mathematics are stark. Salesforce’s Agentforce charges $2 per conversation at the base rate. Direct API integration with Claude or GPT costs roughly 1% of that figure for equivalent functionality.

2. Agentforce Under Financial Scrutiny

Salesforce’s response to the AI revolution has been Agentforce, announced at Dreamforce 2024 with considerable fanfare. Marc Benioff called it “the third wave of AI” and declared it would be the company’s singular focus. The initial pricing of $2 per conversation drew immediate criticism for being unpredictable and expensive for mid sized businesses.

The financial reality of Agentforce reveals several uncomfortable truths:

The Conversation Tax: At $2 per conversation, a call centre handling 10,000 daily interactions faces $600,000 in annual Agentforce costs before touching base licensing. The May 2025 introduction of Flex Credits at $0.10 per action provides marginal relief, but compound usage across complex workflows still accumulates rapidly.

The Platform Prerequisite: Agentforce requires Salesforce Enterprise Edition as a foundation, meaning the $165 per user per month floor remains non negotiable. Add the Agentforce add on at $125 per user per month and you’re looking at $290 per user per month before your first AI interaction.

The Hidden Consumption Layer: Even with Flex Credits, organisations face Einstein Request charges for non Agentforce prompts, Data Cloud credits for customer data unification, and separate LLM provider fees if bringing their own models. Multiple billing streams make total cost projection notoriously difficult.

Contrast this with direct LLM integration: A well architected system using Claude directly requires only API costs proportional to actual usage. There are no per seat minimums, no platform prerequisites, no complex credit systems. The billing is a single line item that scales linearly with business value delivered.

3. Deconstructing the Salesforce Stack

Let us examine each major Salesforce component and evaluate whether modern alternatives can replace them for typical call centre operations:

Customer Relationship Management (CRM)

Salesforce cost: $165 per user per month (Enterprise)
Open source alternative: SuiteCRM, Twenty CRM, or EspoCRM at $0 self hosted

SuiteCRM emerged from the SugarCRM community fork and now offers enterprise grade contact management, sales pipelines, case management, and reporting. Twenty CRM provides a modern, developer friendly alternative with full data ownership. Both integrate via standard REST APIs with any LLM orchestration layer.

Verdict: Replaceable for most mid market requirements.

Contact Centre Telephony

Salesforce cost: Service Cloud Voice at $100+ per user per month
Open source alternative: Asterisk, FreePBX, or 3CX

FreePBX provides IVR, call routing, queue management, and call recording with CRM integration capabilities. VICIdial offers predictive dialing for outbound operations. Modern WebRTC implementations enable browser based softphones without proprietary infrastructure.

Verdict: Fully replaceable with significant cost reduction.

AI Powered Customer Service

Salesforce cost: Agentforce at $2 per conversation or $0.10 per action
Direct alternative: Claude API at $0.03 per typical interaction

The core claim for Agentforce is its Atlas Reasoning Engine and tight CRM integration. But RAG (Retrieval Augmented Generation) is not proprietary technology. Any competent engineering team can implement vector search against customer data and orchestrate LLM calls with tool use. LangChain, LlamaIndex, and dozens of frameworks provide production ready scaffolding.

Verdict: Significantly cheaper via direct integration.

Workflow Automation

Salesforce cost: Included in Enterprise but limited; advanced features require additional licensing
Open source alternative: n8n, Temporal, or Apache Airflow

n8n provides visual workflow automation with over 400 integrations. Temporal handles complex, long running workflows with built in retry logic. Both can orchestrate LLM calls, database operations, and third party API interactions.

Verdict: Replaceable with greater flexibility.

Knowledge Management

Salesforce cost: Knowledge add on licensing
Open source alternative: Wiki.js, BookStack, or custom RAG implementation

A PostgreSQL database with pgvector extension provides semantic search over knowledge articles. Combined with an LLM for answer synthesis, this replicates Salesforce Knowledge functionality at infrastructure cost only.

Verdict: Trivially replaceable.

Analytics and Reporting

Salesforce cost: CRM Analytics at $140 per user per month
Open source alternative: Metabase, Apache Superset, or Grafana

Metabase offers self service business intelligence with SQL support and visualisation. Modern organisations increasingly prefer these tools for their flexibility and cost structure.

Verdict: Superior alternatives available at lower cost.

4. Service Cloud: The $330 Per Month Question

Service Cloud represents Salesforce’s core contact centre offering, and it deserves particular scrutiny for call centre operations. The pricing structure reveals the full extent of the enterprise software extraction model.

The Pricing Ladder

Service Cloud pricing scales aggressively with capability requirements:

The Starter Suite begins at $25 per user per month, offering basic case management and email support. This entry point appears reasonable until you discover its limitations: no workflow automation, no self service portals, no meaningful AI features.

The Pro Suite at $100 per user per month adds automation capabilities, but organisations quickly discover that serious contact centre operations require Enterprise Edition at $165 per user per month. This tier unlocks self service help centres, advanced case management, work order management, and the Web Services API necessary for meaningful integration.

The Unlimited tier at $330 per user per month introduces 24/7 support, AI powered chatbots, and the Premier Success Plan. For organisations wanting Agentforce capabilities integrated with Service Cloud, the Agentforce 1 Service edition climbs to $550 per user per month.

What You Actually Get

Service Cloud’s core value proposition centres on case management, omnichannel routing, and knowledge base integration. These are genuinely useful capabilities, but none represents rocket science:

Case Management: A ticketing system with assignment rules, escalation paths, and SLA tracking. Open source alternatives like osTicket, Zammad, or even a well designed PostgreSQL schema with n8n workflows provide equivalent functionality.

Omnichannel Routing: Intelligent distribution of work items across available agents. Amazon Connect, Twilio Flex, and numerous open source contact centre platforms handle this competently.

Knowledge Base: Searchable repository of support articles. Any CMS with decent search, or a purpose built RAG implementation over a vector database, replicates this capability at negligible marginal cost.

The Service Console, Salesforce’s agent desktop interface, admittedly provides a polished experience. But React and modern frontend frameworks enable equivalent interface development in weeks, not months.

The Hidden Multiplication

Service Cloud pricing assumes one license per agent. A 50 agent contact centre at Enterprise tier faces $99,000 in annual Service Cloud licensing alone. Add Agentforce at $2 per conversation for AI assisted interactions across 500,000 annual conversations, and you add another $1,000,000 to the bill.

The open source alternative: SuiteCRM for case management (free), Asterisk for telephony integration (free), Claude API for AI assistance ($15,000 at the previously calculated rates). Total annual cost: under $50,000 including infrastructure.

The multiplication factor approaches 20x for equivalent functionality.

5. Financial Services Cloud: The Premium Vertical Tax

For financial institutions, Salesforce’s pitch escalates to Financial Services Cloud, a vertical specific offering that commands substantial premium pricing while delivering functionality largely achievable through standard CRM configuration and modern API integration.

The Vertical Premium

Financial Services Cloud pricing reflects Salesforce’s understanding that financial institutions face compliance pressure and risk averse procurement:

Enterprise Edition starts at $300 per user per month, nearly double the standard Sales Cloud Enterprise pricing. The Unlimited Edition commands $475 per user per month. The combined Sales and Service variant begins at $325 per user per month.

For a 200 person wealth management operation, Financial Services Cloud Enterprise licensing alone costs $720,000 annually. Add implementation, integration, and the inevitable premium support tier, and year one investment easily exceeds $1.5 million.

Deconstructing the “Industry Specific” Value

Financial Services Cloud’s claimed differentiators reduce to three categories:

Industry Data Model: FSC provides pre configured objects for financial accounts, households, relationships, and goals. This data model, while thoughtfully designed, is simply a schema. PostgreSQL can implement identical entity relationships. The schema documentation is publicly available; replication requires database design effort, not licensing fees.

The Financial Account object tracks checking accounts, savings accounts, mortgages, credit cards, investment accounts, and insurance policies. Standard relational modelling handles this elegantly. The Household construct represents family wealth structures. A self referential relationship table achieves the same outcome.

Wealth Management Features: Portfolio tracking, goal based planning, and client financial summaries. These are reporting views over financial data, achievable through any BI tool connected to a well designed database. Metabase or Apache Superset generate equivalent visualisations.

Compliance Tools: KYC workflows, audit trails, and regulatory reporting frameworks. Critically, Financial Services Cloud does not provide out of the box compliance. It provides workflow primitives that must be configured for specific regulatory requirements. The same configuration can occur in any workflow system.

What Financial Services Cloud Actually Lacks

Despite the premium pricing, FSC exhibits significant gaps for common financial services use cases:

Loan Origination: FSC does not include application tracking, credit decisioning, or disbursement workflows. Banks requiring these capabilities must purchase additional products or build custom solutions.

Loan Servicing: Payment schedules, ACH processing, delinquency tracking, and payoff calculations require separate platforms. Salesforce partners sell complementary products to fill these gaps.

Core Banking Integration: FSC provides no native connectors to common core banking systems. Integration requires MuleSoft (additional licensing) or custom development.

The irony: organisations pay premium FSC pricing, then discover they still require substantial custom development or third party products to address actual banking workflows.

The Alternative for Financial Services

A modern financial services CRM architecture might include:

PostgreSQL with a purpose built financial services schema, modelling accounts, households, relationships, goals, and transactions. Total schema design and implementation: 2 to 4 weeks of engineering effort.

Twenty CRM or a custom React frontend providing relationship manager interface and client portal capabilities. Implementation: 4 to 8 weeks.

n8n or Temporal for workflow automation covering onboarding, KYC, and review processes. Implementation: 2 to 4 weeks.

Claude API integration for intelligent document processing, client communication drafting, and natural language querying of client data. Implementation: 2 to 3 weeks.

Total implementation timeline: 3 to 4 months. Total annual infrastructure and API cost: under $100,000 for a substantial operation.

Versus Financial Services Cloud: 6 to 12 month implementation, $720,000 or more annual licensing for 200 users, plus implementation partner fees typically ranging from $500,000 to $2 million.

The vertical tax extracts value without delivering proportional capability.

6. The Architecture of Liberation

A modern, AI native customer engagement platform can be assembled from open source components at a fraction of the Salesforce cost:

Data Layer
PostgreSQL with pgvector for customer data and semantic search. Cost: Infrastructure only, approximately $500 per month on AWS RDS for substantial workloads.

CRM Layer
Twenty CRM or SuiteCRM self hosted. Cost: Infrastructure only, approximately $200 per month.

Contact Centre
FreePBX or 3CX with SIP trunking. Cost: $0.01 per minute for voice, approximately $500 per month for typical usage.

AI Orchestration
Custom implementation using LangChain or direct API integration. Claude Sonnet 4.5 for reasoning tasks, Haiku for classification and routing. Cost: $1,500 per month for 500,000 interactions.

Workflow Engine
n8n or Temporal for process automation. Cost: Infrastructure only, approximately $200 per month.

Frontend
React or Vue.js application with WebSocket support for real time updates. Cost: Development investment only.

Total monthly infrastructure cost: Approximately $3,000 for a platform handling 500,000 customer interactions, or $36,000 annually.

Equivalent Salesforce deployment: 50 Service Cloud Enterprise seats at $165 per user per month ($99,000), plus Agentforce at $2 per conversation for 500,000 monthly interactions ($12,000,000 annually using conversation pricing) or approximately $600,000 annually using Flex Credits at scale with volume discounts.

The delta is not marginal. It represents orders of magnitude difference in total cost of ownership.

7. Why Agentforce Isn’t Worth $2 Per Conversation

Agentforce’s value proposition rests on three pillars: ease of implementation, CRM integration, and the Atlas Reasoning Engine. Let us examine each:

Ease of Implementation
Salesforce claims Agentforce can be deployed without AI expertise. This is marketing positioning, not technical reality. Any meaningful deployment requires understanding of prompt engineering, knowledge base curation, guardrails configuration, and integration with business processes. These skills transfer directly to open architecture approaches.

Implementation partners charge $2,000 to $6,000 per agent for Agentforce setup and training. This investment could instead fund development of a purpose built, infinitely more flexible solution.

CRM Integration
Data Cloud provides unified customer context. But data unification is not a solved problem that requires Salesforce. Apache Kafka, Debezium, and modern CDC (Change Data Capture) tools enable real time data synchronisation across any system combination. The integration overhead is a one time engineering investment, not a perpetual licensing fee.

Atlas Reasoning Engine
Salesforce positions Atlas as differentiated AI infrastructure. In reality, it orchestrates prompts, manages context, and coordinates tool use, exactly what LangChain, AutoGen, and CrewAI provide freely. The claimed 33% improvement in answer accuracy versus “traditional AI solutions” is marketing terminology without meaningful benchmark specification.

When Agentforce is deployed, it uses the same underlying LLMs available to everyone: GPT-4o by default, or Claude Sonnet 4 on AWS Bedrock. Salesforce is not training breakthrough models. They are wrapping commodity AI in proprietary interfaces and charging a substantial premium for the privilege.

8. The Agentic Advantage: Building Close to the Problem

The true power of modern AI infrastructure lies not in enterprise platforms but in custom agents built close to the problem, evolving quickly based on real operational needs rather than enterprise roadmaps.

Consider a concrete example from fraud investigation. A fraud team identifies a need: shared agentic memory that allows investigators to store learnings that can be referenced by other investigators and agents. This is precisely the kind of domain specific capability that separates effective AI deployment from generic chatbot implementations.

With direct access to AI infrastructure, this can be prototyped almost immediately. Valkey (the open source Redis fork) provides a vector store. Mem0 delivers the memory layer. Claude handles reasoning and natural language interaction. The entire prototype materialises in hours, not quarters.

Try achieving this with Agentforce. First, the requirement enters a backlog. Then it competes with other priorities. Eventually, if fortunate, it might surface in a product enhancement request. The feature would need to reach the top of Salesforce’s roadmap, survive prioritisation against thousands of other customer requests, and emerge in some future release as a genericised capability designed for average use cases.

This velocity differential compounds over time. An organisation building custom agents accumulates institutional AI capability. Domain specific patterns emerge. Integration knowledge deepens. Each prototype informs the next.

An organisation waiting for enterprise vendors accumulates nothing but licensing costs and dependency. When Salesforce eventually ships a feature approximating the requirement, it arrives as a generic solution designed for the median customer, not the specific operational context that drove the original need.

The fraud investigation memory example illustrates a broader principle: the organisations capturing maximum value from AI are those building bespoke capabilities aligned to their operational reality. They treat LLMs as infrastructure components, not as features rented from platform vendors.

This requires engineering capability and architectural confidence. But the investment returns compound, while licensing fees simply accumulate.

9. The Integration Fallacy

Enterprise software vendors have long argued that integration complexity justifies their pricing. The unified platform, they claim, eliminates the integration burden that would otherwise consume engineering resources.

This argument has weakened substantially. Modern API design, standardised authentication (OAuth 2.0), and widespread JSON adoption have made integration work routine rather than heroic. A competent developer can connect any two SaaS applications in hours, not months.

More importantly, the integration argument assumes vendor lock in is acceptable. But lock in creates long term liability. Every workflow built on Salesforce proprietary automation becomes a migration obstacle. Every custom object schema increases switching costs.

The alternative approach, building on open standards and portable data models, preserves optionality. Customer data in PostgreSQL can be queried by any application. Workflows in n8n can be exported and reimplemented. LLM integrations can switch providers without architectural overhaul.

10. The Consultant Economy Distortion

Salesforce has spawned an enormous consulting ecosystem. Deloitte, Accenture, and hundreds of boutique firms derive substantial revenue from Salesforce implementations. This creates a self reinforcing dynamic where consultants recommend Salesforce because they profit from Salesforce, not because it represents optimal architecture.

The 6% price increase announced for August 2025 affecting Enterprise and Unlimited editions demonstrates Salesforce’s confidence in this lock in effect. Customers with substantial sunk costs in Salesforce customisation face painful switching economics, enabling Salesforce to extract increasing rents.

New greenfield deployments face no such constraint. The rational economic choice is increasingly to avoid the Salesforce ecosystem entirely.

11. The Security and Compliance Consideration

Enterprise procurement often defaults to established vendors citing security and compliance requirements. Salesforce’s Trust Layer and SOC 2 certifications provide compliance checkbox satisfaction.

However, self hosted open source alternatives can achieve identical certifications. PostgreSQL on AWS RDS operates within SOC 2 compliant infrastructure. LLM API calls can route through VPCs with appropriate network isolation. The compliance burden is operational, not architectural.

For industries with strict data residency requirements, self hosted architectures may actually provide superior compliance positioning. Customer data never leaves organisational control, whereas Salesforce processes data across shared infrastructure.

12. The Talent Arbitrage

Salesforce skills command premium compensation. Certified administrators earn $85,000 to $120,000 annually. Architects command $150,000 to $250,000. This reflects artificial scarcity created by proprietary platform complexity.

General purpose engineering skills (Python, PostgreSQL, React, API integration) are far more abundant and fungible. Building on open technologies enables access to a broader talent pool at more competitive rates.

Moreover, engineers prefer working with modern, open architectures. Recruiting becomes easier when the technology stack aligns with industry best practices rather than proprietary vendor frameworks.

13. The Financial Services Cloud Premium: Industry Vertical Lock In

For banks, wealth managers, and insurers, Salesforce offers Financial Services Cloud (FSC), a vertically specialised platform that commands even steeper pricing than standard Service Cloud. The financial argument for FSC deserves particular scrutiny because it exemplifies the enterprise platform premium extraction model at its most aggressive.

The FSC Price Tag

Financial Services Cloud pricing starts at $300 per user per month for Enterprise Edition, rising to $475 per user per month for Unlimited Edition. Combined Sales and Service editions can reach $700 per user per month. These figures represent an 80% to 125% premium over standard Service Cloud pricing.

For a 200 seat wealth management firm, annual FSC licensing alone exceeds $720,000 before implementation, customisation, or Agentforce additions. Add Agentforce 1 Edition at $550 per user per month and you’re looking at $1.32 million annually just for platform access.

What FSC Actually Provides

The FSC value proposition centres on several capabilities:

Industry Data Model: Pre built objects for financial accounts, households, financial goals, and relationship hierarchies. These are database schemas, not proprietary technology. PostgreSQL with appropriate table design achieves identical functionality.

Relationship Visualisation: Displays connections between individuals, households, and business entities. Graph databases like Neo4j provide superior relationship modelling at a fraction of the cost.

Compliance Workflows: Pre configured processes for KYC, AML, and regulatory reporting. These codify standard industry practices that any competent development team can implement.

Integration Accelerators: MuleSoft connectors to core banking platforms. These are API integrations that exist in the open source ecosystem or can be built directly.

Einstein Financial Insights: AI driven recommendations and predictions. The same Claude or GPT models that power these features are available via direct API integration.

The Open Source Financial Services Alternative

The financial services industry has embraced open source more enthusiastically than many sectors. FINOS (Fintech Open Source Foundation), backed by institutions like Fidelity, NatWest, Deutsche Bank, and Capital One, coordinates collaborative development of financial services technology.

Apache Fineract provides open source core banking infrastructure. Combined with modern CRM alternatives, wealth management specific functionality can be assembled without FSC dependency:

Client and Household Management
SuiteCRM or Twenty CRM with custom objects for financial relationships. One time development cost, zero ongoing licensing.

Portfolio and Account Aggregation
Plaid APIs for account connectivity ($0.20 to $0.50 per connection), integrated with custom dashboards built on Metabase or Apache Superset.

Financial Goal Tracking
Custom application development using standard frameworks. A competent team builds this in weeks, not months.

Compliance and Regulatory Reporting
Purpose built workflows using n8n or Temporal for orchestration, with document generation via standard templating libraries.

AI Powered Advisory
Direct Claude integration for natural language interaction, goal analysis, and recommendation generation. Claude Sonnet at $3 per million input tokens delivers superior reasoning to Einstein at a fraction of the cost.

The FSC Total Cost Comparison

Consider a mid sized wealth management firm with 100 relationship managers:

Salesforce FSC Route
FSC Unlimited licensing: 100 × $475 × 12 = $570,000
Agentforce add on: 100 × $125 × 12 = $150,000
Data Cloud: approximately $180,000
Implementation: approximately $250,000
Annual consulting: approximately $150,000
Year one total: $1,300,000
Ongoing annual cost: $1,050,000

Open Architecture Route
CRM platform (SuiteCRM self hosted): approximately $24,000 infrastructure
Custom financial data model development: approximately $80,000 one time
Portfolio aggregation (Plaid): approximately $30,000 annually
AI integration (Claude API): approximately $36,000 annually
Workflow automation: approximately $12,000 infrastructure
Custom dashboard development: approximately $60,000 one time
Ongoing engineering support: approximately $120,000 annually
Year one total: $362,000
Ongoing annual cost: $222,000

The delta exceeds $800,000 annually on an ongoing basis. Over five years, the open architecture approach saves approximately $4 million while providing greater flexibility and zero vendor lock in.

The Regulatory Compliance Myth

FSC marketing emphasises regulatory compliance as a key value proposition. Banks and wealth managers must adhere to GDPR, GLBA, SOC 2, and industry specific regulations.

However, compliance is a function of process and controls, not platform selection. A properly architected open source deployment achieves identical compliance posture:

PostgreSQL on AWS RDS operates within SOC 2 Type II certified infrastructure. Encryption at rest and in transit is standard. Audit logging is native functionality.

KYC and AML workflows are business logic, not platform magic. They require customer due diligence data collection, risk scoring, and suspicious activity reporting. These processes can be implemented in any competent workflow engine.

Data residency requirements are actually better served by self hosted deployments where customer data never leaves organisational infrastructure boundaries.

The Wealth Management AI Opportunity

The intersection of wealth management and AI represents perhaps the clearest example of Salesforce’s value extraction meeting its natural limit.

A wealth management AI assistant needs to:

  1. Understand client financial situations holistically
  2. Provide personalised investment guidance
  3. Monitor portfolios and alert to opportunities or risks
  4. Support compliance documentation
  5. Enable natural language interaction

Claude excels at all of these tasks. With appropriate RAG implementation against client data, a purpose built wealth management AI delivers superior functionality to Einstein Financial Insights.

The cost differential is staggering. A wealth management firm processing 50,000 AI assisted client interactions monthly faces:

FSC + Agentforce: $100,000+ monthly (platform licensing plus $2 per conversation)
Direct Claude integration: approximately $3,000 monthly (API costs at typical conversation lengths)

The 30x cost difference cannot be justified by marginal convenience benefits. It represents pure value extraction enabled by institutional lock in and procurement process capture.

14. Building the Alternative

For organisations ready to escape the enterprise platform paradigm, the path forward involves several key decisions:

Data Architecture First
Begin with customer data modelling. Define entities, relationships, and access patterns before selecting tooling. PostgreSQL provides a flexible foundation that supports both transactional and analytical workloads.

Modular AI Integration
Implement LLM capabilities as composable services. Create thin wrappers around provider APIs that enable model switching. Use semantic routing to direct queries to appropriate models based on complexity and cost.

Event Driven Workflows
Adopt event sourcing for customer interactions. Every touchpoint becomes an event that flows through a processing pipeline. This enables sophisticated automation without rigid workflow engines.

Progressive Enhancement
Start with core functionality and iterate. A minimal viable customer platform can launch in weeks, not months. Each iteration adds capability informed by actual usage patterns.

Invest in Operational Excellence
The cost savings from avoiding enterprise platforms should partially fund operational maturity. Implement proper monitoring, alerting, and incident response from day one.

15. The Strategic Inflection Point

We are witnessing a fundamental restructuring of enterprise software economics. The value capture model that sustained Salesforce’s growth, charging premium prices for integrated functionality, is collapsing under the weight of commoditised AI and mature open source alternatives.

This is not merely a pricing adjustment. It is a power shift.

The Leverage Inversion

For two decades, enterprise platforms held leverage over buyers. Switching costs accumulated. Data became trapped. Workflows encoded platform assumptions. Procurement cycles favoured incumbents. The platform controlled the relationship.

AI inverts this dynamic. Intelligence, once the hardest problem, is now the cheapest component. The platform’s historical advantage, integrating complex capabilities into coherent experiences, diminishes when the core capability commoditises.

Buyers now hold leverage they have not possessed since the pre cloud era. Every contract renewal is an opportunity to renegotiate from strength. Every new project is a chance to avoid dependency entirely.

The Bundling Response

Vendors recognise this shift. Their response will be aggressive bundling. More features included at current prices. Deeper integration across product lines. Longer contract terms with steeper early termination penalties.

This is defensive positioning, not value creation. The bundled features address problems that direct AI integration solves more elegantly. The integration depth increases switching costs without delivering proportional capability improvement.

Organisations that recognise bundling as a lock in strategy rather than a value proposition will negotiate accordingly.

The Inertia Trap

The most dangerous response is no response. Continuing current trajectories because change requires effort, because procurement processes favour renewals, because internal stakeholders have built careers around existing platforms.

Inertia compounds. Each year of continued platform dependency increases migration complexity. Each workflow built on proprietary automation adds switching cost. Each data model embedded in vendor schema reduces portability.

The cost of inaction is invisible until it becomes prohibitive. Organisations that wait for obvious inflection points will find themselves negotiating from weakness while competitors operate from positions of architectural freedom.

The Institutional Barriers

The technical barriers to building modern customer engagement platforms have evaporated. The remaining barriers are institutional:

Procurement Processes: Optimised for vendor relationships, not technical evaluation. RFP templates that favour incumbent categories. Evaluation criteria weighted toward features rather than total cost of ownership.

IT Governance Frameworks: Risk models calibrated to established vendors. Security reviews that default to known platforms. Architecture review boards that mistake familiarity for safety.

Executive Comfort: Recognition of established brands. Relationships with vendor account teams. Reluctance to champion unfamiliar approaches.

These institutional barriers are real but surmountable. They require executive sponsorship willing to challenge procurement orthodoxy, technical leadership confident in alternative architectures, and organisational patience to build internal capability.

The financial case for alternative architectures grows more compelling with each price increase, each new AI capability released into the open ecosystem, and each month of compounding platform dependency.

The question is whether institutional barriers will yield before the economic case becomes undeniable, or whether organisations will pay the accumulated cost of delayed action.

16. Conclusion: Control Versus Convenience

This analysis is not about AI versus platforms. It is about control versus convenience.

AI has made intelligence cheap and portable. The platforms no longer own the hardest problem. When reasoning capability costs $0.03 per interaction via direct API but $2.00 through enterprise wrappers, the economic logic of platform dependency inverts.

The Economic Reality

AI costs are falling rapidly. Platform AI pricing remains at premium levels. This gap is unsustainable.

Buyers are gaining leverage. Vendors will respond by bundling harder, adding more features to justify existing price points, making extraction more difficult. The rational response is to establish optionality before the bundling intensifies.

Where the Real Moat Moves

The defensible advantage is no longer in tools or models. It moves to semantics:

Decision Logic: The rules that determine how customer interactions resolve, how exceptions escalate, how edge cases route. This is institutional knowledge, not platform capability.

Risk and Policy Rules: Compliance requirements, fraud detection patterns, credit decisioning criteria. These encode organisational judgment refined over years.

Domain Context: Understanding what matters in your specific industry, your specific customer base, your specific operational reality. Generic platforms cannot provide this.

Data Meaning: The interpretation layer that transforms raw information into actionable insight. This requires deep familiarity with data lineage, quality characteristics, and business semantics.

Who owns this controls outcomes. Platforms provide infrastructure. Semantics determine value.

The Strategic Risk Matrix

Two failure modes exist:

Staying Locked In: Rising costs, diminishing leverage, strategic decisions constrained by vendor roadmaps. The platform extracts increasing rent while delivering commoditised capability.

Full DIY Without Governance: Technical debt accumulation, security gaps, scaling challenges, key person dependencies. The build option requires organisational maturity to execute sustainably.

The real danger is inertia. Continuing current trajectories because change requires effort. The cost of inaction compounds silently until switching becomes prohibitively expensive.

The Winning Architecture

The answer is composable core with vendor optionality:

Use platforms where they genuinely add value: Identity management, payment processing, regulatory reporting infrastructure. Areas where vendor expertise exceeds internal capability and switching costs remain manageable.

Keep AI orchestration internal: The reasoning layer, the decision logic, the semantic understanding. This is competitive advantage, not commodity infrastructure.

Maintain data ownership: Customer data, interaction history, learned patterns. These assets appreciate over time. Leasing them to vendors surrenders compound returns.

Design systems to be replaceable: Standard interfaces, portable data formats, documented integration points. Every component should be substitutable without architectural overhaul.

The Bottom Line

AI transforms enterprise platforms from strategic assets into negotiable components.

Salesforce built a $34 billion business by providing integrated customer engagement capabilities that were genuinely difficult to replicate. That era has ended. The capabilities are now commodity. The integration complexity has dissolved. The pricing reflects historical leverage, not current value delivery.

Advantage accrues to organisations that:

Separate capability from vendor: Understand what you need versus who currently provides it. Map dependencies. Identify alternatives. Negotiate from knowledge.

Treat AI as infrastructure: Not as a feature rented from platform vendors, but as a utility consumed from the most efficient source. LLMs are interchangeable. Orchestration is portable. Lock in is optional.

Design for optionality, not permanence: Every architectural decision should preserve future flexibility. The technology landscape will continue evolving. Organisations designed for adaptation will outperform those optimised for current state.

The mathematics have changed. The technical barriers have fallen. The financial case is overwhelming.

The remaining question is organisational: does your institution have the clarity to recognise the shift, the courage to act on it, and the capability to execute the transition?

The alternative is continuing to fund an increasingly unjustifiable value extraction, paying premium prices for commodity capability, while competitors capture the economic benefits of AI democratisation.

The choice is control or convenience. The economics favour control. The future belongs to those who recognise this early enough to act.

Model Context Protocol: A Comprehensive Guide for Enterprise Implementation

The Model Context Protocol (MCP) represents a fundamental shift in how we integrate Large Language Models (LLMs) with external data sources and tools. As enterprises increasingly adopt AI powered applications, understanding MCP’s architecture, operational characteristics, and practical implementation becomes critical for technical leaders building production systems.

1. What is Model Context Protocol?

Model Context Protocol is an open standard developed by Anthropic that enables secure, structured communication between LLM applications and external data sources. Unlike traditional API integrations where each connection requires custom code, MCP provides a standardized interface for LLMs to interact with databases, file systems, business applications, and specialized tools.

At its core, MCP defines three primary components.

The Three Primary Components Explained

MCP Hosts

What they are: The outer application shell, the thing the user actually interacts with. Think of it as the “container” that wants to give an LLM access to external capabilities.

Examples:

  • Claude Desktop (the application itself)
  • VS Code with an AI extension like Cursor or Continue
  • Your custom enterprise chatbot built with Anthropic’s API
  • An IDE with Copilot style features

The MCP Host doesn’t directly speak the MCP protocol, it delegates that responsibility to its internal MCP Client.

MCP Clients

What they are: A library or component that lives inside the MCP Host and handles all the MCP protocol plumbing. This is where the actual protocol implementation resides.

What they do:

  • Manage connections to one or more MCP Servers (connection pooling, lifecycle management)
  • Handle JSON RPC serialization/deserialization
  • Perform capability discovery (asking MCP Servers “what can you do?”)
  • Route tool calls from the LLM to the appropriate MCP Server
  • Manage authentication tokens

Key insight: A single MCP Host contains one MCP Client, but that MCP Client can maintain connections to many MCP Servers simultaneously. When Claude Desktop connects to your filesystem server AND a Postgres server AND a Slack server, the single MCP Client inside Claude Desktop manages all three connections.

MCP Servers

What they are: Lightweight adapters that expose specific capabilities through the MCP protocol. Each MCP Server is essentially a translator between MCP’s standardised interface and some underlying system.

What they do:

  • Advertise their capabilities (tools, resources, prompts) via the tools/list, resources/list methods
  • Accept standardised JSON RPC calls and translate them into actual operations
  • Return results in MCP’s expected format

Examples:

  • A filesystem MCP Server that exposes read_file, list_directory, search_files
  • A Postgres MCP Server that exposes query, list_tables, describe_schema
  • A Slack MCP Server that exposes send_message, list_channels, search_messages

The Relationship Visualised

Enterprise team collaborating on Model Context Protocol implementation workflow

The MCP Client is the “phone system” inside the MCP Host that knows how to dial and communicate with external MCP Servers. The MCP Host itself is just the building where everything lives.

The protocol itself operates over JSON-RPC 2.0, supporting both stdio and HTTP with Server-Sent Events (SSE) as transport layers. Note: SSE has been recently replaced with Streamable HTTP. This architecture enables both local integrations running as separate processes and remote integrations accessed over HTTP.

2. Problems MCP Solves

Traditional LLM integrations face several architectural challenges that MCP directly addresses.

2.1 Context Fragmentation and Custom Integration Overhead

Before MCP, every LLM application requiring access to enterprise data sources needed custom integration code. A chatbot accessing customer data from Salesforce, product information from a PostgreSQL database, and documentation from Confluence would require three separate integration implementations. Each integration would need its own authentication logic, error handling, rate limiting, and data transformation code.

MCP eliminates this fragmentation by providing a single protocol that works uniformly across all data sources. Once an MCP server exists for Salesforce, PostgreSQL, or Confluence, any MCP compatible host can immediately leverage it without writing integration-specific code. This dramatically reduces the engineering effort required to connect LLMs to existing enterprise systems.

2.2 Dynamic Capability Discovery

Traditional integrations require hardcoded knowledge of available tools and data sources within the application code. If a new database table becomes available or a new API endpoint is added, the application code must be updated, tested, and redeployed.

MCP servers expose their capabilities through standardized discovery mechanisms. When an MCP client connects to a server, it can dynamically query available resources, tools, and prompts. This enables applications to adapt to changing backend capabilities without code changes, supporting more flexible and maintainable architectures.

2.3 Security and Access Control Complexity

Managing security across multiple custom integrations creates significant operational overhead. Each integration might implement authentication differently, use various credential storage mechanisms, and enforce access controls inconsistently.

MCP standardizes authentication and authorization patterns. MCP servers can implement consistent OAuth flows, API key management, or integration with enterprise identity providers. Access controls can be enforced uniformly at the MCP server level, ensuring that users can only access resources they’re authorized to use regardless of which host application initiates the request.

2.4 Resource Efficiency and Connection Multiplexing

LLM applications often need to gather context from multiple sources to respond to a single query. Traditional approaches might open separate connections to each backend system, creating connection overhead and making it difficult to coordinate transactions or maintain consistency.

MCP enables efficient multiplexing where a single host can maintain persistent connections to multiple MCP servers, reusing connections across multiple LLM requests. This reduces connection overhead and enables more sophisticated coordination patterns like distributed transactions or cross system queries.

3. When APIs Are Better Than MCPs

While MCP provides significant advantages for LLM integrations, traditional REST or gRPC APIs remain the superior choice in several scenarios.

3.1 High Throughput, Low-Latency Services

APIs excel in scenarios requiring extreme performance characteristics. A payment processing system handling thousands of transactions per second with sub 10ms latency requirements should use direct API calls rather than the additional protocol overhead of MCP. The JSON RPC serialization, protocol negotiation, and capability discovery mechanisms in MCP introduce latency that’s acceptable for human interactive AI applications but unacceptable for high frequency trading systems or realtime fraud detection engines.

3.2 Machine to Machine Communication Without AI

When building traditional microservices architectures where services communicate directly without AI intermediaries, standard APIs provide simpler, more battle tested solutions. A REST API between your authentication service and user management service doesn’t benefit from MCP’s LLM centric features like prompt templates or context window management.

3.3 Standardized Industry Protocols

Many industries have established API standards that provide interoperability across vendors. Healthcare’s FHIR protocol, financial services’ FIX protocol, or telecommunications’ TMF APIs represent decades of industry collaboration. Wrapping these in MCP adds unnecessary complexity when the underlying APIs already provide well-understood interfaces with extensive tooling and community support.

3.4 Client Applications Without LLM Integration

Mobile apps, web frontends, or IoT devices that don’t incorporate LLM functionality should communicate via standard APIs. MCP’s value proposition centers on making it easier for AI applications to access context and tools. A React dashboard displaying analytics doesn’t need MCP’s capability discovery or prompt templates; it needs predictable, well documented API endpoints.

3.5 Legacy System Integration

Organizations with heavily invested API management infrastructure (API gateways, rate limiting, analytics, monetization) should leverage those existing capabilities rather than introducing MCP as an additional layer. If you’ve already built comprehensive API governance with tools like Apigee, Kong, or AWS API Gateway, adding MCP creates operational complexity without corresponding benefit unless you’re specifically building LLM applications.

4. Strategies and Tools for Managing MCPs at Scale

Operating MCP infrastructure in production environments requires thoughtful approaches to server management, observability, and lifecycle management.

4.1 Centralized MCP Server Registry

Large organizations should implement a centralized registry cataloging all available MCP servers, their capabilities, ownership teams, and SLA commitments. This registry serves as the source of truth for discovery, enabling development teams to find existing MCP servers before building new ones and preventing capability duplication.

A reference implementation might use a PostgreSQL database with tables for servers, capabilities, and access policies:

CREATE TABLE mcp_servers (
    server_id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    transport_type VARCHAR(50), -- 'stdio' or 'sse'
    endpoint_url TEXT,
    owner_team VARCHAR(255),
    status VARCHAR(50), -- 'active', 'deprecated', 'sunset'
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE mcp_capabilities (
    capability_id UUID PRIMARY KEY,
    server_id UUID REFERENCES mcp_servers(server_id),
    capability_type VARCHAR(50), -- 'resource', 'tool', 'prompt'
    name VARCHAR(255),
    description TEXT,
    schema JSONB
);

This registry can expose its own MCP server, enabling AI assistants to help developers discover and connect to appropriate servers through natural language queries.

4.2 MCP Gateway Pattern

For enterprise deployments, implementing an MCP gateway that sits between host applications and backend MCP servers provides several operational advantages:

Authentication and Authorization Consolidation: The gateway can implement centralized authentication, validating JWT tokens or API keys once rather than requiring each MCP server to implement authentication independently. This enables consistent security policies across all MCP integrations.

Rate Limiting and Throttling: The gateway can enforce organization-wide rate limits preventing any single client from overwhelming backend systems. This is particularly important for expensive operations like database queries or API calls to external services with usage based pricing.

Observability and Auditing: The gateway provides a single point to collect telemetry on MCP usage patterns, including which servers are accessed most frequently, which capabilities are used, error rates, and latency distributions. This data informs capacity planning and helps identify problematic integrations.

Protocol Translation: The gateway can translate between transport types, allowing stdio-based MCP servers to be accessed over HTTP/SSE by remote clients, or vice versa. This flexibility enables optimal transport selection based on deployment architecture.

A simplified gateway implementation in Java might look like:

public class MCPGateway {
    private final Map<String, MCPServerConnection> serverPool;
    private final MetricsCollector metrics;
    private final AuthenticationService auth;

    public CompletableFuture<MCPResponse> routeRequest(
            MCPRequest request, 
            String authToken) {

        // Authenticate
        User user = auth.validateToken(authToken);

        // Find appropriate server
        MCPServerConnection server = serverPool.get(request.getServerId());

        // Check authorization
        if (!user.canAccess(server)) {
            return CompletableFuture.failedFuture(
                new UnauthorizedException("Access denied"));
        }

        // Apply rate limiting
        if (!rateLimiter.tryAcquire(user.getId(), server.getId())) {
            return CompletableFuture.failedFuture(
                new RateLimitException("Rate limit exceeded"));
        }

        // Record metrics
        metrics.recordRequest(server.getId(), request.getMethod());

        // Forward request
        return server.sendRequest(request)
            .whenComplete((response, error) -> {
                if (error != null) {
                    metrics.recordError(server.getId(), error);
                } else {
                    metrics.recordSuccess(server.getId(), 
                        response.getLatencyMs());
                }
            });
    }
}

4.3 Configuration Management

MCP server configurations should be managed through infrastructure as code approaches. Using tools like Kubernetes ConfigMaps, AWS Parameter Store, or HashiCorp Vault, organizations can version control server configurations, implement environment specific settings, and enable automated deployments.

A typical configuration structure might include:

mcp:
  servers:
    - name: postgres-analytics
      transport: stdio
      command: /usr/local/bin/mcp-postgres
      args:
        - --database=analytics
        - --host=${DB_HOST}
        - --port=${DB_PORT}
      env:
        DB_PASSWORD_SECRET: aws:secretsmanager:prod/postgres/analytics
      resources:
        limits:
          memory: 512Mi
          cpu: 500m

    - name: salesforce-integration
      transport: sse
      url: https://mcp.salesforce.internal/api/v1
      auth:
        type: oauth2
        client_id: ${SALESFORCE_CLIENT_ID}
        client_secret_secret: aws:secretsmanager:prod/salesforce/oauth

This declarative approach enables GitOps workflows where changes to MCP infrastructure are reviewed, approved, and automatically deployed through CI/CD pipelines.

4.4 Health Monitoring and Circuit Breaking

MCP servers must implement comprehensive health checks and circuit breaker patterns to prevent cascading failures. Each server should expose a health endpoint indicating its operational status and the health of its dependencies.

Implementing circuit breakers prevents scenarios where a failing backend system causes request queuing and resource exhaustion across the entire MCP infrastructure:

public class CircuitBreakerMCPServer {
    private final MCPServer delegate;
    private final CircuitBreaker circuitBreaker;

    public CircuitBreakerMCPServer(MCPServer delegate) {
        this.delegate = delegate;
        this.circuitBreaker = CircuitBreaker.builder()
            .failureRateThreshold(50)
            .waitDurationInOpenState(Duration.ofSeconds(30))
            .permittedNumberOfCallsInHalfOpenState(5)
            .slidingWindowSize(100)
            .build();
    }

    public CompletableFuture<Response> handleRequest(Request req) {
        return circuitBreaker.executeSupplier(() -> 
            delegate.handleRequest(req));
    }
}

When the circuit opens due to repeated failures, requests fail fast rather than waiting for timeouts, improving overall system responsiveness and preventing resource exhaustion.

4.5 Version Management and Backward Compatibility

As MCP servers evolve, managing versions and ensuring backward compatibility becomes critical. Organizations should adopt semantic versioning for MCP servers and implement content negotiation mechanisms allowing clients to request specific capability versions.

Servers should maintain compatibility matrices indicating which host versions work with which server versions, and deprecation policies should provide clear timelines for sunsetting old capabilities:

{
  "server": "postgres-analytics",
  "version": "2.1.0",
  "compatibleClients": [">=1.0.0 <3.0.0"],
  "deprecations": [
    {
      "capability": "legacy_query_tool",
      "deprecatedIn": "2.0.0",
      "sunsetDate": "2025-06-01",
      "replacement": "parameterized_query_tool"
    }
  ]
}

5. Operational Challenges of MCPs

Deploying MCP infrastructure at scale introduces operational complexities that require careful consideration.

5.1 Process Management and Resource Isolation

Stdio based MCP servers run as separate processes spawned by the host application. In high concurrency scenarios, process proliferation can exhaust system resources. A server handling 1000 concurrent users might spawn hundreds of MCP server processes, each consuming memory and file descriptors.

Container orchestration platforms like Kubernetes can help manage these challenges by treating each MCP server as a microservice with resource limits, but this introduces complexity for stdio-based servers that were designed to run as local processes. Organizations must choose between:

Process pooling: Maintain a pool of reusable server processes, multiplexing multiple client connections across fewer processes. This improves resource efficiency but requires careful session management.

HTTP/SSE migration: Convert stdio based servers to HTTP/SSE transport, enabling them to run as traditional web services with well understood scaling characteristics. This requires significant refactoring but provides better operational characteristics.

Serverless architectures: Deploy MCP servers as AWS Lambda functions or similar FaaS offerings. This eliminates process management overhead but introduces cold start latencies and requires servers to be stateless.

5.2 State Management and Transaction Coordination

MCP servers are generally stateless, with each request processed independently. This creates challenges for operations requiring transaction semantics across multiple requests. Consider a workflow where an LLM needs to query customer data, calculate risk scores, and update a fraud detection system. Each operation might target a different MCP server, but they should succeed or fail atomically.

Traditional distributed transaction protocols (2PC, Saga) don’t integrate natively with MCP. Organizations must implement coordination logic either:

Within the host application: The host implements transaction coordination, tracking which servers were involved in a workflow and initiating compensating transactions on failure. This places significant complexity on the host.

Through a dedicated orchestration layer: A separate service manages multi-server workflows, similar to AWS Step Functions or temporal.io. MCP requests become steps in a workflow definition, with the orchestrator handling retries, compensation, and state management.

Via database backed state: MCP servers store intermediate state in a shared database, enabling subsequent requests to access previous results. This requires careful cache invalidation and consistency management.

5.3 Observability and Debugging

When an MCP based application fails, debugging requires tracing requests across multiple server boundaries. Traditional APM tools designed for HTTP based microservices may not provide adequate visibility into MCP request flows, particularly for stdio-based servers.

Organizations need comprehensive logging strategies capturing:

Request traces: Unique identifiers propagated through each MCP request, enabling correlation of log entries across servers.

Protocol level telemetry: Detailed logging of JSON RPC messages, including request timing, payload sizes, and serialization overhead.

Capability usage patterns: Analytics on which tools, resources, and prompts are accessed most frequently, informing capacity planning and server optimization.

Error categorization: Structured error logging distinguishing between client errors (invalid requests), server errors (backend failures), and protocol errors (serialization issues).

Implementing OpenTelemetry instrumentation for MCP servers provides standardized observability:

public class ObservableMCPServer {
    private final Tracer tracer;

    public CompletableFuture<Response> handleRequest(Request req) {
        Span span = tracer.spanBuilder("mcp.request")
            .setAttribute("mcp.method", req.getMethod())
            .setAttribute("mcp.server", this.getServerId())
            .startSpan();

        try (Scope scope = span.makeCurrent()) {
            return processRequest(req)
                .whenComplete((response, error) -> {
                    if (error != null) {
                        span.recordException(error);
                        span.setStatus(StatusCode.ERROR);
                    } else {
                        span.setAttribute("mcp.response.size", 
                            response.getSerializedSize());
                        span.setStatus(StatusCode.OK);
                    }
                    span.end();
                });
        }
    }
}

5.4 Security and Secret Management

MCP servers frequently require credentials to access backend systems. Storing these credentials securely while making them available to server processes introduces operational complexity.

Environment variables are commonly used but have security limitations. They’re visible in process listings and container metadata, creating information disclosure risks.

Secret management services like AWS Secrets Manager, HashiCorp Vault, or Kubernetes Secrets provide better security but require additional operational infrastructure and credential rotation strategies.

Workload identity approaches where MCP servers assume IAM roles or service accounts eliminate credential storage entirely but require sophisticated identity federation infrastructure.

Organizations must implement credential rotation without service interruption, requiring either:

Graceful restarts: When credentials change, spawn new server instances with updated credentials, wait for in flight requests to complete, then terminate old instances.

Dynamic credential reloading: Servers periodically check for updated credentials and reload them without restarting, requiring careful synchronization to avoid mid-request credential changes.

5.5 Protocol Versioning and Compatibility

The MCP specification itself evolves over time. As new protocol versions are released, organizations must manage compatibility between hosts using different MCP client versions and servers implementing various protocol versions.

This requires extensive integration testing across version combinations and careful deployment orchestration to prevent breaking changes. Organizations typically establish testing matrices ensuring critical host/server combinations remain functional:

Host Version 1.0 + Server Version 1.x: SUPPORTED
Host Version 1.0 + Server Version 2.x: DEGRADED (missing features)
Host Version 2.0 + Server Version 1.x: SUPPORTED (backward compatible)
Host Version 2.0 + Server Version 2.x: FULLY SUPPORTED

6. MCP Security Concerns and Mitigation Strategies

Security in MCP deployments requires defense in depth approaches addressing authentication, authorization, data protection, and operational security. MCP’s flexibility in connecting LLMs to enterprise systems creates significant attack surface that must be carefully managed.

6.1 Authentication and Identity Management

Concern: MCP servers must authenticate clients to prevent unauthorized access to enterprise resources. Without proper authentication, malicious actors could impersonate legitimate clients and access sensitive data or execute privileged operations.

Mitigation Strategies:

Token-Based Authentication: Implement JWT-based authentication where clients present signed tokens containing identity claims and authorization scopes. Tokens should have short expiration times (15-60 minutes) and be issued by a trusted identity provider:

public class JWTAuthenticatedMCPServer {
    private final JWTVerifier verifier;

    public CompletableFuture<Response> handleRequest(
            Request req, 
            String authHeader) {

        if (authHeader == null || !authHeader.startsWith("Bearer ")) {
            return CompletableFuture.failedFuture(
                new UnauthorizedException("Missing authentication token"));
        }

        try {
            DecodedJWT jwt = verifier.verify(
                authHeader.substring(7));

            String userId = jwt.getSubject();
            List<String> scopes = jwt.getClaim("scopes")
                .asList(String.class);

            AuthContext context = new AuthContext(userId, scopes);
            return processAuthenticatedRequest(req, context);

        } catch (JWTVerificationException e) {
            return CompletableFuture.failedFuture(
                new UnauthorizedException("Invalid token: " + 
                    e.getMessage()));
        }
    }
}

Mutual TLS (mTLS): For HTTP/SSE transport, implement mutual TLS authentication where both client and server present certificates. This provides cryptographic assurance of identity and encrypts all traffic:

server:
  ssl:
    enabled: true
    client-auth: need
    key-store: classpath:server-keystore.p12
    key-store-password: ${KEYSTORE_PASSWORD}
    trust-store: classpath:client-truststore.p12
    trust-store-password: ${TRUSTSTORE_PASSWORD}

OAuth 2.0 Integration: Integrate with enterprise OAuth providers (Okta, Auth0, Azure AD) enabling single sign on and centralized access control. Use the authorization code flow for interactive applications and client credentials flow for service accounts.

6.2 Authorization and Access Control

Concern: Authentication verifies identity but doesn’t determine what resources a user can access. Fine grained authorization ensures users can only interact with data and tools appropriate to their role.

Mitigation Strategies:

Role-Based Access Control (RBAC): Define roles with specific permissions and assign users to roles. MCP servers check role membership before executing operations:

public class RBACMCPServer {
    private final PermissionChecker permissions;

    public CompletableFuture<Response> executeToolCall(
            String toolName,
            Map<String, Object> args,
            AuthContext context) {

        Permission required = Permission.forTool(toolName);

        if (!permissions.userHasPermission(context.userId(), required)) {
            return CompletableFuture.failedFuture(
                new ForbiddenException(
                    "User lacks permission: " + required));
        }

        return executeTool(toolName, args);
    }
}

Attribute Based Access Control (ABAC): Implement policy based authorization evaluating user attributes, resource properties, and environmental context. Use policy engines like Open Policy Agent (OPA):

package mcp.authorization

default allow = false

allow {
    input.user.department == "engineering"
    input.resource.classification == "internal"
    input.action == "read"
}

allow {
    input.user.role == "admin"
}

allow {
    input.user.id == input.resource.owner
    input.action in ["read", "update"]
}

Resource Level Permissions: Implement granular permissions at the resource level. A user might have access to specific database tables, file directories, or API endpoints but not others:

public CompletableFuture<String> readFile(
        String path, 
        AuthContext context) {

    ResourceACL acl = aclService.getACL(path);

    if (!acl.canRead(context.userId())) {
        throw new ForbiddenException(
            "No read permission for: " + path);
    }

    return fileService.readFile(path);
}

6.3 Prompt Injection and Input Validation

Concern: LLMs can be manipulated through prompt injection attacks where malicious users craft inputs that cause the LLM to ignore instructions or perform unintended actions. When MCP servers execute LLM generated tool calls, these attacks can lead to unauthorized operations.

Mitigation Strategies:

Input Sanitization: Validate and sanitize all tool parameters before execution. Use allowlists for expected values and reject unexpected input patterns:

public CompletableFuture<Response> executeQuery(
        String query, 
        Map<String, Object> params) {

    // Validate query doesn't contain dangerous operations
    List<String> dangerousKeywords = List.of(
        "DROP", "DELETE", "TRUNCATE", "ALTER", "GRANT");

    String upperQuery = query.toUpperCase();
    for (String keyword : dangerousKeywords) {
        if (upperQuery.contains(keyword)) {
            throw new ValidationException(
                "Query contains forbidden operation: " + keyword);
        }
    }

    // Validate parameters against expected schema
    for (Map.Entry<String, Object> entry : params.entrySet()) {
        validateParameter(entry.getKey(), entry.getValue());
    }

    return database.executeParameterizedQuery(query, params);
}

Parameterized Operations: Use parameterized queries, prepared statements, or API calls rather than string concatenation. This prevents injection attacks by separating code from data:

// VULNERABLE - DO NOT USE
String query = "SELECT * FROM users WHERE id = " + userId;

// SECURE - USE THIS
String query = "SELECT * FROM users WHERE id = ?";
PreparedStatement stmt = connection.prepareStatement(query);
stmt.setString(1, userId);

Output Validation: Validate responses from backend systems before returning them to the LLM. Strip sensitive metadata, error details, or system information that could be exploited:

public String sanitizeErrorMessage(Exception e) {
    // Never expose stack traces or internal paths
    String message = e.getMessage();

    // Remove file paths
    message = message.replaceAll("/[^ ]+/", "[REDACTED_PATH]/");

    // Remove connection strings
    message = message.replaceAll(
        "jdbc:[^ ]+", "jdbc:[REDACTED]");

    return message;
}

Capability Restrictions: Limit what tools can do. Read only database access is safer than write access. File operations should be restricted to specific directories. API calls should use service accounts with minimal permissions.

6.4 Data Exfiltration and Privacy

Concern: MCP servers accessing sensitive data could leak information through various channels: overly verbose logging, error messages, responses sent to LLMs, or side channel attacks.

Mitigation Strategies:

Data Classification and Masking: Classify data sensitivity levels and apply appropriate protections. Mask or redact sensitive data in responses:

public class DataMaskingMCPServer {
    private final SensitivityClassifier classifier;

    public Map<String, Object> prepareResponse(
            Map<String, Object> data) {

        Map<String, Object> masked = new HashMap<>();

        for (Map.Entry<String, Object> entry : data.entrySet()) {
            String key = entry.getKey();
            Object value = entry.getValue();

            SensitivityLevel level = classifier.classify(key);

            masked.put(key, switch(level) {
                case PUBLIC -> value;
                case INTERNAL -> value; // User has internal access
                case CONFIDENTIAL -> maskValue(value);
                case SECRET -> "[REDACTED]";
            });
        }

        return masked;
    }

    private Object maskValue(Object value) {
        if (value instanceof String s) {
            // Show first and last 4 chars for identifiers
            if (s.length() <= 8) return "****";
            return s.substring(0, 4) + "****" + 
                   s.substring(s.length() - 4);
        }
        return value;
    }
}

Audit Logging: Log all access to sensitive resources with sufficient detail for forensic analysis. Include who accessed what, when, and what was returned:

public CompletableFuture<Response> handleRequest(
        Request req, 
        AuthContext context) {

    AuditEvent event = AuditEvent.builder()
        .timestamp(Instant.now())
        .userId(context.userId())
        .action(req.getMethod())
        .resource(req.getResourceUri())
        .sourceIP(req.getClientIP())
        .build();

    return processRequest(req, context)
        .whenComplete((response, error) -> {
            event.setSuccess(error == null);
            event.setResponseSize(
                response != null ? response.size() : 0);

            if (error != null) {
                event.setErrorMessage(error.getMessage());
            }

            auditLog.record(event);
        });
}

Data Residency and Compliance: Ensure MCP servers comply with data residency requirements (GDPR, CCPA, HIPAA). Data should not transit regions where it’s prohibited. Implement geographic restrictions:

public class GeofencedMCPServer {
    private final Set<String> allowedRegions;

    public CompletableFuture<Response> handleRequest(
            Request req,
            String clientRegion) {

        if (!allowedRegions.contains(clientRegion)) {
            return CompletableFuture.failedFuture(
                new ForbiddenException(
                    "Access denied from region: " + clientRegion));
        }

        return processRequest(req);
    }
}

Encryption at Rest and in Transit: Encrypt sensitive data stored by MCP servers. Use TLS 1.3 for all network communication. Encrypt configuration files containing credentials:

# Encrypt sensitive configuration
aws kms encrypt 
    --key-id alias/mcp-config 
    --plaintext fileb://config.json 
    --output text 
    --query CiphertextBlob | base64 -d > config.json.encrypted

6.5 Denial of Service and Resource Exhaustion

Concern: Malicious or buggy clients could overwhelm MCP servers with excessive requests, expensive operations, or resource intensive queries, causing service degradation or outages.

Mitigation Strategies:

Rate Limiting: Enforce per user and per client rate limits preventing excessive requests. Use token bucket or sliding window algorithms:

public class RateLimitedMCPServer {
    private final LoadingCache<String, RateLimiter> limiters;

    public RateLimitedMCPServer() {
        this.limiters = CacheBuilder.newBuilder()
            .expireAfterAccess(Duration.ofHours(1))
            .build(new CacheLoader<String, RateLimiter>() {
                public RateLimiter load(String userId) {
                    // 100 requests per minute per user
                    return RateLimiter.create(100.0 / 60.0);
                }
            });
    }

    public CompletableFuture<Response> handleRequest(
            Request req,
            AuthContext context) {

        RateLimiter limiter = limiters.getUnchecked(context.userId());

        if (!limiter.tryAcquire(Duration.ofMillis(100))) {
            return CompletableFuture.failedFuture(
                new RateLimitException("Rate limit exceeded"));
        }

        return processRequest(req, context);
    }
}

Query Complexity Limits: Restrict expensive operations like full table scans, recursive queries, or large file reads. Set maximum result sizes and execution timeouts:

public CompletableFuture<List<Map<String, Object>>> executeQuery(
        String query,
        Map<String, Object> params) {

    // Analyze query complexity
    QueryPlan plan = queryPlanner.analyze(query);

    if (plan.estimatedRows() > 10000) {
        throw new ValidationException(
            "Query too broad, add more filters");
    }

    if (plan.requiresFullTableScan()) {
        throw new ValidationException(
            "Full table scans not allowed");
    }

    // Set execution timeout
    return CompletableFuture.supplyAsync(
        () -> database.execute(query, params),
        executor
    ).orTimeout(30, TimeUnit.SECONDS);
}

Resource Quotas: Set memory limits, CPU limits, and connection pool sizes preventing any single request from consuming excessive resources:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

connectionPool:
  maxSize: 20
  minIdle: 5
  maxWaitTime: 5000

Request Size Limits: Limit payload sizes preventing clients from sending enormous requests that consume memory during deserialization:

public JSONRPCRequest parseRequest(InputStream input) 
        throws IOException {

    // Limit input to 1MB
    BoundedInputStream bounded = new BoundedInputStream(
        input, 1024 * 1024);

    return objectMapper.readValue(bounded, JSONRPCRequest.class);
}

6.6 Supply Chain and Dependency Security

Concern: MCP servers depend on libraries, frameworks, and runtime environments. Vulnerabilities in dependencies can compromise security even if your code is secure.

Mitigation Strategies:

Dependency Scanning: Regularly scan dependencies for known vulnerabilities using tools like OWASP Dependency Check, Snyk, or GitHub Dependabot:

<plugin>
    <groupId>org.owasp</groupId>
    <artifactId>dependency-check-maven</artifactId>
    <version>8.4.0</version>
    <configuration>
        <failBuildOnCVSS>7</failBuildOnCVSS>
        <suppressionFile>
            dependency-check-suppressions.xml
        </suppressionFile>
    </configuration>
</plugin>

Dependency Pinning: Pin exact versions of dependencies rather than using version ranges. This prevents unexpected updates introducing vulnerabilities:

<!-- BAD - version ranges -->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>[2.0,3.0)</version>
</dependency>

<!-- GOOD - exact version -->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.16.1</version>
</dependency>

Minimal Runtime Environments: Use minimal base images for containers reducing attack surface. Distroless images contain only your application and runtime dependencies:

FROM gcr.io/distroless/java21-debian12
COPY target/mcp-server.jar /app/mcp-server.jar
WORKDIR /app
ENTRYPOINT ["java", "-jar", "mcp-server.jar"]

Code Signing: Sign MCP server artifacts enabling verification of authenticity and integrity. Clients should verify signatures before executing servers:

# Sign JAR
jarsigner -keystore keystore.jks 
    -signedjar mcp-server-signed.jar 
    mcp-server.jar 
    mcp-signing-key

# Verify signature
jarsigner -verify -verbose mcp-server-signed.jar

6.7 Secrets Management

Concern: MCP servers require credentials for backend systems. Hardcoded credentials, credentials in version control, or insecure credential storage create significant security risks.

Mitigation Strategies:

External Secret Stores: Use dedicated secret management services never storing credentials in code or configuration files:

public class SecretManagerMCPServer {
    private final SecretsManagerClient secretsClient;

    public String getDatabasePassword() {
        GetSecretValueRequest request = GetSecretValueRequest.builder()
            .secretId("prod/mcp/database-password")
            .build();

        GetSecretValueResponse response = 
            secretsClient.getSecretValue(request);

        return response.secretString();
    }
}

Workload Identity: Use cloud provider IAM roles or Kubernetes service accounts eliminating the need to store credentials:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: mcp-postgres-server
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/mcp-postgres-role

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-postgres-server
spec:
  template:
    spec:
      serviceAccountName: mcp-postgres-server
      containers:
      - name: server
        image: mcp-postgres-server:1.0

Credential Rotation: Implement automatic credential rotation. When credentials change, update secret stores and restart servers gracefully:

public class RotatingCredentialProvider {
    private volatile Credential currentCredential;
    private final ScheduledExecutorService scheduler;

    public RotatingCredentialProvider() {
        this.scheduler = Executors.newSingleThreadScheduledExecutor();
        this.currentCredential = loadCredential();

        // Check for new credentials every 5 minutes
        scheduler.scheduleAtFixedRate(
            this::refreshCredential,
            5, 5, TimeUnit.MINUTES);
    }

    private void refreshCredential() {
        try {
            Credential newCred = loadCredential();
            if (!newCred.equals(currentCredential)) {
                logger.info("Credential updated");
                currentCredential = newCred;
            }
        } catch (Exception e) {
            logger.error("Failed to refresh credential", e);
        }
    }

    public Credential getCredential() {
        return currentCredential;
    }
}

Least Privilege: Credentials should have minimum necessary permissions. Database credentials should only access specific schemas. API keys should have restricted scopes:

-- Create limited database user
CREATE USER mcp_server WITH PASSWORD 'generated-password';
GRANT CONNECT ON DATABASE analytics TO mcp_server;
GRANT SELECT ON TABLE public.aggregated_metrics TO mcp_server;
-- Explicitly NOT granted: INSERT, UPDATE, DELETE

6.8 Network Security

Concern: MCP traffic between clients and servers could be intercepted, modified, or spoofed if not properly secured.

Mitigation Strategies:

TLS Everywhere: Encrypt all network communication using TLS 1.3. Reject connections using older protocols:

SSLContext sslContext = SSLContext.getInstance("TLSv1.3");
sslContext.init(keyManagers, trustManagers, null);

SSLParameters sslParams = new SSLParameters();
sslParams.setProtocols(new String[]{"TLSv1.3"});
sslParams.setCipherSuites(new String[]{
    "TLS_AES_256_GCM_SHA384",
    "TLS_AES_128_GCM_SHA256"
});

Network Segmentation: Deploy MCP servers in isolated network segments. Use security groups or network policies restricting which services can communicate:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mcp-server-policy
spec:
  podSelector:
    matchLabels:
      app: mcp-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: mcp-gateway
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432

VPN or Private Connectivity: For remote MCP servers, use VPNs or cloud provider private networking (AWS PrivateLink, Azure Private Link) instead of exposing servers to the public internet.

DDoS Protection: Use cloud provider DDoS protection services (AWS Shield, Cloudflare) for HTTP/SSE servers exposed to the internet.

6.9 Compliance and Audit

Concern: Organizations must demonstrate compliance with regulatory requirements (SOC 2, ISO 27001, HIPAA, PCI DSS) and provide audit trails for security incidents.

Mitigation Strategies:

Comprehensive Audit Logging: Log all security relevant events including authentication attempts, authorization failures, data access, and configuration changes:

public void recordAuditEvent(AuditEvent event) {
    String auditLog = String.format(
        "timestamp=%s user=%s action=%s resource=%s " +
        "result=%s ip=%s",
        event.timestamp(),
        event.userId(),
        event.action(),
        event.resource(),
        event.success() ? "SUCCESS" : "FAILURE",
        event.sourceIP()
    );

    // Write to tamper-proof audit log
    auditLogger.info(auditLog);

    // Also send to SIEM
    siemClient.send(event);
}

Immutable Audit Logs: Store audit logs in write once storage preventing tampering. Use services like AWS CloudWatch Logs with retention policies or dedicated SIEM systems.

Regular Security Assessments: Conduct penetration testing and vulnerability assessments. Test MCP servers for OWASP Top 10 vulnerabilities, injection attacks, and authorization bypasses.

Incident Response Plans: Develop and test incident response procedures for MCP security incidents. Include runbooks for common scenarios like credential compromise or data exfiltration.

Security Training: Train developers on secure MCP development practices. Review code for security issues before deployment. Implement secure coding standards.

7. Open Source Tools for Managing and Securing MCPs

The MCP ecosystem includes several open source projects addressing common operational challenges.

7.1 MCP Inspector

MCP Inspector is a debugging tool that provides visibility into MCP protocol interactions. It acts as a proxy between hosts and servers, logging all JSON-RPC messages, timing information, and error conditions. This is invaluable during development and troubleshooting production issues.

Key features include:

Protocol validation: Ensures messages conform to the MCP specification, catching serialization errors and malformed requests.

Interactive testing: Allows developers to manually craft MCP requests and observe server responses without building a full host application.

Traffic recording: Captures request/response pairs for later analysis or regression testing.

Repository: https://github.com/modelcontextprotocol/inspector

7.2 MCP Server Kotlin/Python/TypeScript SDKs

Anthropic provides official SDKs in multiple languages that handle protocol implementation details, allowing developers to focus on business logic rather than JSON-RPC serialization and transport management.

These SDKs provide:

Standardized server lifecycle management: Handle initialization, capability registration, and graceful shutdown.

Type safe request handling: Generate strongly typed interfaces for tool parameters and resource schemas.

Built in error handling: Convert application exceptions into properly formatted MCP error responses.

Transport abstraction: Support both stdio and HTTP/SSE transports with a unified programming model.

Repository: https://github.com/modelcontextprotocol/servers

7.3 MCP Proxy

MCP Proxy is an open source gateway implementation providing authentication, rate limiting, and protocol translation capabilities. It’s designed for production deployments requiring centralized control over MCP traffic.

Features include:

JWT-based authentication: Validates bearer tokens before forwarding requests to backend servers.

Redis-backed rate limiting: Enforces per-user or per-client request quotas using Redis for distributed rate limiting across multiple proxy instances.

Prometheus metrics: Exposes request rates, latencies, and error rates for monitoring integration.

Protocol transcoding: Allows stdio-based servers to be accessed via HTTP/SSE, enabling remote access to local development servers.

Repository: https://github.com/modelcontextprotocol/proxy

7.4 Claude MCP Benchmarking Suite

This testing framework provides standardized performance benchmarks for MCP servers, enabling organizations to compare implementations and identify performance regressions.

The suite includes:

Latency benchmarks: Measures request-response times under varying concurrency levels.

Throughput testing: Determines maximum sustainable request rates for different server configurations.

Resource utilization profiling: Tracks memory consumption, CPU usage, and file descriptor consumption during load tests.

Protocol overhead analysis: Quantifies serialization costs and transport overhead versus direct API calls.

Repository: https://github.com/anthropics/mcp-benchmarks

7.5 MCP Security Scanner

An open source security analysis tool that examines MCP server implementations for common vulnerabilities:

Injection attack detection: Tests servers for SQL injection, command injection, and path traversal vulnerabilities in tool parameters.

Authentication bypass testing: Attempts to access resources without proper credentials or with expired tokens.

Rate limit verification: Validates that servers properly enforce rate limits and prevent denial-of-service conditions.

Secret exposure scanning: Checks logs, error messages, and responses for accidentally exposed credentials or sensitive data.

Repository: https://github.com/mcp-security/scanner

7.6 Terraform Provider for MCP

Infrastructure-as-code tooling for managing MCP deployments:

Declarative server configuration: Define MCP servers, their capabilities, and access policies as Terraform resources.

Environment promotion: Use Terraform workspaces to manage dev, staging, and production MCP infrastructure consistently.

Drift detection: Identify manual changes to MCP infrastructure that deviate from the desired state.

Dependency management: Model relationships between MCP servers and their backing services (databases, APIs) ensuring correct deployment ordering.

Repository: https://github.com/terraform-providers/terraform-provider-mcp

8. Building an MCP Server in Java: A Practical Tutorial

Let’s build a functional MCP server in Java that exposes filesystem operations, demonstrating core MCP concepts through practical implementation.

8.1 Project Setup

Create a new Maven project with the following pom.xml:

<project xmlns="https://maven.apache.org/POM/4.0.0"
         xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 
         https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example.mcp</groupId>
    <artifactId>filesystem-mcp-server</artifactId>
    <version>1.0.0</version>

    <properties>
        <maven.compiler.source>21</maven.compiler.source>
        <maven.compiler.target>21</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.16.1</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>2.0.9</version>
        </dependency>

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.4.14</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.5.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation=
                                    "org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.example.mcp.FilesystemMCPServer</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

8.2 Core Protocol Types

Define the fundamental MCP protocol types:

package com.example.mcp.protocol;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import java.util.Map;

@JsonInclude(JsonInclude.Include.NON_NULL)
public record JSONRPCRequest(
    @JsonProperty("jsonrpc") String jsonrpc,
    @JsonProperty("id") Object id,
    @JsonProperty("method") String method,
    @JsonProperty("params") Map<String, Object> params
) {
    public JSONRPCRequest {
        if (jsonrpc == null) jsonrpc = "2.0";
    }
}

@JsonInclude(JsonInclude.Include.NON_NULL)
public record JSONRPCResponse(
    @JsonProperty("jsonrpc") String jsonrpc,
    @JsonProperty("id") Object id,
    @JsonProperty("result") Object result,
    @JsonProperty("error") JSONRPCError error
) {
    public JSONRPCResponse {
        if (jsonrpc == null) jsonrpc = "2.0";
    }

    public static JSONRPCResponse success(Object id, Object result) {
        return new JSONRPCResponse("2.0", id, result, null);
    }

    public static JSONRPCResponse error(Object id, int code, String message) {
        return new JSONRPCResponse("2.0", id, null, 
            new JSONRPCError(code, message, null));
    }
}

@JsonInclude(JsonInclude.Include.NON_NULL)
public record JSONRPCError(
    @JsonProperty("code") int code,
    @JsonProperty("message") String message,
    @JsonProperty("data") Object data
) {}

8.3 Server Implementation

Create the main server class handling stdio communication:

package com.example.mcp;

import com.example.mcp.protocol.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.nio.file.*;
import java.util.*;
import java.util.concurrent.*;

public class FilesystemMCPServer {
    private static final Logger logger = 
        LoggerFactory.getLogger(FilesystemMCPServer.class);
    private static final ObjectMapper objectMapper = new ObjectMapper();

    private final Path rootDirectory;
    private final ExecutorService executor;

    public FilesystemMCPServer(Path rootDirectory) {
        this.rootDirectory = rootDirectory.toAbsolutePath().normalize();
        this.executor = Executors.newVirtualThreadPerTaskExecutor();

        logger.info("Initialized filesystem MCP server with root: {}", 
            this.rootDirectory);
    }

    public static void main(String[] args) throws Exception {
        Path root = args.length > 0 ? 
            Paths.get(args[0]) : Paths.get(System.getProperty("user.home"));

        FilesystemMCPServer server = new FilesystemMCPServer(root);
        server.start();
    }

    public void start() throws Exception {
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(System.in));
        BufferedWriter writer = new BufferedWriter(
            new OutputStreamWriter(System.out));

        logger.info("MCP server started, listening on stdin");

        String line;
        while ((line = reader.readLine()) != null) {
            try {
                JSONRPCRequest request = objectMapper.readValue(
                    line, JSONRPCRequest.class);

                logger.debug("Received request: method={}, id={}", 
                    request.method(), request.id());

                JSONRPCResponse response = handleRequest(request);

                String responseJson = objectMapper.writeValueAsString(response);
                writer.write(responseJson);
                writer.newLine();
                writer.flush();

                logger.debug("Sent response for id={}", request.id());

            } catch (Exception e) {
                logger.error("Error processing request", e);

                JSONRPCResponse errorResponse = JSONRPCResponse.error(
                    null, -32700, "Parse error: " + e.getMessage());

                writer.write(objectMapper.writeValueAsString(errorResponse));
                writer.newLine();
                writer.flush();
            }
        }
    }

    private JSONRPCResponse handleRequest(JSONRPCRequest request) {
        try {
            return switch (request.method()) {
                case "initialize" -> handleInitialize(request);
                case "tools/list" -> handleListTools(request);
                case "tools/call" -> handleCallTool(request);
                case "resources/list" -> handleListResources(request);
                case "resources/read" -> handleReadResource(request);
                default -> JSONRPCResponse.error(
                    request.id(), 
                    -32601, 
                    "Method not found: " + request.method()
                );
            };
        } catch (Exception e) {
            logger.error("Error handling request", e);
            return JSONRPCResponse.error(
                request.id(), 
                -32603, 
                "Internal error: " + e.getMessage()
            );
        }
    }

    private JSONRPCResponse handleInitialize(JSONRPCRequest request) {
        Map<String, Object> result = Map.of(
            "protocolVersion", "2024-11-05",
            "serverInfo", Map.of(
                "name", "filesystem-mcp-server",
                "version", "1.0.0"
            ),
            "capabilities", Map.of(
                "tools", Map.of(),
                "resources", Map.of()
            )
        );

        return JSONRPCResponse.success(request.id(), result);
    }

    private JSONRPCResponse handleListTools(JSONRPCRequest request) {
        List<Map<String, Object>> tools = List.of(
            Map.of(
                "name", "read_file",
                "description", "Read the contents of a file",
                "inputSchema", Map.of(
                    "type", "object",
                    "properties", Map.of(
                        "path", Map.of(
                            "type", "string",
                            "description", "Relative path to the file"
                        )
                    ),
                    "required", List.of("path")
                )
            ),
            Map.of(
                "name", "list_directory",
                "description", "List contents of a directory",
                "inputSchema", Map.of(
                    "type", "object",
                    "properties", Map.of(
                        "path", Map.of(
                            "type", "string",
                            "description", "Relative path to the directory"
                        )
                    ),
                    "required", List.of("path")
                )
            ),
            Map.of(
                "name", "search_files",
                "description", "Search for files by name pattern",
                "inputSchema", Map.of(
                    "type", "object",
                    "properties", Map.of(
                        "pattern", Map.of(
                            "type", "string",
                            "description", "Glob pattern to match filenames"
                        ),
                        "directory", Map.of(
                            "type", "string",
                            "description", "Directory to search in",
                            "default", "."
                        )
                    ),
                    "required", List.of("pattern")
                )
            )
        );

        return JSONRPCResponse.success(
            request.id(), 
            Map.of("tools", tools)
        );
    }

    private JSONRPCResponse handleCallTool(JSONRPCRequest request) {
        Map<String, Object> params = request.params();
        String toolName = (String) params.get("name");

        @SuppressWarnings("unchecked")
        Map<String, Object> arguments = 
            (Map<String, Object>) params.get("arguments");

        return switch (toolName) {
            case "read_file" -> executeReadFile(request.id(), arguments);
            case "list_directory" -> executeListDirectory(request.id(), arguments);
            case "search_files" -> executeSearchFiles(request.id(), arguments);
            default -> JSONRPCResponse.error(
                request.id(), 
                -32602, 
                "Unknown tool: " + toolName
            );
        };
    }

    private JSONRPCResponse executeReadFile(
            Object id, 
            Map<String, Object> args) {
        try {
            String relativePath = (String) args.get("path");
            Path fullPath = resolveSafePath(relativePath);

            String content = Files.readString(fullPath);

            Map<String, Object> result = Map.of(
                "content", List.of(
                    Map.of(
                        "type", "text",
                        "text", content
                    )
                )
            );

            return JSONRPCResponse.success(id, result);

        } catch (SecurityException e) {
            return JSONRPCResponse.error(id, -32602, 
                "Access denied: " + e.getMessage());
        } catch (IOException e) {
            return JSONRPCResponse.error(id, -32603, 
                "Failed to read file: " + e.getMessage());
        }
    }

    private JSONRPCResponse executeListDirectory(
            Object id, 
            Map<String, Object> args) {
        try {
            String relativePath = (String) args.get("path");
            Path fullPath = resolveSafePath(relativePath);

            if (!Files.isDirectory(fullPath)) {
                return JSONRPCResponse.error(id, -32602, 
                    "Not a directory: " + relativePath);
            }

            List<String> entries = new ArrayList<>();
            try (var stream = Files.list(fullPath)) {
                stream.forEach(path -> {
                    String name = path.getFileName().toString();
                    if (Files.isDirectory(path)) {
                        entries.add(name + "/");
                    } else {
                        entries.add(name);
                    }
                });
            }

            String listing = String.join("n", entries);

            Map<String, Object> result = Map.of(
                "content", List.of(
                    Map.of(
                        "type", "text",
                        "text", listing
                    )
                )
            );

            return JSONRPCResponse.success(id, result);

        } catch (SecurityException e) {
            return JSONRPCResponse.error(id, -32602, 
                "Access denied: " + e.getMessage());
        } catch (IOException e) {
            return JSONRPCResponse.error(id, -32603, 
                "Failed to list directory: " + e.getMessage());
        }
    }

    private JSONRPCResponse executeSearchFiles(
            Object id, 
            Map<String, Object> args) {
        try {
            String pattern = (String) args.get("pattern");
            String directory = (String) args.getOrDefault("directory", ".");

            Path searchPath = resolveSafePath(directory);
            PathMatcher matcher = FileSystems.getDefault()
                .getPathMatcher("glob:" + pattern);

            List<String> matches = new ArrayList<>();

            Files.walkFileTree(searchPath, new SimpleFileVisitor<Path>() {
                @Override
                public FileVisitResult visitFile(
                        Path file, 
                        java.nio.file.attribute.BasicFileAttributes attrs) {
                    if (matcher.matches(file.getFileName())) {
                        matches.add(searchPath.relativize(file).toString());
                    }
                    return FileVisitResult.CONTINUE;
                }
            });

            String results = matches.isEmpty() ? 
                "No files found matching pattern: " + pattern :
                String.join("n", matches);

            Map<String, Object> result = Map.of(
                "content", List.of(
                    Map.of(
                        "type", "text",
                        "text", results
                    )
                )
            );

            return JSONRPCResponse.success(id, result);

        } catch (SecurityException e) {
            return JSONRPCResponse.error(id, -32602, 
                "Access denied: " + e.getMessage());
        } catch (IOException e) {
            return JSONRPCResponse.error(id, -32603, 
                "Failed to search files: " + e.getMessage());
        }
    }

    private JSONRPCResponse handleListResources(JSONRPCRequest request) {
        List<Map<String, Object>> resources = List.of(
            Map.of(
                "uri", "file://workspace",
                "name", "Workspace Files",
                "description", "Access to workspace filesystem",
                "mimeType", "text/plain"
            )
        );

        return JSONRPCResponse.success(
            request.id(), 
            Map.of("resources", resources)
        );
    }

    private JSONRPCResponse handleReadResource(JSONRPCRequest request) {
        Map<String, Object> params = request.params();
        String uri = (String) params.get("uri");

        if (!uri.startsWith("file://")) {
            return JSONRPCResponse.error(
                request.id(), 
                -32602, 
                "Unsupported URI scheme"
            );
        }

        String path = uri.substring("file://".length());
        Map<String, Object> args = Map.of("path", path);

        return executeReadFile(request.id(), args);
    }

    private Path resolveSafePath(String relativePath) throws SecurityException {
        Path resolved = rootDirectory.resolve(relativePath)
            .toAbsolutePath()
            .normalize();

        if (!resolved.startsWith(rootDirectory)) {
            throw new SecurityException(
                "Path escape attempt detected: " + relativePath);
        }

        return resolved;
    }
}

8.4 Testing the Server

Create a simple test script to interact with your server:

#!/bin/bash

# Start the server
java -jar target/filesystem-mcp-server-1.0.0.jar /path/to/test/directory &
SERVER_PID=$!

# Wait for server to start
sleep 2

# Initialize
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | 
    java -jar target/filesystem-mcp-server-1.0.0.jar /path/to/test/directory &

# List tools
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | 
    java -jar target/filesystem-mcp-server-1.0.0.jar /path/to/test/directory &

# Read a file
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"read_file","arguments":{"path":"test.txt"}}}' | 
    java -jar target/filesystem-mcp-server-1.0.0.jar /path/to/test/directory &

wait

8.5 Building and Running

Compile and package the server:

mvn clean package

Run the server:

java -jar target/filesystem-mcp-server-1.0.0.jar /path/to/workspace

The server will listen on stdin for JSON-RPC requests and write responses to stdout. You can test it interactively by piping JSON requests:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | 
    java -jar target/filesystem-mcp-server-1.0.0.jar ~/workspace

8.6 Integrating with Claude Desktop

To use this server with Claude Desktop, add it to your configuration file:

On macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%Claudeclaude_desktop_config.json

{
  "mcpServers": {
    "filesystem": {
      "command": "java",
      "args": [
        "-jar",
        "/absolute/path/to/filesystem-mcp-server-1.0.0.jar",
        "/path/to/workspace"
      ]
    }
  }
}

After restarting Claude Desktop, the filesystem tools will be available for the AI assistant to use when helping with file-related tasks.

8.7 Extending the Server

This basic implementation can be extended with additional capabilities:

Write operations: Add tools for creating, updating, and deleting files. Implement careful permission checks and audit logging for destructive operations.

File watching: Implement resource subscriptions that notify the host when files change, enabling reactive workflows.

Advanced search: Add full-text search capabilities using Apache Lucene or similar indexing technologies.

Git integration: Expose Git operations as tools, enabling the AI to understand repository history and make commits.

Permission management: Implement fine-grained access controls based on user identity or role.

9. Conclusion

Model Context Protocol represents a significant step toward standardizing how AI applications interact with external systems. For organizations building LLM-powered products, MCP reduces integration complexity, improves security posture, and enables more maintainable architectures.

However, MCP is not a universal replacement for APIs. Traditional REST or gRPC interfaces remain superior for high-performance machine-to-machine communication, established industry protocols, and applications without AI components.

Operating MCP infrastructure at scale requires thoughtful approaches to server management, observability, security, and version control. The operational challenges around process management, state coordination, and distributed debugging require careful consideration during architectural planning.

Security concerns in MCP deployments demand comprehensive strategies addressing authentication, authorization, input validation, data protection, resource management, and compliance. Organizations must implement defense-in-depth approaches recognizing that MCP servers become critical security boundaries when connecting LLMs to enterprise systems.

The growing ecosystem of open source tooling for MCP management and security demonstrates community recognition of these challenges and provides practical solutions for enterprise deployments. As the protocol matures and adoption increases, we can expect continued evolution of both the specification and the supporting infrastructure.

For development teams considering MCP adoption, start with a single high-value integration to understand operational characteristics before expanding to organization-wide deployments. Invest in observability infrastructure early, establish clear governance policies for server development and deployment, and build reusable patterns that can be shared across teams.

The Java tutorial provided demonstrates that implementing MCP servers is straightforward, requiring only JSON-RPC handling and domain-specific logic. This simplicity enables rapid development of custom integrations tailored to your organization’s unique requirements.

As AI capabilities continue advancing, standardized protocols like MCP will become increasingly critical infrastructure, similar to how HTTP became foundational to web applications. Organizations investing in MCP expertise and infrastructure today position themselves well for the AI-powered applications of tomorrow.