https://andrewbaker.ninja/wp-content/themes/twentysixteen/fonts/merriweather-plus-montserrat-plus-inconsolata.css

The Year Kafka Grew Up: What version 4.x Actually Means for Platform Teams

There is a version of the Apache Kafka story that gets told as a series of press releases. ZooKeeper removed. KRaft promoted. Share groups landed. Iceberg everywhere. Each headline lands cleanly, and then platform teams go back to their actual clusters and wonder what any of it means for them.

This post is the other version. It is what happened in the Kafka ecosystem over the past twelve months, why it matters, and what you should be paying attention to heading deeper into 2026.

1. The End of ZooKeeper: What Actually Changed

1.1 The headline

Apache Kafka 4.0 dropped on March 18, 2025, and it cut ties with ZooKeeper, a dependency that had been part of its architecture for over a decade. With Kafka 4.0, ZooKeeper support was completely removed. This was not a soft deprecation or a future-dated removal notice. It was final.

1.2 Why this took so long

If you have been watching Kafka long enough to remember when KIP-500 was first proposed in 2019, you will appreciate that the journey from “we want to remove ZooKeeper” to “ZooKeeper is gone” took more than five years. That is not because the Kafka team was slow. It is because replacing the consensus and metadata layer of a distributed system that runs production workloads for thousands of organisations is genuinely hard. You do not get to break things.

KIP-500 was introduced as an early access feature in Kafka 2.8.0, released in 2021. Over the following releases, KRaft matured, gained production readiness, and introduced migration features that made it suitable for real-world use. Kafka 3.9 was the designated bridge release, and KRaft mode became the sole implementation for cluster management in Kafka 4.0.

1.3 What KRaft actually gives you

ZooKeeper was an external dependency that required its own quorum, its own monitoring, its own security configuration, and its own operational runbooks. Every engineer who has debugged a Kafka cluster that was technically healthy but fighting with an inconsistent ZooKeeper state will remember the experience fondly.

With KRaft, Kafka now manages its metadata internally in a topic called @metadata, replicated across a quorum of controller nodes. No more juggling ZooKeeper’s quirks, its distinct configuration syntax, or its resource demands.

The scalability improvement is less talked about but arguably more significant. KRaft can handle far more partitions, think millions rather than hundreds of thousands. Operations like topic creation or partition rebalancing are now O(1), as they simply append to the metadata log, rather than requiring ZooKeeper to reload the entire topic list.

For teams running large multi-tenant Kafka clusters, that partition ceiling increase is not a nice-to-have. It is the difference between running one big cluster and running a sprawl of smaller ones purely to stay within ZooKeeper’s limits.

1.4 The upgrade reality

Clusters in ZooKeeper mode must be migrated to KRaft mode before they can be upgraded to 4.0.x. For clusters in KRaft mode with versions older than 3.3.x, upgrading to 3.9.x first is recommended before upgrading to 4.0.x. If your organisation is still sitting on 3.x with ZooKeeper, that migration path is well-documented and production-proven, but it does require sequenced effort. This is not a version bump. It is an architectural transition, and it deserves to be planned accordingly.

2. KRaft at Maturity

2.1 From “good enough” to genuinely better

There is a natural scepticism that greets replacement architectures. The old thing had years of production hardening. The new thing is theoretically cleaner. Platform engineers rightly ask whether cleaner translates to more reliable when your cluster is handling peak load at 3am.

KRaft has passed that test. With the adoption of KRaft mode, Kafka simplifies deployment while improving scalability and reliability. The controller election improvements introduced in 4.0, specifically KIP-996 adding pre-votes and KIP-966 improving data consistency during leader elections, are not theoretical. They directly reduce the blast radius of controller failovers that in ZooKeeper-based clusters could cascade into extended unavailability.

2.2 Dynamic quorums change operations

Support for dynamic KRaft quorums makes adding or removing controller nodes without downtime a much simpler process. This is a significant operational improvement. In static quorum configurations, changing controller membership was a procedure that required careful sequencing and carried real risk. Dynamic quorums bring controller maintenance closer to the routine operations teams already handle for brokers.

2.3 Simplified configuration management

The default properties files for KRaft mode are no longer stored in a separate config/kraft directory since ZooKeeper has been removed. These files have been consolidated with other configuration files. Small detail, but it matters. The previous split directory structure was a constant reminder that Kafka and ZooKeeper were two separate systems loosely coupled. Consolidation signals that KRaft is not a mode. It is Kafka.

3. Share Groups and the Queue Semantics Debate

3.1 What KIP-932 actually introduces

KIP-932 introduces early access to Queues for Kafka, enabling Kafka to support traditional queue semantics. This feature introduces the concept of share groups to enable cooperative consumption using regular Kafka topics, effectively allowing Kafka to support traditional queue semantics.

The specific capabilities are worth enumerating precisely because the marketing summary elides the nuance. Queues allow multiple consumers to cooperatively read records from the same partition. Records are still consumed by a single consumer in the share group and can be acknowledged individually. That last point is critical. Individual message acknowledgment means failed messages can be redelivered without blocking the partition. This is the behaviour that teams currently implement by hand through dead letter queues, retry topics, and custom offset management.

3.2 The maturity timeline

Queues initially released in early access in 4.0. Since then, steady improvements have been made and it reached the preview state in 4.1. The plan is to mark this feature production ready in 4.2. That trajectory suggests general availability in 2026, most likely in the first half.

Kafka 4.1, released September 2, 2025, brought Queues for Kafka into preview state through KIP-932, along with a new Streams Rebalance Protocol in early access through KIP-1071.

3.3 Why this matters strategically

Teams who chose Kafka over RabbitMQ or SQS typically did so for its durability guarantees, its replay capability, and its throughput characteristics. What they lost was the simpler consumer model where any idle worker picks up the next available message without complex partition assignment.

Share groups close that gap without abandoning Kafka’s fundamental model. A single platform team running Kafka can now support event streaming workloads, stream processing workloads, and queue-style task distribution workloads on the same infrastructure. The operational simplification alone justifies close attention.

3.4 KIP-848: The rebalance protocol that makes it possible

KIP-848, the next generation consumer group protocol, is the foundation that share groups build on. KIP-848 addressed full-group stop-the-world issues due to membership changes, ephemeral member ID assignments tied to heartbeat statuses, and brittle client-side assignments. It resolves these by allowing incremental assignments, removing ephemeral memberships by introducing durable member IDs, and eliminating fragile client-side assignment by moving coordination to the broker, making consumer groups faster and more reliable.

The Next Generation of the Consumer Rebalance Protocol is now Generally Available in Apache Kafka 4.0. The protocol is automatically enabled on the server when the upgrade to 4.0 is finalised. Clients opt in by setting group.protocol=consumer.

For large-scale deployments, the elimination of stop-the-world rebalances is not a marginal improvement. It is the difference between consumer groups that are resilient to membership changes at scale and consumer groups that become a reliability liability as they grow.

4. Diskless and Cloud Native

4.1 The structural problem KIP-500 did not solve

ZooKeeper removal cleaned up the metadata plane. It did not touch the most expensive part of running Kafka in the cloud: the data plane. Traditional Kafka brokers are stateful. They maintain local disk storage for their partition logs. They replicate across availability zones, paying cross-AZ data transfer costs on every write. They over-provision compute capacity because resizing means moving data, which is slow and operationally risky.

Those costs accumulate at scale. The industry response has been both a series of competing Kafka Improvement Proposals and a generation of Kafka-compatible startups built on fundamentally different storage architectures.

4.2 Tiered storage as the first step

Tiered storage, introduced as a production feature in Kafka 3.6, allows Kafka to offload older log segments to object storage while keeping recent data on local broker disks. This reduces storage costs without changing the core write path. It is the pragmatic middle ground and it is now widely deployed.

The limitation is that tiered storage only addresses cold data. The active write path still carries the full cost of inter-AZ replication, and brokers remain stateful for the segments they hold locally.

4.3 The KIP-1150 diskless proposal

KIP-1150, known as Diskless Topics, is a major proposal to re-architect how Kafka handles data in the cloud. It proposes allowing topics to store their data directly in object storage instead of on broker-local disks.

The proposal introduces a leaderless architecture where any broker can write data to a shared object store, bypassing the traditional replication process and its associated costs. To optimise writes to the remote storage system, brokers can group records from different topic-partitions into an object called a shared log segment object. A new coordination layer is then used to retrieve specific records from these objects.

Eliminating inter-AZ replication traffic is the core economic argument. For clusters with meaningful throughput spread across multiple availability zones, that traffic is often the dominant cost line.

4.4 The contested path to standardisation

The Kafka community finds itself at a fork in the road with three KIPs simultaneously addressing the same challenge of high replication costs when running Kafka across multiple cloud availability zones: KIP-1150, KIP-1176, and KIP-1183.

The good news from late 2025 is that community consolidation has begun. Slack announced its intention to withdraw KIP-1176 and contribute to KIP-1150 instead, reducing fragmentation risk. Whether the community converges on a single approach in time for a Kafka 5.x release or whether this extends further is genuinely uncertain. What is certain is that the direction is set. Kafka brokers will eventually stop owning the storage layer.

4.5 The commercial implementations that already exist

While the community debates the right open source path, several production implementations already run on fully disaggregated storage architectures. WarpStream, AutoMQ, and others offer Kafka-compatible services built entirely on object storage.

AutoMQ stores data entirely on S3 but adopts a different architecture. By decoupling storage and computation, it offloads storage to EBS and S3, maintaining full Kafka compatibility without compromising on latency. Confluent has implemented storage-compute separation within its serverless Confluent Cloud, with some cases showing up to 90% cost reduction compared to traditional clusters.

For organisations making infrastructure decisions now, the practical question is not whether diskless Kafka will exist in open source. It is whether the cost savings justify moving to a commercial implementation ahead of upstream standardisation, accepting either vendor lock-in or protocol compatibility risk as the trade.

5. Apache Iceberg

5.1 Why Iceberg keeps appearing in Kafka conversations

Apache Iceberg is a table format for large analytic datasets, not a streaming system. But it has become inescapable in Kafka ecosystem discussions because it solves a problem that every organisation with a Kafka cluster eventually faces: how do you make streaming data queryable without building and maintaining a custom ETL pipeline?

In 2025, Confluent Cloud Tableflow went generally available for Iceberg, WarpStream released their own Tableflow equivalent, and Aiven released open-source Iceberg Topics. The pattern is consistent across vendors: surface Kafka topics as Iceberg tables without requiring users to build the conversion infrastructure themselves.

5.2 The Confluent Tableflow approach

Tableflow, which became generally available in 2025, converts Kafka topics to Iceberg tables automatically. Data engineers can query Kafka topics using standard Iceberg-compatible engines, DuckDB, Apache Spark, Trino, Snowflake, and others, without the topic data ever having to leave the streaming layer. The streaming system and the analytical system share the same underlying data.

5.3 The cost and complexity of conversion

Generating Parquet files, the underlying format for Iceberg tables, is computationally expensive. Compared to copying a log segment from local disk to object storage, it uses at least an order of magnitude more CPU cycles and significant amounts of memory. That is fine if this operation is running on a random stateless compute node, but it runs on one of the incredibly important Kafka brokers that is the leader for some of the topic-partitions in your cluster.

That trade-off is real and worth being honest about. Direct Iceberg conversion on the broker adds compute load to exactly the components that should be focused on reliable message delivery. Organisations evaluating Iceberg-native Kafka features should test the conversion overhead against their actual topic throughput, not assume the feature is operationally free.

5.4 What Iceberg adoption actually means for architecture

The broader implication is architectural. Historically, organisations maintained a streaming layer and a data warehouse or lake layer as separate systems, connected by batch ETL jobs or streaming connectors. Iceberg, combined with the current generation of Kafka implementations, is collapsing that boundary. The streaming layer becomes the table layer. Downstream consumers, whether analysts running SQL or ML pipelines reading Parquet, access the same data without a coordination step in between.

For banking and financial services specifically, where regulatory requirements demand audit trails and the ability to replay historical data, Iceberg topics offer a compelling combination: low latency streaming semantics for operational systems and high-throughput analytical query access for compliance and reporting, from the same dataset.

6. Kafka 4.x: Release Cadence and What Is Coming

6.1 The 2025 release picture

Kafka 4.0 shipped in March 2025, removing ZooKeeper and delivering KIP-848 as GA. Kafka 4.1 shipped in September 2025, promoting KIP-932 share groups to preview and introducing the new Streams Rebalance Protocol in early access. Kafka 4.2 was in development by late 2025.

The plan is to mark Queues for Kafka production ready in 4.2. That makes Kafka 4.2 the release platform teams running task distribution workloads should be watching most closely.

6.2 The Java requirement shift

Kafka 4.0 dropped support for Java 8. Clients and Streams now require Java 11, while brokers, tools, and Connect now require Java 17. This is not theoretical compatibility noise. Organisations running Kafka brokers on Java 11 need to update their deployment configurations before upgrading to 4.0 and above. The brokers will not start on older JVM versions. This is a forcing function for JVM standardisation that some infrastructure teams will experience as unwelcome but is ultimately the right move for long-term supportability.

6.3 API compatibility boundary

Kafka 4.0 only supports KRaft mode, and old protocol API versions have been removed. Users should ensure brokers are version 2.1 or higher before upgrading Java clients to 4.0. Similarly, users should ensure their Java client version is 2.1 or higher before upgrading brokers to 4.0.

The backward compatibility window has been formally shortened. Kafka 2.1 is now the baseline. Clients older than that will not connect to Kafka 4.x brokers. For organisations with heterogeneous client deployments, including legacy applications with embedded Kafka clients, an audit of client library versions is a prerequisite for any 4.x upgrade planning.

7. The Operator Perspective: What to Prioritise in 2026

The Kafka ecosystem in 2026 is richer and more complex than it has ever been. That is good for organisations that can absorb and apply the changes. It is a liability for teams that try to follow everything simultaneously.

The practical prioritisation for most platform teams is:

Immediate: ZooKeeper migration if not already complete. Kafka 3.9 remains supported but is the last release to support ZooKeeper. Running ZooKeeper-based clusters means running against an architecture the community has formally closed. The migration tooling is mature. The risk of delay is accumulating technical debt against a hard deadline.

Near term: KIP-848 client adoption. The new consumer group protocol is enabled on brokers in Kafka 4.0 but clients must opt in. Consumer teams that update their configuration to use group.protocol=consumer will gain the stability benefits of incremental rebalances. The cost of not doing so is continuing to take stop-the-world rebalances that the protocol was specifically designed to eliminate.

Medium term: Evaluate share groups in 4.2. The preview status in 4.1 is the right time to prototype workloads that currently use separate queuing infrastructure. When 4.2 brings GA status, organisations that have already tested share groups against their use cases will be positioned to consolidate faster.

Strategic: Watch KIP-1150 and the diskless architecture consolidation. This is a decision point that will not require immediate action but will have significant infrastructure cost implications over a two to three year horizon. Organisations making cloud infrastructure investments now should ensure their Kafka deployment architecture does not foreclose the options that diskless brokers will enable.

Ongoing: Evaluate Iceberg integration against actual query patterns. Iceberg topics are compelling, but the compute overhead of conversion is real. Pilot the feature against production topic throughput before committing to it as a platform-wide pattern.

Closing Observation

Kafka’s 2025 story is not primarily about any single feature. It is about a platform that has been methodically resolving the architectural compromises it made under the constraint of early-stage distributed systems thinking, while simultaneously facing a generation of cloud-native competitors built without those constraints.

The ZooKeeper removal closes a chapter that should have closed sooner but could not close safely until it did. KRaft’s maturity delivers the architectural simplicity that always made conceptual sense. Share groups extend Kafka’s relevance to workloads that previously required a second queuing system. And the diskless architecture debate, contested as it is, points toward a future where the cost of operating Kafka at scale declines materially.

For engineering leaders, the consistent signal is that investment in Kafka expertise remains well placed. The platform is maturing in the right directions. The question is execution: how quickly can your organisation absorb the changes that are already available and position itself for the ones that are still landing.

Andrew Baker is Chief Information Officer at Capitec Bank. The views expressed here are personal and do not represent Capitec Bank or its technology strategy.

The Death of the Enterprise Service Bus: Why Kafka and Microservices Are Winning

1. Introduction

The Enterprise Service Bus (ESB) once promised to be the silver bullet for enterprise integration. Organizations invested millions in platforms like MuleSoft, IBM Integration Bus, Oracle Service Bus, and TIBCO BusinessWorks, believing they would solve all their integration challenges. Today, these same organizations are discovering that their ESB has become their biggest architectural liability.

The rise of Apache Kafka, Spring Boot, and microservices architecture represents more than just a technological shift. It represents a fundamental rethinking of how we build scalable, resilient systems. This article examines why ESBs are dying, how they actively harm businesses, and why the combination of Java, Spring, and Kafka provides a superior alternative.

2. The False Promise of the ESB

Enterprise Service Buses emerged in the early 2000s as a solution to point-to-point integration chaos. The pitch was compelling: a single, centralized platform that would mediate all communication between systems, apply transformations, enforce governance, and provide a unified integration layer.

The reality turned out very differently. What organizations got instead was a monolithic bottleneck that became increasingly difficult to change, scale, or maintain. The ESB became the very problem it was meant to solve.

3. How ESBs Kill Business Velocity

3.1. The Release Coordination Nightmare

Every change to an ESB requires coordination across multiple teams. Want to update an endpoint? You need to test every flow that might be affected. Need to add a new integration? You risk breaking existing integrations. The ESB becomes a coordination bottleneck where release cycles stretch from days to weeks or even months.

In a Kafka and microservices architecture, services are independently deployable. Teams can release changes to their own services without coordinating with dozens of other teams. A payment service can be updated without touching the order service, the inventory service, or any other component. This independence translates directly to business velocity.

3.2. The Scaling Ceiling

ESBs scale vertically, not horizontally. When you hit performance limits, you buy bigger hardware or cluster nodes, which introduces complexity and cost. More critically, you hit hard limits. There is only so much you can scale a monolithic integration platform.

Kafka was designed for horizontal scaling from day one. Need more throughput? Add more brokers. Need to handle more consumers? Add more consumer instances. A single Kafka cluster can handle millions of messages per second across hundreds of nodes. This is not theoretical scaling. This is proven at companies like LinkedIn, Netflix, and Uber handling trillions of events daily.

3.3. The Single Point of Failure Problem

An ESB is a single critical service that everything depends on. When it goes down, your entire business grinds to a halt. Payments stop processing. Orders cannot be placed. Customer requests fail. The blast radius of an ESB failure is catastrophic.

With Kafka and microservices, failure is isolated. If one microservice fails, it affects only that service’s functionality. Kafka itself is distributed and fault tolerant. With proper replication settings, you can lose entire brokers without losing data or availability. The architecture is resilient by design, not by hoping your single ESB cluster stays up.

4. The Technical Debt Trap

4.1. Upgrade Hell

ESB upgrades are terrifying events. You are upgrading a platform that mediates potentially hundreds of integrations. Testing requires validating every single flow. Rollback is complicated or impossible. Organizations commonly run ESB versions that are years out of date because the risk and effort of upgrading is too high.

Spring Boot applications follow standard semantic versioning and upgrade paths. Kafka upgrades are rolling upgrades with backward compatibility guarantees. You upgrade one service at a time, one broker at a time. The risk is contained. The effort is manageable.

4.2. Vendor Lock-In

ESB platforms come with proprietary development tools, proprietary languages, and proprietary deployment models. Your integration logic is written in vendor-specific formats that cannot be easily migrated. When you want to leave, you face rewriting everything from scratch.

Kafka is open source. Spring is open source. Java is a standard. Your code is portable. Your skills are transferable. You are not locked into a single vendor’s roadmap or pricing model.

4.3. The Talent Problem

Finding developers who want to work with ESB platforms is increasingly difficult. The best engineers want to work with modern technologies, not proprietary integration platforms. ESB skills are legacy skills. Kafka and Spring skills are in high demand.

This talent gap creates a vicious cycle. Your ESB becomes harder to maintain because you cannot hire good people to work on it. The people you do have become increasingly specialized in a dying technology, making it even harder to transition away.

5. The Pitfalls That Kill ESBs

5.1. Message Poisoning

A single malformed message can crash an ESB flow. Worse, that message can sit in a queue or topic, repeatedly crashing the flow every time it is processed. The ESB lacks sophisticated dead-letter queue handling, lacks proper message validation frameworks, and lacks the observability to quickly identify and fix poison message problems.

Kafka with Spring Kafka provides robust error handling. Dead-letter topics are first-class concepts. You can configure retry policies, error handlers, and message filtering at the consumer level. When poison messages occur, they are isolated and can be processed separately without bringing down your entire integration layer.

5.2. Resource Contention

All integrations share the same ESB resources. A poorly performing transformation or a high-volume integration can starve other integrations of CPU, memory, or thread pool resources. You cannot isolate workloads effectively.

Microservices run in isolated containers with dedicated resources. Kubernetes provides resource quotas, limits, and quality-of-service guarantees. One service consuming excessive resources does not impact others. You can scale services independently based on their specific needs.

5.3. Configuration Complexity

ESB configurations grow into sprawling XML files or proprietary configuration formats with thousands of lines. Understanding the full impact of a change requires expert knowledge of the entire configuration. Documentation falls out of date. Tribal knowledge becomes critical.

Spring Boot uses convention over configuration with sensible defaults. Kafka configuration is straightforward properties files. Infrastructure-as-code tools like Terraform and Helm manage deployment configurations in version-controlled, testable formats. Complexity is managed through modularity, not through ever-growing monolithic configurations.

5.4. Lack of Elasticity

ESBs cannot auto-scale based on load. You provision for peak capacity and waste resources during normal operation. When unexpected load hits, you cannot quickly add capacity. Manual intervention is required, and by the time you scale up, you have already experienced an outage.

Kubernetes Horizontal Pod Autoscaler can scale microservices based on CPU, memory, or custom metrics like message lag. Kafka consumer groups automatically rebalance when you add or remove instances. The system adapts to load automatically, scaling up during peaks and scaling down during quiet periods.

6. The Java, Spring, and Kafka Alternative

6.1. Modern Java Performance

Java 25 represents the cutting edge of JVM performance and developer productivity. Virtual threads, now mature and production-hardened, enable massive concurrency with minimal resource overhead. The pauseless garbage collectors, ZGC and Shenandoah, eliminate GC pause times even for multi-terabyte heaps, making Java competitive with languages that traditionally claimed performance advantages.

The ahead-of-time compilation cache dramatically reduces startup times and improves peak performance by sharing optimized code across JVM instances. This makes Java microservices start in milliseconds rather than seconds, fundamentally changing deployment dynamics in containerized environments.

This is not incremental improvement. Java 25 represents a generational leap in performance, efficiency, and developer experience that makes it the ideal foundation for high-throughput microservices.

6.2. Spring Boot Productivity

Spring Boot eliminates boilerplate. Auto-configuration sets up your application with sensible defaults. Spring Kafka provides high-level abstractions over Kafka consumers and producers. Spring Cloud Stream enables event-driven microservices with minimal code.

A complete Kafka consumer microservice can be written in under 100 lines of code. Testing is straightforward with embedded Kafka. Observability comes built in with Micrometer metrics and distributed tracing support.

6.3. Kafka as the Integration Backbone

Kafka is not just a message broker. It is a distributed commit log that provides durable, ordered, replayable streams of events. This fundamentally changes how you think about integration.

With Kafka 4.2, the platform has evolved even further by introducing native queue support alongside its traditional topic-based architecture. This means you can now implement classic queue semantics with competing consumers for workload distribution while still benefiting from Kafka’s durability, scalability, and operational simplicity. Organizations no longer need separate queue infrastructure for point-to-point messaging patterns.

Instead of request-response patterns mediated by an ESB, you have event streams that services can consume at their own pace. Instead of transformations happening in a central layer, transformations happen in microservices close to the data. Instead of a single integration layer, you have a distributed data platform that handles both streaming and queuing workloads.

7. Real-World Patterns

7.1. Event Sourcing

Store every state change as an event in Kafka. Your services consume these events to build their own views of the data. You get complete audit trails, temporal queries, and the ability to rebuild state by replaying events.

ESBs cannot do this. They are designed for transient message passing, not durable event storage.

7.2. Change Data Capture

Use tools like Debezium to capture database changes and stream them to Kafka. Your microservices react to these change events without complex database triggers or polling. You get near real-time data pipelines without the fragility of ESB database adapters.

7.3. Saga Patterns

Implement distributed transactions using choreography or orchestration patterns with Kafka. Each service publishes events about its local transactions. Other services react to these events to complete their portion of the saga. You get eventual consistency without distributed locks or two-phase commit.

ESBs attempt to solve this with BPEL or proprietary orchestration engines that become unmaintainable complexity.

7.4. Work Queue Distribution

With Kafka 4.2’s native queue support, you can implement traditional work-queue patterns where tasks are distributed among competing consumers. This is perfect for batch processing, background jobs, and task distribution scenarios that previously required separate queue infrastructure like RabbitMQ or ActiveMQ. Now you get queue semantics with Kafka’s operational benefits.

8. The Migration Path

8.1. Strangler Fig Pattern

You do not need to rip out your ESB overnight. Apply the strangler fig pattern. Identify new integrations or integrations that need significant changes. Implement these as microservices with Kafka instead of ESB flows. Gradually migrate existing integrations as they require updates.

Over time, the ESB shrinks while your Kafka ecosystem grows. Eventually, the ESB becomes small enough to eliminate entirely.

8.2. Event Gateway

Deploy a Kafka-to-ESB bridge for transition periods. Services publish events to Kafka. The bridge consumes these events and forwards them to ESB endpoints where necessary. This allows new services to be built on Kafka while maintaining compatibility with legacy ESB integrations.

8.3. Invest in Platform Engineering

Build internal platforms and tooling around your Kafka and microservices architecture. Provide templates, generators, and golden-path patterns that make it easier to build microservices correctly than to add another ESB flow.

Platform engineering accelerates the migration by making the right way the easy way.

9. The Cost Reality

Organizations often justify ESBs based on licensing costs versus building custom integrations. This analysis is fundamentally flawed.

ESB licenses are expensive, but that is just the beginning. Add the cost of specialized consultants. Add the cost of extended release cycles. Add the opportunity cost of features not delivered because teams are blocked on ESB changes. Add the cost of outages when the ESB fails.

Kafka is open source with zero licensing costs. Spring is open source. Java is free. The tooling ecosystem is mature and open source. Your costs shift from licensing to engineering time, but that engineering time produces assets you own and can evolve without vendor dependency.

More critically, the business velocity enabled by microservices and Kafka translates directly to revenue. Features ship faster. Systems scale to meet demand. You capture opportunities that ESB architectures would have missed.

10. Conclusion

The ESB is a relic of an era when centralization seemed like the answer to complexity. We now know that centralization creates brittleness, bottlenecks, and business risk.

Kafka and microservices represent a fundamentally better approach. Distributed ownership, independent scalability, fault isolation, and evolutionary architecture are not just technical benefits. They are business imperatives in a world where velocity and resilience determine winners and losers.

The question is not whether to move away from ESBs. The question is how quickly you can execute that transition before your ESB becomes an existential business risk. Every day you remain on an ESB architecture is a day your competitors gain ground with more agile, scalable systems.

The death of the ESB is not a tragedy. It is an opportunity to build systems that actually work at the scale and pace modern business demands. Java, Spring, and Kafka provide the foundation for that future. The only question is whether you will embrace it before it is too late.