Confluent Cloud: Fully Managed Kafka Streaming

This report focuses on real-time data and how autonomous systems can be fed at scale reliably. To shed light on this challenge, we assess the ease of use of a fully managed Kafka platform—Confluent Cloud—and a self-managed open-source Apache Kafka solution.

The most popular tool for streaming data is the Apache Kafka project. Created by LinkedIn, Kafka was open sourced and graduated from the Apache Incubator in late 2012. Kafka is a distributed publish-subscribe messaging system that maintains feeds in groups, known as topics. Publishers write data to topics where subscribers can read them. Kafka is a distributed system, where topics are partitioned and replicated across multiple nodes in the cluster.

Within Kafka, messages are key/value pairs that can store objects in any format. Messages with the same key are ordered and stored in the same partition so they can be consumed by the same instance of a subscriber.

In our test, we utilized all steps of a use case for a distributed event store and stream-processing platform. The categories, or components of ease of use, included in our calculations were across setup, development, operations, and scale categories.

Using story points, we assessed the comparative ease-of-use value realization between Confluent Cloud and Kafka across setup, development, and operations. We found that the value realization of fully managed Confluent Cloud was about three times that of open-source Kafka in setup, nearly double in development, and more than double in operations.

Scalability is a significant component of why fully managed Confluent Cloud is easier to use than open-source Kafka. It’s easy to get started and can grow up to 5 GBps ingress with the click of a button. This requires hours or days of manual effort with open-source Kafka

Our team found that fully managed Confluent Cloud is much easier to use than open-source Kafka. While Confluent Cloud accelerates the setup, development, and operations, the most impressive feature is the seamless scale out for when the application grows.

With companies striving to be data driven and utilize every bit of data possible, it is essential to process an increasing amount of data in real time. There are numerous applications being developed that make autonomous decisions about where data is produced, consumed, analyzed, and reacted to in real time. The technology is making pragmatic, tactical decisions on its own as a result. However, if data is not captured within a specific timeframe, its value is lost, and the decision or action needed never occurs.

There are, fortunately, technologies designed to handle large volumes of time-sensitive streaming data. Known by names like streaming, messaging, live feeds, real-time, and event-driven, this data category needs special attention because delayed processing can negatively affect its value. A sudden price change, a critical threshold met, an anomaly detected, a sensor reading changing rapidly, an outlier in a log file—any of these can be of immense value to a decision-maker or a process, but only if alerted in time to affect the outcome.

There have been explosive developments in this space in the past few years, together with a corresponding growth of commercial vendors that close source a few capabilities that are borderline necessary for enterprise applications. These capabilities include security features for access control, encryption, auditing, connectors maintained for a growing number of applications and data systems, and disaster recovery tools.

Some organizations require a commercial vendor behind every piece of software in the shop. Others decide on a case-by-case basis, often leaning toward open-source options that boast low up-front costs and the opportunity to prove out software, albeit without the safety net of a commercial vendor arrangement.

The realities of the pandemic have exposed the importance of a reliable, agile technology infrastructure like Confluent for enabling business continuity, mining intelligence, and scalable operation. Understanding the cost implications of Confluent deployments in this environment is important, especially as enterprises build and run applications and services on managed platforms.

This report outlines the results of an ease-of-use field test to uncover the advantages of a fully managed Kafka platform—Confluent Cloud—over a self-managed open-source Apache Kafka solution. Note that the open-source Kafka deployment assumes a virtualized or cloud-based environment, as opposed to a bare-metal environment that presents even greater complexity. It is also worth noting that this comparison addresses only a subset of the functionality enabled by Confluent Cloud, which is itself a complete data streaming platform.

Assessing the ease of use of a particular platform is an important factor in time-to-value implementations and day-to-day operations and maintenance. Often, an ease-of-use study uncovers hidden costs and the time and effort it takes to get the solution off the ground and produce value for the organization. It can also uncover hidden technical debt and issues of platform maintenance that come from custom configurations, undocumented workarounds, siloed development, individual contributor knowledge, and so on.

Two of the value propositions of fully managed platforms in the cloud are increased time to value and the mitigation of technical and administration debts—which we call ease-of-use value realization. This report measures the ease of use and quantifies the difference between the time-and-effort costs of a self-managed, open-source Kafka solution versus the fully managed offering from Confluent Cloud.

We utilized all steps of a use case for a distributed event store and stream-processing platform. The categories, or components of ease of use, that we included in our calculations are as follows:

See the Appendix for a work breakdown of the testing tasks.

Testing Configuration For our test, we built two different core streaming data stacks—the only difference being the data streaming engine. Table 1 and Figure 1 illustrate the stacks, with Figure 1 depicting the streaming platforms being compared in the middle column.

To simulate data movement across these stacks, we used TPC-C-like transactional workloads to generate data in the sources and used Schema Registry and Connect to move the data to the PostgreSQL sink. NOTE: This was not a TPC-C benchmark nor was it a performance benchmark (learn more about the TPC-C workload). The TPC-C-like workload was only used to generate data for stream processing.

Test Scoring To score ease of use, we took inspiration from an agile Scrum development project-management approach. As experienced consultants, we have extensively used and advocated for the agile project management methodology when developing and operating information management platforms. The agile method is a much bigger subject than the scope of this paper, but we used the following components of an agile methodology in this ease-of-use field test, many of which will be familiar to readers:

In our study, we collected two different measures:

We documented every story/task in the completion of our sprints, the URL we used, the number of discrete steps required, and how much time and effort it required—expressed as a T-shirt size and translated to a Fibonacci number. We divided the work into the following six (6) sprints:

This section details the results of our ease-of-use field test. To present our findings, we begin with a series of burndown charts showing the time and effort required to complete each sprint.

Setup Kafka Cluster Sprint The burndown chart in Figure 3 compares the burndown of work required to set up a Kafka Cluster between Apache Kafka and Confluent Cloud. As you can see, there was considerably less time and effort required in Confluent Cloud (with only 7 story points) compared to Apache Kafka (53 story points), since many of the tasks are taken care of using their fully managed platform out of the box.

Setup Database Sprint The chart in Figure 4 compares the burndown of work required to set up the ksqlDB database on both Apache Kafka and Confluent Cloud. Again, there was much less time and effort required in Confluent Cloud (with only 9 story points) compared to Apache Kafka (38 story points).

Setup Database Connect Sprint Figure 5 compares the burndown of work required to set up the four database connectors on both Apache Kafka and Confluent Cloud. Keep in mind, this included many tasks for setup and configuration of each of the four databases (PostgreSQL, MySQL, SQL Server, and MongoDB). There was less time and effort required in Confluent Cloud (29 story points) compared to Apache Kafka (42 story points).

Note that the connectors used in this sprint are ones that are freely available for both Confluent Cloud and open-source Apache Kafka. However, Confluent offers a slew of additional fully managed connectors that can enable even faster time to value when integrating with external systems.

Typical Development Sprint The chart in Figure 6 compares a typical development sprint for tasks involving both Apache Kafka and Confluent Cloud. Keep in mind that during a Kafka rollout, these same development tasks will be repeated over and over, multiplying the advantage of saving time and effort by using Confluent Cloud. There was less time and effort required in Confluent Cloud (12 story points) compared to Apache Kafka (22 story points).

Basic Operational Change Sprint The chart in Figure 7 compares performing basic operational tasks on both Apache Kafka and Confluent Cloud, and includes tasks like modifying a topic, rebalancing a topic leader, removing a topic, and throttling bandwidth. Again, these tasks will be repeated over and over, multiplying the effect over time. There was less time and effort required in Confluent Cloud (3 story points) compared to Apache Kafka (7 story points).

Scale Out Sprint Figure 8 compares performing a scale out (adding nodes) on an Apache Kafka cluster. The major difference here is that Confluent Cloud completely manages scaling operations and therefore requires zero time or interaction to manage the size of the infrastructure. By contrast, Apache Kafka requires considerable time and effort (47 story points).

To summarize and measure overall ease of use, we used the results of our agile work burndown to quantify a final result. We arrived at two factors:

In terms of the overall effort, we found Confluent Cloud should take about 71% less time and effort to deploy, develop, and maintain than a self-managed open-source Apache Kafka stack. To get a sense of the scope of tasks assessed in this comparison, the detailed work breakdown is published in the Appendix of this report.

We found that overall, working with Confluent Cloud is roughly 31% easier (less complex and less prone to human error) than open-source Apache Kafka.

In our test, we utilized all steps of a use case for a distributed event store and stream-processing platform. The categories, or components of ease of use, included in our calculations were across setup, development, operations, and scale categories. We broke each of these categories into their component tasks for each platform, and then assigned scoring to these tasks to determine relative level of effort and difficulty associated with each process category.

Our findings: In setup operations, a fully managed Confluent Cloud offers about three times the ease-of-use value realization compared to open-source Apache Kafka. In terms of development, the advantage of Confluent Cloud over Apache Kafka is almost 2x, while for operations the advantage is greater than 2x.

Scalability is an area of significant advantage. We found a fully managed Confluent Cloud much easier to use than open-source Apache Kafka—a click of a button is all that is needed to grow a deployment up to 5 GBps ingress. By comparison, open-source Apache Kafka can require hours or even days of manual effort to do the same–and that doesn’t even account for the additional planning and coordination that must take place ahead of a scaling effort.

In terms of overall effort, we found Confluent Cloud should take about 71% less time and effort to deploy, develop, and maintain than a self-managed open-source Apache Kafka stack. We also found that overall, working with Confluent Cloud is roughly 31% easier (less complex and less prone to human error) than open-source Apache Kafka.

Any way you slice it, fully managed Confluent Cloud proved in our testing to be much easier to use than open-source Kafka. Using Confluent Cloud accelerates setup, development, and operations, but the most impressive feature is the seamless scale out for when the application grows (not to mention the ability to scale down to realign spending when workloads shrink). This drives straight to the bottom line through efficiencies, the time to do more, and the ability to realize more business value.

The following is a detailed work breakdown used for the field tests. The platform for which the task applies is also indicated.

Change cluster settings: Confluent Cloud Configure broker initially: Apache Kafka Create CI/CD pipeline process for prod config updates: Apache Kafka Reconfigure and deploy broker config: Apache Kafka Repeat for production environment: Apache Kafka Test broker config: Apache Kafka

Create a cluster: Confluent Cloud Download latest release: Apache Kafka Install Java: Apache Kafka Launch EC2 instance: Apache Kafka Repeat for each broker: Apache Kafka Repeat for production environment: Apache Kafka Start environment: Apache Kafka

Configure access management: Confluent Cloud Configure ACLs: Apache Kafka Configure SASL for brokers: Apache Kafka Configure Zookeeper authentication: Apache Kafka Create certificate authority ad sign: Apache Kafka Generate SSL key/cert for each broker: Apache Kafka, Confluent Cloud Reconfigure brokers for cert: Apache Kafka

Configure ksql listeners: Apache Kafka Configure ksql server parameters: Apache Kafka Configure topic and connector: Confluent Cloud Create CI/CD pipeline process for prod config updates: Apache Kafka Repeat for production environment: Apache Kafka Test listener config: Apache Kafka

Add public key and repo: Apache Kafka Create ksql cluster: Confluent Cloud Install pre-reqs: Apache Kafka Launch EC2 instance: Apache Kafka Repeat for additional cluster nodes: Apache Kafka Repeat for production environment: Apache Kafka Start environment: Apache Kafka

Configure connector for PostgreSQL: Confluent Cloud Configure SQL Server: Apache Kafka Configure CDC for SQL Server: Apache Kafka Configure connector for MongoDB: Confluent Cloud Configure connector for MySQL: Confluent Cloud Configure connector for SQL Server: Confluent Cloud Configure data change events for MongoDB: Apache Kafka Configure data change events for MySQL: Apache Kafka Configure data change events for PostgreSQL: Apache Kafka Configure Debezium connector for MongoDB: Apache Kafka Configure Debezium connector for MySQL: Apache Kafka Configure Debezium connector for PostgreSQL: Apache Kafka Configure Debezium connector for SQL Server: Apache Kafka Configure MongoDB: Apache Kafka, Confluent Cloud Configure MySQL: Apache Kafka, Confluent Cloud Configure PostgreSQL: Apache Kafka, Confluent Cloud Configure SQL Server: Confluent Cloud

Create connector for MongoDB: Confluent Cloud Create connector for MySQL: Confluent Cloud Create connector for PostgreSQL: Confluent Cloud Create connector for SQL Server: Confluent Cloud Install Debezium connector for MongoDB: Apache Kafka Install Debezium connector for MySQL: Apache Kafka Install Debezium connector for PostgreSQL: Apache Kafka Install Debezium connector for SQL Server: Apache Kafka

Ease-of-use is important, but it is only one criterion for platform selection. This test is a point-in-time check into specific time-and-effort tasks. There are numerous other factors to consider in selection across factors of Performance, Administration, Features and Functionality, Workload Management, User Interface, Scalability, Vendor, Reliability, and numerous other criteria. It is also our experience that documentation and practices change over time and is different for different platforms.

GigaOm runs all of its tests to strict ethical standards. The results of the report are the objective results of the application of tests to the simulations described in the report. The report clearly defines the selected criteria and process used to establish the field test. The report also clearly states the tools and workloads used. The reader is left to determine for themselves how to qualify the information for their individual needs. The report does not make any claim regarding the third-party certification and presents the objective results received from the application of the process to the criteria as described in the report. The report strictly measures ease-of-use and does not purport to evaluate other factors that potential customers may find relevant when making a purchase decision.

This is a sponsored report. Confluent chose the competitors. GigaOm designed the test and scoring rubric. Choosing compatible configurations is subject to judgment. We have attempted to describe our decisions in this paper.

William McKnight is a former Fortune 50 technology executive and database engineer. An Ernst & Young Entrepreneur of the Year finalist and frequent best practices judge, he helps enterprise clients with action plans, architectures, strategies, and technology tools to manage information.

Currently, William is an analyst for GigaOm Research who takes corporate information and turns it into a bottom-line-enhancing asset. He has worked with Dong Energy, France Telecom, Pfizer, Samba Bank, ScotiaBank, Teva Pharmaceuticals, and Verizon, among many others. William focuses on delivering business value and solving business problems utilizing proven approaches in information management.

Jake Dolezal is a contributing analyst at GigaOm. He has two decades of experience in the information management field, with expertise in analytics, data warehousing, master data management, data governance, business intelligence, statistics, data modeling and integration, and visualization. Jake has solved technical problems across a broad range of industries, including healthcare, education, government, manufacturing, engineering, hospitality, and restaurants. He has a doctorate in information management from Syracuse University.

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

© Knowingly, Inc. 2022 "Confluent Cloud: Fully Managed Kafka Streaming" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.

Images Powered by Shutterstock

The Data Daily

Confluent Cloud: Fully Managed Kafka Streaming