Change Data Capture with CockroachDB and Strimzi

By Aykut Bulgu / June 13, 2022

Dealing with data is hard in the Cloud-native era, as the cloud itself has many dynamics, and microservices have their unique data-relevant problems. As many patterns emerge after dealing with similar problems over and over, technologists invented many data integration patterns to solve those data-related problems.

Change Data Capture (CDC) is one of those data integration patterns, maybe one of the most popular ones nowadays. It enables capturing row-level changes into a configurable sink for downstream processing such as reporting, caching, full-text indexing, or most importantly helping with avoiding dual writes.

Many technologies implement CDC as either a part of their product’s solution or the main functionality of an open-source project like Debezium.

CockroachDB is one of those technologies that implement CDC as a part of their product features.

CockroachDB is a distributed database that supports standard SQL. It is designed to survive software and hardware failures, from server restarts to data center outages.

While CockroachDB is designed to be excellent, it needs to coexist with other systems like full-text search engines, analytics engines or data pipeline systems. Because of that, it has Changefeeds, which enable data sinks like AWS S3, webhooks and most importantly Apache Kafka or Strimzi: a Kafka on Kubernetes solution.

Strimzi is a CNCF sandbox project, which provides an easy way to run and manage an Apache Kafka cluster on Kubernetes or OpenShift. Strimzi is a Kubernetes Operator, and it provides an easy and flexible configuration of a Kafka cluster, empowered by the capabilities of Kubernetes/OpenShift.

In this tutorial, you will:

Run a Kafka cluster on OpenShift/Kubernetes.
Create a topic within Strimzi by using its the Strimzi CLI.
Create a CockroachDB cluster on OpenShift and use its SQL client.
Create a table on CockroachDB and configure it for using CDC.
Create, update, delete records in the CockroachDB table.
Consume the change events from the relevant Strimzi topic by using the Strimzi CLI.

Prerequisites

You’ll need the following for this tutorial:

oc CLI.
Strimzi CLI.
A 30-days trial license for CockroachDB, which is required to use CockroachDB’s CDC feature.
You must have Strimzi 0.26.1 or AMQ Streams 2.1.0 operator installed on your OpenShift/Kubernetes cluster.
You must have CockroachDB 2.3.0 operator installed on your OpenShift/Kubernetes cluster.

CockroachBank’s Request

Suppose that you are a consultant who works with a customer, CockroachBank LLC.

They use CockroachDB on OpenShift and their daily bank account transactions are kept in CockroachDB. Currently, they have a mechanism for indexing the bank account transaction changes in Elasticsearch but they noticed that it creates data inconsistencies between the actual data and the indexed log data that is in Elasticsearch.

They want you to create a core mechanism that avoids any data inconsistency issue between systems. They require you to create a basic implementation of a CDC by using CockroachDB’s Changefeed mechanism and because they use CockroachDB on OpenShift, they would like you to use Strimzi.

You must only implement the CDC part, so do not need to implement the Elasticsearch part. To simulate the CockroachBank system, you must install the AMQ Streams and CockroachDB operators on your OpenShift cluster and create their instances.

The following image is the architectural diagram of the system they require you to implement:

Required Architecture

Creating a Strimzi Cluster

To create a Strimzi cluster, you need a namespace/project created on your OpenShift/Kubernetes cluster. You can use the oc CLI to create a namespace/project.

oc new-project cockroachbank

Run the kfk command of Strimzi CLI to create a Kafka cluster with one broker. A small cluster with one broker is enough for a demo.

export STRIMZI_KAFKA_CLI_STRIMZI_VERSION=0.26.1 && \
kfk clusters --create \
--cluster my-cluster \
--replicas 1 \
--zk-replicas 1 \
-n cockroachbank -y

NOTE

You can download the Strimzi CLI Cheat Sheet from here for more information about the CLI commands.

Verify that you have all the Strimzi pods up and running in Ready state.

oc get pods -n cockroachbank

The output should be as follows:

NAME                                          READY   STATUS    ...
...output omitted...
my-cluster-entity-operator-64b8dcf7f6-cmjz8   3/3     Running   ...
my-cluster-kafka-0                            1/1     Running   ...
my-cluster-zookeeper-0                        1/1     Running   ...
...output omitted...

Running CockroachDB on OpenShift

To install CockroachDB on your OpenShift cluster, download the CockroachDB cluster custom resource, which the CockroachDB operator creates the cluster for you by using the resource definition you provide in the YAML.

The YAML content should be as follows:

kind: CrdbCluster
apiVersion: crdb.cockroachlabs.com/v1alpha1
metadata:
  name: crdb-tls-example
  namespace: cockroachbank
spec:
  cockroachDBVersion: v21.1.10
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        volumeMode: Filesystem
  nodes: 3
  tlsEnabled: false

Run the following command to apply the resource YAML on OpenShift.

oc create -f crdb-cluster.yaml -n cockroachbank

Verify that you have all the CockroachDB pods up and running in Ready state.

oc get pods -n cockroachbank

The output should be as follows:

NAME                                          READY   STATUS    ...
...output omitted...
crdb-tls-example-0                            1/1     Running   ...
crdb-tls-example-1                            1/1     Running   ...
crdb-tls-example-2                            1/1     Running   ...
...output omitted...

In another terminal run the following command to access CockroachDB SQL client interface:

oc rsh -n cockroachbank crdb-tls-example-0 cockroach sql --insecure

On the SQL client interface, run the following commands to enable the trial for enterprise usage. You must do this because CDC is a part of the Enterprise Changefeeds and you need an enterprise license. Refer to the Prerequisites part of this tutorial if you did not get a trial license yet.

SET CLUSTER SETTING cluster.organization = '_YOUR_ORGANIZATION_';

SET CLUSTER SETTING enterprise.license = '_YOUR_LICENSE_';

Creating and Configuring a CockroachDB Table

On the SQL query client terminal window, run the following command to create a database called bank on CockroachDB.

root@:26257/defaultdb> CREATE DATABASE bank;

Select the bank database to use for the rest of the actions in the query window.

root@:26257/defaultdb> USE bank;

Create a table called account with the id and balance fields.

root@:26257/bank> CREATE TABLE account (id INT PRIMARY KEY, balance INT);

Create a Changefeed for the table account. Set the Strimzi Kafka cluster broker address for the Changefeed, to specify the broker to send the captured change data:

root@:26257/bank> CREATE CHANGEFEED FOR TABLE account INTO 'kafka://my-cluster-kafka-bootstrap:9092' WITH UPDATED;

Notice that the Kafka bootstrap address is my-cluster-kafka-bootstrap:9092. This is the service URL that Strimzi provides for the Kafka cluster my-cluster you have created.

NOTE

For more information on creating a Changefeed on CockroachDB refer to this documentation page.

Leave the terminal window open for further instructions.

Creating a Strimzi Topic and Consuming Data

In another terminal window run the following command to create a topic called account:

kfk topics --create --topic account \
--partitions 1 --replication-factor 1 \
-c my-cluster -n cockroachbank

The output must be as follows:

kafkatopic.kafka.strimzi.io/account created

IMPORTANT

A topic with 1 partition and 1 replication factor is enough for this demonstration. If you have a Strimzi Kafka cluster with more than 1 broker, then you can configure your topic differently.

Notice that it is the same name as the CockroachDB table account. CockroachDB CDC produces change data into a topic with the same name as the table by default.

In the same terminal window run the following command to start consuming from the account topic:

kfk console-consumer --topic account \
-c my-cluster -n cockroachbank

Leave the terminal window open to see the consumed messages for the further steps.

Capturing the Change Event Data

In the SQL client terminal window, run the following command to insert a sample account data into the account table.

root@:26257/bank> INSERT INTO account (id, balance) VALUES (1, 1000), (2, 250), (3, 700);

This should create the following accounts in the CockroachDB account table:

id	balance
1	1000
2	250
3	700

After inserting the data, verify that the Strimzi CLI consumer prints out the consumed data:

{"after": {"balance": 1000, "id": 1}, "updated": "1655075852039229718.0000000000"}
{"after": {"balance": 250, "id": 2}, "updated": "1655075852039229718.0000000000"}
{"after": {"balance": 700, "id": 3}, "updated": "1655075852039229718.0000000000"}

To observe some more event changes on the topic data, run the following queries on the SQL client to change the balance between accounts.

root@:26257/bank> UPDATE account SET balance=700 WHERE id=1;
root@:26257/bank> UPDATE account SET balance=600 WHERE id=2;
root@:26257/bank> UPDATE account SET balance=650 WHERE id=3;

Verify that the new changes, which CDC of CockroachDB captures, are printed out by the Strimzi CLI consumer. In the consumer’s terminal window, you should see a result as follows:

...output omitted...
{"after": {"balance": 700, "id": 1}, "updated": "1655075898837400914.0000000000"}
{"after": {"balance": 600, "id": 2}, "updated": "1655075904775509980.0000000000"}
{"after": {"balance": 650, "id": 3}, "updated": "1655075910511670876.0000000000"}

Notice the balance changes that represent a transaction history in the consumer output.

As the last step, delete the bank account to see how CockroachDB CDC captures them. Run the following SQL query in the query console of CockroachDB:

root@:26257/bank> DELETE FROM account where id <> 0;

The Strimzi CLI consumer should print the following captured events:

...output omitted...
{"after": null, "updated": "1655075949177132715.0000000000"}
{"after": null, "updated": "1655075949177132715.0000000000"}
{"after": null, "updated": "1655075949177132715.0000000000"}

Notice that the after field becomes null when the change data is a delete event.

Congratulations! You’ve successfully captured the bank account transaction changes, and streamed them as change events via Strimzi.

Conclusion

In this tutorial, you have learned how to run a Kafka cluster on OpenShift by using Strimzi, and how to create a topic by using the Strimzi CLI. Also, you have learned to install CockroachDB on OpenShift, create a table by using its SQL query interface, and configure it for using CDC. You’ve created change events on a CockroachDB table and consumed those events from the Strimzi Kafka topic by using the Strimzi CLI.

You can find the resources for this tutorial in this repository.

Messaging Architectures for Cloud-Native Applications

Messaging

Microservices

Patterns

By Aykut Bulgu / December 1, 2020

1. Traditional Applications and Monoliths
- 1.1. Benefits and Drawbacks
2. Cloud-Native Applications and Microservices
Resources

1. Traditional Applications and Monoliths

Do you remember the times we used to create applications by creating data models regarding the business domains and use those data models as the reflection of the relational database objects -mostly tables- in order to do CRUD actions?

Business requirements were pouring from the waterfall and making us so soaking wet that we could not easily respond to change: new business requirements, bug fixes, enhancements, etc.

When Agile methodologies came out, after some time, this made us more flexible and respond to change very quickly whereas there came out some ideas of SOA, service bus, distributed state management, etc. but business domains stayed kind of merged and monoliths survived.

Monolith applications -which is actually not an anti-pattern- ruled the world for a considerable number of years with different kinds of architectures that have their own benefits and drawbacks.

Figure 1. Traditional Application Design. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures?slide=4

1.1. Benefits and Drawbacks

The Benefits:

Easy to start with
Easy transaction management
Sync communication
Can be powered-up with the modular architecture

The Drawbacks:

Hard to change business domain and data model
Hard to scale
Tightly coupled components

2. Cloud-Native Applications and Microservices

One of the best motivations for approaches like Domain Driven Design is being monoliths’ tightly coupled between business domains and the need of separating those domains in order to loosen the coupling and provide single responsibility for each domain as bounded contexts.

Figure 2. Bounded Contexts. https://martinfowler.com/bliki/BoundedContext.html

So these kinds of approaches led to microservices’ creation with the motivation of loosely coupling between bounded contexts, being polyglot, which means using best-fitting tools for the relevant service, easily being scalable horizontally, and most importantly with these benefits, being able to easily adapt into the could-native world.

Apart from creating a lot of benefits on the microservices side, microservices and the cloud-native application architecture has some challenges that may turn a developer’s life into a nightmare.

2.1. Challenges

Microservice architectures have many challenges like manageability of the services, traceability, monitoring, service discovery, distributed state and data management, and resilience which are handled automatically by cloud-native platforms like Kubernetes. For example service discovery is one of the requirements of an application that consists of microservices and Kubernetes provides this service discovery mechanism on its own side.
What cloud-native platforms can not provide and leave it to the guru developers is state management itself.

2.1.1. State

In order to keep the state in a distributed system and make it flew through the microservices has some challenges. Keeping the state in a distributed cache system like Infinispan and create a kind of single source of truth for the state is a common pattern but in the mesh of the service, it is tough to manage since there will be an Inception of caches.

Keeping the state distributed through services is even tougher.

Figure 3. Microservices & Data. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures?slide=5

As Database-per-microservice is a common pattern and as each bounded context should have and handle its own data, -with the need of sharing the state/data through services- this makes direct point-to-point communication between microservices more important.

2.1.2. Synchronous Communication

Synchronous data retrieval is a way to get the data that is needed from a microservice to another microservice. One can use comparingly new technologies like HTTP+REST, gRPC, or some old school technologies like RMI and IIOP, but all these synchronous point-to-point styles of data retrieval have some costs.

Figure 4. Synchronous Data Retrieval. https://speakerdeck.com/yanaga/distribute-your-microservices-data-with-events-cqrs-and-event-sourcing?slide=5

Latency is one of the key points of messaging between services and with synchronous communication if one of the services whose data will be retrieved has some performance problems in itself, it will be able to serve the data with a bit of latency that may cause data retrieval latency or timeout exceptions.

Or a service may have some failure inside and is not available that specific time period the sync data call won’t work.

Also, any performance issues on the network will directly either affect the latency or service availability. So it is up to the network’s being reliable.

We know that there are some patterns like distributed caching, bulkhead, and circuit breaker patterns for handling this kind of failure scenario by implementing fault tolerance strategies, but is it really the right way to do it?

2.2. Challenging the Challenges

So there are some solutions which some of them are invented years ago, but still rigid enough to be a ‘solution’ while others are brand-new architectures that will help us for challenging the challenges of cloud-native application messaging and communications.

Let’s start by taking a look at the common asynchronous messaging architectures before jumping into the solutions.

2.2.1. Asynchronous Messaging and Messaging Architectures

Like synchronous communication, asynchronous communication -or in other words messaging– has to be done over protocols. Two sides of the communication should agree on the protocol, so that message data that is either forecasted or consumed can be understood by the consumer.

While HTTP+REST protocol is the most used protocol for synchronous communication, there are several other protocols that asynchronous messaging systems widely use; like AMQP, STOMP, XMMP, MQTT, or Kafka protocols.

There are three main types of messaging models:

Point-to-point
Publish-subscribe (Pub-sub)
Hybrid

Point-to-Point

Point-to-point messaging is like sending a parcel via mail post services. You go to a post office, write the address you want the parcel to be delivered to, and post the parcel knowing it will be delivered sometime later. The receiver does not have to be at home when the parcel is sent and at some point later the parcel will be received at the address.

Point-to-point messaging systems are mostly implemented as queues that use the first-in, first-out (FIFO) order. So this means that only one subscriber of a queue can receive a specific message.

Figure 5. Point-to-Point Messaging. https://docs.oracle.com/cd/E19340-01/820-6424/aerbj/index.html

This opens the conversation of queues are being durable, which means if there are no active subscribers the messaging system will retain the messages until a subscriber comes and consumes them.

Point-to-point messaging is generally used for the use cases of calling for a message to be acted upon once only as queues can best provide an at-least-once delivery guarantee.

Publish-Subcribe

In order to understand how publish-subscribe (pub-sub) works think that you are an attendee on a webinar. When you are connected you can hear and watch what the speaker says, along with the other participants. When you disconnect you miss what the speaker says, but when you connect again you are able to hear what is being said.

So the webinar works like a pub-sub mechanism that while all the attendees are subscribers, the speaker is the broadcaster/publisher.

Pub-sub mechanisms are generally implemented through topics that act like the webinar broadcast to be subscribed. So when a message is produced on a topic, all the subscribers get it since the message is distributed along.

Figure 6. Publish-Subscribe Messaging. https://engineering.carsguide.com.au/laravel-pub-sub-messaging-with-apache-kafka-3b27ed1ee5e8

Topics are nondurable, unlike the queues. This means that a subscriber/consumer that is not consuming any messages -as it might be not running etc.-, misses the broadcasted messages in that period of being off. So this means that topics can provide a at-most-once delivery guarantee for each subscriber.

Hybrid model

Hybrid models of messaging systems both include the point-to-point and publish-subscribe as the use cases generally require a messaging system that has many consumers who want a copy of the message with full durability; in other words without message loss.

Technologies like ActiveMQ and Apache Kafka both implement this hybrid model with their own ways of persistence and distribution mechanisms.

Durability is a key factor especially on Cloud-Native distributed systems since the persistence of the state and being able to somehow replay it plays a key role in component communication. By adding it the capabilities of the publish-subscribe mechanism decrease the dependencies between components/services/microservices as it has the power of persisting and getting the message again either with the same subscriber or another one.

So the hybrid messaging systems are very vital when it comes to passing states as messages through Cloud-Native microservices as events since event-driven distributed architectures require these capabilities.

2.2.2. Events & Event Sourcing

As a process of developing microservice-based cloud-native architectures, approaches like Domain-Driven Design (DDD) makes it easy to divide the bounded contexts and see the sub-domains related to the parent domain.

One of the best techniques to separate and define the bounded contexts is the Event Storming technique, which takes the events as entry points and emerges everything including commands, data relationships, communication styles, and most importantly combobulators which are mostly mapped as bounded contexts.

Figure 7. Events Storming Components. https://medium.com/@springdo/a-facilitators-recipe-for-event-storming-941dcb38db0d

After all when most of the event storming map emerges, one can see all communication points between bounded contexts which are mostly mapped as microservices or services in the system that has their own database and data structure.

Figure 8. A real-life Event Storming example;)

This structure, as it all consists of events, gives the main idea of using async communication via a publish-subscribe system that queues the events to be consumed, in other words, doing Event Sourcing.

Event Sourcing is a state-event-message pattern that captures all changes to an application state as a sequence of events that can be consumed by other applications -in this case, microservices.

Figure 9. Event Sourcing. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures?slide=14

Event Sourcing is very important in a distributed cloud-native environment because in the cloud-native world microservices can easily be scaled, new microservices can join or -from an application modernization perspective- microservices can be separated from their big monolith mother in order to make it live its own life.

So having the capability of an asynchronous publish-subscribe system that has the durability of data or ability to replay is very important. Additionally, queueing the events rather than the final data makes it flexible for other services which means implementing dependency inversion in an asynchronous environment with the capability of eventual consistency.

The question here is: How to create/trigger those events?

There are many programmatic ways rather than languages or framework libraries to create events and publish them. One can create database listeners or interceptors programmatically (like Hibernate Envers does) or can handle them in the DAO (Data Access Object) or service layer of the application. Even so, creating an Event Sourcing mechanism is not easy.

At this point, a relatively new pattern comes as a savior: Change Data Capture.

2.2.3. Change Data Capture

Change Data Capture (CDC) is a pattern that is used to track the data change -mostly in databases- in order to take any action on it.

Figure 10. Change Data Capture. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures?slide=15

A CDC mechanism should listen to the data change, and create the event that includes the change as Create, Insert, Update, Delete actions, and the data change itself. After creating the action it can be published to any durable pub-sub system in order to be consumed.

In order to decouple the database change event listening capability from the application code, it is one of the best patterns that is used for event-driven architectures of cloud-native applications.

Debezium is probably the most popular open-source implementation nowadays because of its easy integration with a popular set of databases and Apache Kafka, especially on platforms like Kubernetes/OpenShift.

Figure 11. CDC with Debezium. https://developers.redhat.com/blog/2020/04/14/capture-database-changes-with-debezium-apache-kafka-connectors/

Now that, we have our event sourcing listener and creator as a CDC implementation like Debezium and let’s say we use Apache Kafka for event distribution between microservices.

Since the data we are creating is not the data itself but the change subscribed microservice should get the change and reflect it to its database. This change -when it comes again a microservice having a database of its own because it’s being a bounded context– is generally used for the read purpose rather than its being a write on the database because it is a reflection of the event that is just triggered by another write on another database.

So this makes us recall a pattern -that is already automatically implemented by design in the sample of ours- that’s been used for many years especially by enterprise-level relational databases: CQRS.

2.2.4. Command Query Responsibility Segregation (CQRS)

CQRS is a pattern that suggests one to separate the read model from the write model, mainly with the motivation of separation of concerns and performance.

Figure 12. CQRS with Separate Datastores. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures?slide=20

In a cloud-native system that has a set of polyglot microservices that has its own database -either as relational or NoSQL- the CQRS pattern fits well since each microservice has to have the data reflection of the dependent/dependee application.

2.3. Solutions Assemble: State Propagation

So in our imaginary, well-architected, distributed cloud-native system, in order to make our microservices communicate and transfer their state, we called the best of breed super-heroic patterns -some of which has great implementations.

Figure 13. State Propagation & Outbox Pattern. https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/

This state transfer through an asynchronous system that has the durability and pub-sub ability like Apache Kafka, triggered by a change data capture mechanism like Debezium, writing to a component which creates the event data to be read by another component is all called State Propagation which is backed by the Outbox pattern on the microservices side.

To sum up all, getting all the solutions altogether state propagation and creating event-driven mechanisms with CDC, Event Sourcing and CQRS help us in a very elegant way in order to solve the challenges of the cloud-native microservices era.

Resources

Books

Jakub Korab. ‘Understanding Message Brokers’. O’Reilly Media, Inc. ISBN 9781491981535.
Todd Palino, Gwen Shapira, Neha Narkhede. ‘Kafka: The Definitive Guide’. O’Reilly Media, Inc. ISBN 9781491936160.

Videos

Gwen Saphira. ‘Cloud native data pipelines with Apache Kafka’. https://learning.oreilly.com/videos/cloud-native-data/0636920333746/0636920333746-video328373
Edson Yanaga. ‘Distribute Your Microservices Data With Events, CQRS, and Event Sourcing’. https://www.youtube.com/watch?v=HdvWfr2KwA0
Marius Bogoevici, Edson Yanaga. ‘Data Strategies for Microservice Architectures’. https://www.youtube.com/watch?v=n_V8hBRoshY

Presentations

Edson Yanaga. ‘Distribute Your Microservices Data With Events, CQRS, and Event Sourcing’. https://speakerdeck.com/yanaga/distribute-your-microservices-data-with-events-cqrs-and-event-sourcing
Marius Bogoevici. ‘Data Strategies for Microservice Architectures’. https://speakerdeck.com/mbogoevici/data-strategies-for-microservice-architectures

Website Articles

Open Practice Library. ‘Event Storming’. https://openpracticelibrary.com/practice/event-storming/
Donal Spring. ‘A facilitators recipe for Event Storming’. https://medium.com/@springdo/a-facilitators-recipe-for-event-storming-941dcb38db0d
Martin Fowler. ‘Bounded Context’. https://martinfowler.com/bliki/BoundedContext.html
Martin Fowler. ‘CQRS’. https://martinfowler.com/bliki/CQRS.html
Martin Fowler. ‘Event Sourcing’. https://martinfowler.com/eaaDev/EventSourcing.html