Kubernetes shouldn’t be the primary platform that involves thoughts to run Apache Kafka clusters. Certainly, Kafka’s robust dependency on storage may be a ache level concerning Kubernetes’ method of doing issues in the case of persistent storage. Kafka brokers are distinctive and stateful, how can we implement this in Kubernetes?
Let’s undergo the fundamentals of Strimzi, a Kafka operator for Kubernetes curated by Red Hat and see what issues it solves.
A particular focus will likely be made on how you can plug extra Kafka instruments to a Strimzi set up.
We can even evaluate Strimzi with different Kafka operators by offering their execs and cons.
Strimzi
Strimzi is a Kubernetes Operator aiming at lowering the price of deploying Apache Kafka clusters on cloud native infrastructures.
As an operator, Strimzi extends the Kubernetes API by offering sources to natively handle Kafka sources, together with:
- Kafka clusters
- Kafka matters
- Kafka customers
- Kafka MirrorMaker2 situations
- Kafka Join situations
The mission is at present on the “Sandbox” stage on the Cloud Native Computing Foundation.
Notice: The CNCF web site defines a “sandbox” mission as “Experimental tasks not but extensively examined in manufacturing on the bleeding fringe of expertise.”
With Strimzi, deploying a 3 dealer tls-encrypted cluster is so simple as making use of the next YAML file:
apiVersion: kafka.strimzi.io/v1beta2
sort: Kafka
metadata:
title: my-cluster
spec:
kafka:
model: 3.2.3
replicas: 3
listeners:
- title: plain
port: 9092
kind: inner
tls: false
- title: tls
port: 9093
kind: inner
tls: true
config:
offsets.subject.replication.issue: 3
transaction.state.log.replication.issue: 3
transaction.state.log.min.isr: 2
default.replication.issue: 3
min.insync.replicas: 2
inter.dealer.protocol.model: "3.2"
storage:
kind: jbod
volumes:
- id: 0
kind: persistent-declare
measurement: 100Gi
deleteClaim: false
- id: 1
kind: persistent-declare
measurement: 100Gi
deleteClaim: false
zookeeper:
replicas: 3
storage:
kind: persistent-declare
measurement: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
A subject appears to be like like this:
apiVersion: kafka.strimzi.io/v1beta2
sort: KafkaTopic
metadata:
title: my-subject
labels:
strimzi.io/cluster: my-cluster
spec:
partitions: 1
replicas: 1
config:
retention.ms: 7200000
section.bytes: 1073741824
Each of those examples are from the examples
listing of the Strimzi operator. This listing contains many extra examples overlaying all of Strimzi’s capabilities.
Safety
An fascinating characteristic of Strimzi is the out-of-the-box safety features. By default, intra-broker communication is encrypted with TLS whereas communication with ZooKeeper is each autenticated and encrypted with mTLS.
The Apache ZooKeeper clusters backing the Kafka situations usually are not uncovered outdoors of the Kubernetes cluster, offering additionnal safety.
These configurations are literally unimaginable to override, thought it’s doable to entry the ZooKeeper through the use of a tweak project by scholzj.
Strimzi PodSets
Kubernetes comes with its personal answer for managing distributed stateful purposes: StatefulSets.
The official documentation states:
(StatefulSets) manages the deployment and scaling of a set of Pods, and gives ensures concerning the ordering and uniqueness of those Pods.
Whereas StatfulSets benefit from being Kubernetes native sources, they’ve some limitations.
Listed here are just a few examples:
- Scaling up and down is linear. When you’ve got a StatefulSet with 3 pods: pod-1, pod-2, pod-3, scaling up will create pod-4 and cutting down can solely delete pod-4. This may be a problem if you need to remove a selected pod of your deployment. Utilized to Kafka, you may be in a scenario the place a foul subject could make a dealer instable, with StatefulSets you can’t delete this explicit dealer and scale out a brand new contemporary dealer.
- All of the pods share the identical specs (CPU, Mem, # of PVCs, and so on.)
- Important node failure requires handbook intervention
These limitations had been addressed by the Strimzi workforce by developping their very own sources: the StrimziPodSets, a characteristic launched in Strimzi 0.29.0.
The advantages of utilizing StrimziPodSets embody:
- Scaling up and down is extra versatile
- Per dealer configuration
- Opens the gate for dealer specialization as soon as ZooKeeper-less Kafka is GA (KIP-500, extra on this subject later within the article)
A disadvantage of utilizing StrimziPodSets is that the Strimzi Operator occasion turns into essential.
If you wish to hear extra concerning the Strimzi PodSets, be happy to observe the StrimziPodSets – What it is and why should you care? video by Jakub Scholz.
Deploying Strimzi
Strimzi’s Quickstart documentation is completely full and functionnal.
We’ll focus the remainder of the article on addressing helpful points that aren’t coated by Strimzi.
Kafka UI on prime of Strimzi
Strimzi brings plenty of consolation for customers in the case of managing Kafka sources in Kubernetes. We wished to carry one thing to the desk by exhibiting how you can deploy a Kafka UI on prime of a Strimzi cluster as a local Kubernetes ressource.
There are a number of open supply Kafka UI tasks on GitHub, to quote just a few:
Let’s go for Kafka UI which has the cleanest UI (IMO) among the many competitors.
The mission gives official Docker photos as we will see within the documentation. We’ll leverage this picture and deploy a Kafka UI occasion as a Kubernetes deployment.
The next YAML is an instance of a Kafka UI occasion configured over a SCRAM-SHA-512
authenticated Strimzi Kafka cluster. The UI is authenticated towards an OpenLDAP through ldaps
.
apiVersion: apps/v1
sort: Deployment
metadata:
title: cluster-kafka-ui
namespace: kafka
spec:
selector:
matchLabels:
app: cluster-kafka-ui
template:
metadata:
labels:
app: cluster-kafka-ui
spec:
containers:
- picture: provectuslabs/kafka-ui:v0.4.0
title: kafka-ui
ports:
- containerPort: 8080
env:
- title: KAFKA_CLUSTERS_0_NAME
worth: "cluster"
- title: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
worth: "cluster-kafka-bootstrap:9092"
- title: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
worth: SASL_PLAINTEXT
- title: KAFKA_CLUSTERS_0_PROPERTIES_SASL_MECHANISM
worth: SCRAM-SHA-512
- title: KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG
worth: 'org.apache.kafka.widespread.safety.scram.ScramLoginModule required username="admin" password="XSnBiq6pkFNp";'
- title: AUTH_TYPE
worth: LDAP
- title: SPRING_LDAP_URLS
worth: ldaps://myldapinstance.firm:636
- title: SPRING_LDAP_DN_PATTERN
worth: uid={0},ou=Folks,dc=firm
- title: SPRING_LDAP_ADMINUSER
worth: uid=admin,ou=Apps,dc=firm
- title: SPRING_LDAP_ADMINPASSWORD
worth: Adm1nP@ssw0rd!
- title: JAVA_OPTS
worth: "-Djdk.tls.shopper.cipherSuites=TLS_RSA_WITH_AES_128_GCM_SHA256 -Djavax.web.ssl.trustStore=/and so on/kafka-ui/ssl/truststore.jks"
volumeMounts:
- title: truststore
mountPath: /and so on/kafka-ui/ssl
readOnly: true
volumes:
- title: truststore
secret:
secretName: myldap-truststore
Notice: By leveraging a PLAINTEXT
inner listener on port 9092, we don’t want to supply a KAFKA_CLUSTERS_0_PROPERTIES_SSL_TRUSTSTORE_LOCATION
configuration.
With this configuration, customers have to authenticate through LDAP to the Kafka UI. As soon as they’re logged in, the underlying person used for interactions with the Kafka cluster is the admin person outlined in KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG
. Position primarily based entry management was not too long ago launched with this issue.
Schema Registry with Strimzi
We had a functionnal have to deploy a Schema Registry occasion for our Kafka clusters operating in Kubernetes.
Whereas Strimzi goes the additional mile by managing extra instruments like Kafka Join or MirrorMaker situations, it’s not but able to deploying a Schema Registry.
To mitigate this subject, the Rubin Observatory Science Quality and Reliability Engineering team labored on the strimzi-registry-operator.
The configurations we used are the one showcased within the example section of the README.
The one issue we encountered was that the operator shouldn’t be but succesful to deploy a Schema Registry backed on a SCRAM-SHA-512
secured cluster.
What about ZooKeeper-less Kafka?
After a few years of labor on KIP-500, the Apache Kafka workforce lastly introduced that operating Kafka in KRaft mode (ZooKeeper much less) turned manufacturing prepared. The announcement was made as a part of the Kafka 3.3 release.
The Strimzi workforce started work on the KRaft mode in Strimzi 0.29.0. As said within the Strimzi documentation, the characteristic continues to be experimental, each on Kafka and Strimzi ranges.
Strimzi’s main contributor, Jakub Scholz, has commented the next on the matter:
I believe calling it manufacturing prepared for brand spanking new clusters is a bit unusual. It signifies that we would wish to keep up two parallel code paths with assured upgrades and so on. for presumably a very long time. So, TBH, I hoped we might have far more progress at this time limit and be extra ready for ZooKeeper removing. However as a my private opinion – I might be most likely very reluctant to name something at this stage manufacturing prepared anyway.
Following on these feedback, we will guess that ZooKeeper-less Kafka shouldn’t be going to be the default configuration in Strimzi within the subsequent launch (0.34.0 on the time of writing) however it’ll undoubtedly occur sooner or later.
What about storage?
Storage is commonly a ache level with naked metallic Kubernetes clusters and Kafka makes no exception.
The neighborhood consensus for provisioning storage on Kubernetes is through Ceph with Rook thought different options exists (Longhorn or OpenEBS on the Open Supply facet, Portworx or Linstor as proprietary options).
Evaluating storage engines for naked metallic Kubernetes clusters is just too massive a subject to be included on this article however be happy to take a look at our earlier article ”Ceph object storage within a Kubernetes cluster with Rook” for extra on Rook.
We did have the chance to match performances between a 3 brokers Kafka set up with Strimzi/Rook Ceph towards a 3 brokers Kafka cluster operating on the identical machine with direct disk entry.
Listed here are the specs and outcomes of the benchmark:
Specs
Kubernetes environement:
- Kafka Model 3.2.0 on Kubernetes by way of Strimzi
- 3 brokers (one pod per node)
- 6 RBD gadgets per dealer (provisionned by the Rook Ceph Storage Class)
- Xms java default (2g)
- Xmx java default (29g)
Naked metallic environement:
- Kafka Model 3.2.0 as JVM course of with the Apache launch
- 3 brokers (one JVM per node)
- 6 RBD gadgets per dealer (JBOD with ext4 formatting)
- Xms java default (2g)
- Xmx java default (29g)
Notes: The benchmarks had been run on the identical machines (HP Gen 7 with 192 Gb RAM and 6 x 2 TB disks) with RHEL 7.9. Kubernetes was not operating when Kafka as JVM course of was operating and vice versa.
kafka-producer-perf-test
--topic my-topic-benchmark
--record-size 1000
--throughput -1
--producer.config /mnt/kafka.properties
--num-records 50000000
Notice: The subject my-topic-benchmark
has 100 partitions and 1 reproduction.
Outcomes
We ran the earlier benchmark 10 occasions on every configuration and averaged the outcomes:
Metric | JBOD naked metallic | Ceph RBD | Efficiency distinction |
---|---|---|---|
Data/sec | 75223 | 65207 | – 13.3 % |
Avg latency | 1.45 | 1.28 | + 11.1 % |
The outcomes are fascinating: whereas the write performances had been higher on JBOD, the latency was slower utilizing Ceph.
Strimzi alternate options
There are two foremost alternate options to Strimzi in the case of working Kafka on Kubernetes:
We didn’t check Koperator completely so it might be unfair to match it to Strimzi on this article.
As for the Confluent operator, it gives many options that we don’t have with Strimzi. Listed here are just a few that we deemed fascinating:
- Schema Registry integration
- ksqlDB integration
- LDAP authentication help
- Out-of-the-box UI (Confluent Management Middle) for each Admins and Developpers
- Alerting
- Tiered storage
All these include the fee (actually) of shopping for a business license from Confluent. Notice that the operator and Management Middle might be examined for a 30 days trial interval.