How To Deploy Apache Kafka With Kubernetes

How To Deploy Apache Kafka With Kubernetes

Get started with Kafka by deploying with Kubernetes locally before moving to the cloud.

·

8 min read

Kafka is the de facto event store and distributed message broker solution for large microservice architecture systems. Kubernetes is the industry standard for orchestrating containerized services. For many organizations, deploying Kafka on Kubernetes is a low-effort approach that fits within their architecture strategy.

In this post, we’ll look at the appeal of hosting Kafka on Kubernetes, providing a quick primer on both applications. Finally, we’ll walk through a cloud-agnostic method to configure Kubernetes for deploying Kafka and its sibling services.

A Primer on the Tech

Let’s start with a quick overview of Kubernetes and Kafka.

Kubernetes

Google started developing what eventually became Kubernetes (k8s) in 2003. At the time, the project was known as Borg. In 2014, Google released k8s as an open-source project on Github, and k8s quickly picked up partners through Microsoft, Red Hat, and Docker. Over the years, more and more endeavors used Kubernetes, including GitHub itself and the popular game, Pokémon GO. In 2022, we see k8s usage growing in the AI/ML space and with an increasing emphasis on security.

The introduction of k8s into the cloud development lifecycle provided several key benefits:

  • Zero downtime deployments
  • Scalability
  • Immutable infrastructure
  • Self-healing systems

Many of these benefits come from the use of declarative configuration in k8s. In order to change an infrastructure configuration, resources must be destroyed and rebuilt, thereby enforcing immutability. In addition, if k8s detects resources that have drifted out of the declared specification, it attempts to rebuild the state of the system to match that specification again.

For developers, using k8s means finally putting an end to the frustrating midnight deployments where you have to drop everything to scale up services or patch production environments directly by hand.

Apache Kafka

Kafka is an open-source distributed stream processing tool. Kafka allows for multiple “producers” to add messages (key-value pairs) to “topics”. These messages are ordered in each topic as a queue. “Consumers” subscribe to the topic and can retrieve messages in the order they arrived in the queue.

Kafka is hosted on a server typically called a “broker.” There can be many different Kafka brokers in different regions. In addition to Kafka brokers, another service named Zookeeper keeps different brokers in sync and helps coordinate topics and messages.

The brilliance of Kafka is that it can handle hundreds of thousands of messages a second while being distributed and at a relatively cheap cost per MB.

Kafka often occupies a spot akin to the “central nervous system” of a microservice architecture. Messages are passed along between producers and consumers, which in reality, are services inside your cloud. The same service could both be a consumer and a producer of messages from the same or different topics inside Kafka.

An example use case is creating a new user in your application. The User service publishes a message on a “Provision User” topic. The Email service consumes this message about a new user and then sends a welcome email to them. The User and Email services did not have to directly message each other, but their respective jobs were executed asynchronously.

Deploying Kafka With Kubernetes

For our mini project walkthrough, we’ll set up Kubernetes and Kafka in a cloud-neutral way using Minikube, which allows us to run an entire k8s cluster on a single machine. We installed the following applications:

Minikube Setup

With Minikube installed, we can start it with the minikube start command. Then, we can see the status:

$ minikube status

minikube

type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

Instructions for setting up Kubernetes to run in your cloud provider of choice can be found in the documentation for each provider (for example, AWS, GCP, or Azure), but the YAML configuration files listed below should work across all providers, with minor adjustments for IP addresses and related fields.

Defining a Kafka Namespace

First, we define a namespace for deploying all Kafka resources, using a file named 00-namespace.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: "kafka"
  labels:
    name: "kafka"

We apply this file using kubectl apply -f 00-namespace.yaml.

We can test that the namespace was created correctly by running kubectl get namespaces, verifying that Kafka is a namespace present in Minikube.

Deploying Zookeeper

Next, we deploy Zookeeper to our k8s namespace. We create a file name 01-zookeeper.yaml with the following contents:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: zookeeper-service
  name: zookeeper-service
  namespace: kafka
spec:
  type: NodePort
  ports:
    - name: zookeeper-port
      port: 2181
      nodePort: 30181
      targetPort: 2181
  selector:
    app: zookeeper
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: zookeeper
  name: zookeeper
  namespace: kafka
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      containers:
        - image: wurstmeister/zookeeper
          imagePullPolicy: IfNotPresent
          name: zookeeper
          ports:
            - containerPort: 2181

There are two resources created in this YAML file. The first is the service called zookeeper-service, which will use the deployment created in the second resource named zookeeper. The deployment uses the wurstmeister/zookeeper Docker image for the actual Zookeeper binary. The service exposes that deployment on a port on the internal k8s network. In this case, we use the standard Zookeeper port of 2181, which the Docker container also exposes.

We apply this file with the following command: kubectl apply -f 01-zookeeper.yaml.

We can test for the successfully created service as follows:

$ kubectl get services -n kafka
NAME               TYPE      CLUSTER-IP     PORT(S)         AGE
zookeeper-service  NodePort  10.100.69.243  2181:30181/TCP  3m4s

We see the internal IP address of Zookeeper (10.100.69.243), which we’ll need to tell the broker where to listen for it.

Deploying a Kafka Broker

The last step is to deploy a Kafka broker. We create a 02-kafka.yaml file with the following contents, be we replace <ZOOKEEPER-INTERNAL-IP> with the CLUSTER-IP from the previous step for Zookeeper. The broker will fail to deploy if this step is not taken.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kafka-broker
  name: kafka-service
  namespace: kafka
spec:
  ports:
  - port: 9092
  selector:
    app: kafka-broker
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kafka-broker
  name: kafka-broker
  namespace: kafka
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-broker
  template:
    metadata:
      labels:
        app: kafka-broker
    spec:
      hostname: kafka-broker
      containers:
      - env:
        - name: KAFKA_BROKER_ID
          value: "1"
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: <ZOOKEEPER-INTERNAL-IP>:2181
        - name: KAFKA_LISTENERS
          value: PLAINTEXT://:9092
        - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://kafka-broker:9092
        image: wurstmeister/kafka
        imagePullPolicy: IfNotPresent
        name: kafka-broker
        ports:
        - containerPort: 9092

Again, we are creating two resources — service and deployment — for a single Kafka Broker. We run kubectl apply -f 02-kafka.yaml. We verify this by seeing the pods in our namespace:

$ kubectl get pods -n kafka
NAME                            READY   STATUS    RESTARTS   AGE
kafka-broker-5c55f544d4-hrgnv   1/1     Running   0          48s
zookeeper-55b668879d-xc8vd      1/1     Running   0          35m

The Kafka Broker pod might take a minute to move from ContainerCreating status to Runningstatus.

Notice the line in 02-kafka.yaml where we provide a value for KAFKA_ADVERTISED_LISTENERS. To ensure that Zookeeper and Kafka can communicate by using this hostname (kafka-broker), we need to add the following entry to the /etc/hosts file on our local machine:

127.0.0.1 kafka-broker

Testing Kafka Topics

In order to test that we can send and retrieve messages from a topic in Kafka, we will need to expose a port for Kafka to make it accessible from localhost. We run the following command to expose a port:

$ kubectl port-forward kafka-broker-5c55f544d4-hrgnv 9092 -n kafka
Forwarding from 127.0.0.1:9092 -> 9092
Forwarding from [::1]:9092 -> 9092

The above command kafka-broker-5c55f544d4-hrgnv references the k8s pod that we saw above when we listed the pods in our kafka namespace. This command makes the port 9092 of that pod available outside of the Minikube k8s cluster at localhost:9092.

To easily send and retrieve messages from Kafka, we’ll use a command-line tool named KCat (formerly Kafkacat). To create a message and a topic named test, we run the following command:

$ echo "hello world!" | kafkacat -P -b localhost:9092 -t test

The command should execute without errors, indicating that producers are communicating fine with Kafka in k8s. How do we see what messages are currently on the queue named test? We run the following command:

$ kafkacat -C -b localhost:9092 -t test

hello world!
% Reached end of topic test [0] at offset 1

We did it! We have successfully deployed Kafka with Kubernetes!

Next Steps

Kafka is a key player for organizations that are interested in implementing real-time, event-driven architectures and systems. Deploying Kafka with Kubernetes is a great start, but organizations will also need to figure out how to make Kafka work seamlessly and securely with their existing API ecosystems. What they’ll need are tools to handle management and security for the entire lifecycle of their APIs.

Among the providers out there, I came across Gravitee, one of the leading solutions that’s particularly focused on helping organizations manage, secure, govern, and productize their API ecosystems — no matter what protocols, services, or styles they’re building on top of. Gravitee even has a Kafka connector that ingests data by exposing endpoints that transform requests into messages that can then be published to your Kafka topic. It can also stream Kafka events to consumers with web-friendly protocols like Websocket.

Conclusion

In this article, we’ve talked about how Kafka helps choreograph microservice architectures by being a central nervous system relaying messages to and from many different services. For deploying Kafka, we’ve looked at Kubernetes, a powerful container orchestration platform that you can run locally (with Minikube) or in production environments with cloud providers. Lastly, we demonstrated how to use Minikube to set up a local Kubernetes cluster, deploy Kafka, and then verify a successful deployment and configuration using KCat.

Happy deploying!