Create a Kafka cluster for development with Docker Compose

Create a Kafka cluster for development with Docker Compose

COPPER Nguyen

This article assume that you have basic knowledge about Kafka, and Docker Compose. For development environment, there is no strict requirement about high availability and durability, we will try to make our cluster same as production, staging environment as much as possible without caring about faulty hardware corrupted.

Our infrastructure architecture is very simple, there is Virtual Machine that run the application and the other Virtual Machine that running the database, caching engine, search engine, etc…. The application will connect to Kafka using private ip address.

First create kafka-cluster directory where we store needed assets to run docker container.

[root@localhost ~]# mkdir kafka-cluster
[root@localhost ~]# cd kafka-cluster
create kafka-cluster dir command

Next, download and extract kafka to the current directory.

[root@localhost kafka-cluster]# wget https://downloads.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz

kafka_2.13-3.3.1.tgz    100%[============================>] 100.19M  2.72MB/s    in 45s     

2022-11-15 10:22:59 (2.24 MB/s) - ‘kafka_2.13-3.3.1.tgz’ saved [105053134/105053134]

[root@localhost kafka-cluster]# tar -xf kafka_2.13-3.3.1.tgz 
[root@localhost kafka-cluster]# ll
total 102592
drwxr-xr-x. 7 root root       105 Sep 30 02:06 kafka_2.13-3.3.1
-rw-r--r--. 1 root root 105053134 Oct  3 06:05 kafka_2.13-3.3.1.tgz
[root@localhost kafka-cluster]# 
download kafka command

We will create configfiles and dockerfiles to store the properties file for zookeeperkafka and Dockerfile for them.

[root@localhost kafka-cluster]# mkdir configfiles dockerfiles

Let prepare config file for zookeeper , our zookeeper is private by default and do not expose any port to the host. The file is located at configfiles/zookeeper.properties . We will mount this file to container when it’s running.

# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=14000
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
admin.enableServer=false
# admin.serverPort=8080

Next step, we will create Dockerfile for zookeeper to build zookeeper image. The file will be located at dockerfiles/ZookeeperDockerfile

FROM openjdk:20-slim-buster
WORKDIR /app
COPY . .
CMD ./bin/zookeeper-server-start.sh config/zookeeper.properties

Next we will create docker-compose.yml to setup zookeeper service, volume and mount configuration file to container. This file will be located at current directory.

version: "3.9"

networks:
  default:
    name: kafka
    driver: bridge

volumes:
  zookeeperdata:

services:
  zookeeper:
    build:
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/ZookeeperDockerfile
    volumes:
      - zookeeperdata:/tmp/zookeeper
      - ./configfiles/zookeeper.properties:/app/config/zookeeper.properties

Now the current directory tree look like.

[root@localhost kafka-cluster]# tree .
.
├── configfiles
│   └── zookeeper.properties
├── docker-compose.yml
├── dockerfiles
│   └── ZookeeperDockerfile
├── kafka_2.13-3.3.1
│   ├── bin
│   │   ├── connect-distributed.sh
│   │   ├── connect-mirror-maker.sh
│   │   ├── connect-standalone.sh
│   │   ├── kafka-acls.sh
│   │   ├── kafka-broker-api-versions.sh
│   │   ├── kafka-cluster.sh
│   │   ├── kafka-configs.sh
│   │   ├── kafka-console-consumer.sh
│   │   ├── kafka-console-producer.sh
│   │   ├── kafka-consumer-groups.sh
│   │   ├── kafka-consumer-perf-test.sh
│   │   ├── kafka-delegation-tokens.sh
│   │   ├── kafka-delete-records.sh
│   │   ├── kafka-dump-log.sh
│   │   ├── kafka-features.sh
│   │   ├── kafka-get-offsets.sh

Now we can able to build and run zookeeper container.

[root@localhost kafka-cluster]# docker compose up -d --build zookeeper
[+] Building 3.0s (8/8) FINISHED                                                                                   
[root@localhost kafka-cluster]#

You can see the log to make sure it’s running.

[root@localhost kafka-cluster]# docker compose ps
NAME                        COMMAND                  SERVICE             
kafka-cluster-zookeeper-1   "/bin/sh -c './bin/z…"   zookeeper           
[root@localhost kafka-cluster]# docker compose logs -f zookeeper

Next we will setup three kafka brokers, let assume let the private ip address of our virtual machine is 10.0.2.15 . We will have 3 container that has name leesingaren and temo , each one will expose different port on host machine 40005000 and 6000 .

The Dockerfile for each broker will look like this, the file is located at dockerfiles/KafkaDockerfile

FROM openjdk:20-slim-buster
WORKDIR /app
COPY . .
CMD ./bin/kafka-server-start.sh config/server.properties
Kafka Broker Dockerfile

For each broker, we will prepare their server.properties file, the port that they will listen for inter-broker communicate and the port for receiving connection from external client.

For leesin broker, we put the server.properties into configfiles/leesin/server.properties and mount the configuration file to the container when it start.

The content is look like this.

broker.id=1
zookeeper.connect=zookeeper:14000
zookeeper.connection.timeout.ms=18000
sasl.enabled.mechanisms=PLAIN
sasl.mechanism.inter.broker.protocol=PLAIN
listener.name.internal.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
 username="admin" \
 password="admin-secret" \
 user_admin="admin-secret";

listener.name.external.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
     username="admin" \
     password="admin-secret" \
 user_admin="admin-secret";

listeners=INTERNAL://:9092,EXTERNAL://:4000
advertised.listeners=INTERNAL://leesin:9092,EXTERNAL://10.0.2.15:4000
inter.broker.listener.name=INTERNAL
listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0
configfiles/leesin/server.properties

Two thing to keep eyes on is the listeners value and advertised.listeners .

listeners=INTERNAL://:9092,EXTERNAL://:4000
advertised.listeners=INTERNAL://leesin:9092,EXTERNAL://10.0.2.15:4000

For now, our docker-compose.yml is look like this.

version: "3.9"

networks:
  default:
    name: kafka
    driver: bridge

volumes:
  leesindata:
  zookeeperdata:

services:
  zookeeper:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/ZookeeperDockerfile
    volumes:
      - zookeeperdata:/tmp/zookeeper
      - ./configfiles/zookeeper.properties:/app/config/zookeeper.properties
  
  leesin:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/KafkaDockerfile
    volumes:
      - leesindata:/tmp/kafka-logs
      - ./configfiles/leesin/server.properties:/app/config/server.properties
    ports:
      - 4000:4000
docker-compose.yml

Before doing that, if you set up a firewall, make sure you allow the incoming traffic from port 40005000 and 6000 .

firewall setting

Now start the leesin broker.

[root@localhost kafka-cluster]# docker compose up -d leesin

You can check the log and see that leesin broker worked.

[root@localhost kafka-cluster]# docker compose ps
NAME                        COMMAND                  SERVICE             STATUS              PORTS
kafka-cluster-leesin-1      "/bin/sh -c './bin/k…"   leesin              running             0.0.0.0:4000->4000/tcp, :::4000->4000/tcp
kafka-cluster-zookeeper-1   "/bin/sh -c './bin/z…"   zookeeper           running             
[root@localhost kafka-cluster]#

Keep create server.properties file for 2 left broker garen and temo .

The garen ‘s file look like this, the file is located at configfiles/garen/server.properties (the different from leesin configuration file is just broker_id , listener , advertised.listeners.name )

broker.id=2
zookeeper.connect=zookeeper:14000
zookeeper.connection.timeout.ms=18000
sasl.enabled.mechanisms=PLAIN
sasl.mechanism.inter.broker.protocol=PLAIN
listener.name.internal.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
 username="admin" \
 password="admin-secret" \
 user_admin="admin-secret";

listener.name.external.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
     username="admin" \
     password="admin-secret" \
 user_admin="admin-secret";

listeners=INTERNAL://:9092,EXTERNAL://:5000
advertised.listeners=INTERNAL://garen:9092,EXTERNAL://10.0.2.15:5000
inter.broker.listener.name=INTERNAL
listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0
configfiles/garen/server.properties

The temo ‘s file look like this, the file is located at configfiles/temo/server.properties

broker.id=3
zookeeper.connect=zookeeper:14000
zookeeper.connection.timeout.ms=18000
sasl.enabled.mechanisms=PLAIN
sasl.mechanism.inter.broker.protocol=PLAIN
listener.name.internal.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
 username="admin" \
 password="admin-secret" \
 user_admin="admin-secret";

listener.name.external.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
     username="admin" \
     password="admin-secret" \
 user_admin="admin-secret";

listeners=INTERNAL://:9092,EXTERNAL://:6000
advertised.listeners=INTERNAL://temo:9092,EXTERNAL://10.0.2.15:6000
inter.broker.listener.name=INTERNAL
listener.security.protocol.map=INTERNAL:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0
configfiles/temo/server.properties

For now the directory tree is look like.

[root@localhost kafka-cluster]# tree
.
├── configfiles
│   ├── garen
│   │   └── server.properties
│   ├── leesin
│   │   └── server.properties
│   ├── temo
│   │   └── server.properties
│   └── zookeeper.properties
├── docker-compose.yml
├── dockerfiles
│   ├── KafkaDockerfile
│   └── ZookeeperDockerfile
├── kafka_2.13-3.3.1
│   ├── bin
│   │   ├── connect-distributed.sh
│   │   ├── connect-mirror-maker.sh

│   │   ├── kafka-server-start.sh
│   │   ├── zookeeper-security-migration.sh
│   │   ├── zookeeper-server-start.sh
│   │   ├── zookeeper-server-stop.sh
│   │   └── zookeeper-shell.sh
│   ├── config
│   │   ├── connect-console-sink.properties
│   │   ├── connect-console-source.properties
│   │   ├── connect-distributed.properties
│   │   ├── connect-file-sink.properties

Update docker-compose.yml and add garentemo service. The file’s content is look like this.

version: "3.9"

networks:
  default:
    name: kafka
    driver: bridge

volumes:
  leesindata:
  garendata:
  temodata:
  zookeeperdata:

services:
  zookeeper:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/ZookeeperDockerfile
    volumes:
      - zookeeperdata:/tmp/zookeeper
      - ./configfiles/zookeeper.properties:/app/config/zookeeper.properties
  
  leesin:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/KafkaDockerfile
    volumes:
      - leesindata:/tmp/kafka-logs
      - ./configfiles/leesin/server.properties:/app/config/server.properties
    ports:
      - 4000:4000
    
  garen:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/KafkaDockerfile
    volumes:
      - garendata:/tmp/kafka-logs
      - ./configfiles/garen/server.properties:/app/config/server.properties
    ports:
      - 5000:5000

  temo:
    build: 
      context: kafka_2.13-3.3.1
      dockerfile: ../dockerfiles/KafkaDockerfile
    volumes:
      - temodata:/tmp/kafka-logs
      - ./configfiles/temo/server.properties:/app/config/server.properties
    ports:
      - 6000:6000
docker-compose.yml

Time to start them

[root@localhost kafka-cluster]# docker compose up -d garen temo

Verify these container is running.

[root@localhost kafka-cluster]# docker compose ps
NAME                        COMMAND                  SERVICE             STATUS              PORTS
kafka-cluster-garen-1       "/bin/sh -c './bin/k…"   garen               running             0.0.0.0:5000->5000/tcp, :::5000->5000/tcp
kafka-cluster-leesin-1      "/bin/sh -c './bin/k…"   leesin              running             0.0.0.0:4000->4000/tcp, :::4000->4000/tcp
kafka-cluster-temo-1        "/bin/sh -c './bin/k…"   temo                running             0.0.0.0:6000->6000/tcp, :::6000->6000/tcp
kafka-cluster-zookeeper-1   "/bin/sh -c './bin/z…"   zookeeper           running             
[root@localhost kafka-cluster]#

Now, we should set up a user interface to check the current kafka cluster start, i will use an opensource from https://github.com/provectus/kafka-ui

In docker-compose.yml file, i will add this service.

ui: 
  image: provectuslabs/kafka-ui:latest
  ports:
    - "8080:8080"
  env_file:
    - env-files/ui.env

The env_files is look like this. The file is located at env-files/ui.env

KAFKA_CLUSTERS_0_NAME=kafka-cluster
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=10.0.2.15:4000
KAFKA_CLUSTERS_0_ZOOKEEPER=zookeeper:14000
KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL=SASL_PLAINTEXT
KAFKA_CLUSTERS_0_PROPERTIES_SASL_MECHANISM=PLAIN
KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username="admin" password="admin-secret";
env-files/ui.env

Start this service.

[root@localhost kafka-cluster]# docker compose up -d ui

Now visit 10.0.2.15:8080 to see the cluster status. By using UI you can also create some topic, push message to these topic, these steps is too make sure our cluster work as expected.

I pushed the source code to GitHub, you can download the source code here.

GitHub - dongnguyenltqb/kafka_2.13-3.3.1-docker-compose
Contribute to dongnguyenltqb/kafka_2.13-3.3.1-docker-compose development by creating an account on GitHub.
github soure code

Thank for reading !