The configuration of a Kafka cluster consists of two parts: The setup of the cluster itself and the setup of each topic that is published in the cluster. Specific settings for Kafka producers and Kafka consumers (for example acknowledgements) are not covered in this article. 

Topics

Kafka organizes messages in topics. A topic is a log of messages, so-called records:

A Topic is a log of records

  • New records are appended at the end of the log.
  • To read a certain record of a topic, the request must contain the offset of that record. 
  • After applying an offset, records can be read sequentially.
  • Records are immutable. It is not intended to change them once they are stored.
  • Records are kept for a retention period before they are deleted. By default, Kafka in cenTaur does not delete any records (retention period = "-1"). 

Producer, broker and consumer

Kafka is a distributed platform that consist of producers, brokers and consumers. Each of these can have multiple instances:

  • Message broker: Receives records, stores them in topics and distributes them on request. If there is more than one broker, they form a Kafka cluster.
  • Producer: Creates messages and publishes them to a certain topic. Producers can distribute messages to several topics. A producer can request an acknowledgement from a broker if a message was stored and replicated.
  • Consumer: Subscribes to one or more topics. If several consumers are organized in one group for a topic, the processing of the records is distributed among them. Consumers can acknowledge if they read a message.

Partitions

Kafka uses partitions for the distributed storage of a topic on multiple broker instances:

Partions of a topic are distributed across borkers.

  • All partitions of a topic form the record log of this topic.
  • The number of partitions is defined when a topic is created. If no specific number is defined, Kafka uses the default value. Partitions distribute the load among brokers. If there are N brokers and N partitions for a topic, each broker holds one partition.

Replicas

Kafka uses replicas to distribute the data of a partition across other brokers for redundancy.

  • The broker that owns a partition replicates the data to other brokers with a replica of that partition.
  • The replication factor defines how many replicas of a partition must exist. A replication factor of 2 means that there are two replicas for each partition.
  • If the broker that owns a partition fails, a broker that holds a replica of this partition takes over.

Replicas can be in sync with the leading partition or not.

  • The minimal in sync factor with replicas defines how many replicas must be in sync with the leading partition before a producer acknowledges that the publishing of a message to partition and replicas was successful. It is also possible that the acknowledgement is already sent if the message is written to the partition or no acknowledgement is sent.
  • Only brokers with a replica that is in sync with the leading partition can become the new leader of this partition if the actual leading broker fails.
  • A higher minimal in sync factor with replicas provides more data security but also adds more time for publishing messages.

Default values

Kafka in cenTaur uses the following default values:

ParameterDefault valueName in the config file 
Number of partitions8KAFKA_NUM_PARTITIONS
Number of replicas3KAFKA_DEFAULT_REPLICATION_FACTOR
Minimal number of replicas in sync2KAFKA_MIN_INSYNC_REPLICAS
Retention period for records-1KAFKA_LOG_RETENTION_MS

The values are defined in the Helm charts for Kafka. You find them in the stateful.yaml configuration file of the deployment.

You can request the values for a certain topic via command shell from a broker of your Kafka cluster.