Kafka Internals: client dns lookup

Thanh Tung Dao
3 min readAug 14, 2021

Recently, I’m really lucky to have a chance to work with Kafka and have a deeper understanding on how it works. Hence, this post will be about client.dns.lookup , one of the many nuances that you have to deal with when working with, maintaining kafka cluster.

What is it about?

It is one of the properties that is set for Kafka clients client.dns.lookup . In older version of kafka (before Kafka 2.6), the default value would be default . Internally, it resolve a symbolic hostname by using:

new InetSocketAddress(String hostname, int port)

which it onlys choose the first IP address even if the DNS has many A records for the hostname. This is because of the following logic:

InetAddress.getAllByName(hostname)[0]

Why does it matter?

This is problematic because if you have many IP addresses, and in an unfortunate event that the first IP you pick up is down, your whole application (consumer, producer) will throw errors too.

You will be scratching your head because on the surface , your kafka cluster is working fine, all metrics in your monitoring system is good.

what’s happening

How do we solve it?

Luckily, the Kafka community has long realised the flawed in the logic and has introduced changes in KIP-302 to introduce another logic. It is to enable Kafka clients to use all DNS resolved IP addresses. By doing so, if one IP address is not reachable, the Kafka client will move on to the next IP in the list, until there is none. Only then it will throw error. It will reduce the failure rate of your kafka consumer/ producer if you have something on going on your Kafka cluster side ( broker rotation for example). This is more important when your run everything in cloud and containerized environments where a single host name may resolve to multiple IP.

It was taken up further that default value of client.dns.lookup changes from "default" to "use_all_dns_ips" (I like the pun here) in KIP-602 and to be released in Kafka 2.6 among other amazing changes.

If your Kafka is still recent enough to have "use_all_dns_ips" option but the version is still below 2.6, well, you need to override this properties in your application code.

Lets take Kafka Connect running on docker for example. I use Kafka Connect to demonstrate because it has bother consumer and producer ( source and sink connector). To make the changes, you just need to add it as the environment variable.

CONNECT_CLIENT_DNS_LOOKUP="use_all_dns_ips"

The full example is here:

docker run -d \
--name=kafka-connect \
--net=host \
-e CONNECT_BOOTSTRAP_SERVERS=localhost:29092 \
-e CONNECT_REST_PORT=28082 \
-e CONNECT_GROUP_ID="quickstart" \
-e CONNECT_CONFIG_STORAGE_TOPIC="quickstart-config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="quickstart-offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="quickstart-status" \
-e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="localhost" \
-e CONNECT_PLUGIN_PATH=/usr/share/java \
confluentinc/cp-kafka-connect:6.2.0
-e CONNECT_CLIENT_DNS_LOOKUP="use_all_dns_ips"

You might wonder why its client.dns.lookup in the official doc but we are using CONNECT_CLIENT_DNS_LOOKUP . If you look closely in the docs here, you will find out that:

For the Kafka Connect (cp-kafka-connect) image, convert the property variables as below and use them as environment variables:

Prefix with CONNECT_.

Convert to upper-case.

Replace a period (.) with a single underscore (_).

Replace a dash (-) with double underscores (__).

Replace an underscore (_) with triple underscores (___).

It took me a while to string all the pieces together so I hope this will help you in some way.

--

--