Setting Up Cilium Clustermesh with Helm: tips and gotcha’s moments

Thanh Tung Dao
5 min readNov 19, 2023

Setting up things following official docs is easy right? May be … may be not. In this blog, I will share my experience setting up Cilium Clustermesh with Helm and GitOps, specifically the blockers that I encountered. Hopefully, someone is going to spend lesser time setting this super useful feature up

source: https://cilium.io/blog/2019/03/12/clustermesh/

1.Firewall network

source: the Internet

Let me say it out loud for you again. Firewall firewall firewall. This might be the part where you scream out uncontrollably, blaming yourself for not doing it earlier because things never works. There is a long list of rules that we have to create in the official Cilium doc. Depending on the organisation, this might be the most time-consuming part of the setting up process, simply because wehave to create a ticket and rely on someone else to setup for you.

2. DNS /TLS

Wow, we just jump from one hole to another (even bigger) hole is’nt it.

2.1 Annotations — DNS record creation

We might get an error from the other cluster’s Cilium agent pod saying that the Domain Names is not resolvable/ reachable. If you get this error, please check again whether your DNS record again. It might be in a private DNS hosted zone or even not created at all.

If we leave the DNS record management to something like cert-manager, its recommended to check clustermesh.apiserver.service.annotation . An example would look like this

clustermesh:
apiserver:
service:
annotations:
external-dns.aplpha.kubernetes.io/hostname: "example.hostname"
external-dns.alpha.kubernetes.io/ttl: "1m"

2.1 ExtraDnsNames

Now the DNS record does exist. But this time we get another (new) error.

"transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for clustermesh-apiserver.cilium.io, *.mesh.cilium.io, clustermesh-apiserver.kube-system.svc, not cluster.use1.companydomain.com\"

This can be traced back to this line of code in Cilium. I have included this below

{{- if and (or .Values.externalWorkloads.enabled .Values.clustermesh.useAPIServer) .Values.clustermesh.apiserver.tls.auto.enabled (eq .Values.clustermesh.apiserver.tls.auto.method "certmanager") }}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: clustermesh-apiserver-server-cert
namespace: {{ .Release.Namespace }}
{{- with .Values.clustermesh.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
issuerRef:
{{- toYaml .Values.clustermesh.apiserver.tls.auto.certManagerIssuerRef | nindent 4 }}
secretName: clustermesh-apiserver-server-cert
commonName: clustermesh-apiserver.cilium.io
dnsNames: # <- HERE
- clustermesh-apiserver.cilium.io
- "*.mesh.cilium.io"
- "clustermesh-apiserver.{{ .Release.Namespace }}.svc"
{{- range $dns := .Values.clustermesh.apiserver.tls.server.extraDnsNames }}
- {{ $dns | quote }}
{{- end }}
ipAddresses:
- "127.0.0.1"
- "::1"
{{- range $ip := .Values.clustermesh.apiserver.tls.server.extraIpAddresses }}
- {{ $ip | quote }}
{{- end }}
duration: {{ printf "%dh0m0s" (mul .Values.clustermesh.apiserver.tls.auto.certValidityDuration 24) }}
{{- end }}

I think you probably have some theories in your mind. And yes, you are right. The TLS handshake is being established and the client is checking for the Common Name / Subject Alternate Names (SANs) of the Cilium Clustermesh apiserver to match the Domain Name you provided. Hence, the solution for this would be to provide value for the field clustermesh.apiserver.tls.server.extraDnsNamesso that Helm would add it to the list of SANs in the Cilium Clustermesh apiserver cert

2.3 What about trusting CA ? Its Helm 14.1

source: the Internet

My advice is to use Cilium Helm chart greater than 14.0. Let me tell you why. When we create a Cilium deployment, there are two options for us: either supply our own CA or let Helm deployment auto-generate one for us.

With Helm 1.14.0, this second option is made possible as Cilium can trust different CAs from different clusters instead of using a centralised CAs for all. Even if we go with the centralised CA option ( I’m not saying which one is better) , we can always set them to be the same in clustermesh.apiserver.tls.ca.cert . The commit that allows all this to work is here https://github.com/cilium/cilium/commit/46ecd9a6516319ff97d78a05d2d7011a41d9076e

3. Cluster ID — No zero

From the documentation, we have to give our cluster a unique numeric cluster ID.

Each cluster must be assigned a unique human-readable name as well as a numeric cluster ID (1-255). 
It is best to assign both these attributes at installation time of Cilium:

Please note that it must be unique so the number must be decided before setting up clustermesh. And wait, I know you remember the 255 number, but please don’t forget number 1. It means you can’t use zero index counting and start your first cluster with ID 0

source: its in the picture

4. It takes two hands to clap

Its by far the most confusing thing to me. On one hand, from the official doc, it says we have to do it only once

Finally, connect the clusters. This step only needs to be done in one direction. The connection will automatically be established in both directions:

It means that if we want to establise a clustermesh between CLUSTER1and CLUSTER2, we just need to run this

cilium clustermesh connect --context $CLUSTER1 --destination-context $CLUSTER2

On the other hand, I suddenly remember: if I don’t put the client certificate and the CA of the CLUSTER1into CLUSTER2the kuberenetes secret cilium-clustermesh , how can the CLUSTER2 establish TLS connections. Further more, when I ran cilium clustermesh status for both clusters, only CLUSTER1 shows active clustermesh connection.

Hence, I finally accept that there might be nuances between setting up with CLI (like in the official docs) and Helm chart. We have to repeat whatever steps we have done on N clusters we we want to create a clustermesh with. At the time of this document, there is even an open issue for this https://github.com/cilium/cilium/issues/19057

--

--