Skip to content

Perishable Controllers quorum with SASL-SSL and SCRAM-SHA-512 #106

@piotrpietka

Description

@piotrpietka

I came across a disturbing bug while building a quorum of controllers. The challenge was the use of SASL_SSL and SCRAM-SHA-512 as the primary and only mechanism in intra-cluster communication (controller-broker, controller-controller, broker-broker). If there is only one controller in CurrentVoters, the cluster is immune to all server failures and restarts, even all at the same time, etc. If I promote more controllers from CurrentObservers to CurrentVoters, the loss of one controller does not cause any problems but if I lose all controllers, even for a while, the quorum no longer has a chance to gather.

I add scram credentials during storage format:
1st controller$ /opt/kafka/bin/kafka-storage.sh format -t $CLUSTER_ID --feature kraft.version=1 --initial-controllers 1@kafka1.domain.local:19095,2@kafka2.domain.local:19095,3@kafka3.domain.local:19095 -c /etc/kafka/controller.properties --add-scram 'SCRAM-SHA-512=[name="admin",password="qaz123456"]'
other controllers$ /opt/kafka/bin/kafka-storage.sh format -t $CLUSTER_ID --feature kraft.version=1 --no-initial-controllers -c /etc/kafka/controller.properties --add-scram 'SCRAM-SHA-512=[name="admin",password="qaz123456"]'

controller config:
process.roles=controller
node.id=1
controller.quorum.bootstrap.servers=kafka1.domain.local:19095,kafka2.domain.local:19095,kafka3.domain.local:19095
listeners=CONTROLLER_SASL_SSL://kafka1.pietka.local:19095

controller.listener.names=CONTROLLER_SASL_SSL
listener.security.protocol.map=CONTROLLER_SASL_SSL:SASL_PLAINTEXT
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
super.users=User:pietka.local;User:admin;User:ANONYMOUS
security.protocol=SASL_PLAINTEXT

#ssl
ssl.principal.mapping.rules=RULE:^.*CN=\\*\\.(.*?),OU=Corporation.*$/$1/L
ssl.client.auth=required
ssl.keystore.location=/etc/kafka/certs/client.keystore.jks
ssl.keystore.password=qaz123456
ssl.truststore.location=/etc/kafka/certs/server.truststore.jks
ssl.truststore.password=qaz123456

#sasl
sasl.mechanism=SCRAM-SHA-512
listener.name.controller_sasl_ssl.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="qaz123456" user_admin="qaz123456";
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="qaz123456" user_admin="qaz123456";
sasl.enabled.mechanisms=SCRAM-SHA-512
sasl.mechanism.controller.protocol=SCRAM-SHA-512
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512

log.dirs=/data/controller

num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
num.partitions=2
num.recovery.threads.per.data.dir=2
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
log.retention.hours=2
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

logs:
[2025-05-18 11:48:54,895] INFO [RaftManager id=1] Failed authentication with kafka2.domain.local/10.10.21.2 (channelId=2) (Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512) (org.apache.kafka.common.network.Selector) [2025-05-18 11:48:54,895] INFO [RaftManager id=1] Node 2 disconnected. (org.apache.kafka.clients.NetworkClient) [2025-05-18 11:48:54,896] ERROR [RaftManager id=1] Connection to node 2 (kafka2.domain.local/10.10.21.2:19095) failed authentication due to: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512 (org.apache.kafka.clients.NetworkClient) [2025-05-18 11:48:54,896] ERROR [kafka-1-raft-outbound-request-thread]: Failed to send the following request due to authentication error: ClientRequest(expectResponse=true, callback=org.apache.kafka.raft.KafkaNetworkChannel$$Lambda/0x00007f4d3f3cd778@55782294, destination=2, correlationId=576, clientId=raft-client-1, createdTimeMs=1747561734586, requestBuilder=VoteRequestData(clusterId='R8gKVujrQ5qNagUU3rwdHw', voterId=2, topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, replicaEpoch=5, replicaId=1, replicaDirectoryId=vE9YPPpdfucShPkUZgFA8g, voterDirectoryId=HzZ9HhGDsjlbWPgBXFd67g, lastOffsetEpoch=3, lastOffset=580, preVote=true)])])) (org.apache.kafka.raft.KafkaNetworkChannel$SendThread) [2025-05-18 11:48:54,896] ERROR Request OutboundRequest(correlationId=574, data=VoteRequestData(clusterId='R8gKVujrQ5qNagUU3rwdHw', voterId=2, topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, replicaEpoch=5, replicaId=1, replicaDirectoryId=vE9YPPpdfucShPkUZgFA8g, voterDirectoryId=HzZ9HhGDsjlbWPgBXFd67g, lastOffsetEpoch=3, lastOffset=580, preVote=true)])]), createdTimeMs=1747561734586, destination=kafka2.domain.local:19095 (id: 2 rack: null isFenced: false)) failed due to authentication error (org.apache.kafka.raft.KafkaNetworkChannel) org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512 [2025-05-18 11:48:54,898] ERROR [RaftManager id=1] Unexpected error NETWORK_EXCEPTION in VOTE response: InboundResponse(correlationId=574, data=VoteResponseData(errorCode=13, topics=[], nodeEndpoints=[]), source=kafka2.domain.local:19095 (id: 2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient) [2025-05-18 11:48:54,939] INFO [SocketServer listenerType=CONTROLLER, nodeId=1] Failed authentication with /10.10.21.2 (channelId=10.10.21.1:19095-10.10.21.2:51690-2-194) (Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512) (org.apache.kafka.common.network.Selector) [2025-05-18 11:48:54,942] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader) [2025-05-18 11:48:55,043] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader) [2025-05-18 11:48:55,144] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader) [2025-05-18 11:48:55,247] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions