Add support for listing Kafka offsets in bulk #26168

pmw-rp · 2025-07-10T17:32:27Z

Description

This PR modifies how the Trino Kafka integration performs translation of timestamps into offsets.

The current implementation makes a Kafka API call per partition to translate the timestamp, however the API can accept a list of partitions as part of the call, allowing for a bulk translation.

By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

cla-bot · 2025-07-10T17:32:31Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

findinpath · 2025-07-10T20:43:24Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

Change commit comment to
pull all partition offsets in a single call to Kafka. -> Retrieve in bulk partition offsets

findinpath · 2025-07-10T20:43:49Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

 import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
 import org.apache.kafka.common.PartitionInfo;
 import org.apache.kafka.common.TopicPartition;
 import org.apache.kafka.common.config.ConfigResource;


Squash the two commits into one

findinpath · 2025-07-10T20:46:34Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

+import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;


In the description

"By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time."

please add some specific numbers to add the reviewers understand the impact of this change.

findinpath · 2025-07-10T20:48:55Z

https://github.com/trinodb/trino/actions/runs/16201941001/job/45742893962?pr=26168

Commit 97525136936f7faffd10b4ed3519939d170416e1 is an merge commit: https://api.github.com/repos/trinodb/trino/commits/97525136936f7faffd10b4ed3519939d170416e1
Error: PR requires a rebase. Found: 1 merge commits.

git rebase origin/master

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

findinpath · 2025-07-10T21:06:51Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

+                        Map<TopicPartition, Long> partitionBeginTimestamps = new HashMap<>();
+                        partitionBeginOffsets.forEach((partition, partitionIndex) -> {
+                            partitionBeginTimestamps.put(partition, offsetTimestampRanged.get().begin());
+                        });


long partitionBeginTimestamp = floorDiv(offsetTimestampRanged.get().begin(), MICROSECONDS_PER_MILLISECOND); Map<TopicPartition, Long> partitionBeginTimestamps = partitionBeginOffsets.entrySet().stream() .collect(Collectors.toMap(Map.Entry::getKey, _ -> partitionBeginTimestamp));

No need to mutate the map anymore

timestamps.replaceAll((k, v) -> floorDiv(v, MICROSECONDS_PER_MILLISECOND));

in findOffsetsForTimestampGreaterOrEqual method.

findinpath · 2025-07-10T21:11:38Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

    }

-    private static Optional<Long> findOffsetsForTimestampGreaterOrEqual(KafkaConsumer<byte[], byte[]> kafkaConsumer, TopicPartition topicPartition, long timestamp)
+    private static Map<TopicPartition, Long> findOffsetsForTimestampGreaterOrEqual(KafkaConsumer<byte[], byte[]> kafkaConsumer, Map<TopicPartition, Long> timestamps)


optional: Maybe we could think rather returning Map<TopicPartition, Optional<Long> instead

It is better to avoid having null values.

github-actions · 2025-08-01T17:05:33Z

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

wendigo · 2025-08-13T10:18:54Z

@cla-bot check

cla-bot · 2025-08-13T10:18:58Z

The cla-bot has been summoned, and re-checked this pull request!

By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time

wendigo

@findinpath I've applied your review comments myself

github-actions bot added the kafka Kafka connector label Jul 10, 2025

findinpath requested a review from wendigo July 10, 2025 20:41

findinpath reviewed Jul 10, 2025

View reviewed changes

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java Outdated Show resolved Hide resolved

findinpath reviewed Jul 10, 2025

View reviewed changes

github-actions bot added the stale label Aug 1, 2025

cla-bot bot added the cla-signed label Aug 13, 2025

wendigo force-pushed the list-offsets branch from 9752513 to b113c30 Compare August 13, 2025 10:19

Retrieve Kafka partitions offsets in bulk

a9805c7

By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time

wendigo force-pushed the list-offsets branch from b113c30 to a9805c7 Compare August 13, 2025 10:48

wendigo approved these changes Aug 13, 2025

View reviewed changes

wendigo merged commit f522a29 into trinodb:master Aug 13, 2025
16 checks passed

github-actions bot added this to the 477 milestone Aug 13, 2025

ebyhr mentioned this pull request Aug 16, 2025

Add Trino 477 release notes #26350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for listing Kafka offsets in bulk #26168

Add support for listing Kafka offsets in bulk #26168

Uh oh!

pmw-rp commented Jul 10, 2025 •

edited by ebyhr

Loading

Uh oh!

cla-bot bot commented Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath commented Jul 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

wendigo commented Aug 13, 2025

Uh oh!

cla-bot bot commented Aug 13, 2025

Uh oh!

wendigo left a comment

Uh oh!

Uh oh!

Uh oh!

Add support for listing Kafka offsets in bulk #26168

Add support for listing Kafka offsets in bulk #26168

Uh oh!

Conversation

pmw-rp commented Jul 10, 2025 • edited by ebyhr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release notes

Uh oh!

cla-bot bot commented Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

wendigo commented Aug 13, 2025

Uh oh!

cla-bot bot commented Aug 13, 2025

Uh oh!

wendigo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pmw-rp commented Jul 10, 2025 •

edited by ebyhr

Loading

findinpath commented Jul 10, 2025 •

edited

Loading