Skip to content

Conversation

@efcasado
Copy link
Contributor

@efcasado efcasado commented Apr 27, 2025

Motivation

The current implementation of the Kafka Connect adaptor in Pulsar IO does not support Kafka Connect transforms and predicates. This limits the flexibility and potential of using Kafka Connect-based source connectors, such as the Debezium PostgreSQL source connector, which rely heavily on transformations to modify, filter, or enrich records before they are ingested. By adding support for transforms, we enable users to fully leverage these connectors' capabilities, sparing them the need to deploy additional Pulsar Functions to perform simple message-level transformations.

Modifications

Extended KafkaConnectSource with three additional steps: initPredicates, initTransforms and applyTransforms.

  • initPredicates and initTransforms are called during the initialization of the Kafka Connect adaptor and are responsible for setting up any configured predicates and transforms.

  • applyTransforms is executed during processRecords and applies all available transforms to each source record before it is processed.

kafka-connect-adaptor has been updated to include the connect-transforms package.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Extended the existing KafkaConnectSourceTest test to verify that transforms can be initialized and applied

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions
Copy link

@efcasado Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-label-missing doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Apr 27, 2025
@lhotari lhotari added this to the 4.1.0 milestone Apr 28, 2025
@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

Thanks for the contribution, @efcasado!

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

Connectors that rely on transforms, such as Debezium, typically package their own set of Kafka Connect transforms, making it unnecessary to include them in the adaptor itself.

@efcasado Are you also planning to contribute to the Pulsar IO Debezium connectors?

Debezium version is outdated in Pulsar:

pulsar/pom.xml

Line 204 in a4a3409

<debezium.version>1.9.7.Final</debezium.version>

It would be great to get that updated too. Would you be interested in handling that too? Previous upgrading attempt was #23078, but that's stalled.
Could we get Debezium upgraded to 3.1.x ?

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

@efcasado In SMT, there's also predicates (Confluent doc for predicates). Do you have intention to add support for that too?

@efcasado
Copy link
Contributor Author

@efcasado In SMT, there's also predicates (Confluent doc for predicates). Do you have intention to add support for that too?

I guess it makes sense to include them in this PR since they are somewhat related. I can give it a try unless you tell me to only do this in a separate PR.

@efcasado
Copy link
Contributor Author

efcasado commented Apr 28, 2025

Connectors that rely on transforms, such as Debezium, typically package their own set of Kafka Connect transforms, making it unnecessary to include them in the adaptor itself.

@efcasado Are you also planning to contribute to the Pulsar IO Debezium connectors?

Debezium version is outdated in Pulsar:

pulsar/pom.xml

Line 204 in a4a3409

<debezium.version>1.9.7.Final</debezium.version>

It would be great to get that updated too. Would you be interested in handling that too? Previous upgrading attempt was #23078, but that's stalled.
Could we get Debezium upgraded to 3.1.x ?

Actually, that's how this PR started but I ended up opting for the path of least resistance 😅 I got a version of Debezium 3.1 partially working, but I ran into some issues I could not fix. I will be very happy to give it another try in the coming days.

@lhotari
Copy link
Member

lhotari commented Apr 28, 2025

@efcasado In SMT, there's also predicates (Confluent doc for predicates). Do you have intention to add support for that too?

I guess it makes sense to include them in this PR since they are somewhat related. I can give it a try unless you tell me to only do this in a separate PR.

It's fine to do it in this PR.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Added a comment about supporting negate for predicates. It would be great to have some unit test coverage too.

@efcasado efcasado marked this pull request as draft April 28, 2025 17:32
@efcasado efcasado changed the title [improve][io] support kafka connect transforms [improve][io] support kafka connect transforms and predicates Apr 29, 2025
@efcasado
Copy link
Contributor Author

@lhotari I believe I addressed all the above points and the change is ready for another round of reviews 😊

@lhotari
Copy link
Member

lhotari commented Apr 29, 2025

@lhotari I believe I addressed all the above points and the change is ready for another round of reviews 😊

@efcasado great work! one more question about skipping source records.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

codecov-commenter commented Apr 29, 2025

Codecov Report

Attention: Patch coverage is 81.01266% with 15 lines in your changes missing coverage. Please review.

Project coverage is 74.18%. Comparing base (bbc6224) to head (9b7134a).
Report is 1057 commits behind head on master.

Files with missing lines Patch % Lines
...he/pulsar/io/kafka/connect/KafkaConnectSource.java 82.05% 9 Missing and 5 partials ⚠️
...r/io/kafka/connect/AbstractKafkaConnectSource.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24221      +/-   ##
============================================
+ Coverage     73.57%   74.18%   +0.61%     
+ Complexity    32624    32555      -69     
============================================
  Files          1877     1866      -11     
  Lines        139502   144919    +5417     
  Branches      15299    16557    +1258     
============================================
+ Hits         102638   107508    +4870     
+ Misses        28908    28897      -11     
- Partials       7956     8514     +558     
Flag Coverage Δ
inttests 26.71% <ø> (+2.12%) ⬆️
systests 23.28% <ø> (-1.04%) ⬇️
unittests 73.66% <81.01%> (+0.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...r/io/kafka/connect/AbstractKafkaConnectSource.java 63.28% <0.00%> (+1.68%) ⬆️
...he/pulsar/io/kafka/connect/KafkaConnectSource.java 68.00% <82.05%> (+13.94%) ⬆️

... and 1078 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari
Copy link
Member

lhotari commented Apr 30, 2025

Great work on this PR @efcasado! Merging this soon.

Actually, that's how this PR started but I ended up opting for the path of least resistance 😅 I got a version of Debezium 3.1 partially working, but I ran into some issues I could not fix. I will be very happy to give it another try in the coming days.

Looking forward to future contributions!

@lhotari lhotari merged commit c0c5044 into apache:master Apr 30, 2025
51 of 52 checks passed
lhotari pushed a commit that referenced this pull request Apr 30, 2025
@efcasado
Copy link
Contributor Author

@lhotari Will this change be cherry-picked for 3.x too? We are running Pulsar 3.3.5 and we are wondering if we should just use our own fork or wait for you and the team to promote this change to the 3.x branch.

@lhotari
Copy link
Member

lhotari commented Apr 30, 2025

@lhotari Will this change be cherry-picked for 3.x too? We are running Pulsar 3.3.5 and we are wondering if we should just use our own fork or wait for you and the team to promote this change to the 3.x branch.

I don't see barriers in cherry-picking to 3.3.x since this isn't introducing breaking changes. I'll mark it for 3.0.x and 3.3.x too.

lhotari pushed a commit that referenced this pull request Apr 30, 2025
lhotari pushed a commit that referenced this pull request Apr 30, 2025
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request May 2, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request May 6, 2025
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request May 7, 2025
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request May 7, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request May 12, 2025
nodece pushed a commit to ascentstream/pulsar that referenced this pull request May 28, 2025
@lhotari
Copy link
Member

lhotari commented Sep 10, 2025

@efcasado FYI, There's an open issue #24646 related to this PR. Being fixed in #24654

KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants