-
Notifications
You must be signed in to change notification settings - Fork 1.8k
in_kafka: improve offset commit timing to avoid data loss in edge cases #10735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch refactors the Kafka input plugin’s offset commit logic when enable.auto.commit is disabled. Previously, offsets were committed immediately after each individual message was processed, even before it was flushed to the Fluent Bit pipeline. This introduced a small risk window where, in the event of a crash or forced shutdown, messages could be acknowledged (via Kafka commit) without being fully ingested resulting in potential data loss under extreme conditions. This patch moves the offset commit logic to occur only after a successful batch flush (flb_input_log_append). This ensures that: - offsets are only committed if messages were actually encoded and flushed. - we avoid committing unprocessed data, thereby improving correctness. - we reduce the number of commits, improving efficiency in batch mode. This change aligns commit timing with actual ingestion and eliminates the window of inconsistency between processing and commit acknowledgment. Signed-off-by: Eduardo Silva <[email protected]>
WalkthroughThe Kafka input plugin’s commit logic in in_kafka_collect was refactored to conditionally commit only after a successful append to the input log. A new append_ret variable governs the flow, with explicit error logging on append failures and early success when no data is produced. Changes
Sequence Diagram(s)sequenceDiagram
participant Kafka as Kafka Broker
participant Plugin as in_kafka_collect
participant Encoder as Encoder
participant InputLog as flb_input_log_append
Kafka->>Plugin: Poll messages
Plugin->>Encoder: Encode batch
alt output_length == 0
Plugin-->>Kafka: No commit/change
Plugin-->Plugin: ret = 0
else Encode success
Plugin->>InputLog: Append records
alt append_ret == 0
opt enable_auto_commit == false
Plugin->>Kafka: rd_kafka_commit()
end
Plugin-->Plugin: ret = 0
else append_ret != 0
Plugin->>Plugin: Log error "failed to append records"
Plugin-->Plugin: ret = -1
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
e4f69c9 to
8425a60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
plugins/in_kafka/in_kafka.c (2)
160-160: Nit: Initialize append_ret or narrow its scopeMinor polish to avoid any chance of future uninitialized-use if the block is refactored later. Alternatively, declare it inside the append block.
-int append_ret; +int append_ret = -1;
224-226: Improve append failure logging (include error code and payload size)Richer context makes triage simpler.
- flb_plg_error(ins, "failed to append records"); + flb_plg_error(ins, "failed to append records: ret=%d, bytes=%zu", + append_ret, ctx->log_encoder->output_length);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/in_kafka/in_kafka.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (2)
plugins/in_kafka/in_kafka.c (2)
213-216: LGTM: Commit gated by successful append aligns with at-least-once ingestionDeferring commits until after flb_input_log_append() succeeds removes the data-loss window called out in the PR objective. This is the right place to gate commits.
228-229: LGTM: No-op when no data to flushEarly success when there’s nothing to append is correct and avoids unnecessary commits.
| if (append_ret == 0) { | ||
| if (!ctx->enable_auto_commit) { | ||
| rd_kafka_commit(ctx->kafka.rk, NULL, 0); | ||
| } | ||
| ret = 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle rd_kafka_commit() errors; consider async commit optionally
Currently, commit failures are silent. Capture and log the error so operators can observe commit issues. Keep ret = 0 to preserve at-least-once semantics (append already succeeded), but surface the failure.
Optionally, consider using async commit (third arg = 1) or making it configurable to avoid blocking the input thread on network hiccups.
- if (!ctx->enable_auto_commit) {
- rd_kafka_commit(ctx->kafka.rk, NULL, 0);
- }
+ if (!ctx->enable_auto_commit) {
+ rd_kafka_resp_err_t c_err;
+ c_err = rd_kafka_commit(ctx->kafka.rk, NULL, 0);
+ if (c_err != RD_KAFKA_RESP_ERR_NO_ERROR) {
+ flb_plg_warn(ins, "rd_kafka_commit failed: %s",
+ rd_kafka_err2str(c_err));
+ /* Keep ret == 0: data has been appended; failing here would
+ * not help delivery guarantees and could cause unnecessary
+ * collector backoff. */
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (append_ret == 0) { | |
| if (!ctx->enable_auto_commit) { | |
| rd_kafka_commit(ctx->kafka.rk, NULL, 0); | |
| } | |
| ret = 0; | |
| } | |
| if (append_ret == 0) { | |
| if (!ctx->enable_auto_commit) { | |
| rd_kafka_resp_err_t c_err; | |
| c_err = rd_kafka_commit(ctx->kafka.rk, NULL, 0); | |
| if (c_err != RD_KAFKA_RESP_ERR_NO_ERROR) { | |
| flb_plg_warn(ins, "rd_kafka_commit failed: %s", | |
| rd_kafka_err2str(c_err)); | |
| /* Keep ret == 0: data has been appended; failing here would | |
| * not help delivery guarantees and could cause unnecessary | |
| * collector backoff. */ | |
| } | |
| } | |
| ret = 0; | |
| } |
This patch refactors the Kafka input plugin’s offset commit logic when enable.auto.commit is disabled.
Previously, offsets were committed immediately after each individual message was processed, even before it was flushed to the Fluent Bit pipeline. This introduced a small risk window where, in the event of a crash or forced shutdown, messages could be acknowledged (via Kafka commit) without being fully ingested resulting in potential data loss under extreme conditions.
This patch moves the offset commit logic to occur only after a successful batch registration through
flb_input_log_append(). This ensures that:This change aligns commit timing with actual ingestion and eliminates the window of inconsistency between processing and commit acknowledgment.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit