-
Notifications
You must be signed in to change notification settings - Fork 1.7k
in_tail: fix last_processed_bytes calculation #10677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdjusts tail plugin byte-tracking (per-line assignment and per-chunk reset), adds a multiline regex parser config, and introduces a test verifying offset_key behavior with multiline parsing in the tail input. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant FluentBit
participant Tail
participant Parser
participant Output
Note over TestRunner,FluentBit: Prepare file with two multiline entries
TestRunner->>FluentBit: start engine (tail input, parsers_multiline.conf, offset_key)
FluentBit->>Tail: initialize file state
Tail->>Parser: read chunk / lines
Parser-->>Tail: return parsed multiline record + processed_bytes
Tail->>Tail: set file.last_processed_bytes = processed_bytes
Tail->>Output: emit record (includes computed offset_key)
Note over Tail: after chunk processed => file.last_processed_bytes = 0 and stream_offset updated
TestRunner->>Tail: append another log line
Tail->>Parser: parse new line
Parser-->>Tail: emit subsequent record
Output->>TestRunner: records available for assertion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/runtime/in_tail.c (2)
1417-1421
: Use size_t-compatible format specifier for strlen to avoid UB on some platformsstrlen returns size_t; printing with %ld is not portable. Prefer %zu and drop the unnecessary address-of on expected_msg.
- ret = snprintf(&expected_msg[0], sizeof(expected_msg), "\"%s\":%ld", offset_key, strlen(msg_before_tail)+strlen(NEW_LINE)+strlen(msg_before_tail2)+strlen(NEW_LINE)); + ret = snprintf(expected_msg, sizeof(expected_msg), "\"%s\":%zu", + offset_key, + strlen(msg_before_tail) + strlen(NEW_LINE) + + strlen(msg_before_tail2) + strlen(NEW_LINE));
2432-2432
: Optionally gate this test under FLB_HAVE_REGEX like other regex-dependent testsThis test relies on a type=regex multiline parser. To mirror how “parser” and “tag_regex” are guarded, consider gating registration to avoid failures when regex is disabled.
If preferred, wrap this entry:
- {"multiline_offset_key", flb_test_multiline_offset_key}, + #ifdef FLB_HAVE_REGEX + {"multiline_offset_key", flb_test_multiline_offset_key}, + #endifConfirm whether CI includes builds without regex support. If so, the guard is advisable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
plugins/in_tail/tail_file.c
(1 hunks)tests/runtime/data/tail/parsers_multiline.conf
(1 hunks)tests/runtime/in_tail.c
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/in_tail/tail_file.c
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/runtime/in_tail.c (1)
src/flb_lib.c (4)
flb_service_set
(647-673)flb_input_set
(300-330)flb_output_set
(515-546)flb_start
(914-925)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
char *msg_before_tail = "[2025-06-16 20:42:22,291] INFO - aaaaaaaaaaa"; | ||
char *msg_before_tail2 = "[2025-06-16 20:42:22,500] Error"; | ||
char *msg_final = "[2025-06-16 20:45:29,234] Fatal"; | ||
char expected_msg[1024] = {0}; | ||
int ret; | ||
int num; | ||
|
||
char *expected_strs[] = {msg_final, &expected_msg[0]}; | ||
struct str_list expected = { | ||
.size = sizeof(expected_strs)/sizeof(char*), | ||
.lists = &expected_strs[0], | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Test messages don’t match the configured multiline “start” pattern
The messages use “[YYYY-MM-DD HH:MM:SS,mmm] …” while the added parser (parsers_multiline.conf) expects an ISO8601 “T” separator and a “.” before milliseconds. Without aligning them, the first line after startup may not be recognized as a start line and will be held until flush_timeout, making this test timing-dependent.
Align either the parser (preferred; see parser file comment) or change the test messages to match the parser pattern.
If you keep the bracketed format in the test, please apply the parser diff suggested in parsers_multiline.conf so the start-state matches immediately and the record is flushed without waiting for the timeout.
I’ll verify the multiline “start” regex in parsers_multiline.conf to confirm it requires a “T” and “.” before milliseconds.
🏁 Script executed:
#!/bin/bash
set -e
# Locate the multiline parser config
config_file=$(fd parsers_multiline.conf)
echo "Config file: $config_file"
# Show lines around the start‐pattern definition
rg -nC2 'start' "$config_file"
rg -nC2 'pattern' "$config_file"
Length of output: 1216
I’ll inspect the specific parser config used by this test to confirm the exact “start” regex:
🏁 Script executed:
#!/bin/bash
set -e
# Find the multiline parser config for the tail tests
file=$(fd --exact-name parsers_multiline.conf tests/runtime/data/tail)
echo "Inspecting: $file"
# Show the “start” definition and its associated pattern
rg -nC2 '^\s*start' "$file"
rg -nC2 '^\s*pattern' "$file"
Length of output: 294
I’ll locate and inspect the exact parsers_multiline.conf
used by the tail tests to verify the start
regex:
🏁 Script executed:
#!/bin/bash
set -e
# Find all parsers_multiline.conf under tests/runtime/data/tail
configs=$(fd parsers_multiline.conf tests/runtime/data/tail)
if [[ -z "$configs" ]]; then
echo "No parsers_multiline.conf found under tests/runtime/data/tail"
exit 1
fi
# Show the 'start' and 'pattern' lines with context for each config
for cfg in $configs; do
echo "---- Inspecting: $cfg ----"
rg -nC2 '^\s*start' "$cfg" || true
rg -nC2 '^\s*pattern' "$cfg" || true
done
Length of output: 499
I’ll dump the test’s multiline parser config to inspect its regex:
🏁 Script executed:
#!/bin/bash
set -e
file="tests/runtime/data/tail/parsers_multiline.conf"
echo "==== $file ===="
sed -n '1,200p' "$file"
Length of output: 489
Multiline “start” regex doesn’t match the bracketed timestamps in your tests
The tests/runtime/data/tail/parsers_multiline.conf
rules expect lines like
2025-06-16T20:42:22.291…
but your test messages begin with
[2025-06-16 20:42:22,291]…
As-is, the first log line won’t match the start_state
rule and will be buffered until the flush timeout—making the test timing-dependent.
Please align one of the following:
• Update the parser’s start_state
regex in tests/runtime/data/tail/parsers_multiline.conf
to accept the leading bracket, space separator, and comma-millisecond format.
For example:
rule "start_state" "/^\[?(\d{4}-\d{2}-\d{2})[ T](\d{2}:\d{2}:\d{2})[.,](\d{3})/“ "cont"
• Or change the hardcoded test messages in tests/runtime/in_tail.c
to use the ISO8601 format with “T” and “.” (no leading “[”).
Either approach will ensure the first line is recognized immediately and flushed without waiting for the timeout.
🤖 Prompt for AI Agents
In tests/runtime/in_tail.c around lines 1399 to 1411, the test messages use a
bracketed timestamp format "[YYYY-MM-DD HH:MM:SS,mmm]" which does not match the
parser's multiline start_state expecting "YYYY-MM-DDTHH:MM:SS.mmm"; update the
hardcoded msg_before_tail, msg_before_tail2, and msg_final strings to use
ISO8601-style timestamps without the leading "[" and with a "T" separator and
dot millisecond separator (e.g. "2025-06-16T20:42:22.291…") so the first line
matches the start_state immediately and the test is not timing-dependent.
Fix last_processed_bytes calculation in multiple line scenario. Signed-off-by: zshuang0316 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/runtime/data/tail/parsers_multiline.conf (1)
5-6
: Optional: make the parser more reusable by accepting both space/T and comma/dot for sub-second separatorIf you want this parser to work with similar logs that use ISO 8601 “T” and/or a dot for milliseconds, you can broaden the regex while keeping current behavior intact.
- rule "start_state" "/^\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\]/" "cont" - rule "cont" "/^(?!\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\]).*/" "cont" + rule "start_state" "/^\[\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[,.]\d{3}\]/" "cont" + rule "cont" "/^(?!\[\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[,.]\d{3}\]).*/" "cont"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
plugins/in_tail/tail_file.c
(2 hunks)tests/runtime/data/tail/parsers_multiline.conf
(1 hunks)tests/runtime/in_tail.c
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/runtime/in_tail.c
- plugins/in_tail/tail_file.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
🔇 Additional comments (3)
tests/runtime/data/tail/parsers_multiline.conf (3)
1-3
: Parser block and naming look correctSection header, name, and type are valid for a Fluent Bit multiline regex parser.
5-6
: Regex now matches the bracketed timestamp format used in testsThis addresses the earlier mismatch (T vs space, unescaped dot). Anchoring to the opening bracket and using a negative lookahead for continuation lines is correct and should make the parser robust and non-flaky for the provided log format.
4-4
: flush_timeout(2000 ms) is safe under the existing 5 s test waitThe multiline parser defined in
• tests/runtime/data/tail/parsers_multiline.conf (flush_timeout 2000)
is exercised by
• tests/runtime/in_tail.c (lines 1466–1470) via
wait_num_with_timeout(5000, &num)Since the test harness waits up to 5000 ms for output, a 2000 ms flush timeout will always fire well before the timeout window. No change is required.
This value represent the number of bytes processed by process_content() in the last iteration so we can set it the current processed_bytes directly.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests