From 13595da9f3c682949352a7404fbf8c9a10fd79ae Mon Sep 17 00:00:00 2001
From: Hiroshi Hatake <hiroshi@chronosphere.io>
Date: Fri, 11 Oct 2024 15:17:32 +0900
Subject: [PATCH 1/5] in_tail: Add a description and note for Unicode.Encoding
 parameter

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
---
 pipeline/inputs/tail.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md
index 9d56320b7..a9f348309 100644
--- a/pipeline/inputs/tail.md
+++ b/pipeline/inputs/tail.md
@@ -37,6 +37,7 @@ The plugin supports the following configuration parameters:
 | `static_batch_size`   | Set the maximum number of bytes to process per iteration for the monitored static files (files that already exist upon Fluent Bit start).                                                                                                                                                                                                                                                                                                                          | `50M`     |
 | `file_cache_advise`   | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux.                                                                                                                                                                                                                                                                                                            | `on`      |
 | `threaded`            | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs).                                                                                                                                                                                                                                                                                                                                                            | `false`   |
+| `Unicode.Encoding`    | Set the Unicode character encoding of the file data. This parameter requests two-byte aligned chunk and buffer sizes. If data is not aligned for two bytes, Fluent Bit will use two-byte alignment automatically to avoid character breakages on consuming boundaries. Supported values: `UTF-16LE`, `UTF-16BE`, and `auto`.                                                                                                                                       | `none`    |
 
 ## Buffers and memory management
 
@@ -77,6 +78,17 @@ If no database file is present, positioning behavior depends on the value of `re
 
 The database file essentially stores `inode=offset` so it should be unique per instance of the plugin, for example if you have two tail inputs then use two separate `db` files for each. That way each tail input can independently track its own state.
 
+{% hint style="info" %}
+Note that `Unicode.Encoding` depends on simdutf library which is written in C++11 or above.
+So, the older platforms are not supported for this feature.
+In addition, `Unicode.Encoding auto` is not covered for the all of the usages.
+This is because sometimes this auto-detecting for character encodings makes a mistake to guess the correct encoding.
+
+We recommend to use `UTF-16LE` or `UTF-16BE` if the target file encoding is pre-determined or known beforehand.
+In details, this parameter requests to use 2-bytes aligned chunk and buffer sizes.
+If they are not aligned for 2 bytes, Fluent Bit will use 2-bytes alignments automatically to avoid character breakages on consuming boundaries.
+{% endhint %}
+
 ## Monitor a large number of files
 
 To monitor a large number of files, you can increase the `inotify` settings in your Linux environment by modifying the following `sysctl` parameters:

From cbfe1e919f0b6def4a82940c7bb2585f6be81b90 Mon Sep 17 00:00:00 2001
From: Hiroshi Hatake <hatake@calyptia.com>
Date: Tue, 8 Jul 2025 12:06:21 +0900
Subject: [PATCH 2/5] Update pipeline/inputs/tail.md

Co-authored-by: Alexa Kreizinger <alexakreizinger@gmail.com>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
---
 pipeline/inputs/tail.md | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md
index a9f348309..3a39949b6 100644
--- a/pipeline/inputs/tail.md
+++ b/pipeline/inputs/tail.md
@@ -79,14 +79,9 @@ If no database file is present, positioning behavior depends on the value of `re
 The database file essentially stores `inode=offset` so it should be unique per instance of the plugin, for example if you have two tail inputs then use two separate `db` files for each. That way each tail input can independently track its own state.
 
 {% hint style="info" %}
-Note that `Unicode.Encoding` depends on simdutf library which is written in C++11 or above.
-So, the older platforms are not supported for this feature.
-In addition, `Unicode.Encoding auto` is not covered for the all of the usages.
-This is because sometimes this auto-detecting for character encodings makes a mistake to guess the correct encoding.
-
-We recommend to use `UTF-16LE` or `UTF-16BE` if the target file encoding is pre-determined or known beforehand.
-In details, this parameter requests to use 2-bytes aligned chunk and buffer sizes.
-If they are not aligned for 2 bytes, Fluent Bit will use 2-bytes alignments automatically to avoid character breakages on consuming boundaries.
+The `Unicode.Encoding` parameter is dependent on the simdutf library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
+
+Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
 {% endhint %}
 
 ## Monitor a large number of files

From e54556fba6a0af5ab39f4068f5156d3d99c7efbe Mon Sep 17 00:00:00 2001
From: Hiroshi Hatake <hiroshi@chronosphere.io>
Date: Wed, 22 Oct 2025 16:57:18 +0900
Subject: [PATCH 3/5] in_tail: Add generic.encoding parameter descriptions

Also I added the reason why we need to support these parameters and how
to use them.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
---
 pipeline/inputs/tail.md | 91 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md
index 3a39949b6..5b0b5d3bb 100644
--- a/pipeline/inputs/tail.md
+++ b/pipeline/inputs/tail.md
@@ -38,6 +38,7 @@ The plugin supports the following configuration parameters:
 | `file_cache_advise`   | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux.                                                                                                                                                                                                                                                                                                            | `on`      |
 | `threaded`            | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs).                                                                                                                                                                                                                                                                                                                                                            | `false`   |
 | `Unicode.Encoding`    | Set the Unicode character encoding of the file data. This parameter requests two-byte aligned chunk and buffer sizes. If data is not aligned for two bytes, Fluent Bit will use two-byte alignment automatically to avoid character breakages on consuming boundaries. Supported values: `UTF-16LE`, `UTF-16BE`, and `auto`.                                                                                                                                       | `none`    |
+| `Generic.Encoding`    | Set the non-Unicode encoding of the file data. Supported values: `ShiftJIS`, `UHC`, `GBK`, `GB18030`, `Big5`, `Win866`, `Win874`, `Win1250`, `Win1251`, `Win1252`, `Win2513`, `Win1254`, `Win1255`, and `Win1256`.      | `none`    |
 
 ## Buffers and memory management
 
@@ -84,6 +85,13 @@ The `Unicode.Encoding` parameter is dependent on the simdutf library, which is i
 Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
 {% endhint %}
 
+{% hint style="info" %}
+The `Unicode.Encoding` parameter is dependent on the simdutf library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
+
+Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
+{% endhint %}
+
+
 ## Monitor a large number of files
 
 To monitor a large number of files, you can increase the `inotify` settings in your Linux environment by modifying the following `sysctl` parameters:
@@ -464,3 +472,86 @@ While file rotation is handled, there are risks of potential log loss when using
 - Final note: the `Path` patterns can't match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records.
 
 {% endhint %}
+
+## Character Encoding Conversion
+
+This feature allows Fluent Bit to convert logs from various character encodings into the standard UTF-8 format.
+This is crucial for processing logs from systems, especially Windows, that use legacy or non-UTF-8 encodings.
+Proper conversion ensures that your log data is correctly parsed, indexed, and searchable.
+
+### When to Use This Feature
+
+You should use this feature if your log files or messages are not in UTF-8 and you are seeing garbled or incorrectly rendered characters.
+This is common in environments that use:
+
+* Modern Windows applications that log in UTF-16.
+
+* Legacy Windows systems with applications that use traditional code pages (e.g., ShiftJIS, GBK, Win1252).
+
+### Configuration Parameters
+
+To enable encoding conversion, you will use one of the following two parameters within an input plugin configuration.
+
+1. `Unicode.Encoding`
+
+Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.
+
+* Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
+* Supported Values:
+    * UTF-16LE (Little-Endian)
+    * UTF-16BE (Big-Endian)
+
+2. `Generic.Encoding`
+
+Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
+
+* Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
+* Supported Values: You can use any of the names or aliases listed below.
+
+### East Asian Encodings
+* `ShiftJIS` (Aliases: `SJIS`, `CP932`, `Windows-31J`)
+* `GB18030`
+* `GBK`: (Alias: `CP936`)
+* `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
+* `Big5`: (Alias: `CP950`)
+
+### Windows (ANSI) Encodings
+* `Win1250` (Central European): (Alias: `CP1250`)
+* `Win1251` (Cyrillic): (Alias: `CP1251`)
+* `Win1252` (Western European / Latin): (Alias: `CP1252`)
+* `Win1253` (Greek): (Alias: `CP1253`)
+* `Win1254` (Turkish): (Alias: `CP1254`)
+* `Win1255` (Hebrew): (Alias: `CP1255`)
+* `Win1256` (Arabic): (Alias: `CP1256`)
+
+### DOS (OEM) Encodings
+* `Win866` (Cyrillic - DOS): (Alias: `CP866`)
+* `Win874` (Thai): (Alias: `CP874`)
+
+### Configuration Example
+
+Here is an example of how to use `Generic.Encoding` with the Tail input plugin to read a log file encoded in ShiftJIS.
+
+{% tabs %}
+{% tab title="fluent-bit.yaml" %}
+
+```yaml
+pipeline:
+    inputs:
+      - name:  tail
+        path: /var/log/containers/*.log
+        generic.encoding:    ShiftJIS
+```
+
+{% endtab %}
+{% tab title="fluent-bit.conf" %}
+
+```text
+[INPUT]
+    Name                tail
+    Path                C:\path\to\your\sjis.log
+    Generic.Encoding    ShiftJIS
+```
+
+{% endtab %}
+{% endtabs %}
\ No newline at end of file

From 37e837d2931c586679ac4224e4b6f7fbab13d0ec Mon Sep 17 00:00:00 2001
From: Hiroshi Hatake <hiroshi@chronosphere.io>
Date: Wed, 22 Oct 2025 17:00:54 +0900
Subject: [PATCH 4/5] Suppress lint warnings

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
---
 pipeline/inputs/tail.md | 49 ++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md
index 5b0b5d3bb..93adf9663 100644
--- a/pipeline/inputs/tail.md
+++ b/pipeline/inputs/tail.md
@@ -484,9 +484,9 @@ Proper conversion ensures that your log data is correctly parsed, indexed, and s
 You should use this feature if your log files or messages are not in UTF-8 and you are seeing garbled or incorrectly rendered characters.
 This is common in environments that use:
 
-* Modern Windows applications that log in UTF-16.
+- Modern Windows applications that log in UTF-16.
 
-* Legacy Windows systems with applications that use traditional code pages (e.g., ShiftJIS, GBK, Win1252).
+- Legacy Windows systems with applications that use traditional code pages (e.g., ShiftJIS, GBK, Win1252).
 
 ### Configuration Parameters
 
@@ -496,37 +496,40 @@ To enable encoding conversion, you will use one of the following two parameters
 
 Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.
 
-* Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
-* Supported Values:
-    * UTF-16LE (Little-Endian)
-    * UTF-16BE (Big-Endian)
+- Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
+- Supported Values:
+  - UTF-16LE (Little-Endian)
+  - UTF-16BE (Big-Endian)
 
-2. `Generic.Encoding`
+1. `Generic.Encoding`
 
 Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
 
-* Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
-* Supported Values: You can use any of the names or aliases listed below.
+- Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
+- Supported Values: You can use any of the names or aliases listed below.
 
 ### East Asian Encodings
-* `ShiftJIS` (Aliases: `SJIS`, `CP932`, `Windows-31J`)
-* `GB18030`
-* `GBK`: (Alias: `CP936`)
-* `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
-* `Big5`: (Alias: `CP950`)
+
+- `ShiftJIS` (Aliases: `SJIS`, `CP932`, `Windows-31J`)
+- `GB18030`
+- `GBK`: (Alias: `CP936`)
+- `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
+- `Big5`: (Alias: `CP950`)
 
 ### Windows (ANSI) Encodings
-* `Win1250` (Central European): (Alias: `CP1250`)
-* `Win1251` (Cyrillic): (Alias: `CP1251`)
-* `Win1252` (Western European / Latin): (Alias: `CP1252`)
-* `Win1253` (Greek): (Alias: `CP1253`)
-* `Win1254` (Turkish): (Alias: `CP1254`)
-* `Win1255` (Hebrew): (Alias: `CP1255`)
-* `Win1256` (Arabic): (Alias: `CP1256`)
+
+- `Win1250` (Central European): (Alias: `CP1250`)
+- `Win1251` (Cyrillic): (Alias: `CP1251`)
+- `Win1252` (Western European / Latin): (Alias: `CP1252`)
+- `Win1253` (Greek): (Alias: `CP1253`)
+- `Win1254` (Turkish): (Alias: `CP1254`)
+- `Win1255` (Hebrew): (Alias: `CP1255`)
+- `Win1256` (Arabic): (Alias: `CP1256`)
 
 ### DOS (OEM) Encodings
-* `Win866` (Cyrillic - DOS): (Alias: `CP866`)
-* `Win874` (Thai): (Alias: `CP874`)
+
+- `Win866` (Cyrillic - DOS): (Alias: `CP866`)
+- `Win874` (Thai): (Alias: `CP874`)
 
 ### Configuration Example
 

From a17048ae4f03a820adcb0cd9f5869b8f4e735657 Mon Sep 17 00:00:00 2001
From: Lynette  Miles <6818907+esmerel@users.noreply.github.com>
Date: Wed, 22 Oct 2025 14:27:53 -0700
Subject: [PATCH 5/5] Apply suggestions from code review

This should correct the severe vale errors and most of the suggestions, as well as matching current style.

Signed-off-by: Lynette  Miles <6818907+esmerel@users.noreply.github.com>
---
 pipeline/inputs/tail.md | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/pipeline/inputs/tail.md b/pipeline/inputs/tail.md
index 93adf9663..ef5493143 100644
--- a/pipeline/inputs/tail.md
+++ b/pipeline/inputs/tail.md
@@ -86,7 +86,7 @@ Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all c
 {% endhint %}
 
 {% hint style="info" %}
-The `Unicode.Encoding` parameter is dependent on the simdutf library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
+The `Unicode.Encoding` parameter is dependent on the `simdutf` library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.
 
 Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
 {% endhint %}
@@ -473,40 +473,40 @@ While file rotation is handled, there are risks of potential log loss when using
 
 {% endhint %}
 
-## Character Encoding Conversion
+## Character encoding conversion
 
 This feature allows Fluent Bit to convert logs from various character encodings into the standard UTF-8 format.
 This is crucial for processing logs from systems, especially Windows, that use legacy or non-UTF-8 encodings.
 Proper conversion ensures that your log data is correctly parsed, indexed, and searchable.
 
-### When to Use This Feature
+### When to use this feature
 
-You should use this feature if your log files or messages are not in UTF-8 and you are seeing garbled or incorrectly rendered characters.
+You should use this feature if your log files or messages aren't in UTF-8 and you are seeing garbled or incorrectly rendered characters.
 This is common in environments that use:
 
 - Modern Windows applications that log in UTF-16.
 
-- Legacy Windows systems with applications that use traditional code pages (e.g., ShiftJIS, GBK, Win1252).
+- Legacy Windows systems with applications that use traditional code pages (for example, ShiftJIS, GBK, Win1252).
 
-### Configuration Parameters
+### Configuration parameters
 
 To enable encoding conversion, you will use one of the following two parameters within an input plugin configuration.
 
 1. `Unicode.Encoding`
 
-Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.
+   Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient. 
 
-- Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
-- Supported Values:
-  - UTF-16LE (Little-Endian)
-  - UTF-16BE (Big-Endian)
+   - Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
+   - Supported Values:
+     - UTF-16LE (Little-Endian)
+     - UTF-16BE (Big-Endian)
 
 1. `Generic.Encoding`
 
-Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
+   Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
 
-- Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
-- Supported Values: You can use any of the names or aliases listed below.
+   - Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
+   - Supported Values: You can use any of the names or aliases listed below.
 
 ### East Asian Encodings
 
@@ -516,7 +516,7 @@ Use this parameter to convert from a wide variety of other character encodings,
 - `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
 - `Big5`: (Alias: `CP950`)
 
-### Windows (ANSI) Encodings
+### Windows (ANSI) encodings
 
 - `Win1250` (Central European): (Alias: `CP1250`)
 - `Win1251` (Cyrillic): (Alias: `CP1251`)
@@ -526,12 +526,12 @@ Use this parameter to convert from a wide variety of other character encodings,
 - `Win1255` (Hebrew): (Alias: `CP1255`)
 - `Win1256` (Arabic): (Alias: `CP1256`)
 
-### DOS (OEM) Encodings
+### DOS (OEM) encodings
 
 - `Win866` (Cyrillic - DOS): (Alias: `CP866`)
 - `Win874` (Thai): (Alias: `CP874`)
 
-### Configuration Example
+### Configuration example
 
 Here is an example of how to use `Generic.Encoding` with the Tail input plugin to read a log file encoded in ShiftJIS.