fix: handle UTF-16LE encoding errors in Rack request parsing #2032

amolmjoshi93 · 2026-01-05T12:48:21Z

Fixes production Encoding::CompatibilityError in rack/query_parser.rb
Closes #2031

Summary by CodeRabbit

Bug Fixes
- Prevented encoding errors from malformed or non‑UTF‑8 request data by adding request sanitization early in request processing; query strings and request bodies are now normalized to valid UTF‑8 to avoid encoding-related failures.
Tests
- Added comprehensive tests for encoding sanitization covering multiple encodings, invalid byte sequences, and request-body/stream behaviors.

Fixes production Encoding::CompatibilityError in rack/query_parser.rb

coderabbitai · 2026-01-05T12:48:32Z

📝 Walkthrough

Walkthrough

Adds a new Rails middleware, EncodingSanitizer, that sanitizes request environment strings and wraps rack.input to coerce or scrub non-UTF-8 data into valid UTF-8 before downstream middlewares run.

Changes

Cohort / File(s)	Summary
EncodingSanitizer Middleware `config/initializers/encoding_sanitizer.rb`	New `EncodingSanitizer` middleware with `initialize(app)` and `call(env)`. Sanitizes `QUERY_STRING`, `REQUEST_URI`, `PATH_INFO`, `HTTP_REFERER`; wraps `env['rack.input']` with `SanitizedInput` (subclass of `SimpleDelegator`) that ensures `read`, `gets`, and `each` yield UTF-8-safe data. Inserted before `ActionDispatch::Static` (runs prior to `Rack::MethodOverride`).
Middleware Test Suite `spec/middleware/encoding_sanitizer_spec.rb`	New RSpec coverage for query string and POST body sanitization (UTF-8 pass-through, UTF-16LE conversion, invalid byte handling, nil envs), `SanitizedInput` behaviors (`read`, `gets`, `each`, `rewind`, `close`), and middleware ordering assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant EncodingSanitizer as "EncodingSanitizer\n(middleware)"
  participant ActionDispatch as "ActionDispatch::Static"
  participant MethodOverride as "Rack::MethodOverride"
  participant RackInput as "rack.input\n(SanitizedInput)"
  participant App

  Client->>EncodingSanitizer: HTTP request (env, raw body)
  note right of EncodingSanitizer: sanitize env keys\nwrap env['rack.input'] -> SanitizedInput
  EncodingSanitizer->>ActionDispatch: forward sanitized env
  ActionDispatch->>MethodOverride: forward env
  MethodOverride->>RackInput: read POST body (read/each/gets)
  RackInput-->>MethodOverride: sanitized UTF-8 chunks
  MethodOverride->>App: forward parsed request
  App-->>Client: HTTP response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through bytes both odd and small,
Turn strange encodings tidy and all,
Replace the broken, mend each line,
Make UTF‑8 neat, one nibble at a time,
Now requests sleep safe in my warren of code.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding encoding error handling for UTF-16LE in Rack request parsing via a middleware.
Linked Issues check	✅ Passed	All requirements from issue `#2031` are met: EncodingSanitizer middleware sanitizes QUERY_STRING, REQUEST_URI, PATH_INFO, HTTP_REFERER and wraps rack.input to handle UTF-16LE data, converting to valid UTF-8.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to addressing issue `#2031`: the middleware implementation and comprehensive test coverage with no unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 2031-rack-throws-encodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @config/initializers/encoding_sanitizer.rb:
- Around line 36-42: In the force_utf8 method, remove the redundant call to
force_encoding(Encoding::UTF_8) on the success path since
encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "") already
returns a UTF-8 string; keep the rescue branch as-is
(value.dup.force_encoding(Encoding::UTF_8).scrub("")) to handle encoding errors.
Ensure you only delete the trailing .force_encoding(Encoding::UTF_8) after the
encode call in force_utf8 and run tests to confirm behavior remains correct.

🧹 Nitpick comments (4)

config/initializers/encoding_sanitizer.rb (2)
11-23: Consider adding error handling around sanitization logic.

The middleware performs encoding operations without rescue blocks in the call method. If an unexpected encoding error occurs during sanitization that isn't caught by the rescue clauses in force_utf8, it could cause the middleware to raise an exception and crash the request. Consider wrapping the sanitization logic in a top-level rescue block to ensure the middleware degrades gracefully.
🔎 Proposed defensive error handling
 def call(env)
+  begin
     # Sanitize URL-related env vars
     %w[QUERY_STRING REQUEST_URI PATH_INFO HTTP_REFERER].each do |key|
       sanitize_encoding(env, key)
     end
 
     # Wrap rack.input to sanitize POST body
     if env["rack.input"]
       env["rack.input"] = SanitizedInput.new(env["rack.input"])
     end
+  rescue => e
+    # Log error but don't crash the request
+    Rails.logger.error("EncodingSanitizer error: #{e.message}")
+  end
 
   @app.call(env)
 end
45-75: Consider implementing method delegation for complete rack.input compatibility.

The SanitizedInput wrapper implements read, gets, each, rewind, and close, but rack.input may have additional methods like size, pos, pos=, eof?, string, etc. that downstream Rack middleware or parsers might expect. Missing these could cause NoMethodError exceptions.
🔎 Proposed enhancement using SimpleDelegator
+require 'delegate'
+
 # Wrapper for rack.input that sanitizes encoding on read
-class SanitizedInput
+class SanitizedInput < SimpleDelegator
   def initialize(input)
-    @input = input
+    super(input)
   end
 
   def read(*args)
-    data = @input.read(*args)
+    data = __getobj__.read(*args)
     return data unless data.is_a?(String)
 
     sanitize(data)
   end
 
   def gets(*args)
-    data = @input.gets(*args)
+    data = __getobj__.gets(*args)
     return data unless data.is_a?(String)
 
     sanitize(data)
   end
 
   def each(&block)
-    @input.each { |line| block.call(sanitize(line)) }
-  end
-
-  def rewind
-    @input.rewind
-  end
-
-  def close
-    @input.close if @input.respond_to?(:close)
+    __getobj__.each { |line| block.call(sanitize(line)) }
   end
This ensures all other methods are delegated automatically.
spec/middleware/encoding_sanitizer_spec.rb (2)
25-41: Verify content preservation after encoding conversion.

The test confirms that UTF-16LE is converted to valid UTF-8, but doesn't verify that the actual content is preserved correctly. Consider adding an assertion to check the decoded value matches the original semantic content.
🔎 Proposed enhancement
 it "converts to valid UTF-8" do
   # Simulate UTF-16LE encoded string
   utf16_string = "test=value".encode(Encoding::UTF_16LE)
   env = {
     "QUERY_STRING" => utf16_string,
     "REQUEST_URI" => "/test",
     "PATH_INFO" => "/test"
   }
 
   status, response_env, _body = middleware.call(env)
 
   expect(status).to eq(200)
   expect(response_env["QUERY_STRING"].encoding).to eq(Encoding::UTF_8)
   expect(response_env["QUERY_STRING"]).to be_valid_encoding
+  expect(response_env["QUERY_STRING"]).to include("test")
+  expect(response_env["QUERY_STRING"]).to include("value")
 end
73-132: Add test coverage for gets and close methods.

The test suite covers read, rewind, and each, but the SanitizedInput class also implements gets and close methods that lack test coverage. Adding tests for these methods would ensure complete coverage of the wrapper's behavior.
🔎 Proposed additional tests
describe "#gets" do
  it "sanitizes line-by-line reads" do
    body = "line1\nline2"
    input = StringIO.new(body)
    sanitized = described_class.new(input)

    first_line = sanitized.gets
    expect(first_line).to be_valid_encoding
    expect(first_line).to eq("line1\n")
  end
end

describe "#close" do
  it "delegates close to underlying input" do
    input = StringIO.new("test")
    sanitized = described_class.new(input)

    expect(input).to receive(:close)
    sanitized.close
  end
end

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bc587a0 and 92a6223.

📒 Files selected for processing (2)

config/initializers/encoding_sanitizer.rb
spec/middleware/encoding_sanitizer_spec.rb

🧰 Additional context used

🧬 Code graph analysis (1)

spec/middleware/encoding_sanitizer_spec.rb (1)

config/initializers/encoding_sanitizer.rb (4)

call (11-23)

read (50-55)

rewind (68-70)

each (64-66)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Running tests

🔇 Additional comments (7)

config/initializers/encoding_sanitizer.rb (4)

1-9: LGTM!

Standard middleware initialization pattern with clear documentation.

27-34: LGTM!

Efficient early returns avoid unnecessary sanitization for already-valid UTF-8 strings.

90-93: LGTM!

Correct insertion point ensures encoding is sanitized before downstream middleware parses the request.

78-86: This inconsistency is by design and not a bug.

The two methods use different encoding strategies intentionally based on their input sources. The force_utf8 method handles URL-related environment variables that are already decoded as strings by the web server, while the sanitize method handles raw bytes from rack.input. The ASCII_8BIT intermediate step in sanitize is a standard Ruby pattern for safely handling unknown or mixed binary data from the network, and this approach is correctly applied to the riskier input type. Both methods share identical error handling with fallback to force_encoding(Encoding::UTF_8).scrub(""). The comment on line 81 explicitly documents this intentional strategy: "Force to binary first, then encode to UTF-8."

Likely an incorrect or invalid review comment.

spec/middleware/encoding_sanitizer_spec.rb (3)

1-23: LGTM!

Well-structured test setup. The mock app returning env as the response body enables easy verification of sanitization effects.

43-71: LGTM!

Good coverage of edge cases including invalid byte sequences and nil values. These tests ensure robustness in production scenarios.

134-150: LGTM! Middleware ordering is correctly verified.

These tests ensure the middleware is positioned correctly to sanitize encoding before other middleware parses the request. Note that these tests depend on the presence of ActionDispatch::Static and Rack::MethodOverride in the middleware stack.

config/initializers/encoding_sanitizer.rb

…ncodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8

- Add error handling in call method to prevent request crashes - Refactor SanitizedInput to use SimpleDelegator for better compatibility - Add test coverage for gets and close methods - Verify content preservation in UTF-16LE encoding test - Ensure all rack.input methods are properly delegated All 13 tests passing with no diagnostics.

fix: handle UTF-16LE encoding errors in Rack request parsing

92a6223

Fixes production Encoding::CompatibilityError in rack/query_parser.rb

amolmjoshi93 linked an issue Jan 5, 2026 that may be closed by this pull request

Rack throws `Encoding::CompatibilityError: incompatible character encodings: UTF-16LE and UTF-8 #2031

Open

amolmjoshi93 requested a review from nisusam January 5, 2026 12:48

amolmjoshi93 assigned nisusam Jan 5, 2026

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

config/initializers/encoding_sanitizer.rb Show resolved Hide resolved

amolmjoshi93 added 4 commits January 21, 2026 20:13

Merge remote-tracking branch 'origin/develop' into 2031-rack-throws-e…

43dd7eb

…ncodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8

refactor: Removed redundant force_encoding call.

2ef7bb6

Merge remote-tracking branch 'origin/develop' into 2031-rack-throws-e…

bf5f49f

…ncodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle UTF-16LE encoding errors in Rack request parsing #2032

fix: handle UTF-16LE encoding errors in Rack request parsing #2032

Uh oh!

amolmjoshi93 commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: handle UTF-16LE encoding errors in Rack request parsing #2032

Are you sure you want to change the base?

fix: handle UTF-16LE encoding errors in Rack request parsing #2032

Uh oh!

Conversation

amolmjoshi93 commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amolmjoshi93 commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 5, 2026 •

edited

Loading