-
Notifications
You must be signed in to change notification settings - Fork 89
fix: handle UTF-16LE encoding errors in Rack request parsing #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix: handle UTF-16LE encoding errors in Rack request parsing #2032
Conversation
Fixes production Encoding::CompatibilityError in rack/query_parser.rb
📝 WalkthroughWalkthroughAdds a new Rails middleware, EncodingSanitizer, that sanitizes request environment strings and wraps Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant EncodingSanitizer as "EncodingSanitizer\n(middleware)"
participant ActionDispatch as "ActionDispatch::Static"
participant MethodOverride as "Rack::MethodOverride"
participant RackInput as "rack.input\n(SanitizedInput)"
participant App
Client->>EncodingSanitizer: HTTP request (env, raw body)
note right of EncodingSanitizer: sanitize env keys\nwrap env['rack.input'] -> SanitizedInput
EncodingSanitizer->>ActionDispatch: forward sanitized env
ActionDispatch->>MethodOverride: forward env
MethodOverride->>RackInput: read POST body (read/each/gets)
RackInput-->>MethodOverride: sanitized UTF-8 chunks
MethodOverride->>App: forward parsed request
App-->>Client: HTTP response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @config/initializers/encoding_sanitizer.rb:
- Around line 36-42: In the force_utf8 method, remove the redundant call to
force_encoding(Encoding::UTF_8) on the success path since
encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "") already
returns a UTF-8 string; keep the rescue branch as-is
(value.dup.force_encoding(Encoding::UTF_8).scrub("")) to handle encoding errors.
Ensure you only delete the trailing .force_encoding(Encoding::UTF_8) after the
encode call in force_utf8 and run tests to confirm behavior remains correct.
🧹 Nitpick comments (4)
config/initializers/encoding_sanitizer.rb (2)
11-23: Consider adding error handling around sanitization logic.The middleware performs encoding operations without rescue blocks in the
callmethod. If an unexpected encoding error occurs during sanitization that isn't caught by the rescue clauses inforce_utf8, it could cause the middleware to raise an exception and crash the request. Consider wrapping the sanitization logic in a top-level rescue block to ensure the middleware degrades gracefully.🔎 Proposed defensive error handling
def call(env) + begin # Sanitize URL-related env vars %w[QUERY_STRING REQUEST_URI PATH_INFO HTTP_REFERER].each do |key| sanitize_encoding(env, key) end # Wrap rack.input to sanitize POST body if env["rack.input"] env["rack.input"] = SanitizedInput.new(env["rack.input"]) end + rescue => e + # Log error but don't crash the request + Rails.logger.error("EncodingSanitizer error: #{e.message}") + end @app.call(env) end
45-75: Consider implementing method delegation for complete rack.input compatibility.The
SanitizedInputwrapper implementsread,gets,each,rewind, andclose, butrack.inputmay have additional methods likesize,pos,pos=,eof?,string, etc. that downstream Rack middleware or parsers might expect. Missing these could causeNoMethodErrorexceptions.🔎 Proposed enhancement using SimpleDelegator
+require 'delegate' + # Wrapper for rack.input that sanitizes encoding on read -class SanitizedInput +class SanitizedInput < SimpleDelegator def initialize(input) - @input = input + super(input) end def read(*args) - data = @input.read(*args) + data = __getobj__.read(*args) return data unless data.is_a?(String) sanitize(data) end def gets(*args) - data = @input.gets(*args) + data = __getobj__.gets(*args) return data unless data.is_a?(String) sanitize(data) end def each(&block) - @input.each { |line| block.call(sanitize(line)) } - end - - def rewind - @input.rewind - end - - def close - @input.close if @input.respond_to?(:close) + __getobj__.each { |line| block.call(sanitize(line)) } endThis ensures all other methods are delegated automatically.
spec/middleware/encoding_sanitizer_spec.rb (2)
25-41: Verify content preservation after encoding conversion.The test confirms that UTF-16LE is converted to valid UTF-8, but doesn't verify that the actual content is preserved correctly. Consider adding an assertion to check the decoded value matches the original semantic content.
🔎 Proposed enhancement
it "converts to valid UTF-8" do # Simulate UTF-16LE encoded string utf16_string = "test=value".encode(Encoding::UTF_16LE) env = { "QUERY_STRING" => utf16_string, "REQUEST_URI" => "/test", "PATH_INFO" => "/test" } status, response_env, _body = middleware.call(env) expect(status).to eq(200) expect(response_env["QUERY_STRING"].encoding).to eq(Encoding::UTF_8) expect(response_env["QUERY_STRING"]).to be_valid_encoding + expect(response_env["QUERY_STRING"]).to include("test") + expect(response_env["QUERY_STRING"]).to include("value") end
73-132: Add test coverage forgetsandclosemethods.The test suite covers
read,rewind, andeach, but theSanitizedInputclass also implementsgetsandclosemethods that lack test coverage. Adding tests for these methods would ensure complete coverage of the wrapper's behavior.🔎 Proposed additional tests
describe "#gets" do it "sanitizes line-by-line reads" do body = "line1\nline2" input = StringIO.new(body) sanitized = described_class.new(input) first_line = sanitized.gets expect(first_line).to be_valid_encoding expect(first_line).to eq("line1\n") end end describe "#close" do it "delegates close to underlying input" do input = StringIO.new("test") sanitized = described_class.new(input) expect(input).to receive(:close) sanitized.close end end
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
config/initializers/encoding_sanitizer.rbspec/middleware/encoding_sanitizer_spec.rb
🧰 Additional context used
🧬 Code graph analysis (1)
spec/middleware/encoding_sanitizer_spec.rb (1)
config/initializers/encoding_sanitizer.rb (4)
call(11-23)read(50-55)rewind(68-70)each(64-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Running tests
🔇 Additional comments (7)
config/initializers/encoding_sanitizer.rb (4)
1-9: LGTM!Standard middleware initialization pattern with clear documentation.
27-34: LGTM!Efficient early returns avoid unnecessary sanitization for already-valid UTF-8 strings.
90-93: LGTM!Correct insertion point ensures encoding is sanitized before downstream middleware parses the request.
78-86: This inconsistency is by design and not a bug.The two methods use different encoding strategies intentionally based on their input sources. The
force_utf8method handles URL-related environment variables that are already decoded as strings by the web server, while thesanitizemethod handles raw bytes fromrack.input. The ASCII_8BIT intermediate step insanitizeis a standard Ruby pattern for safely handling unknown or mixed binary data from the network, and this approach is correctly applied to the riskier input type. Both methods share identical error handling with fallback toforce_encoding(Encoding::UTF_8).scrub(""). The comment on line 81 explicitly documents this intentional strategy: "Force to binary first, then encode to UTF-8."Likely an incorrect or invalid review comment.
spec/middleware/encoding_sanitizer_spec.rb (3)
1-23: LGTM!Well-structured test setup. The mock app returning
envas the response body enables easy verification of sanitization effects.
43-71: LGTM!Good coverage of edge cases including invalid byte sequences and nil values. These tests ensure robustness in production scenarios.
134-150: LGTM! Middleware ordering is correctly verified.These tests ensure the middleware is positioned correctly to sanitize encoding before other middleware parses the request. Note that these tests depend on the presence of
ActionDispatch::StaticandRack::MethodOverridein the middleware stack.
…ncodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8
…ncodingcompatibilityerror-incompatible-character-encodings-utf-16le-and-utf-8
- Add error handling in call method to prevent request crashes - Refactor SanitizedInput to use SimpleDelegator for better compatibility - Add test coverage for gets and close methods - Verify content preservation in UTF-16LE encoding test - Ensure all rack.input methods are properly delegated All 13 tests passing with no diagnostics.
Fixes production Encoding::CompatibilityError in rack/query_parser.rb
Closes #2031
Summary by CodeRabbit
Bug Fixes
Tests