fix(grpc): percent-decode target URI path and reject non-UTF-8 characters#2677
fix(grpc): percent-decode target URI path and reject non-UTF-8 characters#2677arjan-bal wants to merge 9 commits into
Conversation
|
This seems weird to me. We should be assuming UTF-8-encoded URIs in general, so I could understand adding a way to get the un-decoded path or a path-as-bytes. And then only this code would do something different. But I'd expect we'd decode URIs as UTF-8 in general, as the encoding of the URI can be different than the encoding of the on-disk path. For example, if the system is using the Latin-1 codepage ( RFC 8089 encourages UTF-8 as we do represent them as strings in every other language to my knowledge. https://encoding.spec.whatwg.org/#encoding assumes Unicode, and we'd definitely be using UTF-8. So, yeah... While it is true that the files don't even need to be any encoding at all (you can store raw binary, except for |
|
@ejona86, to clarify, is the recommendation that gRPC should validate that the target URI contains only valid UTF-8 in the percent-decoded path? |
|
Yeah, we should validate it. But either PercentDecode.decode_utf8() or decode_utf8_lossy() might work. Java uses a replacement character.. Go Edit: C++ doesn't validate. But C++ doesn't care if you store raw binary in strings, and Go is tolerant to invalid UTF-8 in strings. Java doesn't allow that, and neither does Rust to my knowledge. |
c93ae91 to
5cc591e
Compare
|
I've reworked the approach to use |
This PR ensures that non-UTF-8 strings in the target URI path are rejected.
Key Changes:
Target::from_str: Updated to fail for targets with non-UTF-8 symbols in the percent-decoded path.Target::path(): Modified to return the percent-decoded path.