fix(grpc): percent-decode target URI path and reject non-UTF-8 characters by arjan-bal · Pull Request #2677 · grpc/grpc-rust

arjan-bal · 2026-06-05T11:12:16Z

This PR ensures that non-UTF-8 strings in the target URI path are rejected.

Key Changes:

Target::from_str: Updated to fail for targets with non-UTF-8 symbols in the percent-decoded path.
Target::path(): Modified to return the percent-decoded path.

ejona86 · 2026-06-08T18:35:47Z

This seems weird to me. We should be assuming UTF-8-encoded URIs in general, so I could understand adding a way to get the un-decoded path or a path-as-bytes. And then only this code would do something different.

But I'd expect we'd decode URIs as UTF-8 in general, as the encoding of the URI can be different than the encoding of the on-disk path. For example, if the system is using the Latin-1 codepage (LANG=en_US.ISO-8859-1, not the now-typical LANG=en_US.UTF-8), then we'd decode the URI as UTF-8 and then (someone would) convert that to Latin-1 when opening the socket. Historically that was libc that would do that conversion. I see Java do that conversion. It looks like Go assumes the Linux system uses utf-8, which is actually fair these days. Go probably converts the UTF-8 to UTF-16 for Windows.

RFC 8089 encourages UTF-8 as we do represent them as strings in every other language to my knowledge. https://encoding.spec.whatwg.org/#encoding assumes Unicode, and we'd definitely be using UTF-8.

So, yeah... While it is true that the files don't even need to be any encoding at all (you can store raw binary, except for \0 and /), and rust allows you to do that, I don't think we want to deal with that in general.

arjan-bal · 2026-06-08T18:50:14Z

@ejona86, to clarify, is the recommendation that gRPC should validate that the target URI contains only valid UTF-8 in the percent-decoded path?

ejona86 · 2026-06-08T20:37:38Z

Yeah, we should validate it. But either PercentDecode.decode_utf8() or decode_utf8_lossy() might work. Java uses a replacement character.. Go ~~fails URL parsing~~ doesn't validate. Dunno what C++ does.

Edit: C++ doesn't validate. But C++ doesn't care if you store raw binary in strings, and Go is tolerant to invalid UTF-8 in strings. Java doesn't allow that, and neither does Rust to my knowledge.

arjan-bal · 2026-06-09T06:25:02Z

I've reworked the approach to use PercentDecode.decode_utf8() during target URL parsing, which will now fail if it encounters non-UTF-8 characters. I prefer this to using replacement characters because it prevents unexpected behaviour and fails fast to ensures users are immediately aware of the invalid target so they can fix it.

arjan-bal added 3 commits June 5, 2026 00:21

allow non-utf8 addresses in Address struct

054f074

precent decode path in target URL

dea8d57

polish

4cb1693

arjan-bal added the A-grpc-next label Jun 5, 2026

arjan-bal added 3 commits June 5, 2026 17:10

fix non-unix build

991204a

fix darwin test

060c345

Merge remote-tracking branch 'source/master' into non-utf-strings

39c081d

arjan-bal requested a review from dfawley June 5, 2026 17:44

arjan-bal assigned dfawley Jun 5, 2026

arjan-bal changed the title ~~fix(grpc): Fix handling of non-UTF-8 path strings in target URI~~ fix(grpc): Fix handling of non-UTF-8 strings in target URI path Jun 5, 2026

arjan-bal assigned ejona86 Jun 8, 2026

arjan-bal requested a review from ejona86 June 8, 2026 05:10

arjan-bal added 2 commits June 8, 2026 17:48

add minor version to dep

0ffbba6

Merge remote-tracking branch 'source/master' into non-utf-strings

3194351

arjan-bal mentioned this pull request Jun 8, 2026

feat(grpc): Add resolver wrapper to handle HTTP CONNECT proxy configuration #2679

Open

ejona86 closed this Jun 8, 2026

ejona86 reopened this Jun 8, 2026

arjan-bal assigned arjan-bal and unassigned ejona86 and dfawley Jun 9, 2026

disallow non-UTF-8 symbols in path

5cc591e

arjan-bal force-pushed the non-utf-strings branch from c93ae91 to 5cc591e Compare June 9, 2026 06:13

arjan-bal changed the title ~~fix(grpc): Fix handling of non-UTF-8 strings in target URI path~~ fix(grpc): percent-decode target URI path and reject non-UTF-8 characters Jun 9, 2026

arjan-bal assigned ejona86 and unassigned arjan-bal Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(grpc): percent-decode target URI path and reject non-UTF-8 characters#2677

fix(grpc): percent-decode target URI path and reject non-UTF-8 characters#2677
arjan-bal wants to merge 9 commits into
grpc:masterfrom
arjan-bal:non-utf-strings

arjan-bal commented Jun 5, 2026 •

edited

Loading

Uh oh!

ejona86 commented Jun 8, 2026

Uh oh!

arjan-bal commented Jun 8, 2026

Uh oh!

ejona86 commented Jun 8, 2026 •

edited

Loading

Uh oh!

arjan-bal commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arjan-bal commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes:

Uh oh!

ejona86 commented Jun 8, 2026

Uh oh!

arjan-bal commented Jun 8, 2026

Uh oh!

ejona86 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjan-bal commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arjan-bal commented Jun 5, 2026 •

edited

Loading

ejona86 commented Jun 8, 2026 •

edited

Loading

arjan-bal commented Jun 9, 2026 •

edited

Loading