Skip to content

Make conversion of file URLs more consistent #13501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jul 23, 2025

urllib.request.pathname2url() and url2pathname() are fine in Python 3.14, awkward in latest 3.13 and 3.12, and buggy in previous versions. Until pip can drop support for Python 3.11, it's best not to use them.

In pip._internal.utils.urls:

  • Refactor path_to_url() -- it no longer calls pathname2url(), and gains an optional normalize_path argument that can be used to disable path normalization.
  • Refactor url_to_path() -- it no longer calls url2pathname(), and now accepts schemes ending with +file
  • Add clean_file_url() -- this function cleans up quoting and slashes in file: URLs

In pip._internal.models.link:

  • Refactor _clean_file_url_path() into _clean_file_url() -- the new function operates on the entire URL, not just its path.
  • Edit _clean_url_path() -- remove is_local_path argument
  • Edit _ensure_quoted_url() -- call _clean_file_url() for local URLs

In the tests:

  • Stop checking urlparse() and pathname2url() behaviour -- the differences no longer affect us
  • Fix a few incorrect expectations

`urllib.request.pathname2url()` and `url2pathname()` are:

- Fine in 3.14
- Awkward in latest 3.13 and 3.12
- Buggy in previous versions

In this patch we provide our own conversion functions (copy-pasted from the
Python 3.14 source tree), rather than calling `urllib.request` functions.
This allows us to remove workarounds for upstream bugs.
@notatallshaw
Copy link
Member

Hey Barney, thanks for raising these recent PRs. Just be aware all the active maintainers are supporting pip in their spare time, so it may take some time for someone to review these PRs you've raised. So please be don't disheartened by the lack of responses yet.

Pip 25.2 is just about to be released, the release manager for this cycle is @ichard26 so he will have to make a choice on what to pull in wrt to passing tests for Python 3.14, if anything. If it were me I would be tempted to merge the original PR that fixes path tests and then add yours in the 25.3 release cycle, but I won't be around to deal with 25.2 fallout so my opinion isn't too relevant.

@barneygale
Copy link
Contributor Author

No worries at all! I'm still playing around with possible solutions and whatnot (hence the draft PR) so no rush from my side :-)

I'm happy for less invasive solutions to land in the mean time.

@pfmoore
Copy link
Member

pfmoore commented Jul 23, 2025

If it were me I would be tempted to merge the original PR that fixes path tests and then add yours in the 25.3 release cycle

My instincts are the same. However, I still feel that the fact that the tests were failing in the first place indicates that the tests themselves are somehow incorrect, and I would like to understand what's going on there. But I don't have the free time right now to investigate, so my opinion shouldn't be counted for much either.

@barneygale barneygale changed the title Stop using pathname2url() and url2pathname() Make conversion of file URLs more consistent Jul 23, 2025
@barneygale barneygale marked this pull request as ready for review July 23, 2025 21:39
@barneygale
Copy link
Contributor Author

barneygale commented Jul 23, 2025

However, I still feel that the fact that the tests were failing in the first place indicates that the tests themselves are somehow incorrect, and I would like to understand what's going on there.

I think it's more than a test issue FWIW.

pip._internal.models.links._clean_file_url_path() accepts and returns a partial URL path component, but it calls url2pathname() and pathname2url() which accept and return scheme-less URLs. The difference didn't matter until 3.14, when pathname2url() began emitting URLs with three leading slashes rather than one when given a path with one leading slash.

IMHO the solution is to feed url2pathname() and pathname2url() the data they're expecting. That works fine in 3.14+, but in older Python versions it exposes us to bugs and missing features which need new workarounds. And the workarounds aren't trivial - I reckon it's actually simpler to implement the conversion ourselves in url_to_path() and path_to_url() than deal with them. That's the approach taken by this PR. That also lets us remove some existing workarounds - e.g. see the removed code in url_to_path() and tests/lib/__init__.py

@pfmoore
Copy link
Member

pfmoore commented Jul 23, 2025

Thanks for the analysis.

pip._internal.models.links._clean_file_url_path() accepts and returns a partial URL path component, but it calls url2pathname() and pathname2url() which accept and return scheme-less URLs.

Yep, that sounds like it's flat-out wrong, and we should fix that. As I say, I'm low on time right now, so I'm going to have to assume that's what your changes to this code are doing. (I wish I could do a proper review for you, but I simply don't have the time).

And the workarounds aren't trivial - I reckon it's actually simpler to implement the conversion ourselves in url_to_path() and path_to_url() than deal with them.

That sounds fair.

I really appreciate the time you've put into fixing this for us. I'm sorry I haven't had more time to do more than offer passing comments on the design.

@barneygale
Copy link
Contributor Author

No problem, thank you! I know it looks a lot like I broke url2pathname() and pathname2url() in 3.14, but I think (maybe it's more of a hope at this point!) that I fixed longstanding bugs that made the functions hard to use correctly. Unfortunately it seems I upset the delicate balance of pip bugs cancelling out cpython bugs...

@barneygale
Copy link
Contributor Author

3.14 unit test failures in this PR should be resolved by #13496

I'll mark this PR as a draft until the other is merged.

@barneygale barneygale marked this pull request as draft July 23, 2025 22:58
pytest.param(
"file:///T:/path/with spaces/",
"file://///T:/path/with%20spaces",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expectation was also incorrect - we expect a URL with three leading slashes, not five.

(This problem also occurs a couple more times further down.)

@barneygale barneygale marked this pull request as ready for review July 27, 2025 20:08
@barneygale
Copy link
Contributor Author

I can split this PR up into smaller PRs if it would make it easier to review - please let me know if that would be helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants