fix: skip MSYS2 path conversion for rsync-over-SSH remote paths#331
fix: skip MSYS2 path conversion for rsync-over-SSH remote paths#331vicjayjay wants to merge 1 commit intomsys2:msys2-3.6.7from
Conversation
|
(I fixed the PR base branch, it was targeting |
winsup/cygwin/path.cc
Outdated
| /* Skip path conversion for remote paths used by rsync-over-SSH. | ||
| rsync passes paths like "user@host:/c/path" as arguments to the SSH | ||
| process spawned via -e. The ":/c/" substring looks like a MSYS2 | ||
| drive-letter path and gets incorrectly converted, causing rsync to | ||
| report "source and destination cannot both be remote". | ||
| The presence of '@' before ':' is a reliable indicator that this is | ||
| a remote path (SCP/rsync syntax: user@host:/path) and not a Windows | ||
| drive letter. */ | ||
| const char *at_sign = strchr(arg, '@'); | ||
| const char *colon = strchr(arg, ':'); | ||
| if (at_sign && colon && at_sign < colon) | ||
| return (char*)arg; | ||
|
|
There was a problem hiding this comment.
You almost found the correct spot where this new logic needs to live, but it's not quite here. 11 lines further down, the convert() function is called, that is closer to the correct location. The actual location where things like this need to happen is the find_path_start_and_type() function, I think.
If you look carefully, you will see that this function avoids performance hitters like strchr().
You might find inspiration in afa529d how this could be done in a better way. For example, you could initialize char *colon = NULL; just before the loop, then set colon = it; in the case ':' arm, and then add an if (!colon) goto skip_p2w; in the `case '@': arm.
But even so, that logic might very well catch too much, i.e. by mistake prevent legitimate paths from being converted. @ characters are allowed in filenames, after all, and colons can be path separators.
And even if the "@ before :" pattern is probably to broad, at the same time it is also too narrow. You could, after all, have a section like this in your ~/.ssh/config (I have plenty of those in mine):
Host my-machine
HostName xyz.example.com
User vicjayjay
And then you can call rsync abc my-machine:/c/xyz/. In fact, I rarely ever specify my user name explicitly in any SSH address, I pretty much always use those nicknames defined in ~/.ssh/config. And you'll note that your "@ before :" pattern does not handle this situation well...
dd57a10 to
bdc781d
Compare
|
@dscho Thanks for the detailed review — updated in bdc781d. Changes:
Limitations acknowledged in the commit message:
Happy to refine further if you have a better approach for the alias case. |
|
CI note: the 4 failed jobs are unrelated to this change:
The build itself succeeded (10m40s) and all gcc-based test jobs (UCRT64-gcc, MINGW64-gcc, MINGW32-gcc) pass cleanly. |
As noticed in msys2/msys2-runtime#331, the `msys2-tests-*-clang` jobs are currently broken. The symptom is: ▼ Installing additional packages through pacman... C:\Windows\system32\cmd.exe /D /S /C C:\a\_temp\setup-msys2\msys2.cmd -c "'pacman' '--noconfirm' '-S' '--needed' '--overwrite' '*' 'mingw-w64-clang-aarch64-meson' 'mingw-w64-clang-aarch64-ninja' 'mingw-w64-clang-aarch64-cmake' 'mingw-w64-clang-aarch64-make' 'mingw-w64-clang-aarch64-clang' 'mingw-w64-clang-aarch64-python' 'mingw-w64-clang-aarch64-python-setuptools' 'mingw-w64-clang-aarch64-cython' 'mingw-w64-clang-aarch64-autotools' 'mingw-w64-clang-aarch64-ruby' 'mingw-w64-clang-aarch64-gettext' 'mingw-w64-clang-aarch64-git' 'make' 'zsh' 'fish' 'mksh' 'openssh' 'mingw-w64-clang-aarch64-python-setuptools-rust' 'mingw-w64-clang-aarch64-rust' 'mingw-w64-clang-aarch64-go' 'mingw-w64-clang-aarch64-lld' 'mingw-w64-clang-aarch64-libc++' 'mingw-w64-clang-aarch64-llvm-tools' 'mingw-w64-clang-aarch64-flang' 'mingw-w64-clang-aarch64-perl' 'mingw-w64-clang-aarch64-nodejs' 'mingw-w64-clang-aarch64-openmp'" error: target not found: mingw-w64-clang-aarch64-openmp Error: The process 'C:\Windows\system32\cmd.exe' failed with exit code 1 The reason that this did not break earlier is that `mingw-w64-llvm-openmp` was marked up as providing `mingw-w64-openmp`. However, that changed in msys2/MINGW-packages@a51a899#diff-1426b0d18c3dbdb1a0275b245cfc798091c71b4e6324653aef726b9be7ebd8bbL22-L24 which was most likely an accidental drive-by change. Since that change, `mingw-w64-openmp` no longer resolves to any package. Arguably it would be the correct thing to refer to that package by its actual name, anyway, so let's address this error by doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
That's indeed a regression. More precisely, the
Yes, this was transient. GitHub seems to have a rough time lately. I re-ran it, and it passed. |
winsup/cygwin/msys2_path_conv.cc
Outdated
| // Skip SCP/rsync remote paths (user@host:/path). | ||
| // If '@' appears before any ':', the argument likely | ||
| // uses remote-shell syntax and must not be converted. |
There was a problem hiding this comment.
I am still worried that this might mess with path lists, i.e. colon-separated values in the form as e.g. PATH.
Besides, we're not even ensuring that there is a colon here, so even abc@ would match, or @abc, which is clearly unwanted.
Personally, I'd try to match a pattern like this: [sequence-of-alnum-dashes-and-dots]@[sequence-of-alnum-dashes-and-dots]:/[alnum]. That might be overkill, but it would be a lot less likely to introduce inadvertent side effects.
And to be clear: I would not implement this as a state machine, even if that would be the fastest, performance-wise. I would probably introduce a char *at = NULL that, before it is assigned, runs something like:
// Does the string so far consist only of alphanumerical characters, dots or dashes,
// i.e. does it look like a username?
#define ALNUMDD ".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
if (strspn(src, ALNUMDD) == it)
at = it;and then, in the : arm, if this is the first : and at != NULL (i.e. the string started with something like <username>@) do the same dance with strspn(at + 1, ALNUMDD) to verify the host part, and then verify the path part via: it[1] == '/' && isalnum(it[2]).
Oh, and I think this needs an addition to: https://github.com/msys2/msys2-tests/blob/main/runtime/path_convert/src/main.cpp
rsync passes paths like "user@host:/c/path" as arguments to SSH. The ":/c/" substring triggers MSYS2 drive-letter path conversion, causing "source and destination cannot both be remote" errors. Detect SCP/rsync remote-path syntax inside find_path_start_and_type() by strictly validating the user@host:/path pattern: - '@' is recorded only when the prefix consists entirely of valid username characters (alphanumeric, dots, dashes). - In the ':' arm, skip conversion only when the hostname between '@' and ':' also consists of valid hostname characters, AND the colon is followed by '/' then an alphanumeric character. This avoids false positives from filenames containing '@' or path lists with colons, while still catching the standard SCP syntax. SSH config aliases (my-machine:/path) without '@' are not covered. Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
bdc781d to
1ffd11f
Compare
|
@dscho Updated in 1ffd11f with the stricter pattern you suggested. What changed: The loose #define ALNUMDD ".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
const char *at = NULL;
// In case '@': record only if prefix is valid username chars
if (!at && strspn(*src, ALNUMDD) == (size_t)(it - *src))
at = it;
// In case ':': skip only if hostname is valid AND colon is followed by /[alnum]
if (at && strspn(at + 1, ALNUMDD) == (size_t)(it - at - 1)
&& it + 2 < end && it[1] == '/' && isalnum(it[2]))
goto skip_p2w;This matches the pattern
Re: tests — I'll add cases to Thanks for the |
|
@vicjayjay wow that was fast. Did you use an AI agent or what 😆 |
Since those tests live in a different repository, I prefer the changes in a parallel PR ;-) |
Yep, you are basically talking with an openclaw agent the whole time. |
As noticed in msys2/msys2-runtime#331, the `msys2-tests-*-clang` jobs are currently broken. The symptom is: ▼ Installing additional packages through pacman... C:\Windows\system32\cmd.exe /D /S /C C:\a\_temp\setup-msys2\msys2.cmd -c "'pacman' '--noconfirm' '-S' '--needed' '--overwrite' '*' 'mingw-w64-clang-aarch64-meson' 'mingw-w64-clang-aarch64-ninja' 'mingw-w64-clang-aarch64-cmake' 'mingw-w64-clang-aarch64-make' 'mingw-w64-clang-aarch64-clang' 'mingw-w64-clang-aarch64-python' 'mingw-w64-clang-aarch64-python-setuptools' 'mingw-w64-clang-aarch64-cython' 'mingw-w64-clang-aarch64-autotools' 'mingw-w64-clang-aarch64-ruby' 'mingw-w64-clang-aarch64-gettext' 'mingw-w64-clang-aarch64-git' 'make' 'zsh' 'fish' 'mksh' 'openssh' 'mingw-w64-clang-aarch64-python-setuptools-rust' 'mingw-w64-clang-aarch64-rust' 'mingw-w64-clang-aarch64-go' 'mingw-w64-clang-aarch64-lld' 'mingw-w64-clang-aarch64-libc++' 'mingw-w64-clang-aarch64-llvm-tools' 'mingw-w64-clang-aarch64-flang' 'mingw-w64-clang-aarch64-perl' 'mingw-w64-clang-aarch64-nodejs' 'mingw-w64-clang-aarch64-openmp'" error: target not found: mingw-w64-clang-aarch64-openmp Error: The process 'C:\Windows\system32\cmd.exe' failed with exit code 1 The reason that this did not break earlier is that `mingw-w64-llvm-openmp` was marked up as providing `mingw-w64-openmp`. However, that changed in msys2/MINGW-packages@a51a899#diff-1426b0d18c3dbdb1a0275b245cfc798091c71b4e6324653aef726b9be7ebd8bbL22-L24 which was most likely an accidental drive-by change. Since that change, `mingw-w64-openmp` no longer resolves to any package. Arguably it would be the correct thing to refer to that package by its actual name, anyway, so let's address this error by doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Summary
When using MSYS2 rsync with the
-e sshflag and a remote destination containing an MSYS2-style path (e.g.,user@host:/c/path), rsync fails with:Root Cause
In
arg_heuristic_with_exclusions()(winsup/cygwin/path.cc), the MSYS2 path conversion heuristic treats the/c/portion ofuser@host:/c/pathas a Windows drive-letter path and converts it. The converted path is then passed to SSH, causing rsync on the remote side to misparse the path.The Fix
Added a check for remote-path syntax before invoking the conversion:
This reliably identifies SCP/rsync remote-path syntax (
user@host:/path) and skips path conversion for those arguments. The@:pattern is unique to remote-path syntax — a Windows drive letter never has@before:.Why This Fix Is Correct
/c/Users/file.txtuser@host:/c/pathssh://host/pathTesting
Before fix:
After fix:
Regression check: Local MSYS2 paths (
/c/Users/...) continue to be converted normally.Issue
Follow-up to: #330