Skip to content

Conversation

me-and
Copy link
Contributor

@me-and me-and commented Jan 9, 2023

To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.

This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).

To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum for speed, since we're just checking for unexpected changes in the cache, and we don't need cryptographic security.

I was originally working on this for my own builds, e.g. at https://github.com/me-and/Cygwin-inih/blob/fb7534276feeb2a5217df5013c679a370ee6013e/.github/workflows/ci.yml, but it occurred to me that this might be more useful here than in my own repos.

To speed up Cygwin installations, allow users to store the downloaded
packages using GitHub's caching, rather than needing the packages to be
downloaded from the Cygwin mirrors on every run.

This speeds up the installation when there's a good cache hit, at the
expense of adding some additional complexity, and using the limited
cache space (although with 10GB assigned to every repository on GitHub,
I can't imagine most folk are getting close to the limit!).

To avoid unnecessary cache churn, the new steps check if the cache has
actually meaningfully changed before storing a new cache.  This uses
b2sum for speed, since we're just checking for unexpected changes in the
cache, and we don't need cryptographic security.
@me-and me-and marked this pull request as ready for review January 9, 2023 23:47
README.md Outdated
| site | http://mirrors.kernel.org/sourceware/cygwin/ | Mirror site to install from
| check-sig | true | Whether to check the setup.ini signature
| add-to-path | true | Whether to add Cygwin's `/bin` directory to the system `PATH`
| package-cache-key | '' | The key to use for caching downloaded packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind making this a string, rather than a boolean? I think that the cache-keys are scoped to the repository, so the only possible collision is with other cache action uses in the same workflow?

It would be nice to be able to make this on by default in future, without the user having to choose a key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store of caches is per-repository, and the actual restore behaviour is scoped even more tightly than that, with restrictions about what caches are allowed to be restored on what branch. But yes, this is something I debated; I went with letting the user decide mostly so they wouldn't be surprised by caches appearing with unexpected names.

Having been playing around with this a bit more, though, I think I'm going to suggest a different approach: hard-code the cache key name, but make the behaviour configurable with a cache-behaviour setting or similar, that takes values of disabled (get current cache-free behaviour), enabled (full new behaviour), or saveonly or restoreonly to do the obvious things. The latter are useful when – as with a lot of my cygport jobs – you need to run the action multiple times in the repository.

(As we discussed in #5, the ideal solution might be to swap from a composite action to a JS one, since that would allow post-run steps, which could be responsible for saving the cache after it has been used several times, but that's obviously a much bigger job!)

@jon-turney
Copy link
Member

Thanks! This is awesome!

I vagely recall trying to do this initially but running into some problem, so thanks for adding this!

To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.

Some numbers showing that restoring the cache is actually typically faster than fetching the packages again would help give me confidence that this is a good change.

This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).

Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades...

I don't think that's blocking for this PR, since, yes, the likelhood of hitting the cache limit is small, but something you should be aware of.

To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum.

Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.

@me-and
Copy link
Contributor Author

me-and commented Jan 10, 2023

Thanks! This is awesome!

I vagely recall trying to do this initially but running into some problem, so thanks for adding this!

np! This is about my fifth attempt at getting it working without getting stuck and giving up in disgust…

To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.

Some numbers showing that restoring the cache is actually typically faster than fetching the packages again would help give me confidence that this is a good change.

Yep, very fair. I have some rough anecdata that says it helps, but I've not really stress tested it to prove it makes a difference for installations that are big enough for it to be substantial. I should be able to have a go at that shortly :)

This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).

Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades...

I don't think that's blocking for this PR, since, yes, the likelhood of hitting the cache limit is small, but something you should be aware of.

I had been thinking about that. GitHub evicts caches that aren't used for 7+ days, but if a repo were getting rebuilds more regularly than that, the cache would just keep getting larger and larger. I think that's a good reason for it to not be default behaviour for this action (yet), and I should probably note it explicitly in the documentation, but I think it mostly means that anyone using this action might need to manually flush the cache every so often, to force a new smaller cache to be generated.

To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum.

Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.

They're all provided as part of the Git Bash / Git for Windows installation on the GitHub Windows runners. I've not actually checked how this works for self-hosted runners, and I could believe it's not guaranteed to work there. I'll add some notes to that effect…

Add an option to restore a cache without creating a new one, or to
create a cache without restoring an old one.  This is useful in the
scenario where the action is called multiple times in a run, as it
allows the cache to only be restored on the first call, and for a new
cache to only be written on the last call.

This option replaces the `package-cache-key` option; as discussed in cygwin#6
that option isn't very useful.

Additionally, add some more scopes to the cache name.  Without this
change, a second attempt to create a cache in the same action run (e.g.
because one step installs some packages, then another step installs some
more) will fail because of the cache name collision.

Also update the README and tests, although the tests are only getting a
very quick update for this function, as I'm going to do a much more
comprehensive test update separately.
@me-and me-and marked this pull request as draft January 10, 2023 22:46
@me-and me-and force-pushed the cache branch 3 times, most recently from c9be7a5 to 79832e3 Compare January 10, 2023 22:53
This should hopefully show how much of a difference, if any, caching
packages makes.  There's no automated comparison here, the test just
runs one install without a cache, then a second install with a cache,
then a human needs to look at what difference it makes.
me-and added a commit to me-and/cygwin-install-action that referenced this pull request Jan 11, 2023
Add an option to restore a cache without creating a new one, or to
create a cache without restoring an old one.  This is useful in the
scenario where the action is called multiple times in a run, as it
allows the cache to only be restored on the first call, and for a new
cache to only be written on the last call.

This option replaces the `package-cache-key` option; as discussed in cygwin#6
that option isn't very useful.

Additionally, add some more scopes to the cache name.  Without this
change, a second attempt to create a cache in the same action run (e.g.
because one step installs some packages, then another step installs some
more) will fail because of the cache name collision.

Also update the README and tests, although the tests are only getting a
very quick update for this function, as I'm going to do a much more
comprehensive test update separately.
me-and added a commit to me-and/cygwin-install-action that referenced this pull request Jan 11, 2023
Add an option to restore a cache without creating a new one, or to
create a cache without restoring an old one.  This is useful in the
scenario where the action is called multiple times in a run, as it
allows the cache to only be restored on the first call, and for a new
cache to only be written on the last call.

This option replaces the `package-cache-key` option; as discussed in cygwin#6
that option isn't very useful.

Additionally, add some more scopes to the cache name.  Without this
change, a second attempt to create a cache in the same action run (e.g.
because one step installs some packages, then another step installs some
more) will fail because of the cache name collision.

Also update the README and tests, although the tests are only getting a
very quick update for this function, as I'm going to do a much more
comprehensive test update separately.
me-and added a commit to me-and/cygwin-install-action that referenced this pull request Jan 11, 2023
Add an option to restore a cache without creating a new one, or to
create a cache without restoring an old one.  This is useful in the
scenario where the action is called multiple times in a run, as it
allows the cache to only be restored on the first call, and for a new
cache to only be written on the last call.

This option replaces the `package-cache-key` option; as discussed in cygwin#6
that option isn't very useful.

Additionally, add some more scopes to the cache name.  Without this
change, a second attempt to create a cache in the same action run (e.g.
because one step installs some packages, then another step installs some
more) will fail because of the cache name collision.

Also update the README and tests, although the tests are only getting a
very quick update for this function, as I'm going to do a much more
comprehensive test update separately.
me-and added a commit to me-and/cygwin-install-action that referenced this pull request Jan 11, 2023
Add an option to restore a cache without creating a new one, or to
create a cache without restoring an old one.  This is useful in the
scenario where the action is called multiple times in a run, as it
allows the cache to only be restored on the first call, and for a new
cache to only be written on the last call.

This option replaces the `package-cache-key` option; as discussed in cygwin#6
that option isn't very useful.

Additionally, add some more scopes to the cache name.  Without this
change, a second attempt to create a cache in the same action run (e.g.
because one step installs some packages, then another step installs some
more) will fail because of the cache name collision.

Also update the README and tests, although the tests are only getting a
very quick update for this function, as I'm going to do a much more
comprehensive test update separately.
@jon-turney
Copy link
Member

Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile.

They're all provided as part of the Git Bash / Git for Windows installation on the GitHub Windows runners. I've not actually checked how this works for self-hosted runners, and I could believe it's not guaranteed to work there. I'll add some notes to that effect…

Aha! I hadn't noticed that all this happens before we modify that PATH, so it'll never be using cygwin tools to do these things, it's assuming they are already provided by something in the VM image.

@me-and
Copy link
Contributor Author

me-and commented Jan 12, 2023

On the performance topic: initial numbers seem to show using a cache saves very roughly 4 minutes off a Cygwin installation that would otherwise take about 10 minutes in total. Restoring a cache of a few hundred MB seems to take only a few seconds, which clearly beats the download time from the Cygwin mirrors.

Currently the hold-up is the reason I raised #5: if Cygwin is in the PATH, the caching code will still try to use the Git for Windows tar, but will fail. Using the Git for Windows tar is hardcoded in the GitHub cache code. Removing Cygwin from PATH just for the caching steps doesn't seem to work, I don't think "you can't use caching if you add Cygwin to PATH" is a reasonable limitation, so I'm currently trying to find other ways to get the caching to work…

@jon-turney
Copy link
Member

jon-turney commented Jan 13, 2023

Currently the hold-up is the reason I raised #5: if Cygwin is in the PATH, the caching code will still try to use the Git for Windows tar, but will fail. Using the Git for Windows tar is hardcoded in the GitHub cache code. Removing Cygwin from PATH just for the caching steps doesn't seem to work, I don't think "you can't use caching if you add Cygwin to PATH" is a reasonable limitation, so I'm currently trying to find other ways to get the caching to work…

I could live with "if you want to use this action multiple time in the same workflow, you must use package-cache: disabled or add-path: false"?

@me-and
Copy link
Contributor Author

me-and commented Jan 13, 2023

I'm hoping it might be something that can get fixed fairly soon; it looks like the problem was added relatively recently, in actions/cache@9b0be58822, and based on actions/toolkit#1311 I'm not the only person to be seeing similar problems.

Another option – which I've not tested properly but is next on my idea list – is to try just using the parent of that commit in actions/cache. I created a simple test case to demonstrate the issue at https://github.com/me-and/repro-cygwin-cache-woe, and it looks like the parent commit might let the caching work as currently designed, but without this issue.

Otherwise, I agree: just documenting it as a limitation is probably going to be the best option for now!

-------

If you're likely to do regular builds, you might want to store the packages
locally rather than needing to download them from the Cygwin mirrors on every
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "locally" is the best word to use here.

@gstrauss
Copy link
Contributor

gstrauss commented Apr 9, 2025

FYI: I came across a js version of cygwin-install-action at https://github.com/secondlife/setup-cygwin and it includes a feature for caching. I have not looked further yet.

@jon-turney @me-and I also found a 3-year old (unmaintained?) cygwin-install-action repo with you two as contributors:
https://github.com/actions-marketplace-validations/cygwin_cygwin-install-action. Should that repo be updated? deleted? replaced with a pointer to https://github.com/cygwin/cygwin-install-action ?

@jon-turney
Copy link
Member

js version of cygwin-install-action

I don't know about you, but I love duplicated effort.

also found a 3-year old (unmaintained?) cygwin-install-action repo with you two as contributors

I don't know what actions-marketplace-validations is, but their org page https://github.com/actions-marketplace-validations says it's some kind of archive and "do not use anything here", so idk what I can do about that...

@gstrauss
Copy link
Contributor

js version of cygwin-install-action

I don't know about you, but I love duplicated effort.

While I don't know if you can see the other repo, you should have access to see some metrics about how much people use cygwin-install-action:
https://docs.github.com/en/actions/administering-github-actions/viewing-github-actions-metrics

@jeremyd2019
Copy link

I came across this while thinking about adding caching to my own workflows using actions/cache. I was going to ask for 2 things:

  1. an input for package cache directory (there's an output already, but actions/cache would need to know it before this action runs in order to restore the cache to the right place)
  2. a way to clean up unnecessary files from the package cache directory, to prevent it growing without bounds. It seems there is currently no such mechanism

@gstrauss
Copy link
Contributor

gstrauss commented Jun 8, 2025

@jeremyd2019 Somewhat related data point: I found that for my use, I use actions/cache for the cygwin installation and set an expiration of 1 month on the cache. For my use, that's sufficient to keep the installation under control, and I don't have to cache the package cache directory. If I ran my CI more frequently or on larger package (llvm), I might consider adding ccache, too.

It works for me to have a reasonable time frame for cache expiration, combined with the ability to manually modify the CACHE_KEY in the .yml to trigger use of a new cache, if i really needed to.

https://github.com/lighttpd/lighttpd1.4/blob/master/.github/workflows/ci.yml#L235
See the Windows-Cygwin: job for an example of how I set the CACHE_KEY. If you want to put the package cache directory under the cygwin installation to have both cached together, there is a commented-out section to do just that.

Edit: my CI also uses allow-test-packages: true so that my installation pulls in the latest cygwin test packages for my CI.

@jeremyd2019
Copy link

wow, I would not expect using actions/cache directly on the cygwin install would work - it just zips files, so you'd lose acls/permissions and symlinks.

I could probably just add actions/cache on d:\cygwin-packages and let it ride, manually clearing old cache entries if it grows too big, but there should be a way to clean up the cygwin packages dir (if it ever gets high enough on somebody's priority list to implement 😉).

@jon-turney
Copy link
Member

2. a way to clean up unnecessary files from the package cache directory, to prevent it growing without bounds. It seems there is currently no such mechanism

As I wrote above: "Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades..."

:sad_face:

@jeremyd2019
Copy link

jeremyd2019 commented Jun 9, 2025

I implemented caching in my workflow today, unfortunately the github expression function hashFiles is restricted to files under the github workspace so I had to move the cygwin-packages directory around. The path specified seems to be part of the cache identity, I couldn't specify D:\cygwin-packages as the restore path and cygwin-packages as the save path or the restore would not find the cache. A better cache key would probably also include the allow-test-packages flag and a hash of the list of packages to be installed (with the hash of the setup.inis at the end, to allow for updates to create a new cache)

jeremyd2019/cygwin-rust-bootstrap@74a16fd

Before the cygwin-install-action:

      - uses: actions/cache/restore@v4
        with:
          key: cygwin-packages-
          restore-keys: |
            cygwin-packages-
          path: cygwin-packages
      - run: IF EXIST cygwin-packages move /y cygwin-packages D:\
        shell: cmd

After the cygwin-install-action:

      - name: prepare cache
        run: |
          if (Test-Path `${{ steps.cygwin.outputs.package-cache }}\file`* ) {
            rm -r -fo `${{ steps.cygwin.outputs.package-cache }}\file`*
          }
          move "${{ steps.cygwin.outputs.package-cache }}" .
        shell: pwsh
      - uses: actions/cache/save@v4
        with:
          key: cygwin-packages-${{ hashFiles('cygwin-packages/**', '!cygwin-packages/**/setup.ini') }}
          path: cygwin-packages
      - name: post cache
        run: move /Y cygwin-packages "${{ steps.cygwin.outputs.package-cache }}"
        shell: cmd

I had to resort to pwsh because I was getting errors with cmd and rmdir /s /q (I try to avoid powershell when I can because it seems to have a significant startup time compared to cmd).

After looking at what this PR does, I'm thinking now I should not include setup.ini in the hash, but insetad other files.

@jeremyd2019
Copy link

jeremyd2019 commented Jun 11, 2025

I made a wrapper composite action around this one, adding caching the way I was thinking about. I don't know if that'd be helpful to anyone, or make it easier to consider. https://github.com/jeremyd2019/caching-cygwin-install-action/blob/master/action.yml

@jeremyd2019
Copy link

I made a wrapper composite action around this one, adding caching the way I was thinking about. I don't know if that'd be helpful to anyone, or make it easier to consider. https://github.com/jeremyd2019/caching-cygwin-install-action/blob/master/action.yml

I added a simple cache cleaning mechanism to my wrapper action. It compares a list of files "Extracted from" from /var/log/setup.log and removes any files from the package cache not named setup.ini and not in that list. Based on the assumption that the action will be making a fresh Cygwin install on each run, this should remove any packages not actually needed for that install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet