🎉Introducing gix archive 🎉
#969
Byron
started this conversation in
Show and tell
Replies: 1 comment 7 replies
-
Which crate is that? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
gix archiveis a new sub-command which pretty much does what you think it will: given atreeishand a file path, it will extract the treeish exactly like it would when checking it out to stream it into the file path in one of multiple formats,tar,tar.gzandzip.gix archivesome advantages overgit archive:git lfs(ziponly)tar)Performance
Let's dig into some performance comparisons on the linux kernel cloned from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git.
The reason for generally better performance is certainly that there is very basic concurrency that offloads the streaming of worktree data into its own thread. That way, the container format can be handled by the main thread all by itself. Further,
gixis has a fasterodbimplementation and useszlib-ngwhich overall yields about 10% better performance when reading objects from the object store.When compressing,
gixalso benefits from theflate2crate and its advanced backends.It's interesting to see that thanks to
gixit's now possible to get the highest compression level (tar.gz max) in the same time thatgittakes for the default level. In this particular case, one only saves ~3MB with that though. Also, one can get a lightly compressed archive (tar.gz min) faster than an uncompressedtararchive fromgit, albeit at low compression with a 345MB size.File Sizes
All the performance in the world wouldn't be useful if the produced files would be considerably bigger. Let's take a look.
When looking at
tar.gz, it seems to be quite exactly the same as whatgitproduces and is quite uninteresting in that.zip, however, takes the lead in being slightly smaller and more than twice as fast to produce. Unfortunately it still has a shortcoming of not being able to reproduce symbolic links. If that would be fixed, it would be the most advanced format as it's also able to stream large files.Memory Consumption
Finally, let's be sure that
gixdoesn't need unreasonable amounts of memory producing these files.When looking at
max-resident(max-res) size,gixuses consistently less, saving nearly 20% at all times. However, when looking at thepeakmemory (which probably doesn't include virtual memory),gixuses nearly 50% more. This clearly has to do with the container formats which seem to keep quite a lot of extra data around when setting them up, which might be an issue in repositories with a lot of files under the assumption that this scales with file-count.It's worth noting that despite some shortcomings, it seems that
taris the best format when memory consumption is a concern - thengixwill always outperformgitboth in memory consumption and performance (*in this particular setup).Shortcomings
However, it's not all roses right now, and probably due to me creating the
ziparchive incorrectly, symlinks for some reason don't manifest during extraction despite being contained in the archive.This works when using
tarthough.Further, I have the feeling that compression settings aren't applied for some reason for
tar.gz, and it's unclear how to set the compression forziparchives.Also, submodules aren't yet added to the archive, which is the same shortcoming as for
gititself, but that bound to happen as submodule support is currently being added togitoxide.Conclusion
Implementing a minimal viable product of
gix archivemerely as an experiment took only 3 days and showed how powerfulgixhas become, making it possible to write tools that don't only rival the standard implementation, but can even surpass many aspects of it.I will work hard to reach feature parity with
git2for starters and then do my best to make it easier for users to choosegixovergit2when starting new projects.Q & A
Q: Can I use
gix archiveinstead ofgit archive?In think it's worth giving it a short if you need the extra performance or the extra capabilities. Be aware of the current shortcomings though, and maybe even contribute a fix.
Q: Why does
gix archiveexist?As
gixis a development tool to be able to run thegixcrate in the real world, it made sense to be able to test the worktree related code in a context that doesn't involve writing files to disk. Implementing this means we need to be flexible enough to be able to put all related parts together in different ways, andgix archiveis a very nice application of said 'worktree machinery'.As it turns out, many folks wanted it to support submodules as well, and with this work ongoing it seems similarly useful to validate the
gixAPI against such a need - supporting submodules should be reasonably easy with anythinggixcomes up with.Q: Can I use this in my own crate?
Yes, there is
gix-archivethat implements archiving, andgix-worktree-streamthat provides a stream of entries that would make up the worktree on disk, bit for bit.Q: Could the same be implemented with
git2?Definitely, and I'd be keen to see such an implementation in comparison to
gitandgix!Data
Versions used initially.
Then after updates to how
gzcompression works, it's this one:archive -f tararchive -f tar-gzwith
libflatewith
flate2with
flate2level 9with
flate2level 1archive -f zipFile Sizes
tar.gz
libflatetar.gz
flate2tar.gz
flate2with compression level 9tar.gz
flate2with compression level 1Memory
with
libflatewith
flate2Beta Was this translation helpful? Give feedback.
All reactions