Extract Ms-cabs while reading instead of loading all datablocks into memory#56
Conversation
…es? maybe properly implementing the buffer will magically fix it
|
Note to self- figure out why now-reverted commit f3c3de0 caused so many regressions. It's probably simple |
…n't in a commit before.
8a6d31a to
15d9c27
Compare
…hile loop because of 0 byte files. Reimplemented clean code.
|
Note to self- now-reverted commit has been re-implemented, the issue was that the case of 0 bytes remaining needs to be handled with a while loop and not an if statement in order to properly process 0-byte files. |
|
All regressions have been identified and removed, and all remaining preexisting issues I noticed are fixed, with two exceptions. There are a very small number of cabs that have ST.IO MsZIP issues, which as far as I can tell aren't due to ST.Serialization. I do not know enough about MsZIP on that side of things to figure out what's going wrong, so I'll open an issue for those. Beyond that, the only potential remaining issue is that BOS might have superfluous errors scanning empty 36 byte mscabs, but I'm not sure. It's not really much of an issue if that is the case. The only things left will be placing proper failsafes so that as much extraction occurs as possible for partial sets and damaged cabs, and going through my remaining TODOs to convert most of them into PR comments. Then the PR will be ready for review. |
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
|
PR is ready for review. |
SabreTools.Serialization/Models/MicrosoftCabinet/FolderTuple.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Models/MicrosoftCabinet/FolderTuple.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
| /// <returns>True if the file extracted, false otherwise</returns> | ||
| private static bool ExtractFile(string outputDirectory, Stream blockStream, CFFILE file, bool includeDebug) | ||
| /// <returns>True if all files extracted, false otherwise</returns> | ||
| private bool ExtractSet(string? cabFilename, string outputDirectory, bool ignorePrev, bool includeDebug) |
There was a problem hiding this comment.
Sorry that this is a top-of-method comment and not at any specific other point.
This entire method feels very difficult to parse. There are multiple nested bits of code that feel like they should be private helper methods. It also feels like this entire thing would benefit from having some sort of external state tracking, similar to IS-CAB's extraction helper class. The way that you have to juggle variables to make sure that you're tracking the cabinet, folder, block, and file feel overly-complex.
There was a problem hiding this comment.
Gotcha. I'll have to look into how IS-CAB manages state, then. It'll probably be a while then since a bunch of logic will need to be re-written, I'll avoid any other fixes until I have that done.
There was a problem hiding this comment.
A full rewrite can be done in a followup PR. I was putting this comment here as a very general statement. If it's working, then let's get a working version of the code in before it gets retooled. Incremental changes are fine here since there's no rush to get a new stable version of the library out.
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
|
As far as I'm aware, I've addressed everything mentioned here so far. Let me know if I've missed anything, or done anything wrong. |
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
SabreTools.Serialization/Wrappers/MicrosoftCabinet.Extraction.cs
Outdated
Show resolved
Hide resolved
…memory (SabreTools#56) * WIP * WIP 2 * Todo: you're missing a read somehow and getting misaligned by two bytes? maybe properly implementing the buffer will magically fix it * continued blocks my behated * Pre-major-testing * Forgot to add summaries * Attempt to properly roll back. The state i wanted to roll back to wasn't in a commit before. * Figured out the issue with the rolled back commit, this has to be a while loop because of 0 byte files. Reimplemented clean code. * Comment so I don't forget why it's like this * Skip unsupported compression types before opening filestream. * Reenable non-start cab skipping * TODO so i don't forget * TODO so I don't forget * iterate on continued block correctly. * Handle incomplete extraction better * Remove TODOs to ready for PR comments * Missed one * Some minor fixes before rewriting everything * Next round of fixes. * Next set of fixes * Fix debug output
Also fixes #44 somehow.
While this is marked as draft, it's basically done. The only things I still want to do are add failsafes so that as much extraction as possible can continue to happen if errors occur, fix some edge cases (at present, I only know of one file in one cab set that's a regression, the other issues seem to have already been present), and convert most of my TODOs into PR comments. While I did mention before that I wanted to split up the main extraction loop, trying to do so resulted in some really ugly code due to how much state needs to be shared, so I went back on that. I'm perfectly fine with splitting things up if still asked, though.