Skip to content

Conversation

@HeroponRikiBestest
Copy link
Contributor

Also fixes #44 somehow.

While this is marked as draft, it's basically done. The only things I still want to do are add failsafes so that as much extraction as possible can continue to happen if errors occur, fix some edge cases (at present, I only know of one file in one cab set that's a regression, the other issues seem to have already been present), and convert most of my TODOs into PR comments. While I did mention before that I wanted to split up the main extraction loop, trying to do so resulted in some really ugly code due to how much state needs to be shared, so I went back on that. I'm perfectly fine with splitting things up if still asked, though.

@HeroponRikiBestest
Copy link
Contributor Author

HeroponRikiBestest commented Jan 15, 2026

Note to self- figure out why now-reverted commit f3c3de0 caused so many regressions. It's probably simple

@HeroponRikiBestest
Copy link
Contributor Author

Note to self- now-reverted commit has been re-implemented, the issue was that the case of 0 bytes remaining needs to be handled with a while loop and not an if statement in order to properly process 0-byte files.

@HeroponRikiBestest
Copy link
Contributor Author

All regressions have been identified and removed, and all remaining preexisting issues I noticed are fixed, with two exceptions. There are a very small number of cabs that have ST.IO MsZIP issues, which as far as I can tell aren't due to ST.Serialization. I do not know enough about MsZIP on that side of things to figure out what's going wrong, so I'll open an issue for those. Beyond that, the only potential remaining issue is that BOS might have superfluous errors scanning empty 36 byte mscabs, but I'm not sure. It's not really much of an issue if that is the case.

The only things left will be placing proper failsafes so that as much extraction occurs as possible for partial sets and damaged cabs, and going through my remaining TODOs to convert most of them into PR comments. Then the PR will be ready for review.

if (Filename != null)
{
cabinet = OpenSet(Filename);
cabinet = OpenSet(Filename, includeDebug);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for some reason the set returns null (I have no idea how this would happen) but the current cab is valid, should it try and continue with extraction?

if (this.Filename != cabinet.Filename)
{
string firstCabName = Path.GetFileName(cabinet.Filename) ?? string.Empty;
if (includeDebug) Console.WriteLine($"Only the first cabinet {firstCabName} will be extracted!");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way for this to not spam the logs on large sets? I assume probably not, but I'm not sure

private CFDATA? ReadBlock(MicrosoftCabinet cabinet)
{
// Should only ever occur if it tries to read more than the file, but good to catch in general
try
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does putting this in a try-catch block slow things down? Thousands of readblocks will get called for any cab, and I've never thought about whether try-catch has any performance impact, so i have genuinely no idea.

{
if (includeDebug) Console.Error.WriteLine(ex);
return false;
return null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file is certainly messed up in some way if this fails, should it be deleted in case it causes cascading issues for BOS? I assume no, but I'm not sure.

try
{
// Ensure folder contains data
if (folder.DataCount == 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail on spanned only folders or something? I'm not sure when this would happen.


var fs = GetFileStream(file.Name, outputDirectory);

//uint quantumWindowBits = (uint)(((ushort)folder.CompressionType >> 8) & 0x1f);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented code was here since before I did any of my PRs, and I have no clue what it means. Should I leave it?

// Has to be a while loop instead of a for loop due to cab spanning continue blocks
while (j < folder.DataCount)
{
lock (tempCabinet._dataSourceLock)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I need lock state to be maintained the whole loop, is this release and re-lock safe? From what I've tested in BOS, the only time i was getting access errors was on trying to access something after ST has finished handling an empty cab, which also might no longer be an issue anyways. That said, I am a little unsure regardless, I'd appreciate your input. I wanted to have the lock state higher, but couldn't since I have to swap out what tempCabinet is mid-loop

// Get the data to be processed
byte[] blockData = db.CompressedData;

// If the block is continued, append
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, this is specifically if and only if it's jumping between cabs on a continued block in a spanned folder, right?

folder = tempCabinet.Folders[0];
lock (tempCabinet._dataSourceLock)
{
tempCabinet._dataSource.SeekIfPossible(folder.CabStartOffset, SeekOrigin.Begin);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-related to the above comment- is it worth checking here whether the folder it's being grabbed from is continued from prev or continued from prev & next? As far as I know, that should never not be the case and it would be pointless to do so, but I don't know what would be best.

}

byte[] nextData = nextBlock.CompressedData;
if (nextData.Length == 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old version of this code from before my PR checked if it was null first, and then if it was of length zero. Do you know why it was checking if it was null? As far as I can tell, that's never been able to happen since Length isn't nullable.

file = files[++fileCounter];
bytesLeft = (int)file.FileSize;
fs = GetFileStream(file.Name, outputDirectory);
while (bytesLeft < data.Length - tempBytesLeft)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's any way this code could be deduplicated? It's reused 3 times, but due to how much state needs to be shared, splitting it out into a helper method looks really ugly.

@HeroponRikiBestest HeroponRikiBestest marked this pull request as ready for review January 16, 2026 19:59
@HeroponRikiBestest
Copy link
Contributor Author

HeroponRikiBestest commented Jan 16, 2026

PR is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Correct mszip extraction on some cabs currently relies on an exception being thrown

1 participant