[Clippy] fix: handle corrupt ZIP local file header in image/media parts#234
Open
github-actions[bot] wants to merge 3 commits intomasterfrom
Open
[Clippy] fix: handle corrupt ZIP local file header in image/media parts#234github-actions[bot] wants to merge 3 commits intomasterfrom
github-actions[bot] wants to merge 3 commits intomasterfrom
Conversation
When a PPTX file has an image or media part whose ZIP entry contains a corrupt local file header, ZipArchiveEntry.Open() throws InvalidDataException. This propagated unhandled out of AddSlidePart, crashing the entire copy operation. Root cause: ImageData..ctor and MediaData..ctor call part.GetStream() to compute a deduplication hash. CopyRelatedImage and CopyRelatedMedia then call FeedData(stream) to copy the raw bytes. Both sites can throw InvalidDataException when the ZIP entry is corrupt. Fix: - ImageData..ctor and MediaData..ctor now catch InvalidDataException and fall back to a GUID-based unique hash, so deduplication lookup succeeds without crashing. - CopyRelatedImage wraps FeedData in a try/catch for InvalidDataException; the empty image part is left in place and the copy continues so the rest of the slide is preserved. - CopyRelatedMedia applies the same guard to its FeedData call. Adds a regression test (AddSlidePart_WithCorruptImageLocalFileHeader_DoesNotThrow) that corrupts a ppt/media/ entry's local file header signature in-memory and verifies AddSlidePart completes without throwing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Apr 23, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens PowerPoint slide copying/deduplication against PPTX files that contain image/media parts whose underlying ZIP entries have corrupt local file headers (triggering InvalidDataException from ZipArchiveEntry.Open()), allowing best-effort slide extraction instead of aborting.
Changes:
- Catch
InvalidDataExceptionwhen hashingImagePart/DataPartstreams for deduplication and fall back to a generated hash. - Catch
InvalidDataExceptionduring image/mediaFeedData()so slide copy can continue with missing/broken assets. - Add a regression test that corrupts a
ppt/media/local file header and assertsAddSlidePartdoes not throw.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
Clippit/PtOpenXmlUtil.cs |
Adds InvalidDataException handling in ImageData/MediaData hashing to avoid crashing on corrupt ZIP entries. |
Clippit/PowerPoint/Fluent/FluentPresentationBuilder.Copy.cs |
Guards image/media copying (GetStream/FeedData) so corrupt parts don’t abort slide copying. |
Clippit.Tests/PowerPoint/PresentationBuilderSlidePublishingTests.Fluent.cs |
Adds regression test that corrupts a media local header and verifies slide copy completes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1898
to
+1902
| // The image part's ZIP entry has a corrupt local file header. | ||
| // Use a unique hash so the deduplication cache treats this entry | ||
| // as distinct rather than throwing, allowing the rest of the slide to be copied. | ||
| Hash = Guid.NewGuid().ToByteArray(); | ||
| } |
Comment on lines
+1923
to
+1927
| // The media part's ZIP entry has a corrupt local file header. | ||
| // Use a unique hash so the deduplication cache treats this entry | ||
| // as distinct rather than throwing, allowing the rest of the slide to be copied. | ||
| Hash = Guid.NewGuid().ToByteArray(); | ||
| } |
Comment on lines
+28
to
+29
| var corrupted = CorruptFirstMediaLocalFileHeader(srcMemory.ToArray()); | ||
| using var corruptedMemory = new MemoryStream(corrupted); |
Comment on lines
+76
to
+78
| var name = System.Text.Encoding.UTF8.GetString(zipBytes, i + 30, nameLen); | ||
| if (!name.StartsWith("ppt/media/")) | ||
| continue; |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test(excel): add SmlDataRetriever unit tests (SDR001–SDR025) Adds 25 tests covering all public overloads of SmlDataRetriever: - SheetNames / TableNames (string, SmlDocument, SpreadsheetDocument overloads) - RetrieveSheet (all overloads, row/cell structure, invalid name throws) - RetrieveRange (single cell, column slice, invalid name throws, overload parity) - RetrieveTable (Table element, Columns/Data structure, header row excluded, invalid name throws, shared strings decoded, SmlDocument 3-arg overload) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(excel): address SmlDataRetriever review feedback Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4 Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com> * test(excel): consolidate SpreadsheetDocument overload coverage Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4 Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com> * test(excel): tighten shared-string assertion and keep SDR001-025 range Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4 Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com> * test(excel): cover numeric RetrieveRange overloads and rename path helper Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/d94efc9a-c687-4c25-82e9-46a58476370e Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>
Contributor
Author
|
Commit pushed:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This is an automated fix from Clippy.
Closes #233
Root Cause
When a PPTX file has an image or media part whose ZIP entry has a corrupt local file header,
System.IO.Compression.ZipArchiveEntry.Open()throwsSystem.IO.InvalidDataException: A local file header is corrupt.This propagated unhandled from two sites:
ImageData..ctor— callspart.GetStream()to compute a deduplication hashCopyRelatedImage— callsoldPart.GetStream()+FeedData()to copy the raw bytesThe same issue exists symmetrically in
MediaData..ctorandCopyRelatedMedia.Fix
ImageData..ctorInvalidDataException; falls back to a GUID-based unique hash so the deduplication cache lookup succeeds without crashingMediaData..ctorCopyRelatedImageFeedDatain a try/catch; on corrupt data, leaves an empty image part in place and continues copying the rest of the slideCopyRelatedMediaFeedDataThe behaviour is intentionally lenient: the corrupt image/media is skipped (resulting in a broken/missing asset in the output), but the slide copy completes rather than aborting. This matches the principle of best-effort extraction from damaged files.
Test Status
Added regression test
AddSlidePart_WithCorruptImageLocalFileHeader_DoesNotThrow:BRK3066.pptxinto memoryppt/media/local file header and corrupts its signature bytes (0x03 0x04→0xFF 0xFF)AddSlidePartand asserts it does not throw and produces a non-empty output