Skip to content

[Clippy] fix: handle corrupt ZIP local file header in image/media parts#234

Open
github-actions[bot] wants to merge 3 commits intomasterfrom
clippy/fix-issue-233-corrupt-zip-image-5b26047fdcb973c4
Open

[Clippy] fix: handle corrupt ZIP local file header in image/media parts#234
github-actions[bot] wants to merge 3 commits intomasterfrom
clippy/fix-issue-233-corrupt-zip-image-5b26047fdcb973c4

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated fix from Clippy.

Closes #233

Root Cause

When a PPTX file has an image or media part whose ZIP entry has a corrupt local file header, System.IO.Compression.ZipArchiveEntry.Open() throws System.IO.InvalidDataException: A local file header is corrupt.

This propagated unhandled from two sites:

  1. ImageData..ctor — calls part.GetStream() to compute a deduplication hash
  2. CopyRelatedImage — calls oldPart.GetStream() + FeedData() to copy the raw bytes

The same issue exists symmetrically in MediaData..ctor and CopyRelatedMedia.

Fix

Location Change
ImageData..ctor Catches InvalidDataException; falls back to a GUID-based unique hash so the deduplication cache lookup succeeds without crashing
MediaData..ctor Same fallback
CopyRelatedImage Wraps FeedData in a try/catch; on corrupt data, leaves an empty image part in place and continues copying the rest of the slide
CopyRelatedMedia Same guard for FeedData

The behaviour is intentionally lenient: the corrupt image/media is skipped (resulting in a broken/missing asset in the output), but the slide copy completes rather than aborting. This matches the principle of best-effort extraction from damaged files.

Test Status

Added regression test AddSlidePart_WithCorruptImageLocalFileHeader_DoesNotThrow:

  • Opens BRK3066.pptx into memory
  • Scans the raw ZIP bytes for the first ppt/media/ local file header and corrupts its signature bytes (0x03 0x040xFF 0xFF)
  • Calls AddSlidePart and asserts it does not throw and produces a non-empty output
Test run summary: Passed!  total: 17  succeeded: 17  (PresentationBuilderSlidePublishingTests)
Build: 0 errors, warnings only (pre-existing)

Generated by 🌈 Clippy, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@51c8f6ad4357d2ecc06e47120031b3d75e80227d

When a PPTX file has an image or media part whose ZIP entry contains a
corrupt local file header, ZipArchiveEntry.Open() throws
InvalidDataException. This propagated unhandled out of AddSlidePart,
crashing the entire copy operation.

Root cause: ImageData..ctor and MediaData..ctor call part.GetStream() to
compute a deduplication hash. CopyRelatedImage and CopyRelatedMedia then
call FeedData(stream) to copy the raw bytes. Both sites can throw
InvalidDataException when the ZIP entry is corrupt.

Fix:
- ImageData..ctor and MediaData..ctor now catch InvalidDataException and
  fall back to a GUID-based unique hash, so deduplication lookup
  succeeds without crashing.
- CopyRelatedImage wraps FeedData in a try/catch for InvalidDataException;
  the empty image part is left in place and the copy continues so the
  rest of the slide is preserved.
- CopyRelatedMedia applies the same guard to its FeedData call.

Adds a regression test (AddSlidePart_WithCorruptImageLocalFileHeader_DoesNotThrow)
that corrupts a ppt/media/ entry's local file header signature in-memory
and verifies AddSlidePart completes without throwing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens PowerPoint slide copying/deduplication against PPTX files that contain image/media parts whose underlying ZIP entries have corrupt local file headers (triggering InvalidDataException from ZipArchiveEntry.Open()), allowing best-effort slide extraction instead of aborting.

Changes:

  • Catch InvalidDataException when hashing ImagePart/DataPart streams for deduplication and fall back to a generated hash.
  • Catch InvalidDataException during image/media FeedData() so slide copy can continue with missing/broken assets.
  • Add a regression test that corrupts a ppt/media/ local file header and asserts AddSlidePart does not throw.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
Clippit/PtOpenXmlUtil.cs Adds InvalidDataException handling in ImageData/MediaData hashing to avoid crashing on corrupt ZIP entries.
Clippit/PowerPoint/Fluent/FluentPresentationBuilder.Copy.cs Guards image/media copying (GetStream/FeedData) so corrupt parts don’t abort slide copying.
Clippit.Tests/PowerPoint/PresentationBuilderSlidePublishingTests.Fluent.cs Adds regression test that corrupts a media local header and verifies slide copy completes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Clippit/PtOpenXmlUtil.cs
Comment on lines +1898 to +1902
// The image part's ZIP entry has a corrupt local file header.
// Use a unique hash so the deduplication cache treats this entry
// as distinct rather than throwing, allowing the rest of the slide to be copied.
Hash = Guid.NewGuid().ToByteArray();
}
Comment thread Clippit/PtOpenXmlUtil.cs
Comment on lines +1923 to +1927
// The media part's ZIP entry has a corrupt local file header.
// Use a unique hash so the deduplication cache treats this entry
// as distinct rather than throwing, allowing the rest of the slide to be copied.
Hash = Guid.NewGuid().ToByteArray();
}
Comment on lines +28 to +29
var corrupted = CorruptFirstMediaLocalFileHeader(srcMemory.ToArray());
using var corruptedMemory = new MemoryStream(corrupted);
Comment on lines +76 to +78
var name = System.Text.Encoding.UTF8.GetString(zipBytes, i + 30, nameLen);
if (!name.StartsWith("ppt/media/"))
continue;
github-actions Bot and others added 2 commits April 24, 2026 13:55
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test(excel): add SmlDataRetriever unit tests (SDR001–SDR025)

Adds 25 tests covering all public overloads of SmlDataRetriever:
- SheetNames / TableNames (string, SmlDocument, SpreadsheetDocument overloads)
- RetrieveSheet (all overloads, row/cell structure, invalid name throws)
- RetrieveRange (single cell, column slice, invalid name throws, overload parity)
- RetrieveTable (Table element, Columns/Data structure, header row excluded,
  invalid name throws, shared strings decoded, SmlDocument 3-arg overload)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(excel): address SmlDataRetriever review feedback

Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4

Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>

* test(excel): consolidate SpreadsheetDocument overload coverage

Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4

Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>

* test(excel): tighten shared-string assertion and keep SDR001-025 range

Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/0e3ab580-0345-4e3c-bc3d-0570e876b1e4

Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>

* test(excel): cover numeric RetrieveRange overloads and rename path helper

Agent-Logs-Url: https://github.com/sergey-tihon/Clippit/sessions/d94efc9a-c687-4c25-82e9-46a58476370e

Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sergey-tihon <1197905+sergey-tihon@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor Author

Commit pushed: 8cb5773

Generated by 🌈 Clippy, see workflow run. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System.IO.InvalidDataException: A local file header is corrupt.

1 participant