-
Notifications
You must be signed in to change notification settings - Fork 89
WIP: add support for MSI #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: add support for MSI #1244
Conversation
Basic scaffolding, running into a few issues.
pymsiYou should add this dependency in an independent commit with the following command: uv add python-msi
git add pyproject.toml uv.lock
git commit -m 'chore(deps): add python-msi dependency BytesIO and unblob FileI'd recommend submitting a PR to diff --git a/src/pymsi/package.py b/src/pymsi/package.py
index 43ecaee..9f84c66 100644
--- a/src/pymsi/package.py
+++ b/src/pymsi/package.py
@@ -1,8 +1,10 @@
import copy
import io
+import mmap
from pathlib import Path
from typing import Iterator, Optional, Union
+
import olefile
from pymsi import streamname
@@ -18,13 +20,14 @@ from .summary import Summary
class Package:
- def __init__(self, path_or_bytesio: Union[Path, io.BytesIO]):
- if isinstance(path_or_bytesio, io.BytesIO):
- self.path = None
- self.file = path_or_bytesio
- else:
+ def __init__(self, path_or_bytesio: Union[Path, io.BytesIO, mmap.mmap]):
+ if isinstance(path_or_bytesio, Path):
self.path = path_or_bytesio.resolve(True)
self.file = self.path.open("rb")
+ else:
+ self.path = None
+ self.file = path_or_bytesio
+
self.tables = {}
self.ole = None
self.summary = None Reading from BytesIO or an mmap'ed file in python is pretty similar, inverting the check and extending the type hint is sufficient to make unblob work with code like this: def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]:
file.seek(start_offset, io.SEEK_SET)
package = pymsi.Package(file)
msi = pymsi.Msi(package, False)
# MSI moves the file pointer
msi_end_offset = file.tell()
return ValidChunk(
start_offset = start_offset,
end_offset = msi_end_offset,
) The type hint change is not even required, just inverting the condition so that pymsi is more lax when it's not working on a Integration TestsIn order to validate that the handler works, you must create a directory and put files in there so that we can check that unblob works properly and catch regression in the future:
Skip Magic ChangeThat's okay to modify the skip magic list, as long as the file types you remove from the list are handled by a default handler (which is the case here). You can simply remove the line, rather than commenting it. Sandboxing ExceptionWe need to fix that, but if you run it twice it'll disappear. Getting sandboxing right is hard :) |
# MSI moves the file pointer | ||
msi_end_offset = buf.tell() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right. If you look at the output directory when run on the 7z MSI, you'll see that it carves two chunks:
0-1545728.msi
1545728-1563648.unknown
Looking into the unknown
chunk, we see information that belongs to the Summary
field:
hexdump -C 1545728-1563648.unknown
--snip--
00004470 1e 00 00 00 16 00 00 00 49 6e 73 74 61 6c 6c 61 |........Installa|
00004480 74 69 6f 6e 20 44 61 74 61 62 61 73 65 00 00 00 |tion Database...|
00004490 1e 00 00 00 0e 00 00 00 37 2d 5a 69 70 20 50 61 |........7-Zip Pa|
000044a0 63 6b 61 67 65 00 00 00 1e 00 00 00 0c 00 00 00 |ckage...........|
000044b0 49 67 6f 72 20 50 61 76 6c 6f 76 00 1e 00 00 00 |Igor Pavlov.....|
000044c0 0a 00 00 00 49 6e 73 74 61 6c 6c 65 72 00 00 00 |....Installer...|
000044d0 1e 00 00 00 0e 00 00 00 37 2d 5a 69 70 20 50 61 |........7-Zip Pa|
000044e0 63 6b 61 67 65 00 00 00 1e 00 00 00 0b 00 00 00 |ckage...........|
000044f0 49 6e 74 65 6c 3b 31 30 33 33 00 00 1e 00 00 00 |Intel;1033......|
00004500 27 00 00 00 7b 32 33 31 37 30 46 36 39 2d 34 30 |'...{23170F69-40|
00004510 43 31 2d 32 37 30 31 2d 32 35 30 31 2d 30 30 30 |C1-2701-2501-000|
00004520 30 30 32 30 30 30 30 30 30 7d 00 00 03 00 00 00 |002000000}......|
00004530 c8 00 00 00 03 00 00 00 02 00 00 00 03 00 00 00 |................|
00004540 02 00 00 00 40 00 00 00 80 8a 97 7e 8e 04 dc 01 |....@......~....|
00004550 40 00 00 00 80 8a 97 7e 8e 04 dc 01 1e 00 00 00 |@......~........|
00004560 31 00 00 00 57 69 6e 64 6f 77 73 20 49 6e 73 74 |1...Windows Inst|
00004570 61 6c 6c 65 72 20 58 4d 4c 20 76 32 2e 30 2e 33 |aller XML v2.0.3|
00004580 37 31 39 2e 30 20 28 63 61 6e 64 6c 65 2f 6c 69 |719.0 (candle/li|
00004590 67 68 74 29 00 00 00 00 00 00 00 00 00 00 00 00 |ght)............|
000045a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
You can check that by opening the file in https://pymsi.readthedocs.io/en/latest/msi_viewer.html and looking at the Summary tab.
I know very little about the MSI format, but looks like the end offset could be calculated based on the OLE format that the MSI is made off. Probably some magic involving sector sizes and sector counts.
Based on suggestion from @qkaiser. This will make it easier to integrate with unblob (see onekey-sec/unblob#1244).
Requires this PR to (almost) work properly: nightlark/pymsi#81
Don't assume that pymsi will actually read the entire file.
I took a look at adding support for MSI (#1211) and ran into a couple of issues.
Pymsi requires a path or BytesIO -- is there a way to convert from an unblob file? I was looking at
file_utils.OffsetFile
which seems closer (might need pymsi to support file-like objects, @nightlark). I made the following changes:And in pymsi changed the
__init__
for Package to:This seems to work but I'm not sure if there's an easier way.
I'm getting an error related to sandboxing which I suspect is unrelated:
I've been testing with a 7zip MSI downloaded from here:
https://www.7-zip.org/a/7z2501.msi
Finally, I commented out an entry in
DEFAULT_SKIP_MAGIC
to get the new handler called. I don't know what other side effects this will have.Any suggestions?