-
Notifications
You must be signed in to change notification settings - Fork 2
SGA Archive
The container for most-all Dawn Of War assets.
| Sections | Layout |
|---|---|
| Header | See Layout |
| Table Of Contents (ToC) | See Layout |
| ToC Data Block | See Layout |
| Data Block | See Layout |
The first 12 bytes are always laid out like this, the rest of the header is laid out based on the Version given.
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 7 | 8 | Magic Word | '_ARCHIVE' | String (ascii) | |
| 8 | 9 | 2 | Version (Major) | - | UInt16 | See Version |
| 10 | 11 | 2 | Version (Minor) | - | UInt16 | See Version |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 7 | 8 | Magic Word | '_ARCHIVE' | String (ascii) | |
| 8 | 9 | 2 | Version (Major) | '2' | UInt16 | See Version |
| 10 | 11 | 2 | Version (Minor) | '0' | UInt16 | See Version |
| 12 | 27 | 16 | Checksum? | - | MD5 Hash (Bytes) | See Checksums |
| 28 | 155 | 128 | Name | - | String (utf-16-le) | Name is limited to 64 (2-byte) characters. |
| 156 | 171 | 16 | Checksum? | - | MD5 Hash (Bytes) | See Checksums |
| 172 | 175 | 4 | ToC Size | -* | UInt32 | * Data Position - 180 |
| 176 | 179 | 4 | Data Buffer Offset | - | UInt32 | Absolute offset in file |
| 180 | - | - | ToC Offset | - | - | Not actually part of the header. The ToC always starts at 180 in v2.0 |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 7 | 8 | Magic Word | '_ARCHIVE' | String (ascii) | |
| 8 | 9 | 2 | Version (Major) | '5' | UInt16 | See Version |
| 10 | 11 | 2 | Version (Minor) | '0' | UInt16 | See Version |
| 12 | 27 | 16 | Checksum? | - | MD5 Hash (Bytes) | See Checksums |
| 28 | 155 | 128 | Name | - | String (utf-16-le) | Name is limited to 64 (2-byte) characters. |
| 156 | 171 | 16 | Checksum? | - | MD5 Hash (Bytes) | See Checksums |
| 172 | 175 | 4 | ToC Size | - | UInt32 | |
| 176 | 179 | 4 | Data Buffer Offset | - | UInt32 | Absolute position in file. |
| 180 | 183 | 4 | ToC Offset | - | UInt32 | Absolute position in file. |
| 184 | 187 | 4 | Reserved '1' | '1' | UInt32 | Unknown meaning; always 1 |
| 188 | 191 | 4 | Reserved '0'? | - | UInt32 | Unknown meaning; always 0? Untested. |
| 192 | 195 | 4 | ??? | - | UInt32 | Typically '0x4d41dXXX'; running ID? May be used to invalidate the MD5 hash when changed? Although that's a stretch, since Data would probably be changed as well, and I expect that would certainly change the MD5 hash. |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 7 | 8 | Magic Word | '_ARCHIVE' | String (ascii) | |
| 8 | 9 | 2 | Version (Major) | '9' | UInt16 | See Version |
| 10 | 11 | 2 | Version (Minor) | '0' | UInt16 | See Version |
| 12 | 131 | 128 | Name | - | String (utf-16-le) | Name is limited to 64 (2-byte) characters. |
| 132 | 139 | 8 | ToC Offset | - | UInt64 | Absolute offset in file. |
| 140 | 143 | 4 | ToC Size | - | UInt32 | |
| 144 | 171 | 8 | Data Buffer Offset | - | UInt64 | Absolute offset in file. |
| 172 | 175 | 4 | Data Size | - | UInt32 | |
| 176 | 179 | 4 | Reserved '0'? | - | UInt32 | Unknown meaning; always 0? Untested. |
| 180 | 183 | 4 | Reserved '1' | - | UInt32 | Unknown, always 1? Untested. |
| 184 | 427 | 256 | ??? | - | Bytes | After the 'Reserved 1' but before the location stored in 'Data Position'; there are 256 bytes; I have no idea what this could be; hashes, crc, couldn't tell you. |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Virtual Drive Offset | - | UInt32 | Relative to 'ToC Offset' |
| 4 | 5 | 2 | Virtual Drive Count | - | UInt16 | |
| 6 | 9 | 4 | Folder Offset | - | UInt32 | Relative to 'ToC Offset' |
| 10 | 11 | 2 | Folder Count | - | UInt16 | |
| 12 | 15 | 4 | File Offset | - | UInt32 | Relative to 'ToC Offset' |
| 16 | 17 | 2 | File Count | - | UInt16 | |
| 18 | 21 | 4 | Name Buffer Offset | - | UInt32 | Relative to 'ToC Offset' |
| 22 | 23 | 2 | Name Buffer Count | - | UInt16 |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Virtual Drive Offset | - | UInt32 | Relative to 'ToC Offset' |
| 4 | 7 | 4 | Virtual Drive Count | - | UInt32 | |
| 8 | 11 | 4 | Folder Offset | - | UInt32 | Relative to 'ToC Offset' |
| 12 | 15 | 4 | Folder Count | - | UInt32 | |
| 16 | 19 | 4 | File Offset | - | UInt32 | Relative to 'ToC Offset' |
| 20 | 23 | 4 | File Count | - | UInt32 | |
| 24 | 27 | 4 | Name Buffer Offset | - | UInt32 | Relative to 'ToC Offset' |
| 28 | 31 | 4 | Name Size In Bytes | - | UInt32 |
ToC Data contains the Headers for Virtual Drives, Folders, Files, and the Name Buffer.
This could be described as a 'Root Folder', an 'Access Point', or as I've settled on; a 'Virtual Drive' (due to their syntax in the path algorithm; 'V-Drive-path:path-to-file'
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 63 | 64 | Path | - | String (ascii) | Null Padded ('\0') This will typically be 'data'. |
| 64 | 127 | 64 | Name | - | String (ascii) | Null Padded ('\0') This is typically the name of the archive |
| 128 | 129 | 2 | First Folder | - | UInt16 | |
| 130 | 131 | 2 | Last Folder | - | UInt16 | |
| 132 | 133 | 2 | First File | - | UInt16 | |
| 134 | 135 | 2 | Last File | - | UInt16 | |
| 136 | 137 | 2 | ??? | ??? | Bytes | The only time this wasn't '0', it matched the 'First Folder' value? |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 63 | 64 | Path | - | String (ascii) | Null Padded ('\0') This will typically be 'data'. |
| 64 | 127 | 64 | Name | - | String (ascii) | Null Padded ('\0') This is typically the name of the archive |
| 128 | 131 | 4 | First Folder | - | UInt32 | |
| 132 | 135 | 4 | Last Folder | - | UInt32 | |
| 136 | 139 | 4 | First File | - | UInt32 | |
| 140 | 143 | 4 | Last File | - | UInt32 | |
| 144 | 147 | 4 | ??? | ??? | Bytes | The only time this wasn't '0', it matched the 'First Folder' value? |
Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES folders to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Name Offset | - | UInt32 | Relative to 'Name Buffer Offset' |
| 4 | 5 | 2 | Sub folder Start Index | - | UInt16 | Inclusive Start |
| 6 | 7 | 2 | Sub folder Stop Index | - | UInt16 | Exclusive Stop |
| 8 | 9 | 2 | File Start Index | - | UInt16 | Inclusive Start |
| 10 | 11 | 2 | File Stop Index | - | UInt16 | Exclusive Stop |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Name Offset | - | UInt32 | Relative to 'Name Buffer Offset' |
| 4 | 7 | 4 | Sub folder Start Index | - | UInt32 | Inclusive Start |
| 8 | 11 | 4 | Sub folder Stop Index | - | UInt32 | Exclusive Stop |
| 12 | 15 | 4 | File Start Index | - | UInt32 | Inclusive Start |
| 16 | 19 | 4 | File Stop Index | - | UInt32 | Exclusive Stop |
Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES files to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Name Offset | - | UInt32 | Relative to 'Name Buffer Offset' |
| 4 | 7 | 4 | Compression Flag? | '0', '16', or '32' | UInt32 | This seems to be the window size in KibiBytes for the ZLib compression. While '6'/'7' (See ZLib Specification 'CINFO') aren't the only ZLib window sizes available, I believe these are the only two supported by the engine. |
| 8 | 11 | 4 | Data Offset | - | UInt32 | Relative to 'Data Buffer Offset' |
| 12 | 15 | 4 | Decompressed Size | - | UInt32 | |
| 16 | 19 | 4 | Compressed Size | - | UInt32 |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Name Offset | - | UInt32 | Relative to 'Name Buffer Offset' |
| 4 | 7 | 4 | Data Offset | - | UInt32 | Relative to 'Data Buffer Offset' |
| 8 | 11 | 4 | Compressed Size | - | UInt32 | |
| 12 | 15 | 4 | Decompressed Size | - | UInt32 | |
| 16 | 19 | 4 | ??? | - | UInt32 | |
| 20 | 21 | 2 | ??? | - | UInt16 |
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| 0 | 3 | 4 | Name Offset | - | UInt32 | Relative to 'Name Buffer Offset' |
| 4 | 7 | 4 | ??? | - | UInt32 | |
| 8 | 11 | 4 | Data Offset | - | UInt32 | Relative to 'Data Buffer Offset' |
| 12 | 15 | 4 | ??? | - | UInt32 | |
| 16 | 19 | 4 | Compressed Size | - | UInt32 | |
| 20 | 23 | 4 | Decompressed Size | - | UInt32 | |
| 24 | 27 | 4 | ??? | - | UInt32 | |
| 28 | 29 | 2 | ??? | - | UInt16 | |
| 30 | 33 | 4 | ??? | - | UInt32 |
Because names are variable-length and do not provide a length, names must either be read all at once to build a name lookup, or read using a valid 'Name Offset'.
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| - | - | - | File/Folder Name | - | String (ascii) | A Null Terminated String ('\0'). |
More accurately, the File Data Buffer. Can only be read with an appropriate File 'Data Offset' and File 'Compressed Size'
| Start | Stop | Size (bytes) | Name | Expected Value | Type | Notes |
|---|---|---|---|---|---|---|
| - | - | - | Data | - | bytes |
Version follows the format X.Y where X is the major version and Y is the minor version.
Unfortunately, I'm only supporting these formats, since I don't have access to the .SGA archives from other Relic Games.
| Version | Game(s) |
|---|---|
| 2.0 | Dawn Of War I |
| 5.0 | Dawn Of War II |
| 9.0 | Dawn Of War III |
According to Corsix (See Sources), these are MD5 hashes. Unfortunately for me, this isn't an easy thing to check, since I need to know where the checksum begins, if an Initialization Vector was used (and what it was), and the number of bytes used to calculate the checksum.
In a perfect world, this documentation would be autogenerated, or distributed by relic itself.
If this documentation contradicts the appropriate python files, trust the python over this documentation.
If it's egregious enough, open an issue or create a PR with the requested changes.