Skip to content
This repository was archived by the owner on Oct 10, 2023. It is now read-only.

SGA Archive

Marcus Kertesz edited this page Feb 20, 2022 · 4 revisions

The container for most-all Dawn Of War assets.

Sections Layout
Header See Layout
Table Of Contents (ToC) See Layout
ToC Data Block See Layout
Data Block See Layout

Header Layout

Magic Word & Version

The first 12 bytes are always laid out like this, the rest of the header is laid out based on the Version given.

Start Stop Size (bytes) Name Expected Value Type Notes
0 7 8 Magic Word '_ARCHIVE' String (ascii)
8 9 2 Version (Major) - UInt16 See Version
10 11 2 Version (Minor) - UInt16 See Version

Version 2.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 7 8 Magic Word '_ARCHIVE' String (ascii)
8 9 2 Version (Major) '2' UInt16 See Version
10 11 2 Version (Minor) '0' UInt16 See Version
12 27 16 Checksum? - MD5 Hash (Bytes) See Checksums
28 155 128 Name - String (utf-16-le) Name is limited to 64 (2-byte) characters.
156 171 16 Checksum? - MD5 Hash (Bytes) See Checksums
172 175 4 ToC Size -* UInt32 * Data Position - 180
176 179 4 Data Buffer Offset - UInt32 Absolute offset in file
180 - - ToC Offset - - Not actually part of the header.
The ToC always starts at 180 in v2.0

Version 5.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 7 8 Magic Word '_ARCHIVE' String (ascii)
8 9 2 Version (Major) '5' UInt16 See Version
10 11 2 Version (Minor) '0' UInt16 See Version
12 27 16 Checksum? - MD5 Hash (Bytes) See Checksums
28 155 128 Name - String (utf-16-le) Name is limited to 64 (2-byte) characters.
156 171 16 Checksum? - MD5 Hash (Bytes) See Checksums
172 175 4 ToC Size - UInt32
176 179 4 Data Buffer Offset - UInt32 Absolute position in file.
180 183 4 ToC Offset - UInt32 Absolute position in file.
184 187 4 Reserved '1' '1' UInt32 Unknown meaning; always 1
188 191 4 Reserved '0'? - UInt32 Unknown meaning; always 0? Untested.
192 195 4 ??? - UInt32 Typically '0x4d41dXXX'; running ID? May be used to invalidate the MD5 hash when changed? Although that's a stretch, since Data would probably be changed as well, and I expect that would certainly change the MD5 hash.

Version 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 7 8 Magic Word '_ARCHIVE' String (ascii)
8 9 2 Version (Major) '9' UInt16 See Version
10 11 2 Version (Minor) '0' UInt16 See Version
12 131 128 Name - String (utf-16-le) Name is limited to 64 (2-byte) characters.
132 139 8 ToC Offset - UInt64 Absolute offset in file.
140 143 4 ToC Size - UInt32
144 171 8 Data Buffer Offset - UInt64 Absolute offset in file.
172 175 4 Data Size - UInt32
176 179 4 Reserved '0'? - UInt32 Unknown meaning; always 0? Untested.
180 183 4 Reserved '1' - UInt32 Unknown, always 1? Untested.
184 427 256 ??? - Bytes After the 'Reserved 1' but before the location stored in 'Data Position'; there are 256 bytes; I have no idea what this could be; hashes, crc, couldn't tell you.

Table Of Contents Layout

Version 2.0 & 5.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Virtual Drive Offset - UInt32 Relative to 'ToC Offset'
4 5 2 Virtual Drive Count - UInt16
6 9 4 Folder Offset - UInt32 Relative to 'ToC Offset'
10 11 2 Folder Count - UInt16
12 15 4 File Offset - UInt32 Relative to 'ToC Offset'
16 17 2 File Count - UInt16
18 21 4 Name Buffer Offset - UInt32 Relative to 'ToC Offset'
22 23 2 Name Buffer Count - UInt16

Version 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Virtual Drive Offset - UInt32 Relative to 'ToC Offset'
4 7 4 Virtual Drive Count - UInt32
8 11 4 Folder Offset - UInt32 Relative to 'ToC Offset'
12 15 4 Folder Count - UInt32
16 19 4 File Offset - UInt32 Relative to 'ToC Offset'
20 23 4 File Count - UInt32
24 27 4 Name Buffer Offset - UInt32 Relative to 'ToC Offset'
28 31 4 Name Size In Bytes - UInt32

Table Of Contents Data Layout

ToC Data contains the Headers for Virtual Drives, Folders, Files, and the Name Buffer.

Virtual Drive Layout

This could be described as a 'Root Folder', an 'Access Point', or as I've settled on; a 'Virtual Drive' (due to their syntax in the path algorithm; 'V-Drive-path:path-to-file'

Version 2.0 & 5.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 63 64 Path - String (ascii) Null Padded ('\0')
This will typically be 'data'.
64 127 64 Name - String (ascii) Null Padded ('\0')
This is typically the name of the archive
128 129 2 First Folder - UInt16
130 131 2 Last Folder - UInt16
132 133 2 First File - UInt16
134 135 2 Last File - UInt16
136 137 2 ??? ??? Bytes The only time this wasn't '0', it matched the 'First Folder' value?

Version 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 63 64 Path - String (ascii) Null Padded ('\0')
This will typically be 'data'.
64 127 64 Name - String (ascii) Null Padded ('\0')
This is typically the name of the archive
128 131 4 First Folder - UInt32
132 135 4 Last Folder - UInt32
136 139 4 First File - UInt32
140 143 4 Last File - UInt32
144 147 4 ??? ??? Bytes The only time this wasn't '0', it matched the 'First Folder' value?

Folder Header Layout

Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES folders to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.

Version 2.0 & 5.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Name Offset - UInt32 Relative to 'Name Buffer Offset'
4 5 2 Sub folder Start Index - UInt16 Inclusive Start
6 7 2 Sub folder Stop Index - UInt16 Exclusive Stop
8 9 2 File Start Index - UInt16 Inclusive Start
10 11 2 File Stop Index - UInt16 Exclusive Stop

Version 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Name Offset - UInt32 Relative to 'Name Buffer Offset'
4 7 4 Sub folder Start Index - UInt32 Inclusive Start
8 11 4 Sub folder Stop Index - UInt32 Exclusive Stop
12 15 4 File Start Index - UInt32 Inclusive Start
16 19 4 File Stop Index - UInt32 Exclusive Stop

File Header Layout

Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES files to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.

Version 2.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Name Offset - UInt32 Relative to 'Name Buffer Offset'
4 7 4 Compression Flag? '0', '16', or '32' UInt32 This seems to be the window size in KibiBytes for the ZLib compression. While '6'/'7' (See ZLib Specification 'CINFO') aren't the only ZLib window sizes available, I believe these are the only two supported by the engine.
8 11 4 Data Offset - UInt32 Relative to 'Data Buffer Offset'
12 15 4 Decompressed Size - UInt32
16 19 4 Compressed Size - UInt32

Version 5.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Name Offset - UInt32 Relative to 'Name Buffer Offset'
4 7 4 Data Offset - UInt32 Relative to 'Data Buffer Offset'
8 11 4 Compressed Size - UInt32
12 15 4 Decompressed Size - UInt32
16 19 4 ??? - UInt32
20 21 2 ??? - UInt16

Version 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
0 3 4 Name Offset - UInt32 Relative to 'Name Buffer Offset'
4 7 4 ??? - UInt32
8 11 4 Data Offset - UInt32 Relative to 'Data Buffer Offset'
12 15 4 ??? - UInt32
16 19 4 Compressed Size - UInt32
20 23 4 Decompressed Size - UInt32
24 27 4 ??? - UInt32
28 29 2 ??? - UInt16
30 33 4 ??? - UInt32

Name Layout

Because names are variable-length and do not provide a length, names must either be read all at once to build a name lookup, or read using a valid 'Name Offset'.

Version 2.0, 5.0 & 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
- - - File/Folder Name - String (ascii) A Null Terminated String ('\0').

Data Buffer Layout

More accurately, the File Data Buffer. Can only be read with an appropriate File 'Data Offset' and File 'Compressed Size'

Version 2.0, 5.0 & 9.0

Start Stop Size (bytes) Name Expected Value Type Notes
- - - Data - bytes

Footer

Version

Version follows the format X.Y where X is the major version and Y is the minor version.
Unfortunately, I'm only supporting these formats, since I don't have access to the .SGA archives from other Relic Games.

Version Game(s)
2.0 Dawn Of War I
5.0 Dawn Of War II
9.0 Dawn Of War III

Checksums

According to Corsix (See Sources), these are MD5 hashes. Unfortunately for me, this isn't an easy thing to check, since I need to know where the checksum begins, if an Initialization Vector was used (and what it was), and the number of bytes used to calculate the checksum.

Alternative Sources

Xentax

CORSIX SGA v4->v5 Converter

This documentation is wrong?

In a perfect world, this documentation would be autogenerated, or distributed by relic itself.
If this documentation contradicts the appropriate python files, trust the python over this documentation.
If it's egregious enough, open an issue or create a PR with the requested changes.

Clone this wiki locally