SGA Archive

The container for most-all Dawn Of War assets.

Sections	Layout
Header	See Layout
Table Of Contents (ToC)	See Layout
ToC Data Block	See Layout
Data Block	See Layout

Header Layout

Magic Word & Version

The first 12 bytes are always laid out like this, the rest of the header is laid out based on the Version given.

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	7	8	Magic Word	'_ARCHIVE'	String (ascii)
8	9	2	Version (Major)	-	UInt16	See Version
10	11	2	Version (Minor)	-	UInt16	See Version

Version 2.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	7	8	Magic Word	'_ARCHIVE'	String (ascii)
8	9	2	Version (Major)	'2'	UInt16	See Version
10	11	2	Version (Minor)	'0'	UInt16	See Version
12	27	16	Checksum?	-	MD5 Hash (Bytes)	See Checksums
28	155	128	Name	-	String (utf-16-le)	Name is limited to 64 (2-byte) characters.
156	171	16	Checksum?	-	MD5 Hash (Bytes)	See Checksums
172	175	4	ToC Size	-*	UInt32	* Data Position - 180
176	179	4	Data Buffer Offset	-	UInt32	Absolute offset in file

180	-	-	ToC Offset	-	-	Not actually part of the header. The ToC always starts at 180 in v2.0

Version 5.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	7	8	Magic Word	'_ARCHIVE'	String (ascii)
8	9	2	Version (Major)	'5'	UInt16	See Version
10	11	2	Version (Minor)	'0'	UInt16	See Version
12	27	16	Checksum?	-	MD5 Hash (Bytes)	See Checksums
28	155	128	Name	-	String (utf-16-le)	Name is limited to 64 (2-byte) characters.
156	171	16	Checksum?	-	MD5 Hash (Bytes)	See Checksums
172	175	4	ToC Size	-	UInt32
176	179	4	Data Buffer Offset	-	UInt32	Absolute position in file.
180	183	4	ToC Offset	-	UInt32	Absolute position in file.
184	187	4	Reserved '1'	'1'	UInt32	Unknown meaning; always 1
188	191	4	Reserved '0'?	-	UInt32	Unknown meaning; always 0? Untested.
192	195	4	???	-	UInt32	Typically '0x4d41dXXX'; running ID? May be used to invalidate the MD5 hash when changed? Although that's a stretch, since Data would probably be changed as well, and I expect that would certainly change the MD5 hash.

Version 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	7	8	Magic Word	'_ARCHIVE'	String (ascii)
8	9	2	Version (Major)	'9'	UInt16	See Version
10	11	2	Version (Minor)	'0'	UInt16	See Version
12	131	128	Name	-	String (utf-16-le)	Name is limited to 64 (2-byte) characters.
132	139	8	ToC Offset	-	UInt64	Absolute offset in file.
140	143	4	ToC Size	-	UInt32
144	171	8	Data Buffer Offset	-	UInt64	Absolute offset in file.
172	175	4	Data Size	-	UInt32
176	179	4	Reserved '0'?	-	UInt32	Unknown meaning; always 0? Untested.
180	183	4	Reserved '1'	-	UInt32	Unknown, always 1? Untested.
184	427	256	???	-	Bytes	After the 'Reserved 1' but before the location stored in 'Data Position'; there are 256 bytes; I have no idea what this could be; hashes, crc, couldn't tell you.

Table Of Contents Layout

Version 2.0 & 5.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Virtual Drive Offset	-	UInt32	Relative to 'ToC Offset'
4	5	2	Virtual Drive Count	-	UInt16
6	9	4	Folder Offset	-	UInt32	Relative to 'ToC Offset'
10	11	2	Folder Count	-	UInt16
12	15	4	File Offset	-	UInt32	Relative to 'ToC Offset'
16	17	2	File Count	-	UInt16
18	21	4	Name Buffer Offset	-	UInt32	Relative to 'ToC Offset'
22	23	2	Name Buffer Count	-	UInt16

Version 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Virtual Drive Offset	-	UInt32	Relative to 'ToC Offset'
4	7	4	Virtual Drive Count	-	UInt32
8	11	4	Folder Offset	-	UInt32	Relative to 'ToC Offset'
12	15	4	Folder Count	-	UInt32
16	19	4	File Offset	-	UInt32	Relative to 'ToC Offset'
20	23	4	File Count	-	UInt32
24	27	4	Name Buffer Offset	-	UInt32	Relative to 'ToC Offset'
28	31	4	Name Size In Bytes	-	UInt32

Table Of Contents Data Layout

ToC Data contains the Headers for Virtual Drives, Folders, Files, and the Name Buffer.

Virtual Drive Layout

This could be described as a 'Root Folder', an 'Access Point', or as I've settled on; a 'Virtual Drive' (due to their syntax in the path algorithm; 'V-Drive-path:path-to-file'

Version 2.0 & 5.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	63	64	Path	-	String (ascii)	Null Padded ('\0') This will typically be 'data'.
64	127	64	Name	-	String (ascii)	Null Padded ('\0') This is typically the name of the archive
128	129	2	First Folder	-	UInt16
130	131	2	Last Folder	-	UInt16
132	133	2	First File	-	UInt16
134	135	2	Last File	-	UInt16
136	137	2	???	???	Bytes	The only time this wasn't '0', it matched the 'First Folder' value?

Version 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	63	64	Path	-	String (ascii)	Null Padded ('\0') This will typically be 'data'.
64	127	64	Name	-	String (ascii)	Null Padded ('\0') This is typically the name of the archive
128	131	4	First Folder	-	UInt32
132	135	4	Last Folder	-	UInt32
136	139	4	First File	-	UInt32
140	143	4	Last File	-	UInt32
144	147	4	???	???	Bytes	The only time this wasn't '0', it matched the 'First Folder' value?

Folder Header Layout

Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES folders to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.

Version 2.0 & 5.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Name Offset	-	UInt32	Relative to 'Name Buffer Offset'
4	5	2	Sub folder Start Index	-	UInt16	Inclusive Start
6	7	2	Sub folder Stop Index	-	UInt16	Exclusive Stop
8	9	2	File Start Index	-	UInt16	Inclusive Start
10	11	2	File Stop Index	-	UInt16	Exclusive Stop

Version 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Name Offset	-	UInt32	Relative to 'Name Buffer Offset'
4	7	4	Sub folder Start Index	-	UInt32	Inclusive Start
8	11	4	Sub folder Stop Index	-	UInt32	Exclusive Stop
12	15	4	File Start Index	-	UInt32	Inclusive Start
16	19	4	File Stop Index	-	UInt32	Exclusive Stop

File Header Layout

Because the archive is flattened (all folders and files are in a single array), folders use a range to define their files and sub folders.
This REQUIRES files to be read IN ORDER and remain UNSORTED to avoid invalidating the ranges.

Version 2.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Name Offset	-	UInt32	Relative to 'Name Buffer Offset'
4	7	4	Compression Flag?	'0', '16', or '32'	UInt32	This seems to be the window size in KibiBytes for the ZLib compression. While '6'/'7' (See ZLib Specification 'CINFO') aren't the only ZLib window sizes available, I believe these are the only two supported by the engine.
8	11	4	Data Offset	-	UInt32	Relative to 'Data Buffer Offset'
12	15	4	Decompressed Size	-	UInt32
16	19	4	Compressed Size	-	UInt32

Version 5.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Name Offset	-	UInt32	Relative to 'Name Buffer Offset'
4	7	4	Data Offset	-	UInt32	Relative to 'Data Buffer Offset'
8	11	4	Compressed Size	-	UInt32
12	15	4	Decompressed Size	-	UInt32
16	19	4	???	-	UInt32
20	21	2	???	-	UInt16

Version 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
0	3	4	Name Offset	-	UInt32	Relative to 'Name Buffer Offset'
4	7	4	???	-	UInt32
8	11	4	Data Offset	-	UInt32	Relative to 'Data Buffer Offset'
12	15	4	???	-	UInt32
16	19	4	Compressed Size	-	UInt32
20	23	4	Decompressed Size	-	UInt32
24	27	4	???	-	UInt32
28	29	2	???	-	UInt16
30	33	4	???	-	UInt32

Name Layout

Because names are variable-length and do not provide a length, names must either be read all at once to build a name lookup, or read using a valid 'Name Offset'.

Version 2.0, 5.0 & 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
-	-	-	File/Folder Name	-	String (ascii)	A Null Terminated String ('\0').

Data Buffer Layout

More accurately, the File Data Buffer. Can only be read with an appropriate File 'Data Offset' and File 'Compressed Size'

Version 2.0, 5.0 & 9.0

Start	Stop	Size (bytes)	Name	Expected Value	Type	Notes
-	-	-	Data	-	bytes

Footer

Version

Version follows the format X.Y where X is the major version and Y is the minor version.
Unfortunately, I'm only supporting these formats, since I don't have access to the .SGA archives from other Relic Games.

Version	Game(s)
2.0	Dawn Of War I
5.0	Dawn Of War II
9.0	Dawn Of War III

Checksums

According to Corsix (See Sources), these are MD5 hashes. Unfortunately for me, this isn't an easy thing to check, since I need to know where the checksum begins, if an Initialization Vector was used (and what it was), and the number of bytes used to calculate the checksum.

Alternative Sources

Xentax

CORSIX SGA v4->v5 Converter

This documentation is wrong?

In a perfect world, this documentation would be autogenerated, or distributed by relic itself.
If this documentation contradicts the appropriate python files, trust the python over this documentation.
If it's egregious enough, open an issue or create a PR with the requested changes.

SGA Archive

Header Layout

Magic Word & Version

Version 2.0

Version 5.0

Version 9.0

Table Of Contents Layout

Version 2.0 & 5.0

Version 9.0

Table Of Contents Data Layout

Virtual Drive Layout

Version 2.0 & 5.0

Version 9.0

Folder Header Layout

Version 2.0 & 5.0

Version 9.0

File Header Layout

Version 2.0

Version 5.0

Version 9.0

Name Layout

Version 2.0, 5.0 & 9.0

Data Buffer Layout

Version 2.0, 5.0 & 9.0

Footer

Version

Checksums

Alternative Sources

This documentation is wrong?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally