-
Notifications
You must be signed in to change notification settings - Fork 119
feat(crashdump): Add crash dump functionality for fatal errors #1594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
OmniBlade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tested dump generation with a 2022 build.
|
Another approach I realized after publishing could be to move the GameMemory allocations from using GlobalAlloc to instead create a separate GameMemory heap and use HeapAlloc. |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just a bunch of small comments.
|
Pushed a commit attempting to address most of the comments in the previous review:
In addition:
Remaining: |
|
I think this should be enabled by default for now and included in our weekly pre-releases and legi.cc/patch distributions. The patch still crashes now and then without being able to find a cause (replays are send to us, but there is no crash to be found). For example, Legi (and only Legi) crashed yesterday on stream while being on the patch. No cause was found. Players on the patch (with crashdump on) can send us the crashdumps and thus help finding where the issues/instabilities are. |
7bb8e8c to
90a0b49
Compare
Agreed, that's the main use case I was hoping this could address. Once this is merged, the remaining hurdle would be to get the weekly builds to enable the |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cleanup in MiniDumper class needs another look. Make it simple and robust.
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. A few minor comments that could be looked into.
| MiniDumpFilterWriteCombinedMemory = 0x01000000, | ||
| MiniDumpValidTypeFlags = 0x01ffffff, | ||
| MiniDumpNoIgnoreInaccessibleMemory = 0x02000000, | ||
| MiniDumpValidTypeFlagsEx = 0x03ffffff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value correct?
From https://learn.microsoft.com/en-us/windows/win32/api/minidumpapiset/ne-minidumpapiset-minidump_type I'd expect this to be 0x02000001.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does seem to be some discrepancy between the doc and the SDK headers. The SDK headers defines it explicitly as 0x03ffffff, you can see this in the original file at C:\Program Files (x86)\Windows Kits\10\Include\10.0.26100.0\um\minidumpapiset.h
My money would be on the docs being wrong, and as we don't use this value directly it should be fine for our purpose. Does that seem reasonable?
And just out of curiosity, how did you notice this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include\10.0.26100.0\
I only have two older versions and MiniDumpValidTypeFlags is the last value for both versions.
Does that seem reasonable?
Yeah, seems reasonable to me.
And just out of curiosity, how did you notice this?
I was wondering why we're defining all these structs / enums, so I happened to check a couple or maybe just this one to see where they were coming from.
9cbf93d to
a241b80
Compare
|
Rebased on main.
|
| case ThreadCallback: | ||
| retVal = TRUE; | ||
| break; | ||
| case ThreadExCallback: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can merge ThreadCallback, ThreadCallback, ThreadExCallback cases.
| switch (input.CallbackType) | ||
| { | ||
| case IncludeModuleCallback: | ||
| retVal = TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retVal already starts as TRUE so this is not necessary.
| // This is where the memory regions and things are being filtered | ||
| BOOL MiniDumper::CallbackInternal(const MINIDUMP_CALLBACK_INPUT& input, MINIDUMP_CALLBACK_OUTPUT& output) | ||
| { | ||
| BOOL retVal = TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the retVal signal to the caller? Perhaps give it a name that describes it, because it is not obvious what return TRUE and FALSE will do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit of a mess - it's returned in the minidump callback and its meaning varies between callback types.
The documentation for the callback function says If the function succeeds, return TRUE; otherwise, return FALSE., but the individual callback types sometimes uses it to indicate if something should be included or not, if an enumeration should continue, etc. See doc for reference.
I tried coming up with a fitting name earlier, but didn't found something that seemed right. Suggestions welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that sucks. Maybe ask Chat for name idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the docs, it seems like we can always return TRUE and still convey the same exclusions and conditions via the output parameter.
Ended up renaming it to success, which is always TRUE.
Wasn't sure if declaring it as const BOOL success = TRUE; was too bold ;)
| DynamicMemoryAllocator* allocator = TheDynamicMemoryAllocator; | ||
|
|
||
| //m_dumpObjectsSubState is used to track the index of the allocator we are currently traversing | ||
| for (int i = 0; i < m_dumpObjectsSubState; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this loop expensive? Does m_dumpObjectsSubState become large? If yes, could cache allocator maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached it in a new class member and repurposed the m_dumpObjectsSubState variable to track the index of the raw block in the current DMA.
|
Refactored a bit, introducing an enum for the memory range phases and modified the output variable of the callback routine so that our callback can always return TRUE. |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like we can get rid of the concept of m_dumpObjectsSubState by caching the actual pointers to memory structures and keep iterating until they reach NULL and then go to the next until all lists reached the end.
Performance wise it is probably not the biggest of deals but it should make the logic cleaner.
| { | ||
| // DumpMemoryObjects will set outputMemorySize to 0 once it's completed, signalling the end of memory callbacks | ||
| DumpMemoryObjects(output.MemoryBase, output.MemorySize); | ||
| } while ((output.MemoryBase == 0 || output.MemorySize == 0) && m_dumpObjectsState != COMPLETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these parts of the condition required and robust?
So for example it is legal to allocate a 0 size array and get a pointer for it, in which case one could technically track a pointer with zero size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's valid programming wise, but not as a callback-value. Returning a region with size 0 means "done with memory regions", so no more MemoryCallbacks will be sent after that.
That's why, when transitioning between the phases in DumpMemoryObjects we need to call it again if output.MemorySize was 0 so we don't stop getting MemoryCallbacks. The check for COMPLETED is so we know when we should actually return the 0 sized region.
Checking MemoryBase is not needed, I'll remove that.
| //m_dumpObjectsSubState contains the index in TheMemoryPoolFactory of the MemoryPool that is being processed | ||
| if (m_dumpObjectsSubState < poolCount) | ||
| { | ||
| MemoryPool* pool = TheMemoryPoolFactory->getMemoryPoolN(m_dumpObjectsSubState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getMemoryPoolN also has a loop inside.
Can we perhaps cache the MemoryPool* pointer and just keep going to next one until it is NULL, without using getMemoryPoolCount, getMemoryPoolN?
| if (m_dumpObjectsSubState == 0) | ||
| { | ||
| m_rangeIter = TheMemoryPoolFactory->cbegin(); | ||
| ++m_dumpObjectsSubState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps can get rid of m_dumpObjectsSubState by simply initializing m_rangeIter = TheMemoryPoolFactory->cbegin() before this callback?
| if (m_dumpObjectsSubState < rawBlocksInDma) | ||
| { | ||
| // Dump this block | ||
| m_currentAllocator->fillAllocationRangeForRawBlockN(m_dumpObjectsSubState, rawBlockRange); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fillAllocationRangeForRawBlockN also has a loop inside.
Can we perhaps cache the MemoryPoolSingleBlock* pointer and just keep going to next one until it is NULL, without using getRawBlockCount, fillAllocationRangeForRawBlockN?
|
Got rid of the |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner indeed.
|
|
||
| enum DumpObjectsState CPP_11(: Int) | ||
| { | ||
| BEGIN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest use DumpObjectsState_Begin enum name style to avoid potential conflicts with MACRO_CASE.
| // DumpMemoryObjects will set outputMemorySize to 0 once it's completed, signalling the end of memory callbacks | ||
| DumpMemoryObjects(output.MemoryBase, output.MemorySize); | ||
| } while ((output.MemoryBase == 0 || output.MemorySize == 0) && m_dumpObjectsState != COMPLETED); | ||
| } while (output.MemorySize == 0 && m_dumpObjectsState != COMPLETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be != 0 then?
| #ifndef DISABLE_GAMEMEMORY | ||
| do | ||
| { | ||
| // DumpMemoryObjects will set outputMemorySize to 0 once it's completed, signalling the end of memory callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"memory size"
| if (TheMemoryPoolFactory) | ||
| { | ||
| m_dumpObjectsSubState = poolCount; | ||
| m_rangeIter = TheMemoryPoolFactory->cbegin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also move that to BEGIN state?
| { | ||
| m_dumpObjectsState = DMA_ALLOCATIONS; | ||
| m_currentAllocator = TheDynamicMemoryAllocator; | ||
| m_currentSingleBlock = m_currentAllocator->getFirstRawBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also move this to BEGIN state?
| { | ||
| m_dumpObjectsState = DMA_ALLOCATIONS; | ||
| m_currentAllocator = TheDynamicMemoryAllocator; | ||
| m_currentSingleBlock = m_currentAllocator->getFirstRawBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test m_currentAllocator or TheDynamicMemoryAllocator before dereference incase TheDynamicMemoryAllocator was null?
| DUMP_TYPE_FULL = 'F', | ||
| }; | ||
|
|
||
| enum MiniDumperExitCode CPP_11(: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this enum be private in MiniDumper class?
| DUMPER_EXIT_FORCED_TERMINATE = 0x158B1154, | ||
| }; | ||
|
|
||
| enum DumpObjectsState CPP_11(: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this enum be private in MiniDumper class?
|
|
||
| if (m_currentSingleBlock == NULL) | ||
| { | ||
| m_currentAllocator = m_currentAllocator->getNextDmaInList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
if (m_currentAllocator)
m_currentSingleBlock = m_currentAllocator->getFirstRawBlock();
| if (m_currentSingleBlock == NULL) | ||
| { | ||
| // Iterated to a new allocator, start iterating over its blocks | ||
| m_currentSingleBlock = m_currentAllocator->getFirstRawBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
This change introduces the option to generate crash dumps, aka. mini dumps, on fatal errors.
The main minidump functionality is done by explicitly loading the
dbghelp.dllfrom the system directory, as the dbghelp.dll that is bundled with the game is an older version that does not include this functionality. There is an option to create small dumps or extended info dumps, currently both are created.Small dumps
These mostly contain stacks for the process threads and some stack variables, or to create dumps with extended info. The use case for these is to quickly determine where a crash occured, the type of crash, if it was already fixed etc. In addition, if the memory allocation structures are corrupted enough, an extended info dump might not succeed while the small dump should. The size of these dumps are typically on the order of 250kB.
Extended info
These contain global values, along with the memory regions allocated via the memory pool factory and the dynamic memory allocator. This makes all in-game objects available to the person debugging the crash dump, so for example
dt generalszh!TheWritableGlobalDatain WinDbg will show the state at the time the dump was created.An alternative option could be to not traverse the memory structures "manually" to get to the allocations and instead just specify the
MiniDumpWithFullMemoryflag toMiniDumpWriteDump, but that increases the file size considerably.As an example, dump of the generalszh process in the main menu with the shell map in the background yields a ~140MB dump when traversing and ~420MB with
MiniDumpWithFullMemory. Beyond that, the ~140MB file compresses to ~20MB with 7Z, so should be relatively easily transferable.Storage Location
Crash dumps are stored in a new folder called 'CrashDumps' under the userDir ("Documents\Command and Conquer Generals Zero Hour Data"), and on startup it will create this directory if it doesn't exist and delete any older dumps so only the 10 newest small and 2 newest extended info dumps are left. This is to preserve disk space, as the extended info files can be several hundred MB.
Integration points
For VS2022 builds, unhandled exceptions end up in the
UnhandledExceptionFilterin WinMain, which then get a reference to the actual exception that occurred and includes that in the dump.For VC6 builds, unhandled exceptions are caught in the
catch(...)blocks ofGameEngine::executewhich then calls RELEASE_CRASH. As there is no exception data available in this case to populate _EXCEPTION_POINTERS from, an intentional exception is triggered to get the trace of the current thread. This makes the stack traces for VC6 a bit more cryptic than VS2022 builds as the C++ exception handling gets included in the trace.Limitations
In the longer run we'll probably want to replace this code with a more mature solution, like CrashPad, but that currently depends on a newer compiler than VC6.
As the code is intended to be temporary, it's kept behind a new CMake feature so it can be easily removed. There are also some other decisions made with this in mind: