Skip to content

Conversation

@cician
Copy link
Contributor

@cician cician commented Oct 12, 2021

This PR adds a tool that is intended to help solve issues in code with manual memory management.

I've added a new module called runtimeDebug, but I'm open to suggestions if another name or placing it in an existing one is more appropriate. For example there's a debug module, but I don't know what its purpose is.

How it works

The changes to the gc module has added support to setup hooks called on each incRef, decRef, malloc and free. Otherwise the changes here are limited to expose these hooks and remove them.

The runtimeDebug exposes two functions:

  1. ensureNoLeaks, intended for leak checks in unit testing. It only prints anything if leaks are actually found.
  2. leakDebugBlock, intended to be used when debugging gets tricky and more details are needed. It can print every incRef, decRef, malloc and free with addresses, tags and even object contents as strings (when safe-ish to do so). This part is inspired by gc.setDebug functionally that is currently commented out.

When the hooks are connected, internally there's a bit of a delicate dance on hot ash, since runtimeDebug is using managed memory and even GC, while GC itself calls into incRef/decRef/malloc/free. To accomplish this, the hooks are inhibited temporarily.

A few details

I've used raw Wasmi32 pointers for function hooks. Maybe I should use something else? box? Or actual function type?

While the hook functions use WasmI32 pointers for tracked objects, I convert them internally to Number for use in GC land. May have used Int32 instead, but I chose Number hoping to reduce allocations at least for programs with small enough heap.

Limitations and drawbacks

Nesting ensureNoLeaks/leakDebugBlock is not supported, but not prevented.

Per function/code block based testing may not work for debugging some cases covered by the old gc.setDebug functionality. In particular when runtime modules themselves are starting up.

ensureNoLeaks is obviously useful for detecting memory leaks, but may not be all that helpful for other issues like use after free, overwritten object headers or infinite loops in malloc.free (all things experience by yours truly). It may help to run leakDebugBlock with printIncrefsAndDecRefs=true, printMallocAndFree=true and printObjectContents=false.

It adds a slight overhead to memory management for all programs, even if the hooks aren't hooked up. I haven't done any comparisons yet. Hopefully not much, just checking if the hook function is set in a variable. Some sort of conditional compilation would be nice in this case.

Finally, it's not able to leak check the leak checker, so I don't guarantee it doesn't leak memory.

@cician
Copy link
Contributor Author

cician commented Oct 12, 2021

Forgot to mention. I made the CI fail intentionally for now, just to see how it looks on GitHub when a test fails for memory leaks detected. 🤭

@marcusroberts
Copy link
Member

Hi @cician

Just wanted to say thanks for a great contribution and I look forward to looking through it!

- The LeakDebugSummaryMode option wasn't respected so tests were noisy and confusing.
- Added filters to exclude invocation of hooks and of the function to be tested.
- Added detection of deallocations of pre-existing objects.
- Added detailed reporting of ref. counting issues on pre-existing objects.
- Fixed decRef issues in printing object contents and added special case for printing strings.
- Commented out the failing test for Buffer (the leak in question has not been fixed yet).
Copy link
Member

@peblair peblair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally had some time to give this a look. @cician this is incredible stuff! I've left some first-pass comments, and I'm curious to see what the other core team members have to say.

// buffer is not being flushed? If the debugging hooks are active then
// behave ad if ignoreZeros was true to continue executing and give the
// debugging hooks an opportunity to report the issue.
if (WasmI32.ne(_DEBUG_HOOK_DECREF, 0n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. I wonder if we could get around this by adding a call to the flush syscall in the if (WasmI32.ne(_DEBUG_HOOK_DECREF, 0n)) expression? 🤔


let tagToTypeName = (tag: Number) => {
match (tag) {
1 => "string",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use the constants from runtime/unsafe/tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use them to avoid @disableGC. Should I convert tagToTypeName to no gc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that! In that case, I think a comment is fine for now (maybe also add a comment to tags.gr so we can remember to keep this in sync?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this function to use @disableGC. It's not pretty, but probably still better than not using the constants.

Comment on lines 102 to 106
// totals can create confusion. The simplest solution would be to get the tag
// of mallocated objects, but they don't yet have a tag associated. So I
// have to compensate again by adding up the objects that have been
// mallocated but never freed. Which simply means the ones in "allocations"
// Map.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part loses me a little. Can you maybe elaborate a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this comment is pretty confusing. I'll rewrite it.

What I mean is that to correctly present the summary of incRefs/decRefs by tag, it's not enough to just track incRefs/decRefs themselves because incRef doesn't get called for newly allocated objects. They start with ref. count of 1 set directly in GC.malloc. But it also isn't as simple as adding some code in the hook for GC.malloc because freshly allocated objects don't have a tag yet. The tag is normally set after (like in dataStructures.allocateString and similar functions). So what I actually do is track the lifetime of single objects in the allocations map. Only when the object is freed I add to the mallocPlusFreeByTag map to present along with incRefs/decRefs by tag. If an object is not freed then it's left in the allocations map. So I group these leaked objects by tag as well and present the totals in the summary as "malloc without free".

let mallocPlusFreeByTag = Map.make(): (Map.Map<Number, Number>)

@disableGC
let rec readTag = (userPtrAsNumber: Number) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine why this is the case, but it may be worth a comment explaining why these functions use Number instead of WasmI32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment explaining the choice for Number.

if (Set.contains(userPtr, allocations)) {
// A new allocation, but for an object already allocated inside the debug block, but never removed.
print(
"⚠ PROBLEM DETECTED: unexpected malloc at address 0x" ++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of wish we had colored output here hehe
(not really requesting this, but just a thought)

}

if (Map.contains(userPtr, incRefsOnPreExistingObjects) ||
Map.contains(userPtr, decRefsOnPreExistingObjects)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the formatter output? If so, cc @marcusroberts (I think that Map should be indented here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

Copy link
Member

@ospencer ospencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! And awesome that we've already seen this in action 🙂

A couple of thoughts from me but no big blockers :)

One thing to consider— @phated when we split the runtime from the stdlib, will this be annoying to refactor? I think this makes sense to live in the runtime, but since it uses stdlibs it might be hard to break out, and I'm not sure where it really should live then. I feel like we can worry about it when we actually make that change, but figured I'd grab your thoughts.

let mut _DEBUG_HOOK_DECREF = 0n
let mut _DEBUG_HOOK_MALLOC = 0n
let mut _DEBUG_HOOK_FREE = 0n
let mut _DEBUG_HOOK_RUNNING = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me that the _DEBUG value above doesn't have anything to do with these values. I wonder if there's any renaming we can do to make that more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just move it to the commented code below? Currently it's never used, its for the debugging functions that are all commented out. I agree though that leak checker code can get mixed up with these.

@@ -0,0 +1,712 @@
import WasmI32 from "runtime/unsafe/wasmi32"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filename is a little confusing vs the current runtime/unsafe/debug file. We can't combine them because runtime/debug needs to run in runtime mode—maybe we rename this file to memoryDebug.gr or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoryDebug.gr sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I move it outside of the runtime folder as well?

@cician
Copy link
Contributor Author

cician commented Oct 30, 2021

I checked if the leak checker leaks memory, and it does.

At least in part it's due to #1003 and #773 (these functions are recursive in order to decRef themselves). I can work around #773 by moving some utility functions to the top level, but its difficult to diagnose further until #1003 is fixed.

@cician
Copy link
Contributor Author

cician commented Dec 4, 2021

Probably not a big deal, but the leak checker currently reports leaks when converting numbers to strings due to buffers allocated and intentionally never released in the numberUtils module (_POWERS10, _DIGITS, _HEX_DIGITS etc).

It can be a bit confusing when you get a resulting like this:

⚠ 1 malloc/free leaks detected:
- 0x3e9c80 (ref. count: 1; tag 825241648 unknown): <unknown heap tag type: 0x31303030 | value: 0x3e9c80>

One solution would be to simply call a few functions preemptively to force initialize these buffers, but maybe we want to give them a proper reserved heap tag? Something like "cache" or "lookup table". Otherwise, if it used malloc directly instead of Memory.malloc it wouldn't get reported, but I don't know if we want that.

@cician
Copy link
Contributor Author

cician commented Dec 4, 2021

I checked if the leak checker leaks memory, and it does.

At least in part it's due to #1003 and #773 (these functions are recursive in order to decRef themselves). I can work around #773 by moving some utility functions to the top level, but its difficult to diagnose further until #1003 is fixed.

With #1003 fixed and a fix in the checker itself it's now mostly leak free. There are still leaks elsewhere in the stdlib that make it leak indirectly when working on Sets, Maps and Arrays to print the summary.

If you're wondering how I leak check the leak checker, I'm just running it in an infinite loop and watching if the memory continues to rise indefinitely in the task manager.

@ospencer
Copy link
Member

ospencer commented Aug 5, 2022

We'll want to land #1410 before we land this to minimize the effect on compiled modules.

@phated phated marked this pull request as draft August 5, 2022 23:47
@phated phated added the blocked label Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants