-
Notifications
You must be signed in to change notification settings - Fork 15
Avoiding unnecessary memory allocations in visitors #135
Description
Problem
There is no way for a visitor to know if a pointer value it has received is:
- Allocated.
- Safe to use as part of the return value.
Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.
Proposal
To fix this, the following things need to be added to Getty:
-
A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.
- There are two ways for a visitor to receive a pointer value from a deserializer: the
valueparameter invisitStringand the return value of access methods (e.g.,nextKeySeed,nextElementSeed).
- There are two ways for a visitor to receive a pointer value from a deserializer: the
-
A way for deserializers to know if
visitStringis using the slice as part of the final value, and how much of that slice is being used.
Part One: The Visitor
How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?
To solve this, we can do the following:
-
Define the following type:
⚠️ Edit: See this comment for newLifetimedesign.⚠️ pub const Lifetime = enum { Stack, Heap, Owned, }
-
The type will indicate the lifetime and ownership properties of pointer values passed to visitors:
Stack: The value lives on the stack and its lifetime is shorter than the deserialization process.- The value must be copied by the visitor.
Heap: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.- The value can either be copied or returned directly.
Owned: The value lives on the stack or heap and its lifetime is managed by some entity.- The value can either be copied by the visitor or returned directly if the visitor understands and deems the value's lifetime as safe.
- Since Getty's default visitors won't have enough info to determine whether an
Ownedvalue's lifetime is safe, they must always copy such values.
-
When should visitors free the pointer values they receive?
StackorOwnedvalues should never be freed by the visitor.Stackvalues will be automatically cleaned up by the compiler, obviously.Ownedvalues will be cleaned up eventually after deserialization is finished by the entity that owns them.
Heapvalues passed tovisitStringshould never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.Heapvalues returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.
-
-
Add a
lifetimeparameter tovisitStringthat specifies theLifetimeofinput. -
Remove the
is*Allocatedmethods from access interfaces. WithLifetime, we don't need them anymore. -
Modify the successful return type of access methods to be:
struct { data: @TypeOf(seed).Value, // This may be optional, depending on the access method. lifetime: Lifetime, }
With these changes, visitors can do the following:
// in visitString...
switch (lifetime) {
.Stack => // Make a copy of `input`
.Owned, .Heap => // Make a copy of `input` or return it directly
}
// in visitMap...
while (try map.nextKey(ally, Key)) |key| {
switch (key.lifetime) {
.Stack => // Make a copy of `key.data`
.Owned => // Make a copy of `key.data` or return it directly
.Heap => // Make a copy of `key.data` or return it directly & free it as necessary
}
}Part Two: The Deserializer
How does a deserializer know if visitString is using the slice as part of the final value, and how much of that slice is being used?
Before diving in, there are a few things to keep in mind:
- Access methods are irrelevant for this part. Deserializers will never see them again so no need to worry about them.
- This part only apply to
Heapvalues.Stackvalues are obviously managed automatically by the compiler.Ownedvalues are managed outside the deserialization process, so functions likedeserializeStringdon't need to worry about them.
- The return value of
visitStringmight not be a string at all, so we shouldn't rely solely onvisitString's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.
In any case, to solve this, we can do the following:
-
Change the return type of
visitStringto the following:const Return = struct { value: Value, indices: ?struct { start: usize, end: usize } = null, }; fn visitString( self: Self, ally: ?std.mem.Allocator, comptime Deserializer: type, input: anytype, ) Deserializer.Error!Return
- If
indicesisnull, then that meansvisitStringdid not useinputas part of its return value. In which case, the deserializer should freevalueafterwards. - If
indicesis notnull, then that meansvisitStringdid useinputas part of its return value.startandendspecifies the starting and ending indices ininputat whichvisitString's return value begins and ends.
- If
-
With this new
indicesfield, the deserializer now knows 1) if the visitor is usinginputdirectly in its return value, and 2) how much ofinputis being used.- If the entirety of
inputis being used, then the deserializer should not freeinputafter callingvisitString. - If only a partial slice of
inputis being used, then the deserializer can usestartandendto determine the remaining parts ofinputthat should be freed.
- If the entirety of