Perf and Memory optimization for Graph execution#16295
Perf and Memory optimization for Graph execution#16295saintentropy wants to merge 16 commits intoDynamoDS:masterfrom
Conversation
…araminfo in CLR Function Pointer
…kups. Pre allocate marshaller and interpreter
There was a problem hiding this comment.
Pull Request Overview
This PR introduces targeted performance and memory optimizations in the graph execution engine, focusing on hot paths and reducing allocations.
- Removed LINQ in critical loops and replaced with manual iterations or spans
- Added methods to cache and reuse interpreters and to set symbol values while retrieving previous values
- Refactored heap and DSObject allocation to consolidate metadata handling
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| RuntimeMemory.cs | New SetSymbolValueAndGetPreviousValue method with incomplete XML docs |
| MirrorData.cs | Switched from static to instance methods; new ctor overloads |
| JILFunctionEndPoint.cs | Cached Interpreter instance instead of per-call instantiation |
| CallSite.cs | Changed replication-instruction lists to ReadOnlySpan |
| CLRObjectMarshaler.cs | Removed LINQ from type-check; simplified StackValue handling |
| CLRFFIFunctionPointer.cs | Cached parameter arrays and marshaler instances |
| Heap.cs | Introduced AllocatePointerInternal for metadata-aware allocation |
| Executive.cs | Added Reset method; symbol-node lookup caching |
| DSObject.cs | New DSObject ctor accepting metadata |
| AssociativeGraph.cs | Replaced LINQ with indexed loops; unsafe index use in IsSSANode |
Comments suppressed due to low confidence (5)
src/Engine/ProtoCore/RuntimeMemory.cs:189
- The new XML documentation for
SetSymbolValueAndGetPreviousValueis empty. Please provide meaningful<summary>,<param>and<returns>descriptions.
/// <summary>
src/Engine/ProtoCore/Reflection/MirrorData.cs:191
- Changing
GetDatafromstaticto instance breaks its previous static contract. Ensure callers are updated and document this API change.
internal object GetData(StackValue sv, RuntimeCore runtimeCore)
src/Engine/ProtoCore/DSASM/Executive.cs:2324
- [nitpick] The field
cachesymbolIndexhas inconsistent casing. Rename tocacheSymbolIndexto matchcacheBlockIDandcacheClassIndex.
private int cachesymbolIndex = -1;
src/Engine/ProtoCore/Reflection/MirrorData.cs:71
- The new constructor overload lacks XML doc comments. Please document its parameters (
i,m) and intended usage.
/// <summary>
src/Engine/ProtoCore/Lang/JILFunctionEndPoint.cs:40
- Reusing a single
Interpreterinstance (mInterpreter) across calls may introduce thread-safety issues. Consider making interpreter usage thread-local or synchronizing access.
private void Init(RuntimeCore runtimeCore)
| var firstNode = updateNodeRefList[0].nodeList[0]; | ||
| return firstNode != null && firstNode.nodeType == UpdateNodeType.Symbol && firstNode.symbol.isSSATemp; |
There was a problem hiding this comment.
Accessing updateNodeRefList[0].nodeList[0] without checking for an empty list can throw IndexOutOfRangeException. Validate both lists before indexing or revert to a safe lookup.
| var firstNode = updateNodeRefList[0].nodeList[0]; | |
| return firstNode != null && firstNode.nodeType == UpdateNodeType.Symbol && firstNode.symbol.isSSATemp; | |
| if (updateNodeRefList.Count > 0 && updateNodeRefList[0].nodeList.Count > 0) | |
| { | |
| var firstNode = updateNodeRefList[0].nodeList[0]; | |
| return firstNode != null && firstNode.nodeType == UpdateNodeType.Symbol && firstNode.symbol.isSSATemp; | |
| } | |
| return false; |
(cherry picked from commit 9f5e99050e6a5dfd732a0e641cc4e987a2d50c32)
(cherry picked from commit 1a42ed02775abe7823a3995a05080aaf184baef9)
| { | ||
| if (blockId == cacheBlockID && classIndex == cacheClassIndex && symbolIndex == cachesymbolIndex && cachedSymbol != null) | ||
| { | ||
| return cachedSymbol; |
There was a problem hiding this comment.
How much is caching really going to help here? After all you're simply avoiding array/list indexing (exe.runtimeSymbols[blockId].symbolList[symbolIndex]), which I believe should just be O(1) operations, or am I missing something?
Purpose
Some targeted improvements for graph execution. Primarily for delivery in performance of graphs large data sets. In my test graph reduced average graph execution runtime ~ 15% and tessellation by ~5%
Includes algorithm enhancements:
Includes allocation enhancements
Includes these caching enhancements
Declarations
Check these if you believe they are true
*.resxfilesRelease Notes
(FILL ME IN) Brief description of the fix / enhancement. Use N/A to indicate that the changes in this pull request do not apply to Release Notes. Mandatory section
Reviewers
@QilongTang Not sure if these can get reviewed
FYIs
@aparajit-pratap @achintyabhat