perf(decode::tree): preallocate Vec based on worst-case length#2462
perf(decode::tree): preallocate Vec based on worst-case length#2462datdenkikniet wants to merge 1 commit intoGitoxideLabs:mainfrom
Conversation
We can make a (terrible) guess at how many elements our tree contains. This lets us allocate at least as many entries as we will need, allowing the function to never re-allocate.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 888feb71c7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const HASH_LEN_FIXME: usize = 20; | ||
| let lower_bound_single_entry = 2 + HASH_LEN_FIXME; // 2 = space + trailing zero | ||
| let upper_bound_entries = i.len() / lower_bound_single_entry; | ||
| let mut out = Vec::with_capacity(upper_bound_entries); |
There was a problem hiding this comment.
Avoid preallocating from unvalidated tree byte length
tree() now calls Vec::with_capacity(i.len() / 22) before validating a single entry, so malformed or attacker-controlled tree payloads can force a large allocation and OOM even though parsing immediately returns an error. Previously allocations only grew with successfully decoded entries, so this commit introduces a memory-amplification path for invalid inputs (e.g., corrupted objects received from untrusted repos).
Useful? React with 👍 / 👎.
A micro-optimization that is likely only really beneficial for the specific benchmark that it affects (
TreeIter).We can make a (terrible) guess at how many elements our tree contains. This lets us allocate at least as many entries as we will need, allowing the function to never re-allocate. Allocating vectors takes amortized constant time, but given that the time cost per item is so low (on the order of nanoseconds), avoiding them outright provides a pretty serious speedup.
I have no clue what the project's stance on over-allocation is: if they should be avoided, and/or if this specific optimization is unnecesary, let's close this PR (but remember that it exists).
Benchmark (
cargo bench --bench decode-objects -- TreeRef, diff againstmain):