Skip to content

[BasicContainers] Assorted hashed container fixes and improvements#601

Open
lorentey wants to merge 8 commits intoapple:mainfrom
lorentey:hashed-container-fixes
Open

[BasicContainers] Assorted hashed container fixes and improvements#601
lorentey wants to merge 8 commits intoapple:mainfrom
lorentey:hashed-container-fixes

Conversation

@lorentey
Copy link
Member

@lorentey lorentey commented Mar 11, 2026

RigidSet/UniqueSet/RigidDictionary/UniqueDictionary currently have an implementation of insert that sometimes calculates a maximum probe count that is slightly lower than the real value, and when that happens find will call off the search for some keys right before it would have found the match -- resulting in some of the contents being inaccessible.

Implement a thorough _checkInvariants method to verify hash table layout at runtime. Use it in the existing (and some new) RigidSet unit tests to uncover corrupt results as they happen. Debug and fix their root causes.

While we're here, implement a handful of important TODO items and low-hanging performance fruit:

  • Behavioral engineering: When inserting into small sets/dictionaries, avoid simply appending the new item to the end of storage: it gives the false impression that these types would be preserving insertion order. Instead, insert the new item near the middle of existing storage, swapping the old value to the end.
  • Performance: When inserting an item whose ideal bucket is already occupied, avoid hashing the item that's already there -- it cannot possibly have a lower probe length than the new item, so it will never be swapped anyway. Start hashing with the second item on the collision chain onwards (if any).
  • Performance: While filling a hole after a removal, stop rehashing items once we find one whose ideal bucket would be after the hole -- given the Robin Hood hashing invariants, no subsequent items can possibly hash before or at the hole.
  • Performance: Update the maximum probe length after a removal by applying a trivial upper bound: the maximum probe length cannot possibly be longer than the count of items remaining. (This is still a quite loose upper bound, but it's extremely cheap to apply.)

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

Additionally, add some more tests that exercise various hashing outcomes.
Due to a number of cascading off-by-one errors, the maximum probe length could end up getting set lower than the actual value, _sometimes_ rendering some items unreachable.

Weirdly, the issue seems to be rare enough that it somehow never triggered in any of my previous test runs.
We don’t need to calculate the probe length of the item currently occupying the new item’s ideal bucket — it cannot possibly be lower than the new item’s. This spares one hash operation during insertions that result in a hash collision.
Don’t just append new items to the end of storage — it misleads clients into assuming that the containers preserve insertion order, when they most certainly don’t do that.

Instead, insert new items into the middle of storage, using an extra swap.
@lorentey lorentey added this to the 1.4.1 milestone Mar 11, 2026
@lorentey
Copy link
Member Author

lorentey commented Mar 11, 2026

Cc @tfpauly, with apologies for the oversight 😅

@lorentey lorentey force-pushed the hashed-container-fixes branch from 0680c79 to 16e7715 Compare March 11, 2026 06:50
@lorentey lorentey force-pushed the hashed-container-fixes branch from 60909eb to fda65e1 Compare March 11, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant