-
Notifications
You must be signed in to change notification settings - Fork 183
Description
This isn't particularly novel, so maybe it has been proposed before. Let me know if it has. I only found #340 in my search, which is a bigger idea.
Current definition
Today, Bin
for IntMap
and IntSet
is represented as
containers/containers/src/Data/IntMap/Internal.hs
Lines 355 to 362 in 3c13e0b
data IntMap a = Bin {-# UNPACK #-} !Prefix | |
{-# UNPACK #-} !Mask | |
!(IntMap a) | |
!(IntMap a) | |
-- Fields: | |
-- prefix: The most significant bits shared by all keys in this Bin. | |
-- mask: The switching bit to determine if a key should follow the left | |
-- or right subtree of a 'Bin'. |
Potential new definition
The prefix and the mask can be merged so that we save one word per Bin
.
data IntMap a = Bin {-# UNPACK #-} !Int -- (current Prefix + current Mask)
!(IntMap a)
!(IntMap a)
The mask bit is always zero in the prefix, so this isn't throwing away any information. The lowest set bit of the new int is the current mask and the rest of it is the current prefix.
Current branching on Bin
Branching on Bin
is currently done like this:
containers/containers/src/Data/IntMap/Internal.hs
Lines 813 to 817 in 3c13e0b
insert :: Key -> a -> IntMap a -> IntMap a | |
insert !k x t@(Bin p m l r) | |
| nomatch k p m = link k (Tip k x) p t | |
| zero k m = Bin p m (insert k x l) r | |
| otherwise = Bin p m l (insert k x r) |
containers/containers/src/Data/IntMap/Internal.hs
Lines 3527 to 3528 in 3c13e0b
nomatch i p m | |
= (mask i m) /= p |
containers/containers/src/Data/IntMap/Internal.hs
Lines 3519 to 3520 in 3c13e0b
zero i m | |
= (natFromInt i) .&. (natFromInt m) == 0 |
New branching on Bin
insert :: Key -> a -> IntMap a -> IntMap a
insert !k x t@(Bin pm l r)
| nomatch k pm = link k (Tip k x) p t
| left k pm = Bin pm (insert k x l) r
| otherwise = Bin pm l (insert k x r)
nomatch :: Int -> Int -> Bool
nomatch k pm = mask i pm /= pm .&. (pm-1)
left :: Int -> Int -> Bool
left k pm = int2word k < int2word pm
Performance impact
I don't know yet, I need to make the change and benchmark. And it will involve changing every function, so it will be take a while.
Memory is certainly saved. nomatch
gets a little more expensive, but left
is cheaper than zero
, so I'm hoping there is zero or positive overall effect.
What do you think? Is it worth checking out how this will fare? And is there any bad consequence of this representation, that I didn't think of?