Skip to content

Potential memory optimization for IntMap and IntSet #991

@meooow25

Description

@meooow25

This isn't particularly novel, so maybe it has been proposed before. Let me know if it has. I only found #340 in my search, which is a bigger idea.

Current definition

Today, Bin for IntMap and IntSet is represented as

data IntMap a = Bin {-# UNPACK #-} !Prefix
{-# UNPACK #-} !Mask
!(IntMap a)
!(IntMap a)
-- Fields:
-- prefix: The most significant bits shared by all keys in this Bin.
-- mask: The switching bit to determine if a key should follow the left
-- or right subtree of a 'Bin'.

Potential new definition

The prefix and the mask can be merged so that we save one word per Bin.

data IntMap a = Bin {-# UNPACK #-} !Int -- (current Prefix + current Mask)
                    !(IntMap a)
                    !(IntMap a)

The mask bit is always zero in the prefix, so this isn't throwing away any information. The lowest set bit of the new int is the current mask and the rest of it is the current prefix.

Current branching on Bin

Branching on Bin is currently done like this:

insert :: Key -> a -> IntMap a -> IntMap a
insert !k x t@(Bin p m l r)
| nomatch k p m = link k (Tip k x) p t
| zero k m = Bin p m (insert k x l) r
| otherwise = Bin p m l (insert k x r)

nomatch i p m
= (mask i m) /= p

zero i m
= (natFromInt i) .&. (natFromInt m) == 0

New branching on Bin

insert :: Key -> a -> IntMap a -> IntMap a 
insert !k x t@(Bin pm l r) 
  | nomatch k pm = link k (Tip k x) p t 
  | left k pm    = Bin pm (insert k x l) r 
  | otherwise    = Bin pm l (insert k x r)
 
nomatch :: Int -> Int -> Bool
nomatch k pm = mask i pm /= pm .&. (pm-1)

left :: Int -> Int -> Bool
left k pm = int2word k < int2word pm 

Performance impact

I don't know yet, I need to make the change and benchmark. And it will involve changing every function, so it will be take a while.
Memory is certainly saved. nomatch gets a little more expensive, but left is cheaper than zero, so I'm hoping there is zero or positive overall effect.

What do you think? Is it worth checking out how this will fare? And is there any bad consequence of this representation, that I didn't think of?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions