Skip to content

Conversation

Tockra
Copy link
Contributor

@Tockra Tockra commented Oct 24, 2019

So time to discuss. I added code, which spares 8 byte for each hash function.
The self.ranks doesn't need his len() information because it is initialized with self.ranks==self.bitvecs.len() and you just need to safe it on one place. Its only needed while deallocate the space (Drop).

I measured the Space (0 to 2047 keys) and found no memory leak. I think there shouldn't be one if nobody hurts following invariants:

  • the computed ranks array slice by compute_ranks is not allowed to have a len() != self.bitvecs.len() .
  • the passed usize I in get_rank isn't allowed to overflow a isize. Otherwise the program panics

@pmarks
Copy link
Contributor

pmarks commented Oct 24, 2019

@Tockra Hmm, I'd really like to avoid adding unsafe code. It's just really hard to be sure you've done it right & it will continue to be correct as you evolve the code.

I'm still not sure I understand your use case. Can you tell me roughly how many instance of Mphf you need to create, and what type and how many entries you are loading into each one?

In all our application we had only 1 or 2 Mphf instances, which contain a handful of BitVec and rank vectors, so these change save ~100s of bytes total.

@Tockra
Copy link
Contributor Author

Tockra commented Oct 25, 2019

I'm building a immutable/static variant of https://pdfs.semanticscholar.org/cd2f/fe40c25eaedadbd687ce679c3c988dbac142.pdf .
For huge Inputs (~2^32 u40 values) this structure has ~27000000 hash functions with u16 keys.
At the moment I'm comparing mphf hash functions, with a static lookup table, brown hashing and with simple binary search.
Mphf has the best memory usage but it's also the slowest variant...

B2T my code above is unsafe but it should be also be correct. The self.ranks arrays stores the length but it doesn't use it at any place... So it doesn't need to store the length...

@galo2099
Copy link
Contributor

Wouldn't you get a much bigger improvement by also removing the size from all the interior boxes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants