-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Summary
I want to make a lot of changes, including some that would break the API, so I first want to check if there are any concerns.
Proposed API changes:
- Merge
pushandinsertbuilder methods together by introducing a newLabeltrait. - Replace all arguments that use
impl AsRef<[Label]>to theLabeltrait. - Introduce new
childrenmethod toTrieandIncSearchthat iterates over the children at a prefix.
Proposed QOL changes:
- Update Rust edition to 2024
- Rename instances of
LabeltoToken(so thatLabelrefers to a string of tokens andTokenrefers to a single unit in the trie) - Rename methods for consistency (e.g.
exact_matchtois_exact, to mirroris_prefix). - Restructure crate to be consistent with standard practices (e.g. rename module files to
<mod>/mod.rs; renameinternal_data_structureto justinternal)
These changes will also fix concerns with #36 and #37.
Motivation
I'm writing a profanity filter library and I need a trie that will allow me to search over chars.
Now, whether I want the trie to store char or u8 (where I convert char to &[u8] when searching) is undecided, but both options aren't ergonomic with the current API (a trie of char requires preallocating space for Box<[char]> and there isn't any easily accessible way to convert char to &[u8]).
I also need to know which children are available at some prefix.
Explanation
QOL changes
The general QOL changes (renaming and restructuring) should make this library slightly more accessible to end-users and future maintainers.
Label trait
The Label trait would allow for generalizing over tokens and strings.
The trait signature would look something like:
trait Label<Token> {
fn into_tokens(self) -> impl Iterator<Item = Token>;
}And Label<u8>, Label<char>, and Label<T> would be implemented for &str, &[T], char, etc. (where appropriate).
This means that when inserting into a Trie<u8>, you could use a &str, char or &[u8], without manually converting yourself.
And when searching a Trie<u8>, you could query_until using a &str, char or &[u8].
children methods
Trie<Token, Value> would have a fn children(&self, label: impl Label<Token>) -> Option<impl Iterator<Item = (&Token, Option<&Value>)>> method that returns the children tokens for a prefix.
IncSearch<'_, Token, Value> would have a fn children(&self) -> Option<impl Iterator<Item = (&Token, Option<&Value>)>> method that returns the children tokens for the current prefix.
Both of these operate by using children_node_nums to get the positions and mapping them to the tokens.