Skip to content

Label Trait + QOL changes #39

@JonahPlusPlus

Description

@JonahPlusPlus

Summary

I want to make a lot of changes, including some that would break the API, so I first want to check if there are any concerns.

Proposed API changes:

  • Merge push and insert builder methods together by introducing a new Label trait.
  • Replace all arguments that use impl AsRef<[Label]> to the Label trait.
  • Introduce new children method to Trie and IncSearch that iterates over the children at a prefix.

Proposed QOL changes:

  • Update Rust edition to 2024
  • Rename instances of Label to Token (so that Label refers to a string of tokens and Token refers to a single unit in the trie)
  • Rename methods for consistency (e.g. exact_match to is_exact, to mirror is_prefix).
  • Restructure crate to be consistent with standard practices (e.g. rename module files to <mod>/mod.rs; rename internal_data_structure to just internal)

These changes will also fix concerns with #36 and #37.

Motivation

I'm writing a profanity filter library and I need a trie that will allow me to search over chars.
Now, whether I want the trie to store char or u8 (where I convert char to &[u8] when searching) is undecided, but both options aren't ergonomic with the current API (a trie of char requires preallocating space for Box<[char]> and there isn't any easily accessible way to convert char to &[u8]).

I also need to know which children are available at some prefix.

Explanation

QOL changes

The general QOL changes (renaming and restructuring) should make this library slightly more accessible to end-users and future maintainers.

Label trait

The Label trait would allow for generalizing over tokens and strings.
The trait signature would look something like:

trait Label<Token> {
    fn into_tokens(self) -> impl Iterator<Item = Token>;
}

And Label<u8>, Label<char>, and Label<T> would be implemented for &str, &[T], char, etc. (where appropriate).

This means that when inserting into a Trie<u8>, you could use a &str, char or &[u8], without manually converting yourself.

And when searching a Trie<u8>, you could query_until using a &str, char or &[u8].

children methods

Trie<Token, Value> would have a fn children(&self, label: impl Label<Token>) -> Option<impl Iterator<Item = (&Token, Option<&Value>)>> method that returns the children tokens for a prefix.

IncSearch<'_, Token, Value> would have a fn children(&self) -> Option<impl Iterator<Item = (&Token, Option<&Value>)>> method that returns the children tokens for the current prefix.

Both of these operate by using children_node_nums to get the positions and mapping them to the tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions