Skip to content

fix: add disallowed_special on tiktoken encode #2102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ragas/src/ragas/testset/transforms/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ class LLMBasedExtractor(Extractor, PromptMixin):
def split_text_by_token_limit(self, text, max_token_limit):

# Tokenize the entire input string
tokens = self.tokenizer.encode(text)
tokens = self.tokenizer.encode(text, disallowed_special=())

# Split tokens into chunks of max_token_limit or less
chunks = []
Expand Down
2 changes: 1 addition & 1 deletion ragas/src/ragas/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def camel_to_snake(name):
def num_tokens_from_string(string: str, encoding_name: str = "cl100k_base") -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
num_tokens = len(encoding.encode(string, disallowed_special=()))
return num_tokens
Comment on lines 225 to 229
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add docstring explaining the disallowed_special parameter and why it's set to empty tuple. This helps future maintainers understand the reasoning behind this configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude add this comment to explain it



Expand Down