Skip to content

B/fallback tokenizer for OpenAI embedder#283

Merged
voorhs merged 5 commits intodevfrom
b/fallback-tokenizer-for-openai-embedder
May 11, 2026
Merged

B/fallback tokenizer for OpenAI embedder#283
voorhs merged 5 commits intodevfrom
b/fallback-tokenizer-for-openai-embedder

Conversation

@voorhs
Copy link
Copy Markdown
Collaborator

@voorhs voorhs commented May 10, 2026

No description provided.

import tiktoken

encoding = tiktoken.encoding_for_model(model_name)
encoding = _tiktoken_encoding_for_embedding_model(model_name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Мб добавить запросы на /v1/tokenize? Большинство openai compatible api поддерживают

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

у openai и openrouter такого нет; я нашел что у vllm и sglang есть

как фича прикольно, как вариант - добавить опцию token_counter: Literal["tiktoken", "endpoint", "transformers"] в OpenaiEmbeddingConfig

пока нет критической необходимости в этом, такие грубые оценки токенов норм для текущих задач

@voorhs voorhs merged commit 4f77f10 into dev May 11, 2026
@voorhs voorhs deleted the b/fallback-tokenizer-for-openai-embedder branch May 11, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants