Add prompt token estimation and context window usage UI by karthink · Pull Request #1262 · karthink/gptel

karthink · 2026-02-23T12:07:15Z

Following from #1258, a first attempt at token estimation without slowing down Emacs.

To try it out,

(require 'gptel-tokens)

and run (gptel--estimate-tokens) in a buffer. The first run might take a bit (~10-100 ms), later runs should be faster because of caching.

It's not hooked up to the UI yet, but the idea is to run this on an idle timer via gptel-mode-hook to capture the result in a variable, and read from the variable to display it in gptel's header line, the context inspection menu, transient menu etc.
The heuristic is just (number of words / 2.5) right now, and something in between this and a full tokenizer can be implemented eventually.
Many edge cases! The calculations should rightly be performed in the prompt buffer (created by gptel--create-prompt-buffer), but it's expensive to do this every time.

* gptel-tokens.el: (gptel--token-estimate-cache): (gptel--estimate-tokens-from-string): (gptel--estimate-tokens-from-words): (gptel--sha1): (gptel--estimate-system-tokens): (gptel--estimate-tools-tokens): (gptel--estimate-buffer-tokens): (gptel-context--collect): (gptel--estimate-context-tokens): (gptel--estimate-tokens): (gptel-tokens): Feature

jdtsmith · 2026-02-28T19:29:07Z

I gave a quick try; it's a good start. Once cache is warm, the gptel--estimate-tokens call takes about 50ms on a fairly fast machine for 35k tokens (against Copilot/Opus 4.6, 200k limit; though see below). That would definitely slow down the interface on post-command-hook but I think is fine for an idle-timer. Then again token count will continue to grow, so further optimization might be warranted.

The context estimate does appear to vary/miss in some unexpected ways. For example, if I open a new empty .org file in gptel-mode, and include another long chat file as context (among several others), I get one value. But if I request the same context estimate from the end of that long chat file included in the context, I get a longer estimate. It's as if context inclusion status between files/buffers added and "current buffer to send" is not constant.

But the bigger problem I noticed: all the estimates are too low, for the model combos I've been testing with. E.g. here's an estimate just prior to sending a very small buffer, then the response after send:

(:tokens 26064 :context-window 200000 :percentage 13) [2 times]
Copilot error: (HTTP/2 400) prompt token count of 128886 exceeds the limit of 128000

If I make my own cheap estimate for this setup by calling gptel-menu, hitting C, and M-x count-words in the full context buffer, I find:

Buffer has 14175 lines, 3585 sentences, 66953 words, and 482176 characters

So with a 2.5 multiplier this would imply 193k tokens (as an approximation/upper-limit).

BTW, this also reveals a "source error": via copilot, models are limited to 128k.

Also, small thing, but I noticed that the :percentage estimate tops out at 100%. For the interface, since this is an estimate and not a certain calculation, I think going >100% is reasonable.

* gptel-tokens.el (gptel--estimate-tokens-from-string, gptel--estimate-tokens-from-words): Multiply the word count by 2.5 to estimate tokens, don't divide.

karthink · 2026-03-01T00:15:15Z

I gave a quick try; it's a good start. Once cache is warm, the gptel--estimate-tokens call takes about 50ms on a fairly fast machine for 35k tokens (against Copilot/Opus 4.6, 200k limit; though see below). That would definitely slow down the interface on post-command-hook but I think is fine for an idle-timer. Then again token count will continue to grow, so further optimization might be warranted.

Right now the estimation is running gptel--create-prompt-buffer, which creates a buffer and copies over the text to be sent to the LLM (along with several buffer-local variables). This handles the Org mode branching prompt and other subtleties, but wastes both CPU time and memory. It's excessive just for counting words. I suspect working around this should help a lot.

The context estimate does appear to vary/miss in some unexpected ways. For example, if I open a new empty .org file in gptel-mode, and include another long chat file as context (among several others), I get one value. But if I request the same context estimate from the end of that long chat file included in the context, I get a longer estimate. It's as if context inclusion status between files/buffers added and "current buffer to send" is not constant.

I don't follow. In case 2, is the long chat file also included in gptel-context?

But the bigger problem I noticed: all the estimates are too low, for the model
combos I've been testing with. E.g. here's an estimate just prior to sending a
very small buffer, then the response after send:
(:tokens 26064 :context-window 200000 :percentage 13) [2 times]
Copilot error: (HTTP/2 400) prompt token count of 128886 exceeds the limit of 128000
If I make my own cheap estimate for this setup by calling gptel-menu, hitting
C, and M-x count-words in the full context buffer, I find:
Buffer has 14175 lines, 3585 sentences, 66953 words, and 482176 characters
So with a 2.5 multiplier this would imply 193k tokens (as an approximation/upper-limit).

Arithmetic error: I was dividing by 2.5 instead of multiplying. Now fixed.

26064 * 2.5 * 2.5 = 162900, it's actually overestimating it by about 25%.

jdtsmith · 2026-03-01T18:45:27Z

I don't follow. In case 2, is the long chat file also included in gptel-context?

In either case, the long chat file is included. When invoking gptel-send from a nearly empty chat buffer, the long file is included among the context. When sending from the ed of the long chat file buffer itself, it's doubly included: by context, and by being the "implicit" context of everything in this buffer. I'm not sure in which of the two directions the error goes.

arithmetic error: I was dividing by 2.5 instead of multiplying.

:)

BTW, I believe there's an API call to copilot that lists all the context limits; not sure if you are already hitting that.

karthink mentioned this pull request Feb 23, 2026

Idea: show the "context percentage" for the selected model #1258

Open

gptel-tokens: Fix token count calculation

1289ca0

* gptel-tokens.el (gptel--estimate-tokens-from-string, gptel--estimate-tokens-from-words): Multiply the word count by 2.5 to estimate tokens, don't divide.

puhsu mentioned this pull request Mar 16, 2026

Show token usage #1077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt token estimation and context window usage UI#1262

Add prompt token estimation and context window usage UI#1262
karthink wants to merge 2 commits intomasterfrom
feature-token-estimate

karthink commented Feb 23, 2026

Uh oh!

jdtsmith commented Feb 28, 2026

Uh oh!

karthink commented Mar 1, 2026 •

edited

Loading

Uh oh!

jdtsmith commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karthink commented Feb 23, 2026

Uh oh!

jdtsmith commented Feb 28, 2026

Uh oh!

karthink commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdtsmith commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karthink commented Mar 1, 2026 •

edited

Loading