Skip to content

Add prompt token estimation and context window usage UI#1262

Draft
karthink wants to merge 2 commits intomasterfrom
feature-token-estimate
Draft

Add prompt token estimation and context window usage UI#1262
karthink wants to merge 2 commits intomasterfrom
feature-token-estimate

Conversation

@karthink
Copy link
Copy Markdown
Owner

Following from #1258, a first attempt at token estimation without slowing down Emacs.

To try it out,

(require 'gptel-tokens)

and run (gptel--estimate-tokens) in a buffer. The first run might take a bit (~10-100 ms), later runs should be faster because of caching.

  1. It's not hooked up to the UI yet, but the idea is to run this on an idle timer via gptel-mode-hook to capture the result in a variable, and read from the variable to display it in gptel's header line, the context inspection menu, transient menu etc.

  2. The heuristic is just (number of words / 2.5) right now, and something in between this and a full tokenizer can be implemented eventually.

  3. Many edge cases! The calculations should rightly be performed in the prompt buffer (created by gptel--create-prompt-buffer), but it's expensive to do this every time.

* gptel-tokens.el:
(gptel--token-estimate-cache):
(gptel--estimate-tokens-from-string):
(gptel--estimate-tokens-from-words):
(gptel--sha1):
(gptel--estimate-system-tokens):
(gptel--estimate-tools-tokens):
(gptel--estimate-buffer-tokens):
(gptel-context--collect):
(gptel--estimate-context-tokens):
(gptel--estimate-tokens):
(gptel-tokens): Feature
@jdtsmith
Copy link
Copy Markdown

I gave a quick try; it's a good start. Once cache is warm, the gptel--estimate-tokens call takes about 50ms on a fairly fast machine for 35k tokens (against Copilot/Opus 4.6, 200k limit; though see below). That would definitely slow down the interface on post-command-hook but I think is fine for an idle-timer. Then again token count will continue to grow, so further optimization might be warranted.

The context estimate does appear to vary/miss in some unexpected ways. For example, if I open a new empty .org file in gptel-mode, and include another long chat file as context (among several others), I get one value. But if I request the same context estimate from the end of that long chat file included in the context, I get a longer estimate. It's as if context inclusion status between files/buffers added and "current buffer to send" is not constant.

But the bigger problem I noticed: all the estimates are too low, for the model combos I've been testing with. E.g. here's an estimate just prior to sending a very small buffer, then the response after send:

(:tokens 26064 :context-window 200000 :percentage 13) [2 times]
Copilot error: (HTTP/2 400) prompt token count of 128886 exceeds the limit of 128000

If I make my own cheap estimate for this setup by calling gptel-menu, hitting C, and M-x count-words in the full context buffer, I find:

Buffer has 14175 lines, 3585 sentences, 66953 words, and 482176 characters

So with a 2.5 multiplier this would imply 193k tokens (as an approximation/upper-limit).

BTW, this also reveals a "source error": via copilot, models are limited to 128k.

Also, small thing, but I noticed that the :percentage estimate tops out at 100%. For the interface, since this is an estimate and not a certain calculation, I think going >100% is reasonable.

* gptel-tokens.el (gptel--estimate-tokens-from-string,
gptel--estimate-tokens-from-words): Multiply the word count by
2.5 to estimate tokens, don't divide.
@karthink
Copy link
Copy Markdown
Owner Author

karthink commented Mar 1, 2026

I gave a quick try; it's a good start. Once cache is warm, the gptel--estimate-tokens call takes about 50ms on a fairly fast machine for 35k tokens (against Copilot/Opus 4.6, 200k limit; though see below). That would definitely slow down the interface on post-command-hook but I think is fine for an idle-timer. Then again token count will continue to grow, so further optimization might be warranted.

Right now the estimation is running gptel--create-prompt-buffer, which creates a buffer and copies over the text to be sent to the LLM (along with several buffer-local variables). This handles the Org mode branching prompt and other subtleties, but wastes both CPU time and memory. It's excessive just for counting words. I suspect working around this should help a lot.

The context estimate does appear to vary/miss in some unexpected ways. For example, if I open a new empty .org file in gptel-mode, and include another long chat file as context (among several others), I get one value. But if I request the same context estimate from the end of that long chat file included in the context, I get a longer estimate. It's as if context inclusion status between files/buffers added and "current buffer to send" is not constant.

I don't follow. In case 2, is the long chat file also included in gptel-context?

But the bigger problem I noticed: all the estimates are too low, for the model
combos I've been testing with. E.g. here's an estimate just prior to sending a
very small buffer, then the response after send:

(:tokens 26064 :context-window 200000 :percentage 13) [2 times]
Copilot error: (HTTP/2 400) prompt token count of 128886 exceeds the limit of 128000

If I make my own cheap estimate for this setup by calling gptel-menu, hitting
C, and M-x count-words in the full context buffer, I find:

Buffer has 14175 lines, 3585 sentences, 66953 words, and 482176 characters

So with a 2.5 multiplier this would imply 193k tokens (as an approximation/upper-limit).

Arithmetic error: I was dividing by 2.5 instead of multiplying. Now fixed.

26064 * 2.5 * 2.5 = 162900, it's actually overestimating it by about 25%.

@jdtsmith
Copy link
Copy Markdown

jdtsmith commented Mar 1, 2026

I don't follow. In case 2, is the long chat file also included in gptel-context?

In either case, the long chat file is included. When invoking gptel-send from a nearly empty chat buffer, the long file is included among the context. When sending from the ed of the long chat file buffer itself, it's doubly included: by context, and by being the "implicit" context of everything in this buffer. I'm not sure in which of the two directions the error goes.

arithmetic error: I was dividing by 2.5 instead of multiplying.

:)

BTW, I believe there's an API call to copilot that lists all the context limits; not sure if you are already hitting that.

@puhsu puhsu mentioned this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants