Add prompt token estimation and context window usage UI#1262
Add prompt token estimation and context window usage UI#1262
Conversation
* gptel-tokens.el: (gptel--token-estimate-cache): (gptel--estimate-tokens-from-string): (gptel--estimate-tokens-from-words): (gptel--sha1): (gptel--estimate-system-tokens): (gptel--estimate-tools-tokens): (gptel--estimate-buffer-tokens): (gptel-context--collect): (gptel--estimate-context-tokens): (gptel--estimate-tokens): (gptel-tokens): Feature
|
I gave a quick try; it's a good start. Once cache is warm, the The context estimate does appear to vary/miss in some unexpected ways. For example, if I open a new empty But the bigger problem I noticed: all the estimates are too low, for the model combos I've been testing with. E.g. here's an estimate just prior to sending a very small buffer, then the response after send: If I make my own cheap estimate for this setup by calling So with a 2.5 multiplier this would imply 193k tokens (as an approximation/upper-limit). BTW, this also reveals a "source error": via copilot, models are limited to 128k. Also, small thing, but I noticed that the |
* gptel-tokens.el (gptel--estimate-tokens-from-string, gptel--estimate-tokens-from-words): Multiply the word count by 2.5 to estimate tokens, don't divide.
Right now the estimation is running
I don't follow. In case 2, is the long chat file also included in
Arithmetic error: I was dividing by 2.5 instead of multiplying. Now fixed. 26064 * 2.5 * 2.5 = 162900, it's actually overestimating it by about 25%. |
In either case, the long chat file is included. When invoking
:) BTW, I believe there's an API call to copilot that lists all the context limits; not sure if you are already hitting that. |
Following from #1258, a first attempt at token estimation without slowing down Emacs.
To try it out,
and run
(gptel--estimate-tokens)in a buffer. The first run might take a bit (~10-100 ms), later runs should be faster because of caching.It's not hooked up to the UI yet, but the idea is to run this on an idle timer via
gptel-mode-hookto capture the result in a variable, and read from the variable to display it in gptel's header line, the context inspection menu, transient menu etc.The heuristic is just
(number of words / 2.5)right now, and something in between this and a full tokenizer can be implemented eventually.Many edge cases! The calculations should rightly be performed in the prompt buffer (created by
gptel--create-prompt-buffer), but it's expensive to do this every time.