-
Notifications
You must be signed in to change notification settings - Fork 27
Description
In the current implementation, we use the most recent T datapoints to predict the next datapoint. However, T is usually quite small because of computational limits (either GPU/runtime or context length limits of LLM).
We propose a method of incorporating both short-term and long-term context in each window. We will take the latest S datapoints for our fine context, and compress the preceding W * L datapoints into L datapoints for our coarse context. The fine context is simply the latest S datapoints. The coarse context will take L segments of size W and run some aggregation over each window. The resulting window is the concatenation of the coarse context followed by the fine context. Thus, we combine S + W * L datapoints into a new sliding window of size S + L. The default aggregation method will be mean.
This PR will include a primitive to aggregate the windows in this way, and also a pipeline to run with the new aggregation primitive. We hope that adding more long-term context will allow the LLMs to make sharper predictions.