Skip to content

[ENHANCEMENT] Support on-the-fly 1M context switching #9250

@cjlawson02

Description

@cjlawson02

Problem (one or two sentences)

API providers have different rate limit quotas for 1M context window models versus standard 200K models, making it more cost-effective and efficient to use the standard model until context size actually requires the 1M window.

Context (who is affected and when)

This affects all users who enable the "1M Context Window (Beta)" setting for Claude Sonnet 4/4.5 models. Currently, when this setting is enabled, ALL API requests use the 1M context window regardless of actual context size, even for small conversations that fit well within the 200K limit. This results in unnecessary quota consumption and potentially higher API costs.

Desired behavior (conceptual, not technical)

Change the "Enable 1M Context Window" checkbox to enable dynamic context window switching. When enabled, the system should automatically:

  • Use the standard 200K context window for requests that fit within 200K tokens
  • Switch to the 1M context window only when context size approaches or exceeds 200K tokens
  • Switch back to 200K if context is condensed and fits within the smaller window again

The user shouldn't need to manually toggle the setting - the system intelligently chooses the right model based on actual context usage.

Constraints / preferences (optional)

  • Must maintain backward compatibility with the existing anthropicBeta1MContext boolean setting
  • Should work transparently without requiring user intervention once enabled
  • Context size calculation already exists via calculateTokenDistribution() - reuse this
  • Similar pattern should work for both Anthropic direct API and AWS Bedrock providers

Request checklist

  • I've searched existing Issues and Discussions for duplicates
  • This describes a specific problem with clear context and impact

Roo Code Task Links (optional)

No response

Acceptance criteria (optional)

Given a user has enabled "Dynamic 1M Context Window" for Claude Sonnet 4/4.5
When they start a new conversation with less than 200K tokens of context
Then the system uses the standard 200K context window model
And API requests include NO context-1m-2025-08-07 beta flag
And pricing reflects 200K tier ($3 input / $15 output per million tokens)

Given the same user continues the conversation
When context size grows to exceed 190K tokens (threshold before 200K limit)
Then subsequent API requests automatically switch to 1M context window
And API requests include the context-1m-2025-08-07 beta flag
And pricing reflects 1M tier ($6 input / $22.50 output per million tokens)

Given context was using 1M window
When context is condensed and fits within 180K tokens (with buffer below 200K)
Then subsequent requests switch back to 200K context window
And pricing returns to 200K tier rates

But users with the setting DISABLED should always use 200K context window regardless of size
And the model selection should happen transparently per-request without user intervention
And existing users' anthropicBeta1MContext settings should continue working as before

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions