Anthropic: support multi-block system messages with independent cache_control per block

**Expected Behavior**

Developers should be able to designate portions of the system message as static (cacheable) vs dynamic (uncacheable), with `buildSystemContent()` emitting separate Anthropic `ContentBlock`s accordingly:

```json
{
  "system": [
    {"type": "text", "text": "static instructions...", "cache_control": {"type": "ephemeral"}},
    {"type": "text", "text": "dynamic per-request context..."}
  ]
}
```

Anthropic's API already supports arrays of content blocks in the `system` field, each with independent `cache_control` (up to 4 breakpoints per request). The static block would get cache hits on subsequent requests while the dynamic block is processed fresh.

One possible API shape — a cache policy on `SystemMessage`:

```java
// Static behavioral instructions — cached
new SystemMessage("safety guardrails...", CachePolicy.CACHEABLE);

// Dynamic advisor-injected context — not cached
new SystemMessage("product recommendations...", CachePolicy.NO_CACHE);
```

Another option would be at the advisor level, letting advisors declare whether their `augmentSystemMessage()` output belongs in a cached or uncached block.

**Current Behavior**

`AnthropicChatModel.buildSystemContent()` concatenates all system messages into a single string via `Collectors.joining()`, then wraps the result in one `ContentBlock` with one `cache_control` marker.

This means if *any* part of the system message is dynamic, the entire system cache misses every request. With `SYSTEM_ONLY` strategy enabled, each miss pays Anthropic's 1.25x cache-write cost with zero reads — making it **more expensive** than disabling caching entirely.

There is no way to split static from dynamic content within the system message using the current API. The only workaround is moving dynamic content out of the system message entirely.

**Context**

Our application uses Anthropic Sonnet for conversational AI agents. The system message has two layers:

- **Static behavioral instructions** (~2,280 tokens): safety guardrails, domain principles, brand tone, mode instructions, response formatting. Identical across all requests for a given tenant + mode.
- **Dynamic context injected by Spring AI advisors**: RAG-matched product recommendations (change every query), intent/slot state (change every turn), temporal context (changes daily).

Because `buildSystemContent()` concatenates everything into one block, we can't cache the static prefix independently. Enabling `SYSTEM_ONLY` with any dynamic advisor content means 100% cache misses at 1.25x write cost.

**Workaround:** We moved all dynamic content from the system message to the user message, keeping the system message fully static and cacheable. This works — cache reads cost 0.1x ($0.30/MTok vs $3/MTok uncached for Sonnet) — but it conflates the semantic distinction between system instructions (developer authority) and user-level context. Product recommendations and intent state are contextual data the model *reads*, not behavioral instructions it *follows*, so the practical impact is minimal. But ideally the framework wouldn't force this trade-off.

**Alternatives considered:**
- Bypassing Spring AI to call the Anthropic API directly with fine-grained block placement — loses the entire advisor pipeline
- Waiting for multi-block support — filed this issue

**Related:** #4325 (per-message-type TTLs and min-size thresholds) is complementary but doesn't address static/dynamic splitting within system messages.

**References:**
- [Anthropic Prompt Caching docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) — multi-block system message examples
- Spring AI version: 1.1.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic: support multi-block system messages with independent cache_control per block #5494

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Anthropic: support multi-block system messages with independent cache_control per block #5494

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions