Add reflector instrumentation, pluggable payload mapper, v2 templates by jnzs1836 · Pull Request #7 · strands-labs/harness-optimizer

jnzs1836 · 2026-06-15T22:56:31Z

Supersedes #2.

PR #2 was branched before the package was renamed to strands_harness_optimizer (and before CI was added), so its changes landed on now-nonexistent paths and it couldn't be updated in place due to branch-protection rules. This branch rebases that work onto current main:

BaseAgenticOptimizer: adds _invoke_agent helper that times the agent invocation and captures Strands metrics (tokens, cycles, tool calls, latency) onto last_wall_clock_s / last_metrics. ContrastiveReflectionOptimizer routes through it.
AgentCoreRolloutEngine: new optional payload_mapper: Callable[[dict], dict] init kwarg that transforms the canonical {"data_sample", "params"} payload before the HTTP call. Default None (no transform) — fully backward-compatible.
New built-in templates contrastive_reflection_v2/{system_prompt,task_message_system_prompt}.jinja.

Original work by @hanDing (commit authorship preserved). Verified locally: pytest -m "not integration" → 39 passed.

- BaseAgenticOptimizer: add `_invoke_agent` helper that times the agent invocation and captures Strands metrics (tokens, cycles, tool calls, latency) onto `last_wall_clock_s` / `last_metrics`. ContrastiveReflectionOptimizer routes through it so callers can read step cost without subclassing. - AgentCoreRolloutEngine: the engine still builds the canonical `{"data_sample": ..., "params": ...}` payload. New `payload_mapper: Callable[[dict], dict]` init kwarg is a transform applied to that canonical payload before the HTTP call, so deployments expecting a different shape (flat fields, renamed keys, nested envelopes) can remap without subclassing the engine. Default is None (no transform), fully backward-compatible. - New built-in templates `contrastive_reflection_v2/{system_prompt,task_message_system_prompt}.jinja`: forbids the "Learned Behaviors" appendix, requires holistic integration of insights, adds a safety invariant (require user confirmation before consequential actions), and a length guardrail (<= max(1.1x original, original + 500 chars)). Submission flow still uses the existing `submit_optimized_params` tool contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR rebases and reintroduces prior work to add instrumentation around agent invocation, make AgentCore payloads customizable without subclassing, and ship updated “contrastive_reflection_v2” built-in templates.

Changes:

Add BaseAgenticOptimizer._invoke_agent() to time agent calls and capture Strands usage/latency/tool metrics into last_wall_clock_s and last_metrics; route ContrastiveReflectionOptimizer through it.
Add optional payload_mapper: Callable[[dict], dict] to AgentCoreRolloutEngine to transform the canonical {"data_sample", "params"} payload before invoking the runtime.
Add new built-in templates under templates/contrastive_reflection_v2/.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
strands_harness_optimizer/optimizers/system_prompt/base_agentic_optimizer.py	Adds `_invoke_agent()` plus `last_wall_clock_s` / `last_metrics` fields to expose per-invocation cost/latency.
strands_harness_optimizer/optimizers/system_prompt/contrastive_reflection.py	Routes reflection invocation through `_invoke_agent()` to enable instrumentation.
strands_harness_optimizer/rollout_engines/agentcore_engine.py	Adds `payload_mapper` hook to reshape outgoing runtime payloads without subclassing.
strands_harness_optimizer/templates/contrastive_reflection_v2/system_prompt.jinja	New v2 system prompt template focusing on integrated edits vs appendices.
strands_harness_optimizer/templates/contrastive_reflection_v2/task_message_system_prompt.jinja	New v2 task message template enforcing structural preservation, safety invariant, and length guardrail.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jnzs1836

I think adding try-catch may silently drop the agent failure, causing the further issue for the agent execution. Since we are running an LLM agent experiments, the failure in the agent execution should be visible to the users and they should figure out by themselves, otherwise the experiment results may be biased.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

jnzs1836 requested a review from Copilot June 15, 2026 23:00

Copilot started reviewing on behalf of jnzs1836 June 15, 2026 23:00 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

jnzs1836 requested a review from Copilot June 15, 2026 23:15

Copilot started reviewing on behalf of jnzs1836 June 15, 2026 23:15 View session

jnzs1836 commented Jun 15, 2026

View reviewed changes

Copilot AI reviewed Jun 15, 2026

View reviewed changes

jnzs1836 requested review from Linbo-Liu and Teq2412 June 15, 2026 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add reflector instrumentation, pluggable payload mapper, v2 templates#7

Add reflector instrumentation, pluggable payload mapper, v2 templates#7
jnzs1836 wants to merge 1 commit into
mainfrom
reflector-instrumentation-and-payload-mapper-rebased

jnzs1836 commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnzs1836 left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jnzs1836 commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnzs1836 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants