feat(weave): support merged scorers in monitors by jtschoonhoven · Pull Request #6262 · wandb/weave

jtschoonhoven · 2026-03-04T02:46:06Z

Description

Updates to support trace analysis features:

Defines a new ClassifierScorer that inherits from LLMAsAJudgeScorer
Adds Monitor.merge* attributes to control how Scorers are combined. See below.
Adds Monitor.is_traced so that tracing within monitors is configurable.
Adds a Scorer.display_name property (not necessary but nice-to-have)
Adds LLMAsAJudgeScorer.inject_exception and LLMAsAJudgeScorer.inject_source_code_on_exception which can be used to make that data available to the scorer.
Updates the completions_create function to receive a parent_id so these completions can be traced correctly.

Testing

All these changes are backwards-compatible and default to the old behavior. Tested locally.

codecov · 2026-03-04T02:49:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

wandbot-3000 · 2026-03-04T03:23:00Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=2a1fb22df25b6a7150827880d2079a7834ea8c73

jtschoonhoven · 2026-03-09T16:40:05Z

weave/trace_server/clickhouse_trace_server_batched.py

+        trace_id = req.trace_id or generate_id()
+        parent_id = req.parent_id


Passing through the parent_id allows us to make this completion a child call within a trace.

jtschoonhoven · 2026-03-09T16:46:03Z

weave/scorers/llm_as_a_judge_scorer.py

+@register_object
+class ClassifierScorer(LLMAsAJudgeScorer):
+    """A classifier LLM scorer that tags calls (displayed as pills in the UI)."""


Currently ClassifierScorer shares all its behavior with LLMAsAJudgeScorer but we still need to be able to distinguish between them.

Instead of making this its own class I considered adding an is_classifier attribute on LLMAsAJudgeScorer. But I expect these classes to diverge in the future and it will be nice to encapsulate classifier-specific behavior in its own class.

jtschoonhoven · 2026-03-09T17:06:28Z

weave/scorers/llm_as_a_judge_scorer.py

+    # Optionally inject extra fields in score_args
+    inject_exception: bool = Field(
+        default=False,
+        description="Whether `call.exception` should be automatically added to `score_args`.",
+    )
+    inject_source_code_on_exception: bool = Field(
+        default=False,
+        description="Whether the source code for the op should be automatically added to `score_args` (when `call.exception` is set).",
+    )


For trace analysis we want scorers to have enough context to accurately classify failed calls.

Rather than make this configurable I could instead hardcode this behavior in the scoring worker. But I'm erring on the side of making things configurable.

neutralino1 · 2026-03-09T17:16:30Z

weave/flow/monitor.py

+    merge_scorers: bool = Field(
+        default=False,
+        description="If True, scorers are merged and treated as a single scorer.",
+    )
+    merged_scorers_prompt_header: str | None = Field(
+        default=None,
+        description="Text prepended before the merged classifier prompts.",
+    )
+    merged_scorers_prompt_footer: str | None = Field(
+        default=None,
+        description="Text appended after the merged classifier prompts.",
+    )
+    merged_scorers_prompt_section_header: str = Field(
+        default="{display_name}",
+        description="Text to prepend before each merged scorer prompt (use `{display_name}` to access the scorer's name).",
+    )


Does all of this make sense if scorers are not LLMAsAJudgeScorers? Right now we only support those, but soon we will support custom code scorers, for which "merging" may not make sense.

These only apply to LLMAsAJudgeScorers. Alternatively I could:

Clarify that in the attribute names and descriptions

Hardcode this behavior in scoring_worker.py for classifiers

Something else

I think we should hard-code the behavior in scoring_worker.py, not expose it as settings.

jtschoonhoven added 2 commits March 3, 2026 18:34

errorcat-poc

4d95885

update scorers and monitors for trace analysis

35779b7

jtschoonhoven commented Mar 9, 2026

View reviewed changes

jtschoonhoven added 2 commits March 9, 2026 09:50

update scorers and monitors for trace analysis

eed7750

Merge branch 'master' into jon/errorcat

abee68f

jtschoonhoven force-pushed the jon/errorcat branch from 35779b7 to eed7750 Compare March 9, 2026 16:50

minor updates

8c3c6ff

jtschoonhoven changed the title ~~trace analysis WIP~~ feat(weave): support merged scorers in monitors Mar 9, 2026

jtschoonhoven commented Mar 9, 2026

View reviewed changes

jtschoonhoven marked this pull request as ready for review March 9, 2026 17:08

jtschoonhoven requested a review from a team as a code owner March 9, 2026 17:08

neutralino1 reviewed Mar 9, 2026

View reviewed changes

jtschoonhoven added 2 commits March 9, 2026 10:23

fix tests

ac9a258

merge

7b70206

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(weave): support merged scorers in monitors#6262

feat(weave): support merged scorers in monitors#6262
jtschoonhoven wants to merge 7 commits intomasterfrom
jon/errorcat

jtschoonhoven commented Mar 4, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 4, 2026

Uh oh!

wandbot-3000 bot commented Mar 4, 2026

Uh oh!

jtschoonhoven Mar 9, 2026

Uh oh!

jtschoonhoven Mar 9, 2026

Uh oh!

jtschoonhoven Mar 9, 2026

Uh oh!

neutralino1 Mar 9, 2026

Uh oh!

jtschoonhoven Mar 9, 2026

Uh oh!

neutralino1 Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		trace_id = req.trace_id or generate_id()
		parent_id = req.parent_id

Conversation

jtschoonhoven commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

codecov bot commented Mar 4, 2026

Codecov Report

Uh oh!

wandbot-3000 bot commented Mar 4, 2026

Uh oh!

jtschoonhoven Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

jtschoonhoven Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

jtschoonhoven Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

neutralino1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

jtschoonhoven Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

neutralino1 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jtschoonhoven commented Mar 4, 2026 •

edited

Loading