Skip to content

Conversation

fveiraswww
Copy link
Contributor

Background

Validating LLM outputs isn't only about correctness, it's about extracting signals. Scorers make that possible

this implementation:

  • includes scoring results in the final step response
  • helps when running evals (e.g. Evalite scorers)
  • makes it easier to validate interaction + user intention across agent <> tools
  • opens the door to using other LLMs as evaluators paper

Summary

  • adds a first version of scorers support in streamText
  • runs defined scorers on tool-result parts
  • results are exposed in onStepFinish({ scorers })
  • includes an example (examples/ai-core/src/stream-text/gateway-scorers.ts)

this is a poc, only wired into streamText and still missing docs, tests, and broader integration

Manual Verification

bun run ./stream-text/gateway-scorers.ts

Checklist

  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • Formatting issues have been fixed (run pnpm prettier-fix in the project root)
  • I have reviewed this pull request (self-review)

Future Work

Related Issues

Copy link
Contributor

@vercel vercel bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments:

packages/ai/src/generate-text/index.ts (line 44):

The new scorer types (Scorer, ScorerResult, createScorer, executeScorer) are not exported from the package, making them unavailable for external use.

View Details
📝 Patch Details
diff --git a/packages/ai/src/generate-text/index.ts b/packages/ai/src/generate-text/index.ts
index ba890f16d..ddc90dbf6 100644
--- a/packages/ai/src/generate-text/index.ts
+++ b/packages/ai/src/generate-text/index.ts
@@ -41,3 +41,5 @@ export type {
   TypedToolResult,
 } from './tool-result';
 export type { ToolSet } from './tool-set';
+export type { Scorer, ScorerResult } from './scorer';
+export { createScorer, executeScorer } from './scorer';

Analysis

Missing exports for scorer types and utilities in generate-text module

What fails: TypeScript compilation fails when importing Scorer, ScorerResult, createScorer, and executeScorer from packages/ai/src/generate-text/index.ts - these types and functions are not exported despite being available in the scorer.ts file and used internally.

How to reproduce:

// This fails with TypeScript errors before the fix:
import type { Scorer, ScorerResult } from 'ai';
import { createScorer, executeScorer } from 'ai';

const scorer: Scorer<{value: number}> = createScorer({
  name: 'test-scorer',
  tool: 'my-tool', 
  scorer: (output) => output.value > 10 ? 1 : 0
});

TypeScript errors: Module has no exported member 'Scorer', Module has no exported member 'ScorerResult', Module has no exported member 'createScorer', Module has no exported member 'executeScorer'

Expected: Scorer types and utilities should be exportable since they are used in streamText() scorers parameter and referenced in examples like examples/ai-core/src/stream-text/gateway-scorers.ts

messages: [...recordedResponseMessages, ...stepMessages],
},
providerMetadata: part.providerMetadata,
scorers: recordedScorers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recordedScorers array is never reset between steps, causing scorer results to accumulate across multiple conversation steps and leading to incorrect final results.

View Details
📝 Patch Details
diff --git a/packages/ai/src/generate-text/stream-text.ts b/packages/ai/src/generate-text/stream-text.ts
index fcbc3b95e..7c9c905ca 100644
--- a/packages/ai/src/generate-text/stream-text.ts
+++ b/packages/ai/src/generate-text/stream-text.ts
@@ -857,6 +857,7 @@ class DefaultStreamTextResult<TOOLS extends ToolSet, OUTPUT, PARTIAL_OUTPUT>
           recordedContent = [];
           activeReasoningContent = {};
           activeTextContent = {};
+          recordedScorers = [];
 
           recordedResponseMessages.push(...stepMessages);
 

Analysis

recordedScorers array accumulates across conversation steps in streamText()

What fails: In packages/ai/src/generate-text/stream-text.ts, the recordedScorers array is not reset between steps, causing scorer results from previous steps to accumulate in final results

How to reproduce:

  1. Use streamText() with scorers in a multi-step conversation
  2. Step 1 generates scorer results A, B
  3. Step 2 generates scorer results C, D
  4. Final result incorrectly contains [A, B, C, D] instead of [C, D]

Result: Each step's scorer results include all previous steps' scorer results, making scoring unreliable for multi-step conversations

Expected: Each step should have only its own scorer results, similar to how recordedContent is reset between steps (lines 857-859)

Root cause: Line 857-860 resets recordedContent, activeReasoningContent, and activeTextContent but omits recordedScorers = []

@fveiraswww fveiraswww marked this pull request as draft September 21, 2025 15:39
@fveiraswww
Copy link
Contributor Author

@nicoalbanese what do you think, make sense for the toolkit to have this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant