-
Notifications
You must be signed in to change notification settings - Fork 3k
feat(ai): add tool output scoring system (POC) #8804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments:
packages/ai/src/generate-text/index.ts (line 44):
The new scorer types (Scorer
, ScorerResult
, createScorer
, executeScorer
) are not exported from the package, making them unavailable for external use.
View Details
📝 Patch Details
diff --git a/packages/ai/src/generate-text/index.ts b/packages/ai/src/generate-text/index.ts
index ba890f16d..ddc90dbf6 100644
--- a/packages/ai/src/generate-text/index.ts
+++ b/packages/ai/src/generate-text/index.ts
@@ -41,3 +41,5 @@ export type {
TypedToolResult,
} from './tool-result';
export type { ToolSet } from './tool-set';
+export type { Scorer, ScorerResult } from './scorer';
+export { createScorer, executeScorer } from './scorer';
Analysis
Missing exports for scorer types and utilities in generate-text module
What fails: TypeScript compilation fails when importing Scorer
, ScorerResult
, createScorer
, and executeScorer
from packages/ai/src/generate-text/index.ts
- these types and functions are not exported despite being available in the scorer.ts file and used internally.
How to reproduce:
// This fails with TypeScript errors before the fix:
import type { Scorer, ScorerResult } from 'ai';
import { createScorer, executeScorer } from 'ai';
const scorer: Scorer<{value: number}> = createScorer({
name: 'test-scorer',
tool: 'my-tool',
scorer: (output) => output.value > 10 ? 1 : 0
});
TypeScript errors: Module has no exported member 'Scorer'
, Module has no exported member 'ScorerResult'
, Module has no exported member 'createScorer'
, Module has no exported member 'executeScorer'
Expected: Scorer types and utilities should be exportable since they are used in streamText()
scorers parameter and referenced in examples like examples/ai-core/src/stream-text/gateway-scorers.ts
messages: [...recordedResponseMessages, ...stepMessages], | ||
}, | ||
providerMetadata: part.providerMetadata, | ||
scorers: recordedScorers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recordedScorers
array is never reset between steps, causing scorer results to accumulate across multiple conversation steps and leading to incorrect final results.
View Details
📝 Patch Details
diff --git a/packages/ai/src/generate-text/stream-text.ts b/packages/ai/src/generate-text/stream-text.ts
index fcbc3b95e..7c9c905ca 100644
--- a/packages/ai/src/generate-text/stream-text.ts
+++ b/packages/ai/src/generate-text/stream-text.ts
@@ -857,6 +857,7 @@ class DefaultStreamTextResult<TOOLS extends ToolSet, OUTPUT, PARTIAL_OUTPUT>
recordedContent = [];
activeReasoningContent = {};
activeTextContent = {};
+ recordedScorers = [];
recordedResponseMessages.push(...stepMessages);
Analysis
recordedScorers array accumulates across conversation steps in streamText()
What fails: In packages/ai/src/generate-text/stream-text.ts
, the recordedScorers
array is not reset between steps, causing scorer results from previous steps to accumulate in final results
How to reproduce:
- Use
streamText()
with scorers in a multi-step conversation - Step 1 generates scorer results A, B
- Step 2 generates scorer results C, D
- Final result incorrectly contains [A, B, C, D] instead of [C, D]
Result: Each step's scorer results include all previous steps' scorer results, making scoring unreliable for multi-step conversations
Expected: Each step should have only its own scorer results, similar to how recordedContent
is reset between steps (lines 857-859)
Root cause: Line 857-860 resets recordedContent
, activeReasoningContent
, and activeTextContent
but omits recordedScorers = []
@nicoalbanese what do you think, make sense for the toolkit to have this feature? |
Background
Validating LLM outputs isn't only about correctness, it's about extracting signals. Scorers make that possible
this implementation:
Summary
streamText
tool-result
partsonStepFinish({ scorers })
this is a poc, only wired into
streamText
and still missing docs, tests, and broader integrationManual Verification
bun run ./stream-text/gateway-scorers.ts
Checklist
pnpm changeset
in the project root)pnpm prettier-fix
in the project root)Future Work
Related Issues