-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Please help us keep the number of duplicated issues small. We kindly ask you to add your input to the appropriate issue or PR in case your feature idea is already being tracked.
- I have searched the existing feature requests, latest roadmap and open pull requests.
Problem
The current code does not check if a model actually exists for OpenAI API provider, because we did not implement the model list for the them. If one is using an invalid model, ALL queries are done, the whole evaluation, because there is never a failure of a non-existing-model.
Example for such an error:
time=2025-02-03T17:45:25.191Z level=INFO msg="querying model" model="custom-openai/o3-mini-2025-01-31@reasoning_effort=high" id=0x89e3e0 prompt="\tGiven the following Go code file \"plain.go\" with package \"plain\", provide a test file for this code.\n\tThe tests should produce 100 percent code coverage and must compile.\n\tThe response must contain only the test code in a fenced code block and nothing else.\n\n\t```golang\n\tpackage plain\n\n\tfunc plain() {\n\t\treturn // This does not do anything but it gives us a line to cover.\n\t}\n\t```\n"
time=2025-02-03T17:45:25.513Z level=INFO msg="query retry" count=1 total=3 error="error, status code: 400, message: invalid model ID\ngithub.com/symflower/eval-dev-quality/provider/openai-api.QueryOpenAIAPIModel\n\t/app/provider/openai-api/query.go:26\ngithub.com/symflower/eval-dev-quality/provider/openai-api.(*Provider).Query\n\t/app/provider/openai-api/openai.go:67\ngithub.com/symflower/eval-dev-quality/model/llm.(*Model).query.func1\n\t/app/model/llm/llm.go:305\ngithub.com/avast/retry-go.Do\n\t/go/pkg/mod/github.com/avast/[email protected]+incompatible/retry.go:127\ngithub.com/symflower/eval-dev-quality/model/llm.(*Model).query\n\t/app/model/llm/llm.go:300\ngithub.com/symflower/eval-dev-quality/model/llm.(*Model).WriteTests\n\t/app/model/llm/llm.go:275\ngithub.com/symflower/eval-dev-quality/evaluate/task.runModelAndSymflowerFix\n\t/app/evaluate/task/symflower.go:82\ngithub.com/symflower/eval-dev-quality/evaluate/task.(*WriteTests).Run\n\t/app/evaluate/task/write-test.go:103\ngithub.com/symflower/eval-dev-quality/evaluate.Evaluate.func2\n\t/app/evaluate/evaluate.go:165\ngithub.com/symflower/eval-dev-quality/evaluate.withLoadedModel\n\t/app/evaluate/evaluate.go:322\ngithub.com/symflower/eval-dev-quality/evaluate.Evaluate\n\t/app/evaluate/evaluate.go:142\ngithub.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.(*Evaluate).evaluateLocal\n\t/app/cmd/eval-dev-quality/cmd/evaluate.go:527\ngithub.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.(*Evaluate).Execute\n\t/app/cmd/eval-dev-quality/cmd/evaluate.go:508\ngithub.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.Execute.func1\n\t/app/cmd/eval-dev-quality/cmd/command.go:66\ngithub.com/jessevdk/go-flags.(*Parser).ParseArgs\n\t/go/pkg/mod/github.com/jessevdk/[email protected]/parser.go:333\ngithub.com/symflower/eval-dev-quality/cmd/eval-dev-quality/cmd.Execute\n\t/app/cmd/eval-dev-quality/cmd/command.go:69\nmain.main\n\t/app/cmd/eval-dev-quality/main.go:11\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:272\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700"
time=2025-02-03T17:45:30.517Z level=INFO msg="querying model" model="custom-openai/o3-mini-2025-01-31@reasoning_effort=high" id=0x89e3e0 prompt="\tGiven the following Go code file \"plain.go\" with package \"plain\", provide a test file for this code.\n\tThe tests should produce 100 percent code coverage and must compile.\n\tThe response must contain only the test code in a fenced code block and nothing else.\n\n\t```golang\n\tpackage plain\n\n\tfunc plain() {\n\t\treturn // This does not do anything but it gives us a line to cover.\n\t}\n\t```\n"
time=2025-02-03T17:45:30.707Z level=INFO msg="query retry" count=2 total=3 error="error, status code: 400, message: invalid model ID\ngithub.com/symflower/eval-dev-quality/provider/openai-api.QueryOpenAIAPIModel\n\t/app/provider/openai-api/query.go:26\ngithub.com/symflower/eval-dev-quality/provider/openai-api.(*Provider).Query\n\t/app/provider/openai-api/openai.go:67\ngithub.com/symflower/eval-dev-quality/model/llm.
Solution (optional)
Tasks:
- Check the failure for the queries, and simple stop the evaluation for that model
- Implement querying models e.g. https://platform.openai.com/docs/api-reference/models
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request