Skip to content

Conversation

chenjix
Copy link
Contributor

@chenjix chenjix commented May 27, 2025

No description provided.

@kennymckormick
Copy link
Member

The current implementation may not be reasonable:

For questions belong to refusal, your evaluation scheme mark the response "correct" if pyautogui command is not detected in the response but the word "wait" is detected. However, the criteria might be unknown to the VLM since it's not present in the prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants