Skip to content

Conversation

ryanhoangt
Copy link

Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from browsergym and it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?

One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.

Copy link
Collaborator

@gasse gasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature! See my comments to make it robust to iFrames

window.addEventListener("load", () => {window.browsergym_page_activated();}, {capture: true});
window.addEventListener("pageshow", () => {window.browsergym_page_activated();}, {capture: true});
window.addEventListener("mousemove", () => {window.browsergym_page_activated();}, {capture: true});
window.addEventListener("mousemove", (event) => {window.browsergym_page_activated(); window.pageX = event.clientX; window.pageY = event.clientY;}, {capture: true});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean and simple, I like this

Returns:
An array of the x and y coordinates of the mouse location.
"""
position = page.evaluate("""() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for simple pages, but I'm worried about iframes. Here is something that could work:

  • in the JS callback (mousemove), record the position in JS in the window object, and also record which page / frame received this event, in Python with a method similar to _activate_page_from_js().
  • to extract the mouse position in the browser viewport, take the latest mouse position (last iframe that received a mousemove event), and work your way up the frame hierarchy to reconstruct the current mouse position. See how we do that to get the coordinates of all elements in all iframes here:

https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/observation.py#L293-L377


obs, reward, term, trunc, info = env.step(action)
checkbox = get_checkbox_elem(obs)
assert obs['mouse_position'] == [x, y]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good test, can you do the same for other pages which have iFrames, and check you get the correct coordinates when clicking on elements inside the iframe? (clicking with coordinates, and with bid)

@gasse gasse force-pushed the add-mouse-position branch from cd33d61 to a701498 Compare December 3, 2024 18:49
@gasse
Copy link
Collaborator

gasse commented Dec 3, 2024

BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here
https://github.com/ServiceNow/BrowserGym/

@gasse
Copy link
Collaborator

gasse commented Dec 3, 2024

Seems like there is pageX, pageY but also clientX, clientY
https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX
https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/pageX

Only way to know how / which one of these to use is to write some tests :)

@ryanhoangt
Copy link
Author

Seems like there is pageX, pageY but also clientX, clientY
https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔

@amanjaiswal73892 amanjaiswal73892 linked an issue Jul 16, 2025 that may be closed by this pull request
@amanjaiswal73892 amanjaiswal73892 added the enhancement New feature or request label Jul 16, 2025
@recursix
Copy link
Collaborator

I would like to move forward with this, but cthe urrent code will not universally work.
Could we iterated on this, @ryanhoangt, are you still in terested to work on this.
This chat with claude, is inspiring. It seems like we would need to update all action functions in bgym such that it would update a global variable that would contain the appropriate info.

Might not the best solution, but we could itereate on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mouse coord as observation

4 participants