How are you mapping from "click this element" (presumably obtained via a VLM) to the actual DOM locator that refers to it?
I guess Playwright can do it in "record" mode; I'm curious how you do it from a Chrome extension.
Spitballing here, you inject an event filter on the page and when the click happens, grab the element and run some code to synthesize a selector that just refers to that element? (Presumably you could just reuse Playwright's element-to-locator code at this point.)
So when you go into the "selector" mode, the plugin will add event listeners to all the DOM nodes. Based on your click it will try to generate a bunch of selectors statically first (multiple, css and xpath based), and then based on your guidance its the job of agent4 to make stable selectors.
I guess Playwright can do it in "record" mode; I'm curious how you do it from a Chrome extension.
Spitballing here, you inject an event filter on the page and when the click happens, grab the element and run some code to synthesize a selector that just refers to that element? (Presumably you could just reuse Playwright's element-to-locator code at this point.)