So there’s a very big difference in the sort of vision approach that browser-use...

nikisweeting · 2025-06-28T11:34:10 1751110450

In our experience the DOM-based interaction is more repeatable and performant than vision / xy based, but they each have their tradeoffs, as you said click-and-drag is harder when the source and target arent classic dom elements (e.g. canvas). We'll likely add x,y-based interaction as a fallback method at some point.