Otherwise using a combination of well-known class names, ‘accept’ strings, and heuristics such as z-index, position: fixed/sticky etc can also narrow down the number of likely elements that could be modals/banners.
You could also ask a vision model whether a screenshot has a cookie banner, and ask for co-ordinates to remove it, although this could get expensive at scale!
Thanks, that's a great idea! I was originally going to go the vision model route because I'd also like people to be able to send instructions to sign in with some credentials (like when visiting the nytimes or something).
yeah that's what we basically did here at https://VisualSitemaps.com, but it can also be quickly become over-the-top, and you may end up removing important content. That's why in the end we added a second option to just manually enter CSS classes.
https://chromewebstore.google.com/detail/consent-o-matic/mdj...
Otherwise using a combination of well-known class names, ‘accept’ strings, and heuristics such as z-index, position: fixed/sticky etc can also narrow down the number of likely elements that could be modals/banners.
You could also ask a vision model whether a screenshot has a cookie banner, and ask for co-ordinates to remove it, although this could get expensive at scale!