My experience at least from 2010-2011 was that selenium type tests were woefully brittle and unreliable. Are they generally better these days? If so, is it due to different protocols like remote debugging and headless browsers? Please be kind to this old man and his outdated views.
To be honest, that likely had little to do with Selenium (although there were fewer options around back then) but more with the expectations around the tests.
UI front-end tests are often brittle because people try to test things through them that should have been tested in earlier stages. Either on API level or unit level.
Just to give a simple example. Say you have a login screen. It has a username input, password input, login button and finally a div to show any messages.
The only things you actually want to test here are:
1. A succes login action
2. A action that leads to a message being shown in the message div.
3. *If* there are multiple categories of messages (error, warning, etc) possibly one of each message.
What you don't want to test here are all sorts of login variations that ultimately test input validation (API level) or some other mechanism surrounding password (possibly unit testing).
The problem is that often, and certainly the decade earlier you are talking about, is that companies often take their manual regression tests and just throw them into automation. Forgetting that those manual regressions tests are equally brittle but that this is often overlooked due to the way manual tests are done and reported on.
Having said all that. Selenium is still a solid option in a java environment. But as others have pointed out, there are other very solid options out there like Playwright.
But these also can be equally as brittle if the tests are not setup properly.
I use playwright to run an automated test every time I deploy to staging.
I don't think it's caught any real bugs yet because I haven't actually broken anything but the playwright script keeps running reliably it includes a login and fills out a big long complicated form. Works great. Very quick. Selenium was slow and unreliable.
Here we just provide natural language instructions and the LLMs generate the code appropriate at a given time.
If the site changes, we can regenerate the code using the same instruction, so unless the site changes a lot, it is quite robust
Right so in general I can see this in use by development teams itself cuz we don't want to sit there and manually write tests.
I'd love to tell it to just log in to my own website, click on certain pieces of functionality and repeat that. Especially with more casual day to day tasks.
Heck, we could even auto-generate tests from a bug report (where the steps to reproduce are written in plain english by non-technical testers).
That means less time for a dev to actually reproduce those steps, right?
Exactly! In the future, testers could just write tests in natural language.
Every time we detect, for instance with a vision model, that the interface changed, we ask the Large Action Model to recompute the appropriate code and have it be executed.
Regarding generating tests from bug report totally possible! For now we focus on having a good mapping from low level instructions ("click on X") -> code, but once we solve that, we can have another AI take bug reports -> low level instructions, and use the previously trained LLM!
Really like your use case and would love to chat more about it if you are open. Could you come on our Discord and ping me? https://discord.gg/SDxn9KpqX9