Often, there is benefit, but it's quickly swamped by the maintenance effort.
In the group I'm working with now, we're transitioning to all interactions with the user being abstracted into an AppModel domain object, which will describe user interaction in an abstract, non GUI or other interface-specific way. We will be implementing meta-level and meta-syntax-driven automated coding standards to detect probably violations. In addition, our (now incomplete) Unit Test suite will be run against a Code Coverage tool against the change requests. So any new change requests will have to have coverage in a Unit Test, which will be detected by a nightly script. (The above tools are based on the Smalltalk Refactoring Browser parser & libraries.)
One benefit -- we will be able to test everything in Unit Tests and leave out GUI specifics!
Another benefit of the AppModel architecture, we can eventually publish our abstracted applications as Web Services, and get out of the business of maintaining nitty-gritty GUI code. This will make that group 10X more valuable to their corporation, and they will eventually be doing only half the work!
I've found that it's often worth unit-testing functionality that depends upon complex combinations of GUI state. For example, "Show the menu only when there is a selected bar and the baz list contains at least two items that can be foozled together." These interactions are very often wrong, and even if you code them right, it's nice to have a record of what the spec was so when someone comes along and says "I can make this code much cleaner if I eliminate the check for foozling" (or worse, they change the definition of foozling), the test breaks and you remember there was a reason you put it there in the first place.
I find that it's a waste to test things like appearance, position, labels - basically anything that should be in CSS or other declarative specifications. But UIs often have quite a bit of actual logic in them (ironically, because real humans are often strikingly illogical), and that should all get tested.
They break down because UI testing tends to rely on Strings: either labels for controls or embedded ids of them. It's really, really easy to have those change on you, and when they do the tests become difficult to debug: if the button with the id "Search" isn't found, is it because something broke to cause the button not to show up, because the id has changed to "SearchUsers," or what? And if I need to remove the "Search" button and replace it with somethine else, how do I figure out which tests to change? The high-level nature of the tests inherently makes them much harder to debug, because the test could have broken for any of a hundred reasons.
In other words, with unit tests the linkage between the code being tested and the test tends to be pretty tight; with UI tests it tends to be very, very loose, and that makes the tests correspondingly more fragile and much harder to debug.
We essentially do typesafe metaprogramming on our web UI that generates compile-time-checked constants for all the labels, buttons, etc. so that our tests don't compile if the UI changes, which has gone a long way to keeping the tests stable; it's the best solution we've come up with, but it's a huge investment, and our attempts to do testing of our Swing client have met with less success so far.
In my experience, I have sunk a lot of time making sure that foo div has bar css class when the quux link is clicked... but it has never saved me much time. I have to click through the site regularly (content changes, "does this work in IE", etc.) anyway, and errors are usually noticeable immediately.
Other problems I've noticed are that they are either so specific that they fail for no reason (it was supposed to be red, not green!) or too general that they don't catch issues that would annoy users.
If I could have these tests for free, I'd take 'em. But since they're expensive and don't get me much, I don't bother.
(BTW, if you have complicated algorithms in your JavaScript; refactor so you can test them with Rhino on the command-line. Don't do this stuff in the browser!)
Anyway, I would be interested in hearing your UI testing success stories.
"Anyway, I would be interested in hearing your UI testing success stories."
For Web apps, I use Selenium. I use the Firefox plug-in to record scenarios,with some hand-tweaking to replace any odd xpath stuff with more robust references to IDs. I then periodically run suites of UI tests to see that things still behavior as they should.
Yes, there are times when page content changes and breaks a test, but it's trivial to see where that is happening, so I've not had a problem keeping them up-to-date.
And having automated integration tests is way faster and more reliable than manually clicking through a site. I've caught numerous bugs this way, mostly in pages that do not typically get much use in real-life (but tend to be the first thing a clients tries when showing off code. Go figure.)
So, with Web+Selenium there's not much overhead to creating and maintaining a set of tests. It's a big win to be able to kick off a full suite and automatically run through a site far faster and more reliably than I could by hand.
It's not so good for desktop apps. I've been looking at Swinger, which marries Cucumber with Jemmy, for integration testing of Swing apps. As far as I know there are no good tools for recording user actions, so tests need to be constructed by hand. This makes it harder to assemble tests that capture assorted complex interactions.
But, at some point, someone has to actually test the app itself, so while it's time consuming to assemble automated UI tests, it may pay for itself over time since it reduces the effort in manually walking through the app.
So far neither of these would replace unit and functional testing, and neither catch all bugs as experienced by the end user, but they do reduce the number of problems that make it into a release. For Web apps, the effort spent is well worth it. For desktop apps, I've yet to reach that balance. However, things keep improving.