Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Any reason why everyone seems to be stuck on this problem?

Because it's really, really difficult. A lot of AI-ish stuff pretty rapidly gets to the point where it _looks_ quite impressive, but struggles to make the jump to actual feasibility. Like, there were convincing demos of voice recognition in the mid-90s. You could buy software to transcribe voice on your home computer, and people did. And, now, well, it's better than in the mid-90s certainly, but you wouldn't trust it to write a transcript, not of anything important. Maybe in 2040 we'll have voice recognition that can produce a perfect transcript, and human transcription will be a quaint old-fashioned concept. But I wouldn't like to bet on it, honestly.

And voice recognition is arguably a far, far easier problem.



I'm old enough to remember all the breathless pronouncements in 2015 about how self-driving cars would be everywhere within 5 years. I was skeptical then and I'm still skeptical that it will ever happen outside of some limited range of well known paths that have been pre-determined to be safe - ie we'll have (and we do have) some self driving cars but they'll essentially be on a closed course and will not leave the course. There are already some small bus lines like that - the bus just goes around a closed circuit picking up and dropping off people always taking the same route.


As a transportation planner, I’m so deep in a bubble that I didn’t realize people actually believed the “FSD is right around the corner” stuff.


we're still trying to convert to electric, self dirving is far out there


These are mostly unrelated. Some of Waymo's vehicles are ICE.


My mistake, I trying to imply to `mass adpotion of` and bungled it


I'd argue that voice recognition is in a worse place now — It creates a FALSE sense that it can do a good transcription, when it is actually corrupted in the worst way.

I took part in a legal deposition where an "AI Transcription Software" was being used. When I received the transcript it had numerous errors, but they were all subtle. More common names were inserted in place of the name that I said, e.g., "Kennedy" instead of "Kemeny". "You have a [something]" was transcribed as "I have a [something]", completely reversing the meaning. And many more errors.

The common thread between the errors was that what was inserted into the transcript would have been the MOST EXPECTED word or phrase, instead of the ACTUAL MORE SURPRISING (surprising in an information-theory way) word or phrase. It's evident that on top of the phoneme recognition layer, this transcription software checked questionable items against tables/graphs of most likely words to occur near the other words it confidently identified in that context. Makes a transcription sound great, but it is WRONG.

The result was that the "AI Transcription" actively destroyed key information and hid that destruction under the guise of a smoothly edited transcription.

Although this surely was not the intent of the system's creators, I cannot think of a better way to make a more evil transcription system.


I also imagine it is a space where...you very quickly get to a product that seems magical...but you can see in the data the places where your magical product will regularly be unable to make clear decisions. Unlike a lot of traditional product development, that knowledge makes it very hard for you to release, because the liability and ethical problems are very real.

It's somewhat unfair because we used to simply collect less comprehensive data on performance and therefore know less about our corner cases - but you don't get to live in the future without dealing with the problems of the future.


> You could buy software to transcribe voice on your home computer, and people did. And, now, well, it's better than in the mid-90s certainly, but you wouldn't trust it to write a transcript, not of anything important.

Is this the fault of software or people generally frequently using bad sound equipment in poor and noisy conditions, such as talking on the phone while driving, on the street, poor connection, wind/rain, etc. ?

Personally, because English isn't my first language, I frequently "fail" to transcribe what is being said and have to ask the other person to repeat themselves. It seems like an AI voice transcriber in 2022 is going to work better than me.


Even in ideal recording conditions, machines are quite bad at transcribing human speech. Look at subtitles on live TV shows, typically produced these days by machine transcription; they're typically barely usable.


Those subtitles are how I learned english in the states. They're actually quite good.

Not sure what you're talking about really.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: