Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is this 'just' an insane amount of manual effort? I'm not particularly up on ML, but suspected it at first partly just because 'insane amount of manual effort', but also because many seem clipped too short. But the repo makes it seem very much just like a manual clip collection; acknowledges a 'listicle' of Wilson 'wow' films with 'wow' counts (slight reduction in novel manual effort.. but no timestamps).



100% manual effort, you're right. Took me about three weeks to compile all the clips and data in my spare time and upload them all to Contentful. I'm aware of three clips that others had told me to make a bit longer - I Spy, Drillbit Taylor (second wow), Shanghai Knights (third wow.) If there are others, feel free to let me know, the more input, the better.


I have to ask, what motivated you to see it through?


Basically, I had just finished coding a Twitter bot called @BreakingCasting which posts fake movie remake casting announcements. I really liked using the TMDB API for that so I figured I'd make another small project along the lines of movies/actors/Hollywood until I came up with something more substantive.

I'm pretty sure I came up with this Wow API in the shower and figured it'd be fun and easy - it ended up being fun and tedious. I started working on it on Valentine's Day, finished in early March. The main motivation to finish was my current project, Hollywoodle (https://hollywoodle.ml/), which I refused to start until the Wow API was done.


@BreakingCasting is really cool, congratulations. One suggestion which you are probably already aware: It would be nice if the bot can choose actors of proper age, depending on the difference between the years of the original and the remake. Warren Beatty is older than Harrison Ford, for example!


from the 'read more' section of the site

> In 2015, YouTuber Owenergy uploaded a video showing a supercut of movies (in chronological order) in which Owen Wilson says the word "wow." This video claimed that Owen Wilson has said the word "wow" a total of 102 times (spanning the years 1996-2017) over the course of his film career.

> Many of the scenes mentioned in Owenergy's famous video mistakenly count phrases such as "oh," "whoa," and "pow" as "wow" occurrences and are therefore not included as part of the Owen Wilson Wow API - the total "wow" count of which stands at 91 as of 2022. Additionally, many of the "wow" scenes in Owenergy's YouTube video are out of order and are corrected in this API.


A true fan would have all of his films downloaded including the closed captions.


As someone who needs closed captioning at this point in their life, but can still understand most things, let me tell you how bad closed captioning is. I would not rely on cc to be more than 65% accurate.


I am sorry and surprised to hear that. I would think especially now it wouldn't be too hard to auto generate a good portion.

My experience is very limited but sometimes people ask me to turn on subtitles and they are usually perfectly fine, even the ones that random people on the internet contribute for free - can you elaborate?


> even the ones that random people on the internet contribute for free - can you elaborate

these typically are the best available, but not always quite right.

the cc provided by wgbh are usually pretty good as well, but they are mission focused on delivering good closed captioning - it's just that not everyone wants to use them (or they can't schedule? not sure)

unfortunately, once you leave those two, it's a craps shoot as to whether they make sense or truly convey the message that the audio is conveying. some of them seem to go by what I'm guessing is the original script, which can be fairly close in meaning to what they're saying, others are just a mish-mash of words that are misspelled and don't necessarily convey the meaning that the audio is trying to.

the worst is live sports, which really don't have to be: I've been to conferences that were extremely inclusive and have had real-time (remote) closed captioning - seeing how good those can be make me shake my head in wonder when I see the typical sports closed captioning. eek.

I'm stuck in the middle where I can still hear and interpret most words, so have a lot of incongruence when I am relying on both the cc and the audio when they don't match.


You ever try google docs voice dictation? It is surprisingly good, I think they pushed an update recently. I know it isn't quite the same thing but it seems like such a solvable problem considering they have all the bits.

Youtube CC is hit or miss but sometimes is perfect. I figure a narrow domain like live sports is immanently solvable. Maybe it is time to promote audio captchas. Hmm. This is a bummer, thank you for sharing, would not have guessed.


I've done some commercial CC work recently for a studio and it is REALLY, REALLY hard work. Way harder than it looks to get it right.

Some pieces I would do would have computer-generated first pass. This would be good in places where the words were very clear and the vocabulary was regular. But TV and films can use a lot of weird domain words (see e.g. sci-fi or fantasy) that the computer can't track. And the computer has serious problems with proper nouns and names.

On top of that you have to assign each phrase to a character who might or might not be visible when they are speaking (which sucks in a whole room full of people with similar voices), and you have to time it correctly.

I got fired over an argument about comma placement.


> I got fired over an argument about comma placement.

hopefully not for wanting to use an Oxford comma!


> I got fired over an argument about comma placement.

What happened? Over a comma?


Honestly, a lot of the free internet ones are better.


Closed captions on blu-rays and DVDs are stored as bitmap images, so it wouldn't be possible to search those, although I'm sure the scripts are somewhere on the internet.


Can just OCR them, although you should just watch all of them manually for fun like I'm doing with Nick Cage.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: