Hey HN,
I've been working on this side project for the past month. It generates a step-by-step tutorial guide for YouTube videos that you can follow along without watching long videos. Best suited for tutorial videos but can work for other videos aswell. No BS. Just straight to the point.
The guides are generated from pure transcript so you don't have to worry about it being AI. It's my first project as a total beginner. Something I had to do inorder to get out of tutorial hell.
Please let me know if you have any suggestions or if you face any problems or bugs. I would try to fix them to the best of my abilities and as soon as possible.
I would appreciate your feedback on this. Let me know what you think!
One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.
If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.
Great job @aka_sh!
[1] https://github.com/openai/whisper
p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.
p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.