DeArrow is made by the same person that made SponsorBlock. It's about the highest quality of user-uploaded content, and users mark the following: sponsors, promotions, exclusive access, subscribe callouts, video highlights, intermission, credits, recaps, jokes, as well as create custom chapters.
Not only is the quality extremely high, users self-regulate: a user submission is first spread to a few random users to see if they approve of it, then more, until it spread throughout. So, bad submissions will go out the window. Are there some submissions like highlights that are just "good enough" ? Yes. Does it need to be perfect ? Absolutely not. Does it need to run through a black box of an AI model that's going to upload 5% of the time absolute crap ? Never.
User submissions will always be better than an LLM (which isn't even appropriate for the context of a video.)
It just seems like an unsustainable system that requires user opt-in on a platform where people go to actually avoid opting into anything and just watch videos.
The model doesn't have to be a blackbox either. Plenty of open-source models can summarize a video faster than any human can.
LLMs are a black box not because they're closed or open source, but because they're based upon a neural net. yes, a closed-source LLM is more of a black box, but an open-source LLM is still a black box
beyond the error rate, the problem with using an LLM vs user-generated titles is that LLM use costs money, and we're not quite at the point of running high-quality LLMs on generic hardware yet. also, realistically titles aren't the main problem that needs solving here
also, do not underestimate petty people: sponsorblock works just fine with user-generated data
finally, the video demo clearly shows that if there isn't a user-uploaded alternative thumbnail, DeArrow picks a frame from the video to use, as is handily suggested by GP who hasn't read the article
That statement will age like beautiful, fine wine when your LLMs keep training on LLM generated data and get influenced by the very same creators putting in clickbait images, leading the LLM to believe it is the most appropriate part, etc.
The notion of "garbage in, garbage out" is debunked when we introduce an additional signal, such as responses generated by humans, validation through tests (like code verification), or engagement in a game where maximizing the score is the aim.
Consider the extraordinary case of AlphaGo Zero. Despite beginning with a random initialization and without any human game data to train on, it mastered Go and chess solely through game feedback, reaching superhuman levels. The potency of feedback is nothing short of magical.
Shifting our focus to humans, what's the nature of our major breakthroughs? More often than not, we serendipitously encounter them. They don't typically arise from deduction but from meticulous observation of how our existing theories align with reality. Essentially, we observe and integrate feedback.
Involvement in a larger system - be it the world, society, the internet, or even a dialogue session with a human - is how AIs can transcend the mere regurgitation of the training set. With every interaction, they receive a nugget of new data in the prompt and feedback following the response.
Not only is the quality extremely high, users self-regulate: a user submission is first spread to a few random users to see if they approve of it, then more, until it spread throughout. So, bad submissions will go out the window. Are there some submissions like highlights that are just "good enough" ? Yes. Does it need to be perfect ? Absolutely not. Does it need to run through a black box of an AI model that's going to upload 5% of the time absolute crap ? Never.
User submissions will always be better than an LLM (which isn't even appropriate for the context of a video.)