Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Now I can just print that video (forret.com)
276 points by pforret on Dec 4, 2023 | hide | past | favorite | 119 comments


Definitely would use this.

Instructional video instead of step-by-step text is a personal pet peeve. I know it's a lot easier to just record a video to show something like "how to replace the battery on a cordless vacuum" or "removing a sink basin nut" but it's often such a painful experience for consumption (watch a moment, pause, scrub back and watch again, pause, continue, pause, all with potentially gloved hands often in tight working spaces).


I'm on the other end of this in a way. I think it may come from having to read and write all day every day. Sometimes just having somebody yak at me for a few minutes is useful.

I really enjoy watching instructional videos, especially for recipes. The demo of the cooking techniques is almost always hard to write or talk about, and easy to show.

In the kitchen it works this way for me:

1. Watch the video once or twice all the way through to "learn it" and decide if it's what I want to do.

2. Put together my mise en place and basic prep for the recipe. Learning to do this was a game changer.

3. Finally, put it on my phone or tablet in my kitchen and let it play while I work, it's mostly audio at this point as I've "seen" the content a few times but I'm just listening as if the video is a coach. I'll hit pause at the major steps, and scrub back if I need a refresher on a technique or step.

I've gotten through some very complex dishes this way, and never hit the equivalent rhythm using cookbooks or recipe websites. The audio part of step 3 is really critical to me as it helps me focus on the food rather than remembering all the steps and it's just fills up the background space in my kitchen or act as a coach. The only way it would be better for me is if it automatically paused after each step and I could then ask it "what next?" or "go back two steps, I missed a step" or some other audio prompt.


Your workflow sounds like literal hell to me. I will do anything in my power to get plain text to avoid exactly the experience you are describing!


They are spending a great deal of time, getting this one thing just right.

Most of us, in the IT world, aren't doing that. We're trying to get through this task quickly, and accurately, before moving on to the next one, so for us, we don't need this level of prep/consideration/review.

I'm with you. It's not my style either, after 30+ years in IT. lol


Different strokes for different folks.


Greasy strokes on the phone/tablet screen to say the least!


It sounds like you're describing edutainment, which I also love, but which is a very different exercise for me than genuinely trying to expedite learning how to do some well-scoped task.

Granted, there's a blurry line here, since I certainly may pick up some useful techniques and knowledge from cooking edutainment content even though I never genuinely aspire to the same level of personal cooking.


People have a mental cap on what text should cost. If someone creates instructional content that provides thousands of dollars in value, they can sell videos for $200+, but a book version is hard to sell over $50, even if both provide the same value. Even for free content it is easier to monetize YouTube than it is to monetize a blog.

If we want people to create more text-based material, it needs to have similar financial incentives.


I think part of this is how society consumes information. In the 80s, you mostly had books. Sure, there were some video courses, but a majority of the learning was through books.

Now, people consume most of their information in video formats. Think about the rise of Vine, Youtube, TikTok, and the 100s of others out there just like them.

They are growing like weeds, because that's apparently how the public now likes to consume media, info, etc.


I assume most of this video bias comes from the ease of monetization (for content creators) on Youtube. That might in turn come from some mental cap held by viewers on the value of text vs. video, but I suspect it has more to do with the number and value of ads that can be shown by the platform. Some random blog platform is going to maybe have inline image ads, while Youtube has unskippable video ads which I assume are valued more by advertisers.


That may be a partial explanation, but it only makes the situation worse!

"Oh, but they're only doing it because it will trick people into paying more money."


I have no data to back this up but just taking a stab in the dark - a possible reason might be because people generally tend to prefer learning from someone talking about the subject matter?

I know most all of us here are techies and very used to cracking open books and documentation and text tutorials to teach ourselves stuff, but many people are not like that and especially if you're new to a subject, sometimes books just don't help things to click as well for some reason.

There's probably something to do with the way material is structured and presented differently between talking about it and writing about it, but I wouldn't know what to say about it.

I dunno, just a guess because it's an interesting observation to think about.


oh, it depends

I understand and agree with you but there are situations where full video is better anyways.

example from life: I needed to teardown old laptop to replace thermal paste and I was following some image guide it was all fine until one part stuck and I couldn't figure out what was holding it. there was no way to figure that out from description and images, I needed to find video.

I guess what I'm trying to say is that ideally you want both, or maybe hybrid? like step by step guide constructed from short looped videos showing you how to do that single step?


I generally agree with you. I would appreciate it if the youtubers that do these would just add the list to the text box they write in anyway. That would help a great deal, since all of it would be right there.

Visuals can be useful. I had to look at one just this morning tearing down an old HP Elitebook 8470p. Here I was removing all kinds of screws, when it's just a battery release sliders move, then the entire panel just slides off. doh.

But, most of the time, I just need the list. I don't need a video of someone stepping through some arcane, obscure and rarely needed AD repair. Just gimme the steps and I'll take it from there. :)


I agree 100% with this. Having both video and text is ideal, but if it has to be one or the other, text is much better.


Bard can do this. They have youtube extension.


100% with you on this one. Just give me the list. If I need visuals, I'll chase them down.


Another big annoyance is websites with recipes that do any of the following indcredibly bad UX patterns:

- Big white page not showing any text or images until the entire page and its assets are downloaded, which means if you accidentally click something and go back you have to wait another several seconds for everything to load again

- Pop up GDPR popup while hands are covered in flour and eggs

- Pop up "would you like to subscribe to the newsletter" while hands are covered in sticky sauce

- Pop up "buy this shit for 10% off" with a microscopic X button while something on high heat on the stove

- Not specifying image height and width in CSS so that when user is looking at a piece of text and images above it load, the scrolling position jumps

For these reasons alone I've largely stopped looking at the internet for recipes and turned to physical books, which are much better behaved.


Don't forget the lazy-loading pages that don't properly set their flex box positions so that when you go to click on something a new link pops in at the place you were just about to click that takes you to a different page.

I'm tempted to say that this is a dark pattern because when it happens to me is is almost always a "subscribe", "purchase", or "login" button.


> dark pattern

I wonder how often dark patterns are the result of goal directed A/B testing?

If the goal is set to "did the user click on an advert" - and the A/B changes are fuzzed CSS - then the results would be deviously dark.


Reminds me of how my company is now tracking badge swipe data to try to enforce people going into the office 3 days per week, and sending tickets to managers about their reports who fall short of it.

Someone probably invented it to hit some KPI of number of employees going to the office 3 days per week.

The reality is people are going to the office sick and spreading all kinds of viruses, I often have to take meetings from my car because I can't find a meeting room, among many, many other problems.

But that dude that implemented the system probably got a promotion.


I've started copying these recipes into Crouton https://crouton.app/

It does a remarkable job at extracting the recipes, and the end result is a consistent experience no matter the source.


I've been using Paprika https://www.paprikaapp.com/ for much the same thing. It's amazing how useful it is.


Same idea at https://justtherecipe.com - also you can login with Google account and save recipes there.


You just described all websites


Thankfully reader view defeats most of this, though it has it's drawbacks, but the majority of what you need is readily available via this method.


I saw a YouTube video by a guy who specializes in building D&D characters. He spends twenty minutes going into detail on each one, and then makes the pitch for subscribing to his Patreon account with something like "members get all the details in a convenient list so that you don't have to keep going back to this video."

So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl. It's a bit of a shame that fixing this problem for me will cause one for him.


> So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl.

You spelled out exactly what the attention economy is about. Friction. The money is made on friction. Waste - of time, of cognitive effort, of emotions good and bad.

I feel sorry for this guy, but at the same time, I wish people recognized that attention economy isn't about some nebulous attention you have too much of and don't feel when it's being taken. On the contrary, attention is stolen through friction, and the sum of everyone who "fills their rice bowls" this way is why the web and so many processes and activities on-line feel like shit and remain painfully wasteful.


What's cracking, my internet fam?! Your boy mwigdahl is BACK with another epic comment that will blow your mind and tickle your funny bone! Before I drop this atomic truth bomb on y’all, make sure to SMASH that like button, OBLITERATE the subscribe link, and ANNIHILATE the bell icon so you can join the notification squad and never miss out on my absolutely, positively, life-altering comments!

So, here's the moment you've all been waiting for, after an intense period of reflection, meticulous research, and deep philosophical thought, I’ve come to a profound conclusion that will shake the very foundations of our virtual world:

"I TOTALLY agree!"

Mind blown, right? I know, I know. It's a truth so pure, so succinct, it could only be expressed in exactly three words. But wait, there's more!

Now, before you recover from the sheer brilliance of this comment, hit me up with those triple likes, double shares, and single-minded adoration as I ride into the digital sunset! Remember, it ain't an epic dialogue without a bit of back and forth, so drop your cosmic brain thoughts down below and let's get the internet's greatest conversation rolling!

But hold up, don't scroll away just yet, because I've got a special offer for the next 10 seconds only! If you comment with the hashtag #TotallyAgreeSquad, I'll personally send a virtual high-five your way, delivered at the speed of your internet connection – probably faster than my aunt trying to snag that last piece of pie at Thanksgiving!

And folks, before we wrap up this video comment extravaganza, let's take a moment to honor our totally real and not at all imaginary sponsor, NordExpressVPNWebShadowers. Protect your secrets, your snacks, and yes, even your secret snacks, with their military-grade encryption – because online privacy is no joke, but my puns sure are!

In conclusion, remember to comment, like, and worship the subscribe button. Keep your snacks safe, your memes dank, and your agreements totally - it's your boy mwigdahl, signing off until the next video comment!


I get it, but most content -- that you and I find helpful in so many different cases -- exists because it can be monetized.

And, to your point, friction-based monetization is one of the more effective ways to monetize your content.

If you can't monetize your content, what's the point in creating it? Creating a lot of this content takes time, and therefore many people won't create it if it's not worth their time.

If the world would just start paying directly for content (e.g. via Patreon), and if that was the only monetization needed, then maybe we could remove the painful friction (or other painful methods of monetization). But unfortunately, this will probably never be sufficient on its own.


I create content, heck i'm doing it right now, without being paid. Some is relatevly low value (but not none, I can see you like to spend time reading hn comments) but some ive made has touched thousands of people and been high effort stuff. Am I crazy? am I simply irrational?

No. People will create without incentive. put multiple humans in a box and wait, and culture will spontaneously be generated. Income is not, never will be, and has never, been a nessacary part of this equation.

In general ive observed a tragic cycle of vibrant communities blossoming into existence where everyone created because they want to, people have healthy engagement with it and get primarily positive vibes. Then over time, money invades, starts paying a select few of the creators and arbitrarily excluding others with no partiuclar pattern. Bad vibes ensue, and even those paid have to run in place and burn out. Content quality suffers, becoming less about creative expression and making audience happy, and more about extracting clicks, money, ads, buy, subscribe, like


> I get it, but most content -- that you and I find helpful in so many different cases -- exists because it can be monetized.

That's a tragedy and it didn't use to be like this. In the past, you could use "exists to be monetized" as synonym to "garbage" and effectively filter such content[0] out of your browsing. Friction-based monetization is a giant "fuck you" to the user, so you can rightfully expect the quality and trustworthiness of content to match that attitude. The heuristic is still 100% valid, but it's increasingly hard to find anything other than content made for monetization[1].

I mean:

> If you can't monetize your content, what's the point in creating it?

The answer to that is, "you shouldn't".

> Creating a lot of this content takes time, and therefore many people won't create it if it's not worth their time.

Then those people should find a different, productive activity, and leave the "content creation" to people who are baffled at the question above, because for them, the reason is obvious - "because I can", or "for status", or "pay it forward", or "this would help others", or "the world would be a better place if people knew this thing I know". And none of that precludes asking people to pay for access.

> If the world would just start paying directly for content (e.g. via Patreon), and if that was the only monetization needed, then maybe we could remove the painful friction

No, let's not reverse the order in which things happened. Paying directly for content used to be the norm. It's nigh-impossible now, because everyone and their dog zeroed in on the perfect anti-competitive hack: free but with ads. This prevent almost all honest competition, because unless you have enough surplus to fund your creation yourself, you can't compete with free.

--

[0] - The use of the term "content" on its own implies we're dealing with facsimile without soul.

[1] - It's not that it doesn't exist - but rather, all the major platforms are, overtly or covertly, advertising platforms, so they both enable garbage peddlers and promote the garbage, because that's what pays their bills. In this way, it's not the centralization of the Internet alone that's the problem - it's centralization into platforms with structurally malicious incentives.


> The answer to that is, "you shouldn't".

Unless you enjoy it. Like the huge number of content creators who make little to no money on it, plus the huge number of content creators who do make money on it now but before they were able to.


>exists because it can be monetized

This in itself is a huge problem. The internet used to be a beacon of hope for information sharing, now it's all behind paywalls...to the point of ad-nauseam.

I understand not everything should be "free", but it's nearly impossible to access anything without the need for an account, pay to access it, get bombarded with adverts, reminded to "like & subscribe"....it's shit.


the guy deserves to be compensated for his efforts. I think this is a pretty pessimistic take.


This is a slippery slope because now we're entering a stage where we're commodifying hobbies to the point that it stops being fun, and starts being another product or service that needs to be paid for, whereas previously, it was shared due to passion of said hobby.

I'm seeing it in one of the oldest hobbies I still entertain, RC cars. Small shops have all but went under, everyone buys from the internet, and when you can't figure out how to repair something, you're paying a massive fee for a specialist to figure it out. Adults can deal with this begrudgingly, but this was a child focused hobby primarily, and now we're pricing them out of it.


But he doesn't deserve to steal from me to get it.


Maybe if your business model includes putting things in an inconvenient format that could best be replaced by a bulleted list, you should rethink your business model.


If that's truly what you're doing, sure. If you can reduce a movie to its screenplay without losing value, that's probably what you should be doing. A recipe is a great complement to a cooking video, but reducing the value of a cooking video's value to the recipe is oversimplified.

I wish we'd go the other way, where free text content is complemented by paid-for audio or video commentary. But it has to be a very dull video for a bullet list to be a good replacement (and a bad bullet list to be able to capture what a good video production can convey).


> A recipe is a great complement to a cooking video, but reducing the value of a cooking video's value to the recipe is oversimplified.

There is a simple test to make here: is the complimentary recipe released to the viewer, so that they can read it conveniently and at their own pace? If the recipe is truly a complement to otherwise great cooking video, then releasing it is a no-brainer. If it's withheld, then one has to wonder why, and what the video publisher is afraid of.


Yeah no. There are probably amazing and insightful cooking videos out there, but the moment you intentionally withold the recipe unless people pay, you're using a disadvantage of the medium for rent-seeking.


A lot of this is actually caused by Google in benefiting observability of videos that are longer so they (google) can show more ads. You either get a 60 second short, or 10ish minutes. This leads content creators to stretch out their videos longer than they really should be.


Have you considered paying for the patreon regardless because you consume his content one way or another and value it?


I consume hundreds or thousands of creators' content that I value more than this particular channel. If I felt the duty to donate to each one, let alone subscribe, I would consume much less of it and live in a smaller world.

Perhaps it is a rationalization, but I don't feel that consuming content that someone offers to me for free creates an obligation on my part, whether I love it or not.


I agree that you are not obligated to pay, as in you don't deserve to have debt for consuming things you value. I think it may be a rationalization to argue that you shouldn't try to. Although maybe there is a difference of opinion on separating an obligation from an ought to.

Justifying why it's okay to feel entitled to the content for free (due to lots of available free content) even though it provides a personal value to you is what oversteps in the wrong direction to me, if that's part of your take.

I should probably clarify that merely grabbing or holding your attention is not what I mean by providing value. I mean it saves you time, it gives you conversation topics, it helps you grow, its something you look forward to consuming, it isnt just taking attention you didnt know what to do with, etc. In kther words, there's something specific to the content that resonates with you.

Its possible you just don't value this content all that much, even if you do appreciate it and find it interesting - which is okay.

Value is very subjective and personal, there's no way to codify what I'm saying here.

And it's also fair to argue that you are paying for most of these things with your time and attention, although with your specific example an improvement was theorized that would give you the same value while saving you time, and that translated to paying the creator less. Something about the specifics there didn't seem right


And don't they make money from ads or subscriptions via YouTube or whatever platform they're on? I don't want to bother paying each creator, but I'm fine subscribing to YouTube or Pandora.


interesting. you can't afford to compensate all of the creators whose content you consume, so none of them shall receive compensation?

there is no obligation, it's just a good and kind thing to do.


I'm looking for the implication of the above that "none of them shall receive compensation" from me and not finding it.


fair, I did not assume positive intent


I think ideally the creator would be compensated by PrintThatVideo, who is taking their content and repurposing it.


I mean, I expect someone to make this a module for something than can be ran on your own computer with LLAMA (or whatever) in pretty short order.

The attention market cycle is wrapping up and at this rate AI/LLMs will further kill the market for grabbing your attention by filtering that crap out. Grab the signal, filter the noise.

Yuval Noah Harari is likely correct, the future isn't about attention, it's about intimacy.


> The attention market cycle is wrapping up and at this rate AI/LLMs will further kill the market for grabbing your attention by filtering that crap out. Grab the signal, filter the noise.

That's not my impression at all, but I suppose we'll see which force prevails - user filtering out noise with AI, vs. producers generating much greater volume of low-quality noise with AI tools, and some of them also using AI tools to make some types of noise harder to discern.


Is that a counterargument? It doesn’t seem like one. Why does it matter where and how the content is repurposed? If there is value found and extracted and the person that created that value isn’t compensated then we have a problem. No more incentives to create value.


Do you remember what was the channel? Thanks



No need to spend hours trying to get the text extraction just right - pass the raw extraction into GPT and ask for it to give you the recipe.


I was thinking the same thing. Extraction and basic formatting of information from human language is something that LLMs excel at. Especially if the result is being shown to a human so small mistakes can be tolerated.


It could also dramatically increase quality, look at the below from the example PDF from the page:

> Garlic balsamic chicken, you'll be making over and over. By sear your chicken. I resin wine. Grab yourself a nice large bowl, extra virgin and olive oil. Balsamic glaze. Tomato paste. Honey, fresh lemon juice, garlic, Oregano fresh thyme, coat my chicken with this beautiful balsamic, no balsamic, left behind. Don't you dare waste the good thing. Right? Going in the oven at four twenty five degrees, about thirty ish minutes. Look yes. Fresh thyme, fresh parsley. This is so good. I can't wait. Win our winner. Oh

If we run this through ChatGPT with some basic prompt engineering this becomes:

> Start by searing your chicken in a pan. In a large bowl, combine extra virgin olive oil and balsamic glaze. Add tomato paste, honey, fresh lemon juice, garlic, oregano, and fresh thyme. Coat the chicken thoroughly with this balsamic mixture, ensuring no glaze is left behind. Preheat your oven to 425 degrees Fahrenheit. Place the coated chicken in the oven and bake for about 30 minutes. Once cooked, garnish with fresh thyme and fresh parsley. Serve and enjoy your delicious garlic balsamic chicken. (Note: The phrase "I resin wine" in the audio transcription seems unclear and is possibly a mishearing. I have omitted it as it does not appear to fit the context of the recipe.)


Looks pretty good to me when I use all "the tricks": https://chat.openai.com/share/18bb729c-82e9-4c7d-abc1-a977c9...


It's a little confused about it though.. It's not clear in the gpt version that "balsamic glaze" is what you are making by mixing the ingredients together, and makes it sound rather like some other ingredient you are mixing in. Granted, we have to decipher that a little bit in the first one too, but its not nearly as bad.


It's a balsamic vinegar glaze, which is an ingredient, not what you get when you mix the ingredients together.

That's why it refers to a 'balsamic mixture' once it's mixed in with other things. I actually think it's the opposite - the GPT version is clear and the non-GPT version confused you into thinking that a balsamic glaze is Tomato paste. Honey, Lemon, Garlic etc mixed together.

This is what you need: https://www.ocado.com/products/m-s-glaze-with-balsamic-vineg...


Among other things, it would be wild if "balsamic glaze" included no balsamic.


I see, well the computer did a fine job then. Now I just think that if your gonna add honey and tomato to it anyway, skip the "glaze" product and just buy good balsamic!


Human recipes are extremely inconsistent in that manner too.

When I was fresh out of college my wife and I tried to make some sort of recipe with hamburger and flour. I now know and understand it was trying to get us to make a roux [1] and then mix the hamburger into that. But it described the steps for that very simply and directly with no way to know when to stop cooking the roux, and I had no idea what a roux was at the time. So we ended up with one of the worst meals I've ever cooked: Browned hamburger mixed in soggy raw flour. Heck, I wasn't even salting anything properly then, so it would be unseasoned browned hamburger in soggy raw flour.

As cash-strapped as I was at the time, that one still went in the trash. If I recall even the dog was not impressed.

Many years later I saw the Good Eats episode on roux and the light bulb went off.

Mind you, even made properly what I recall of that recipe would be something more like a base to further spice and use with something else rather than a meal. It was a supposed to be a simple recipe, but it was really too simple. But it would at least be an edible base for further elaboration.

Since then I've been on the lookout for recipes that are clearly invoking some cooking technique but don't really describe it correctly, either because they assume you already know it, or it is straight-up just described wrong. There's a lot of them. The "Internet Cookbook" is full of ideas and I like it for that, but it's quite caveat emptor when it comes to following recipes directly. The skills to make a recipe website, SEO it so it actually gets hits, keep all the ads working, and get pretty cooking pictures don't overlap much with the skill of writing a good recipe.

[1]: https://www.seriouseats.com/a-brief-guide-to-roux#toc-what-i...


If you feed key frames stitched together from the video through the GPT-4V vision model, the vision model can ensure that the steps align with the “story” shown in the images.


what a time saver!


Be mindful however that recipes and song lyrics are the two specific cases where OpenAI is explicitly telling the model not to cooperate, via the default system prompt. They really don't want you to have the bot regurgitate existing text in these two categories, and that includes a recipe you added into the context window yourself. I don't know if the extent of their exception here is limited to system prompt only (so technically not relevant to API users), or if they also biased the model itself at RLHF stage to not reproduce recipes and lyrics.


First time I’m hearing about this for recipes, what’s the source?


Recovered system prompts from OpenAI models. There's a repo that's been tracking those I saw on HN the other day; not sure if it's the same I saw, but this one claims to have collected quite a lot of those:

https://github.com/LouisShark/chatgpt_system_prompt/

Here's the one for ChatGPT with GPT-4 + Dall-E + code interpreter + search:

https://github.com/LouisShark/chatgpt_system_prompt/blob/mai...

It matches what I remembered seeing a week or two ago. View it, and search for "lyrics" or "recipe". Or, to make it simpler, quoting from first appearance of "lyrics" and "recipes" to the last one:

    Do not repeat lyrics obtained from this tool.
    Do not repeat recipes obtained from this tool.
    Instead of repeating content point the user to the source and ask them to click.
    ALWAYS include multiple distinct sources in your response, at LEAST 3-4.

    Except for recipes, be very thorough. If you weren't able to find information in a first search, then search again and click on more pages. (Do not apply this guideline to lyrics or recipes.)
    Use high effort; only tell Except for recipes, be very thorough. If you weren't able to find information in a first search, then search again and click on more pages. (Do not apply this guideline to lyrics or recipes.)
    Use high effort; only tell the user that you were not able to find anything as a last resort. Keep trying instead of giving up. (Do not apply this guideline to lyrics or recipes.)
    Organize responses to flow well, not by source or by citation. Ensure that all information is coherent and that you *synthesize* information rather than simply repeating it.
    Always be thorough enough to find exactly what the user is looking for. Provide context, and consult all relevant sources you found during browsing but keep the answer concise and don't include superfluous information.

    EXTREMELY IMPORTANT. Do NOT be thorough in the case of lyrics or recipes found online. Even if the user insists. You can make up recipes though.


Thanks for the tip! I will add GPT to the mix to clean up the speech and title data.


It's a very cool technical feat, but not something I would personally pay for. I'll just spend the 1-2 minutes to watch the video for free. Not trying to discourage you, just giving honest feedback. Launching the early landing page is a good idea to validate further.


I could also need a service for trimming all of the fat from how-to articles.

> We’ve all been there: we used the florb for too many glorbs and now it needs to be replaced. [...]

> This is an experience that everyone at the staff of howto.biz.uk has had! [...]

> But how do you replace a used-up florb? In this article we are going to show you how. [...]

> [scan the next five paragraphs]


Same with recipes, where the author frequently feels the need to reiterate their grand-grand-grandparents life history in 10 paragraphs before getting to the ingredients and step-by-step instructions. It's sad to see what SEO has become, really...


This is pretty cool but I'd like to see a well-formatted recipe, not a transcript. I prefer the markdown format for recipes so I worked on something like this earlier this year [0]. It fetches Youtube subs (with no audio processing like the video itself like this project) and returns a markdown with ingredients and steps.

[0] https://github.com/gaganpreet/summarise-youtube-recipes


As someone who's learning was significantly accelerated by the "written tutorial" phase of the internet this would be a really great little tool. I find video tutorials to be far more cumbersome than text+ images.


I kind of wrote something for this a few years ago: https://github.com/rberenguel/glancer [edited a fat-fingered copy-paste]

The use-case is technical videos (like from conferences) I’m interested, but not enough to invest 20-60 minutes.

Haven’t used it in a few months so the yt-dlp commands may need updating.


Sadly getting a 404 here–maybe this is a private repository?


I think RBerenguel intended https://github.com/rberenguel/glancer


Thanks, I fatfingered the copy paste on my phone :/


You can also use software to detect “cuts” in the video, which can be used to improve the frame-extraction over just getting six evenly spaced frames from the video.


This is a task called "video summarization". See https://paperswithcode.com/task/video-summarization . I guess the whole project is something like summarizing from video + subtitles + text to pictures + text.


Not the post author but I tried this with ffmpeg and failed. Do you (does anyone) want to share some pointers?


I used something like this a few years ago in a project sort of similar to this one. There's a bunch of parsing and processing to do with that, and the "0.3" value is ... fiddly, but it worked pretty well:

    ffprobe -show_frames -of compact=p=0 -f lavfi "movie=THE_VIDEO_FILE,select=gt(scene\,0.3)" -pretty`


I played with that too before.

`ffmpeg -i input.mp4 -vf "select='gt(scene,0.4)'" -vsync vfr frame-%2d.jpg`

(from the repo pforret/filmpace)

For this project, I want to find an A.I. solution for finding the most 'interesting' frames. Not even sure how to measure interestingness yet, might be the presence of text, the presence of a human ...


PySceneDetect (https://www.scenedetect.com/) might be useful.


Do video formats support structured meta data to be embedded in them?

If I make a video of me cooking, can I embed the recipe in the video, etc. Not just visually, but i.e. at 10s, I digitally insert the data "Add 1 cup red peppers". It isn't necessary a caption of something said or shown, just extra data.

Could a video creator leave substantially more metadata in their videos? I always assumed the pop-up metadata was externally stored and timestamp synced. Is there a way to embed it?


That sounds a bit like subtitles, or Timed Text. There are simple formats (just a text and the moment it should appear) but some formats support changing the position, color, font… most of the times this would be embedded in an extra sidecar file like an .srt or a .sub


It would be better all-around to just have that data in a separate file with timestamps.


Recommend passing the speech-to-text narration through a round of GPT4 API to correct for any transcription errors (use some prompt giving context that it's speech to text)


Wonder if Kagi's universal summarizer would work on recipe videos. It seems to do a decent job on YouTube videos, but those usually have cc built in.


Great, a way to turn videos into something I can scan. Actually something I'd consider using.


This is great, thank you for sharing! I wonder what the reverse would look like. More and more nowadays, I find myself first looking on YouTube for tutorials and walkthroughs, even if they wind up being more verbose than their written counterparts.


Using yt-dlp, ffmpeg and various AI services to print videos (e.g. cooking IG reels)


Based on the example shown on the page, the output doesn't seem very good. If that's one of the better examples the software produced, I don't think this will be useful in practice.


This is one of the first results. The third, if I remember correctly.

I got this running yesterday (Sunday), and I wanted to write the blog post first to test if there was any interest in this topic. Apparently, yes. Now I only have to do the remaining 80% ;-)


An evolution of this process would make it feasible to do retrieval-augmented generation using information from video content. I've thought about trying to do this to improve the (already impressive) abilities LLM's possess as a creative writing assistant/rubber ducky; a lot of good writing advice is on YouTube in the form of video essays, tutorials, lectures, etc.


The copyright notice on the output is a poor choice, since you almost certainly do not own the copyright to any of the content. You've gone to impressive lengths to ensure that the result is true to the source material, which means that there is no claim to this being a transformative work.

(Very cool and useful project, though.)


Ha! Print that video? Yes, but can you FIND THE PRINTER? ---- I humbly apologize, I thought this was some joke, or errant stupidity. Its not. This person has put some very serious thought into not only getting it to work, but to make it useful. Very useful. You have earned my Upvote, and recommendation. Thank you Mr Forret. Thank you.


If the main challenge was 'not having the smartphone in the kitchen', then one possible solution could have been getting another screen dedicated to the kitchen. A tablet, a laptop, a small TV+Google Cast or such combination.

It seems to be a proper media for 'printing' a video.

Of course, choosing challenges and finding solutions is what drives fun.


To me the main problem this solves is having to rewatch the video over and over for each step. Most of the time it's like "Step 2: do thing" then quickly cuts to step 3 well before I could've finished step 2. So having it laid out like this is actually a decent format to receive recipes in.


Exactly. Fiddling with your phone over and over again while your hands are wet/covered in flour etc. A paper sheet you can pin to the fridge or just get dirty is a reasonable solution imo.


I device I think would be great for the kitchen is a large wrist-mounted, waterproof e-ink screen, curved to wrap around the wrist, with two large scroll buttons.

The recipe could be loaded up via a linked smartphone or something, but then you have a device that you can touch with food covered hands and then wash it right alongside your hands later. Big screen so you don't have to squint or scroll frequently like you would on a smartwatch. E-ink so it works well despite bright kitchen lights and has low power consumption.


Honestly large 10-13 inch e-Ink tablets already work well for this as long as you're opening a PDF that stays put. Much like a physical book that doesn't move.

Live web pages suck because they pop up annoyances every 5 seconds that you have to deal with while your hands are messy, and the scrolling jumps around against your will.


These tik tok videos are pretty short right? Why not just get a note book and write down the instructions.

You could even do a little line drawing of the important bits.

You could keep this "cook" book in your kitchen, and maybe pass it to one of your kids (just an example) when they move out or something.


I actually wonder if in the limit of video encoding we could just get a diffusion model that can in real time render realistic video based on a script. Then downloading a movie is just downloading a few megabytes of a prompt and you get a movie playing based off it locally.


Maybe. The only problem I see is economical. Sure, sending over a sequence of prompts, instead of sequence of frames, is going to be a huge storage and bandwidth saver. However, you're going to pay for it dearly, in compute, whenever you want to watch such a live-generated video. In almost all cases, it's vastly better to use more storage than to use more compute, for the same reason that, if you need to keep something to stay above ground level, you're better off placing it on a table or bolting it on a wall, instead of attaching it to a jet engine pointing downwards, firing for TWR=1.


That’s very true and I did make that comment mostly in jest. I think the idea of art generated like that has a distinctly dystopian feel. I think one thing that might push this idea closer to reality is dedicated hardware. It would also allow you to do things like enjoy movies with your favorite actors. “Netflix, show me Dune but with Dustin Hoffman as Paul”.


Wouldn't it be non-deterministic? (Legit question, I'm new to this)


It depends on the specifics, but in general if your settings are correct and the hardware isn't doing something whacky/buggy then it should be deterministic. The math itself is deterministic, the only randomness comes from temperature (intentional randomness introduced in software) and some bugginess with GPUs that weren't designed for this workload.


Cool! I had the same project idea recently. You may be interested in this for the step of speech2text: https://github.com/SYSTRAN/faster-whisper


I think you could send all of that to GPT4 and ask it to read it and provide you with a step by step instruction : recipie and it would do so easily.

I didn’t see how that print out would be super useful, it’s not the complete step by step is it?


Ok, so:

* It does not print the video frames as a 3D object.

* Despite what the graphic at the link suggests, it doesn't 3D-print food

it extracts a recipe with images and text from a video, automatically.


Oh wow....this will incredibly useful for the influx of recent home improvement videos I've been watching lately.


Filtering a video for true content is the real app. Print is simply the format you've chosen to express it.


If there are YouTube-generated captions you can get yt-dlp to download them when you download the video.


For some reason I though the goal was to print (with a 3d printer) a 3d projection of the 4d content of the video. That would be cool...


I thought it would just print the dessert in a way I could eat. It would be much easier. :P


Great work! It's potentially useful and also hilarious.


Could have been a Show HN




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: