Last week, our team released a way to quickly turn podcasts into social audiograms. There is no sign up necessary and its entirely free to use! We are trying to get feedback and would love your input!
Audiograms have been a popular way to create a podcast trailer to share on social media.
If you have a podcast or a video event series, we would love your feedback!
Feel free to comment here and I'll follow along, or also email me directly at lenny@milkvideo.com
Cool app. The transcript was better than I expected.
My experience was quite negative though.
I tried to transcript the knife game song, but when it's picking up the speed, it skipped some words and I couldn't add them perfectly. Then I couldn't remove my errors because it can't edit more than 5 words at a time, and automatically select more.
I fought with the UI for more than 10 minutes and when I tried to download the video to get the (best I could get) imperfect result of my hard work, contrary to what you said ^^ it requested my email so I couldn't download it.
The timing of the lighting of the words is almost OK, but there is a noticeable increase in viewing experience when the words light-up at exactly the right time.
It still needs some UI bug hunting to make it usable but it's on the right track.
Looks pretty great! Oddly enough, I came to HN to take a break from coding something eerily similar (different use case and target audience though). Now when I launch in a month everyone will think I stole your idea. Thanks for that, I guess, and good luck!
I was just looking for something like this. We've had several podcast recently, but sharing the entire clip to social media doesn't get the same engagement. Can't wait to try this. Thanks!
Compared to other Audiogram tools I've used, this is really easy. Processing time is a bit long, but I did test it with a 2 hour podcast.
I have a couple feature suggestions that I think would be really helpful to podcast producers. Our show has mp3 chapters, and we also include matching time codes in our show notes. If you could read one (or both) of those you could suggest segments for users to clip out.
It would also be great if you could read a RSS feed and use that as an input in addition to uploading single files.
To confirm I understand them, I'm writing them below and would love a way to get in touch with you.
1. You want to be able to create clips based on parsing show notes. Since you already do the work to identify key moments, we could use that to speed up your clip making process.
2. You want to import content based on the RSS feed. This will make it easier to get content in the tool, and avoid the "where is the file" rigmarole.
Separately, the processing time is slow. We have a way to improve this, and is on the queue!
Correct, but also with 1, there is a way to embed chapters into a mp3 file that's a part of the ID3 standard. We use Forecast (https://overcast.fm/forecast) to do it. I've found that most podcast applications either parse these chapters or timecodes in show notes (or support both), and that would be another way to identify moments that as a producer we've called out in the file.
for 2, the RSS feed would include show notes and greatly speed up the process for bulk production.
If you'd like to get in touch with me, I'm on twitter as @yakk0dotorg.
thanks again!
It's always magical to see a use case be elevated to a built-in feature. I was doing exactly this with Milk.video for a few podcasts [1]. I'm excited to try this out!
Nice work! The UI is really simple - and love not having to log in to use it. Have you thought about leveraging the ListenNotes API (https://www.listennotes.com/api/) to automatically pull in the podcast episodes via search vs having to upload them?
Impressive. Didn't expect it to be automatically text/captioned. There are other things like this around that make waveform audiograms. Could the text be turned off optionally?
Anyways, very nice for a free clip maker with logo and no watermark.
There are three main parts to note: designing the videos, the backend for handling media, and the download renderer.
The video designing process is a React app that leverages a Ruby on Rails based backend API. The React app handles the views, and if you look at the UI, you will see how the app steps are persisted in the URL. Our React app is based on Redux Toolkit, which is phenomenal. The Rails application is a normal API with Sidekiq (aka Redis) workers, which handle asynchronous tasks.
Our UI renders a number of interesting elements which are respectively generated/prepared in various service oriented streams. The most important is our transcript API, which is from https://www.AssemblyAI.com - they are the best, cheapest, highest quality and best development experience transcription tool. We also have a series of Lambda functions that handle uploaded audio/video file prep, such that we can encode files in a unified format and parse out the audio data needed for visualizing things like the audio waveform or the animated audio frequency data.
A few interesting tidbits are that we use Lambda Layers extensively. We have functions written Ruby and JavaScript where we move the vendor Gems or node_modules into a shared Lambda Layer, and then we also use EFS to run Python based functions that have dependencies that are too big for the Lambda itself.
Our video rendering is also pretty neat, in that we leverage the browser as a rendering view, and batch process screenshots of each frame of the final output video, using a AWS-based container orchestration process.
In summary, this tool is based on the entire work we've been doing for our larger company. Since we can leverage that, we are spinning up a number of single purpose utility projects based on what customers ask for.
If this is interesting at all, we are hiring advanced Javascript engineers who are comfortable learning new things.
You can reach me at lenny@milkvideo.com or if you just want to chat to learn more/ask questions, please feel free to book time here: https://calendly.com/rememberlenny/15-min
I was primarily curious about the video generation system. There are a number of different ways to generate programmatic video - ranging from pure ffmpeg + image and text layers, to some kind of headless browser that spits out frames rendered by a canvas and then a ffmpeg or equivalent process to aggregate them into a video. There are also python libraries like moviepy which offer their own API to create layers. Each of these has their own different levels of performance, and I was curious about whether there is a de-facto best approach for this sort of thing where someone had evaluated all of these options and settled on one after looking at all the tradeoffs.
This tool is just a small piece of our overall product.
In the actual application, you can create content (like this audiogram) once, and then export it for various dimensions without needing to do any redesigning as well.
Last week, our team released a way to quickly turn podcasts into social audiograms. There is no sign up necessary and its entirely free to use! We are trying to get feedback and would love your input!
Audiograms have been a popular way to create a podcast trailer to share on social media.
If you have a podcast or a video event series, we would love your feedback!
Feel free to comment here and I'll follow along, or also email me directly at lenny@milkvideo.com
Thank you!