Show HN: Oration (iOS) turns pdfs into audiobooks

kanodiaashu · on Feb 11, 2024

I am a grad student and I was going for something similar with converting papers to text which could then be used in an audio app like speechify with this - https://github.com/kanodiaayush/make-doc-listenable . I love the idea of this and will try it out, good luck!

adi4213 · on Feb 11, 2024

Thank you so much for your comment! I'm looking at your repo and it looks really cool! Our backend is an ensemble of a few different things, but I explored using `poppler`, `PyPDF2` and other libraries+services as well! I'm really glad to see how accessible writing a service to extract text, meaningfully process it, and generate nice sounding audio is! I hope that Oration provides a nice enough UI/UX for users to enjoy - but it's been fantastic see a lot of open source work in this area. PDF is certainly not an ideal format to use so ubiquitously but I don't see that changing all that soon - particularly in academic settings. I'd love to chat more about this if you don't mind shooting me an email! support [at] trurecord.com

doomrobo · on Feb 12, 2024

Also grad student. I use Voice Dream reader on my devices and it's helped a lot with reading dense texts

https://www.voicedream.com/

iamcreasy · on Feb 12, 2024

I'd love to know the difference if you get a chance to compare these two.

adi4213 · on Feb 12, 2024

I don't know Winston (VoiceDream developer) personally, but there a bunch of things that impress me about both his product and himself. On the product side - it's long been a well established app in the space. I think it's been out for 10+ years, offers a lot of voice options, handles a lot of input document formats, has good support for offline playback, and has been well featured in a bunch of publications. I was also very impressed when I read this : https://www.voicedream.com/macos-reader-subscription/ I really admire Winston for bootstrapping VoiceDream for so long - his initial users bought the iOS app for $2 and he has held true to continuing to provide them with the feature set that grew considerably since the app's origins. His blog post also details how he was on vacation when VoiceDream had a P0/downtime issue and he caught the first flight back to address this, motivated by many users who really depended on the app (such as students studying for exams). There is a ton to admire here.

orand · on Feb 12, 2024

Winston sold the app last year, and it's being maintained by new developers now. https://www.perkins.org/resource/changes-with-voice-dream-ap...

mdaniel · on Feb 11, 2024

I would value a "notify me when it comes to Android" link, since your example is pretty good but not enough for me to buy an iOS device and the "Sign up" just unhelpfully points to the appstore (I say "unhelpfully" because you obviously do have a web presence, given the example player and the fact that clicking on the sample title just redirects back to https://player.oration.app implying that one could be logged in to the website)

Actually, having written all of that, I would value just being able to submit things via your site, since based on your description it doesn't do any on-device processing so why do I even need on app?

adi4213 · on Feb 11, 2024

Thank you for your comment! We're actively working on releasing an Android app soon - we definitely didn't want to neglect Android and are aiming to ship a great experiences on the Play Store soon. We actually do have a preliminary web app that I'd be happy to share with you - it's not as polished as the iOS app, so kindly bear that in mind (we're expecting to ship an improved browser app shortly)

I appreciate you valuing submitting things via the site : give this a try! https://www.oration.app/accounts/signup/

> why do I even need on app? This is a valid question - we prioritized building out a nice experience on an iOS app first and will release a solid web counterpart in the near future.

Definitely would love to hear your feedback and stay in touch

justech · on Feb 11, 2024

This topic has always interested me. For people looking for a free alternative (free for Apple users), I recommend looking into iOS and MacOS spoken content under Accessibility [0]

It's a bit finicky at times but the pros are

1. Free

2. It works on anything on your iPhone, iPad or Mac screen

3. Apple's Siri voices are actually really good! (Better than Speechify voices)

[0] https://support.apple.com/en-ph/guide/iphone/iph96b214f0/ios

adi4213 · on Feb 11, 2024

> For people looking for a free alternative (free for Apple users)

Thanks for pointing this out. Fwiw we do hope to continue providing a useful free tier with users, I appreciate your comment because it leverages services that iOS/macOS users reliably have for free by virtue of being an Apple user

> Apple's Siri voices are actually really good! (Better than Speechify voices)

This is really interesting - particularly because Speechify makes a pretty substantial amount of revenue from paid subscriptions. I imagine that Apple has the resources and capability to continue to improve their voice quality

WhitneyLand · on Feb 12, 2024

Wait, Samantha one of the default Siri voices, sounds not even even close to the quality of what I heard in the examples above for this app.

I think it’s the same that Safari uses for the “Listen to this page” feature?

The two don’t seem comparable even. What am I missing?

justech · on Feb 12, 2024

Also there's two versions of the Samantha voice. One is 11 mb (downloaded by default) and another is 152mb (have to download from settings). The improvement with the 150mb version is very noticeable

WhitneyLand · on Feb 12, 2024

Thank you, that is a big difference.

With the better voice downloaded, what’s your opinion on the quality difference between the app here and iOS?

adi4213 · on Feb 13, 2024

Oration uses OpenAI's text-to-speech system, which I personally found to have superior quality to Apple's (but I also downloaded the upgraded Samantha voice and have been enjoying using it, so I appreciate the recommendation!)

tymscar · on Feb 12, 2024

Wow I didn’t know. Thats basically perfect for me. Thank you!

justech · on Feb 12, 2024

Did you try downloading the bigger Siri voices? I'm using English (US) Voices -> Siri -> Voice 5 (90mb)

luigi23 · on Feb 12, 2024

unfortunately it has to be finnicky, otherwise it'd cannibalize audiobooks apps (amazon got angry recently)

jasonjmcghee · on Feb 12, 2024

I don't think that's true. SOTA open source is still much worse than eleven labs and requires a dedicated modern GPU for any kind of speed of generation. Certainly can't on-demand, on-device on an iPhone.

Apple already is doing very high quality generated audiobooks, but you do it for your book as the author.

https://authors.apple.com/support/4519-digital-narration-aud...

"Mitchell" sounds exactly like a very popular narrator "Ray Porter".

FloatArtifact · on Feb 11, 2024

I appreciate your work however not allowing the owner of the book to own the audio generated through your product does everyone a disservice. Everything seems to be locked into the app or the web interface. If that's a misunderstanding on my part I apologize.

So if we pay for your product we should own what it produces for the sake of long-term use and accessibility. Please allow the end user to download in a standard format the audio like MP3 / MP4.

adi4213 · on Feb 11, 2024

I'm sorry, I didn't make this clear! You very much own the audio generated! That was my intention from the beginning and will make sure that the app along with our terms and services reflect this prominently. I'll make some updates today to make it straightforward for the user to download an .MP3 of what they create on Oration

FloatArtifact · on Feb 11, 2024

Thank you for clarifying! Best luck with your business and I hope to put your service to use.

thrill · on Feb 11, 2024

In the sample on the website the abbreviation inside the parenthesis is skipped leading to the use of LLMs without it ever being initially defined. Not a big deal to anyone familiar with a term but of course papers on new subjects might keep the end user more engaged by not skipping that initial definition process. The audio displayed sounded very natural! How would it work for fiction and multiple voices (future capabilities?)?

adi4213 · on Feb 11, 2024

Thank you for pointing this out! This was our service being a bit 'over-eager' at skipping citations. I'll be working on an update over the next couple of days to address this - that is, to ensure that key definitions are not skipped

collinc777 · on Feb 13, 2024

Congrats on the launch! This is pretty similar to a project i'm working on that turns articles into podcasts. https://a-to-p.vercel.app. Feel free to give it a try.

I'd love to chat about how you generate your audiobooks if you're open to sharing. Good luck with everything!

adi4213 · on Feb 13, 2024

Thank you! Great job with your project as well! I'd love to chat - can you send me an email? support [at] TruRecord.com?

sussmannbaka · on Feb 12, 2024

This will be useful once it works properly. I tried it but didn’t write a review to not tank your score this early. The app needs some more time in the oven: Menu options aren’t responsive, swiping back brings you to views you shouldn’t be able to go to, all my uploads failed due to timeouts and 50 pages is not a lot for something that advertises turning PDFs into audio books. I'll keep it installed and see where you’ll go with this because the idea is great.

adi4213 · on Feb 12, 2024

Thank you kindly for your incredibly helpful and constructive response. I apologize that you experienced these issues and I'll gladly put it some hard work today and this week to resolve this.

> Menu options aren’t responsive, swiping back brings you to views you shouldn’t be able to go to We'll work on a new iOS release to improve on this shortly

> all my uploads failed due to timeouts - oof! I'll get this sorted asap

Sym3tri · on Feb 12, 2024

I love the idea and effort, but when I first tried using it I got a network timeout error when I tried uploading 1 chapter of a PDF book (45 pages, 7MB).

adi4213 · on Feb 12, 2024

Thank you for giving it a try and your comment! I'm sorry you ran into this - would you mind sending us an email at support [at] trurecord.com ? I'll look into what happened right away and fix

aloneindecember · on Feb 11, 2024

It's certainly an interesting and helpful idea. Kudos for working on the idea, and it will be exciting to see the progress in a year or two.

adi4213 · on Feb 11, 2024

Thank you kindly for your feedback! We're looking forward to putting some solid work in and make some good progress over the next couple of years. If you happen to have any feedback or feature wishes, we're definitely all ears!

iamcreasy · on Feb 12, 2024

Really cool project. This will definitely help me read more papers. Can you share more on how the backend is parsing and converting text to speech?

UI issue: Login by google icon covers the password box when trying to create a new account on my phone.

Do you have any formal channel for feature request? I'll pay for this app.

adi4213 · on Feb 12, 2024

> Really cool project. This will definitely help me read more papers. Can you share more on how the backend is parsing and converting text to speech?

Thank you kindly! I'm really glad to hear this - this was exactly what I hoped for when developing this. Happy to share more details : right now we use an ensemble of methods to parse uploaded PDFs. A chunk of this involves using GROBID (https://grobid.readthedocs.io/en/latest/), a machine learning library aimed at parsing academic papers. Funnily enough, GROBID is itself a cascade of sequence labeling models trained on document parsing. The text to speech portion is driven by OpenAI's text-to-speech models, which in my experience seem to deliver the market leading audio quality. The summarization is driven by GPT-3.5-turbo As such, the platform does focus quite a bit on making good sounding audiobooks from academic PDFs. Some of the updates on the roadmap will include improved handling around tabular, graph, and figure content along with mathematical and scientific equations. Its likely that a multi-modal LLM could do a reasonable job at describing this content in spoken form.

> UI issue: Login by google icon covers the password box when trying to create a new account on my phone.

My apologies about this! I'll get this fixed asap

> Do you have any formal channel for feature request? I'll pay for this app.

Very much appreciate this - could you reach out support [at] trurecord.com? I would love to touch base about feature requests you have in mind - I'm really keen to deliver a great experience for users like yourself and am eager to learn about what you'd find helpful.

Thank you again for your message and look forward to getting in touch

iamcreasy · on Feb 12, 2024

Thanks for your response. I'll get back to you with a list after using it for a few weeks.

Can it handle large pdf of a book, or would it be possible to specify pages of a large pdf?

adi4213 · on Feb 12, 2024

> Can it handle large pdf of a book

It very much can; however, the iOS app has a limit of 50 pages and an hourly limit of 5 uploads in our 'free tier'. I didn't want to rush to monetize the iOS app so I could really learn from users like yourself, and subsequently work hard to really make the app great to use. Currently, to sign up for a subscription you can go to the 'Subscription' setting on our web app : http://oration.app/accounts/login/ Subscribing for an account will bypass the upload page limit and hourly limit.

> would it be possible to specify pages of a large pdf?

This is definitely something that I'm aiming to ship soon - I'm trying to both deliver something where a user can have a simple upload experience and an enjoyable Audiobook output, but also provide some more fine-grained handles (like specifying what pages to use, among other things)

> I'll get back to you with a list after using it for a few weeks.

Definitely eagerly looking forward to this! Please don't hesitate to email us at support [at] TruRecord.com (there is also a support e-mail link within the app). We'd be more than happy to also meet with yourself over Zoom to learn from our experience and work towards delivering great functionality.

iamcreasy · on Feb 12, 2024

Cool, besides monthly subscription, it would be highly appreciated if you have alternative, such as pre purchased tokens.

adi4213 · on Feb 13, 2024

Something like a package of X Audiobook Conversions for $Y?

iamcreasy · on Feb 14, 2024

atlas_hugged · on Feb 11, 2024

Nice. Too bad all my uploads failed

adi4213 · on Feb 11, 2024

I appreciate you trying it out and sorry that this was your experience! I'd be more than happy to look into what happened if you wouldn't mind sending an email : support [at] trurecord.com At the moment, the free trial has a limit of 50 pages / PDF (I'll make this more clear in the app) and requires selectable text (although I'm working on adding some OCR in soon)

checker659 · on Feb 11, 2024

Does it work with math formulas?

adi4213 · on Feb 11, 2024

It's definitely a work in progress - but something that active development is being focused around. The way this is being handled in an upcoming update involves a few things - an OCR tool identifies math formulas, applies a bounding box and takes an image. That image gets sent to a multimodal-LLM which attempts to "describe" the formula reasonably. While not yet perfect, this is something I anticipate to improve quite a bit soon. The same approach is going to be applied to tables, graphs, figures, and images.

lordgrenville · on Feb 12, 2024

I once listened to a (human-made) audiobook where the narrator read all the mathematical notation as names of symbols ("open parenthesis open parenthesis" etc, in a discussion of lambda calculus!) So knowing how to convert the notation into natural language requires some domain knowledge beyond that of regular TTS. Maybe LLMs could help, but it's a problem to use an LLM for something where 100% accuracy is important, and there's no easy way to validate the output.

on Feb 11, 2024

[deleted]