Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Oration (iOS) turns pdfs into audiobooks (oration.app)
84 points by adi4213 on Feb 11, 2024 | hide | past | favorite | 43 comments
Hello HN community!

I'm excited to introduce a project I've recently launched: Oration, an iOS app designed to convert PDFs into audiobooks. This idea was inspired by my experiences as an engineering student with ADHD, struggling to engage with dense academic papers. Relying on Text-to-Speech tools, despite their robotic quality, was a workaround for me and others with similar learning preferences or challenges, such as Dyslexia.

Recognizing the limitations of existing tools—difficulty with complex formats, inability to skip over citations or footnotes, and inadequate handling of tables, graphs, and figures—I developed Oration. Our goal is to refine these areas continuously, offering both summarized and full versions of PDFs for a more accessible learning experience.

Oration aims to serve as a high-quality, user-friendly platform for auditory learners and those who find traditional reading methods challenging, with features akin to popular audiobook apps like Audible or Spotify.

How Oration Works:

    1. Download the app and sign up using either a username and password or through Google, with a 2-week free trial that doesn't require a payment method.
    2. Upload a PDF document.
    3. Within about 5-10 minutes, you'll receive a notification that your Audiobook is ready.
    4. Listen to your Audiobook directly in the app or through a browser-based web player, which also facilitates easy sharing with friends and family.
Also, to emphasize - all audio generated by the user is yours to own! We're working on some updates to easily export .MP3 files of Oration Audiobooks you create

For an example of how the web player looks and functions, check out this link: https://player.oration.app/75e079c1-bd7e-4a16-8e02-23636837a...

I believe Oration can significantly benefit those who prefer or require alternative learning formats. We're committed to enhancing the app's functionality and user experience, so feedback and constructive criticism are always welcome.

Thank you for considering Oration, and I hope it proves to be a valuable tool for you or someone you know.




I am a grad student and I was going for something similar with converting papers to text which could then be used in an audio app like speechify with this - https://github.com/kanodiaayush/make-doc-listenable . I love the idea of this and will try it out, good luck!


Thank you so much for your comment! I'm looking at your repo and it looks really cool! Our backend is an ensemble of a few different things, but I explored using `poppler`, `PyPDF2` and other libraries+services as well! I'm really glad to see how accessible writing a service to extract text, meaningfully process it, and generate nice sounding audio is! I hope that Oration provides a nice enough UI/UX for users to enjoy - but it's been fantastic see a lot of open source work in this area. PDF is certainly not an ideal format to use so ubiquitously but I don't see that changing all that soon - particularly in academic settings. I'd love to chat more about this if you don't mind shooting me an email! support [at] trurecord.com


Also grad student. I use Voice Dream reader on my devices and it's helped a lot with reading dense texts

https://www.voicedream.com/


I'd love to know the difference if you get a chance to compare these two.


I don't know Winston (VoiceDream developer) personally, but there a bunch of things that impress me about both his product and himself. On the product side - it's long been a well established app in the space. I think it's been out for 10+ years, offers a lot of voice options, handles a lot of input document formats, has good support for offline playback, and has been well featured in a bunch of publications. I was also very impressed when I read this : https://www.voicedream.com/macos-reader-subscription/ I really admire Winston for bootstrapping VoiceDream for so long - his initial users bought the iOS app for $2 and he has held true to continuing to provide them with the feature set that grew considerably since the app's origins. His blog post also details how he was on vacation when VoiceDream had a P0/downtime issue and he caught the first flight back to address this, motivated by many users who really depended on the app (such as students studying for exams). There is a ton to admire here.


Winston sold the app last year, and it's being maintained by new developers now. https://www.perkins.org/resource/changes-with-voice-dream-ap...


I would value a "notify me when it comes to Android" link, since your example is pretty good but not enough for me to buy an iOS device and the "Sign up" just unhelpfully points to the appstore (I say "unhelpfully" because you obviously do have a web presence, given the example player and the fact that clicking on the sample title just redirects back to https://player.oration.app implying that one could be logged in to the website)

Actually, having written all of that, I would value just being able to submit things via your site, since based on your description it doesn't do any on-device processing so why do I even need on app?


Thank you for your comment! We're actively working on releasing an Android app soon - we definitely didn't want to neglect Android and are aiming to ship a great experiences on the Play Store soon. We actually do have a preliminary web app that I'd be happy to share with you - it's not as polished as the iOS app, so kindly bear that in mind (we're expecting to ship an improved browser app shortly)

I appreciate you valuing submitting things via the site : give this a try! https://www.oration.app/accounts/signup/

> why do I even need on app? This is a valid question - we prioritized building out a nice experience on an iOS app first and will release a solid web counterpart in the near future.

Definitely would love to hear your feedback and stay in touch


This topic has always interested me. For people looking for a free alternative (free for Apple users), I recommend looking into iOS and MacOS spoken content under Accessibility [0]

It's a bit finicky at times but the pros are

1. Free

2. It works on anything on your iPhone, iPad or Mac screen

3. Apple's Siri voices are actually really good! (Better than Speechify voices)

[0] https://support.apple.com/en-ph/guide/iphone/iph96b214f0/ios


> For people looking for a free alternative (free for Apple users)

Thanks for pointing this out. Fwiw we do hope to continue providing a useful free tier with users, I appreciate your comment because it leverages services that iOS/macOS users reliably have for free by virtue of being an Apple user

> Apple's Siri voices are actually really good! (Better than Speechify voices)

This is really interesting - particularly because Speechify makes a pretty substantial amount of revenue from paid subscriptions. I imagine that Apple has the resources and capability to continue to improve their voice quality


Wait, Samantha one of the default Siri voices, sounds not even even close to the quality of what I heard in the examples above for this app.

I think it’s the same that Safari uses for the “Listen to this page” feature?

The two don’t seem comparable even. What am I missing?


Also there's two versions of the Samantha voice. One is 11 mb (downloaded by default) and another is 152mb (have to download from settings). The improvement with the 150mb version is very noticeable


Thank you, that is a big difference.

With the better voice downloaded, what’s your opinion on the quality difference between the app here and iOS?


Oration uses OpenAI's text-to-speech system, which I personally found to have superior quality to Apple's (but I also downloaded the upgraded Samantha voice and have been enjoying using it, so I appreciate the recommendation!)


Wow I didn’t know. Thats basically perfect for me. Thank you!


Did you try downloading the bigger Siri voices? I'm using English (US) Voices -> Siri -> Voice 5 (90mb)


unfortunately it has to be finnicky, otherwise it'd cannibalize audiobooks apps (amazon got angry recently)


I don't think that's true. SOTA open source is still much worse than eleven labs and requires a dedicated modern GPU for any kind of speed of generation. Certainly can't on-demand, on-device on an iPhone.

Apple already is doing very high quality generated audiobooks, but you do it for your book as the author.

https://authors.apple.com/support/4519-digital-narration-aud...

"Mitchell" sounds exactly like a very popular narrator "Ray Porter".


I appreciate your work however not allowing the owner of the book to own the audio generated through your product does everyone a disservice. Everything seems to be locked into the app or the web interface. If that's a misunderstanding on my part I apologize.

So if we pay for your product we should own what it produces for the sake of long-term use and accessibility. Please allow the end user to download in a standard format the audio like MP3 / MP4.


I'm sorry, I didn't make this clear! You very much own the audio generated! That was my intention from the beginning and will make sure that the app along with our terms and services reflect this prominently. I'll make some updates today to make it straightforward for the user to download an .MP3 of what they create on Oration


Thank you for clarifying! Best luck with your business and I hope to put your service to use.


In the sample on the website the abbreviation inside the parenthesis is skipped leading to the use of LLMs without it ever being initially defined. Not a big deal to anyone familiar with a term but of course papers on new subjects might keep the end user more engaged by not skipping that initial definition process. The audio displayed sounded very natural! How would it work for fiction and multiple voices (future capabilities?)?


Thank you for pointing this out! This was our service being a bit 'over-eager' at skipping citations. I'll be working on an update over the next couple of days to address this - that is, to ensure that key definitions are not skipped


Congrats on the launch! This is pretty similar to a project i'm working on that turns articles into podcasts. https://a-to-p.vercel.app. Feel free to give it a try.

I'd love to chat about how you generate your audiobooks if you're open to sharing. Good luck with everything!


Thank you! Great job with your project as well! I'd love to chat - can you send me an email? support [at] TruRecord.com?


This will be useful once it works properly. I tried it but didn’t write a review to not tank your score this early. The app needs some more time in the oven: Menu options aren’t responsive, swiping back brings you to views you shouldn’t be able to go to, all my uploads failed due to timeouts and 50 pages is not a lot for something that advertises turning PDFs into audio books. I'll keep it installed and see where you’ll go with this because the idea is great.


Thank you kindly for your incredibly helpful and constructive response. I apologize that you experienced these issues and I'll gladly put it some hard work today and this week to resolve this.

> Menu options aren’t responsive, swiping back brings you to views you shouldn’t be able to go to We'll work on a new iOS release to improve on this shortly

> all my uploads failed due to timeouts - oof! I'll get this sorted asap


I love the idea and effort, but when I first tried using it I got a network timeout error when I tried uploading 1 chapter of a PDF book (45 pages, 7MB).


Thank you for giving it a try and your comment! I'm sorry you ran into this - would you mind sending us an email at support [at] trurecord.com ? I'll look into what happened right away and fix


It's certainly an interesting and helpful idea. Kudos for working on the idea, and it will be exciting to see the progress in a year or two.


Thank you kindly for your feedback! We're looking forward to putting some solid work in and make some good progress over the next couple of years. If you happen to have any feedback or feature wishes, we're definitely all ears!


Really cool project. This will definitely help me read more papers. Can you share more on how the backend is parsing and converting text to speech?

UI issue: Login by google icon covers the password box when trying to create a new account on my phone.

Do you have any formal channel for feature request? I'll pay for this app.


> Really cool project. This will definitely help me read more papers. Can you share more on how the backend is parsing and converting text to speech?

Thank you kindly! I'm really glad to hear this - this was exactly what I hoped for when developing this. Happy to share more details : right now we use an ensemble of methods to parse uploaded PDFs. A chunk of this involves using GROBID (https://grobid.readthedocs.io/en/latest/), a machine learning library aimed at parsing academic papers. Funnily enough, GROBID is itself a cascade of sequence labeling models trained on document parsing. The text to speech portion is driven by OpenAI's text-to-speech models, which in my experience seem to deliver the market leading audio quality. The summarization is driven by GPT-3.5-turbo As such, the platform does focus quite a bit on making good sounding audiobooks from academic PDFs. Some of the updates on the roadmap will include improved handling around tabular, graph, and figure content along with mathematical and scientific equations. Its likely that a multi-modal LLM could do a reasonable job at describing this content in spoken form.

> UI issue: Login by google icon covers the password box when trying to create a new account on my phone.

My apologies about this! I'll get this fixed asap

> Do you have any formal channel for feature request? I'll pay for this app.

Very much appreciate this - could you reach out support [at] trurecord.com? I would love to touch base about feature requests you have in mind - I'm really keen to deliver a great experience for users like yourself and am eager to learn about what you'd find helpful.

Thank you again for your message and look forward to getting in touch


Thanks for your response. I'll get back to you with a list after using it for a few weeks.

Can it handle large pdf of a book, or would it be possible to specify pages of a large pdf?


> Can it handle large pdf of a book

It very much can; however, the iOS app has a limit of 50 pages and an hourly limit of 5 uploads in our 'free tier'. I didn't want to rush to monetize the iOS app so I could really learn from users like yourself, and subsequently work hard to really make the app great to use. Currently, to sign up for a subscription you can go to the 'Subscription' setting on our web app : http://oration.app/accounts/login/ Subscribing for an account will bypass the upload page limit and hourly limit.

> would it be possible to specify pages of a large pdf?

This is definitely something that I'm aiming to ship soon - I'm trying to both deliver something where a user can have a simple upload experience and an enjoyable Audiobook output, but also provide some more fine-grained handles (like specifying what pages to use, among other things)

> I'll get back to you with a list after using it for a few weeks.

Definitely eagerly looking forward to this! Please don't hesitate to email us at support [at] TruRecord.com (there is also a support e-mail link within the app). We'd be more than happy to also meet with yourself over Zoom to learn from our experience and work towards delivering great functionality.


Cool, besides monthly subscription, it would be highly appreciated if you have alternative, such as pre purchased tokens.


Something like a package of X Audiobook Conversions for $Y?


Yes.


Nice. Too bad all my uploads failed


I appreciate you trying it out and sorry that this was your experience! I'd be more than happy to look into what happened if you wouldn't mind sending an email : support [at] trurecord.com At the moment, the free trial has a limit of 50 pages / PDF (I'll make this more clear in the app) and requires selectable text (although I'm working on adding some OCR in soon)


Does it work with math formulas?


It's definitely a work in progress - but something that active development is being focused around. The way this is being handled in an upcoming update involves a few things - an OCR tool identifies math formulas, applies a bounding box and takes an image. That image gets sent to a multimodal-LLM which attempts to "describe" the formula reasonably. While not yet perfect, this is something I anticipate to improve quite a bit soon. The same approach is going to be applied to tables, graphs, figures, and images.


I once listened to a (human-made) audiobook where the narrator read all the mathematical notation as names of symbols ("open parenthesis open parenthesis" etc, in a discussion of lambda calculus!) So knowing how to convert the notation into natural language requires some domain knowledge beyond that of regular TTS. Maybe LLMs could help, but it's a problem to use an LLM for something where 100% accuracy is important, and there's no easy way to validate the output.


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: