Does it work with math formulas?

adi4213 · on Feb 11, 2024

It's definitely a work in progress - but something that active development is being focused around. The way this is being handled in an upcoming update involves a few things - an OCR tool identifies math formulas, applies a bounding box and takes an image. That image gets sent to a multimodal-LLM which attempts to "describe" the formula reasonably. While not yet perfect, this is something I anticipate to improve quite a bit soon. The same approach is going to be applied to tables, graphs, figures, and images.

lordgrenville · on Feb 12, 2024

I once listened to a (human-made) audiobook where the narrator read all the mathematical notation as names of symbols ("open parenthesis open parenthesis" etc, in a discussion of lambda calculus!) So knowing how to convert the notation into natural language requires some domain knowledge beyond that of regular TTS. Maybe LLMs could help, but it's a problem to use an LLM for something where 100% accuracy is important, and there's no easy way to validate the output.