Hacker News new | past | comments | ask | show | jobs | submit login

Does it work with math formulas?



It's definitely a work in progress - but something that active development is being focused around. The way this is being handled in an upcoming update involves a few things - an OCR tool identifies math formulas, applies a bounding box and takes an image. That image gets sent to a multimodal-LLM which attempts to "describe" the formula reasonably. While not yet perfect, this is something I anticipate to improve quite a bit soon. The same approach is going to be applied to tables, graphs, figures, and images.


I once listened to a (human-made) audiobook where the narrator read all the mathematical notation as names of symbols ("open parenthesis open parenthesis" etc, in a discussion of lambda calculus!) So knowing how to convert the notation into natural language requires some domain knowledge beyond that of regular TTS. Maybe LLMs could help, but it's a problem to use an LLM for something where 100% accuracy is important, and there's no easy way to validate the output.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: