Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd imagine many nested named capturing groups may trip even the best automated system! I do like the solution though.

I would've probably approached it differently, trying to first get the 'inverted' match (i.e. ignore anything that isn't a currency-like pattern) and refine from there. A bit like this one I did a while back, to parse garbled strings that may occur after OCR [0]. I imagine the approach does not translate fully, because it's pattern extraction rather than validation.

[0]: https://observablehq.com/@dleeftink/never-go-nuts



Thanks for sharing! I have to admit I do not have the necessary brain cycles to spare today, but OCR processing is indeed of interest to me, and I will take a more in-depth look in the upcoming days.

The idea of an exclusionary approach sounds interesting as well. I'll have to think about that a bit.


Check out WordNinja in case regex doesn't cut it! [0]

[0]: https://github.com/keredson/wordninja


Will do, thanks again!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: