Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Heh, I just fed autoregex a regex from one of my projects, and it simply times out. It comforts me to know that billion dollar LLMs have to chew on those just as much as I do.


A proper litmus test, what's causing the hang in your case?

Regex to me, is pattern finding and abstraction taken to the extreme. I like the challenge


I think what's causing the hang is using the site. I gave it [a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12} and it sat thinking about things for a few minutes. So I tried the reverse and asked it "Match a UUID" and it sat and thought about things.

Now I am enlightened.


Aww shucks, and here I was feeling a wee bit proud of myself for managing to break it.


Good question. The regex I tried is for extracting amounts in EUR and USD:

  /
  (?<=^|[ \t])
  (?<currency_prefix_with_space>
    (?<currency_prefix>
      €|EUR|\$|USD
    )
    [ \t]?
  )?
  (?<number>
    (?<integral>
      -?
      \d{1,3}
      (?:[\.,]\d{3}|\d*)
    )
    [\.,]
    (?<fraction>
      \d{2,3}
    )
  )
  (?<currency_postfix_with_space>
    [ \t]?
    (?<postfix_or_ending>
      (?<currency_postfix>
        €|EUR|\$|USD
      )
    ) | (?<ending>
          [ \t]|$|\n
        )
  )/x


I'd imagine many nested named capturing groups may trip even the best automated system! I do like the solution though.

I would've probably approached it differently, trying to first get the 'inverted' match (i.e. ignore anything that isn't a currency-like pattern) and refine from there. A bit like this one I did a while back, to parse garbled strings that may occur after OCR [0]. I imagine the approach does not translate fully, because it's pattern extraction rather than validation.

[0]: https://observablehq.com/@dleeftink/never-go-nuts


Thanks for sharing! I have to admit I do not have the necessary brain cycles to spare today, but OCR processing is indeed of interest to me, and I will take a more in-depth look in the upcoming days.

The idea of an exclusionary approach sounds interesting as well. I'll have to think about that a bit.


Check out WordNinja in case regex doesn't cut it! [0]

[0]: https://github.com/keredson/wordninja


Will do, thanks again!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: