Heh, I just fed autoregex a regex from one of my projects, and it simply times o...

dleeftink · 2025-03-07T14:36:36 1741358196

A proper litmus test, what's causing the hang in your case?

Regex to me, is pattern finding and abstraction taken to the extreme. I like the challenge

tclancy · 2025-03-07T15:19:45 1741360785

I think what's causing the hang is using the site. I gave it [a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12} and it sat thinking about things for a few minutes. So I tried the reverse and asked it "Match a UUID" and it sat and thought about things.

Now I am enlightened.

janfoeh · 2025-03-07T15:56:28 1741362988

Aww shucks, and here I was feeling a wee bit proud of myself for managing to break it.

janfoeh · 2025-03-07T14:51:24 1741359084

Good question. The regex I tried is for extracting amounts in EUR and USD:

  /
  (?<=^|[ \t])
  (?<currency_prefix_with_space>
    (?<currency_prefix>
      €|EUR|\$|USD
    )
    [ \t]?
  )?
  (?<number>
    (?<integral>
      -?
      \d{1,3}
      (?:[\.,]\d{3}|\d*)
    )
    [\.,]
    (?<fraction>
      \d{2,3}
    )
  )
  (?<currency_postfix_with_space>
    [ \t]?
    (?<postfix_or_ending>
      (?<currency_postfix>
        €|EUR|\$|USD
      )
    ) | (?<ending>
          [ \t]|$|\n
        )
  )/x

dleeftink · 2025-03-07T15:13:49 1741360429

I'd imagine many nested named capturing groups may trip even the best automated system! I do like the solution though.

I would've probably approached it differently, trying to first get the 'inverted' match (i.e. ignore anything that isn't a currency-like pattern) and refine from there. A bit like this one I did a while back, to parse garbled strings that may occur after OCR [0]. I imagine the approach does not translate fully, because it's pattern extraction rather than validation.

[0]: https://observablehq.com/@dleeftink/never-go-nuts

janfoeh · 2025-03-07T15:54:54 1741362894

Thanks for sharing! I have to admit I do not have the necessary brain cycles to spare today, but OCR processing is indeed of interest to me, and I will take a more in-depth look in the upcoming days.

The idea of an exclusionary approach sounds interesting as well. I'll have to think about that a bit.

dleeftink · 2025-03-08T05:12:26 1741410746

Check out WordNinja in case regex doesn't cut it! [0]

[0]: https://github.com/keredson/wordninja

janfoeh · 2025-03-08T11:23:39 1741433019

Will do, thanks again!