I have done something similar before, to help form an index for a large book. Th...

I have done something similar before, to help form an index for a large book. This is what I would consider a relatively simple regex.

The harder ones I have dealt with are those looking for malformed syntax where the closing mark that might be missing could be several thousand characters after the opening mark, or the opening mark itself might be missing, across a data set that is several hundred million characters. So you need something very complex to find all the distinctive characteristics of the content that is supposed to be enclosed - while avoiding the many similar structures that give false positives. Sometimes the technically easier solution is too slow to run (look ahead and look behind, etc), so you need to pivot and use other regex features. It can take a day or two to get right.