Anybody knows what the problem with negation is? Can't you just swap accepting w...

mcyc · on April 9, 2023

It's because this produces NFAs not DFAs (due to the epsilon arcs), and so negation can't be done by just flipping the final states.

Take, for example, the regex "a|(aa)+" (the set of even length strings of "a"'s or just one "a"). If you use the construction from the article, you get an NFA with basically two arms: one that recognizes "a" and one that recognizes "(aa)+". The initial state contains an epsilon arc to the start state of each of these arms. If you just flip the final/non-final states in each arm, the resulting language contains "a", since we are no longer accepting each even length "a" string, but now each odd length "a" string, of which "a" is a member. Thus the new language is not the proper complement of "a|(aa)+", which would not contain "a".

8uhaal · on April 9, 2023

Ah ok, thanks. What I meant was that you could flip the states if it's a DFA. If you have an NFA then that won't work, as shown by your counterexample.

I suppose you generally accept in an NFA if there is an accepting state epsilon-reachable, so if you flip the accepting states, you could still have an accepting state epsilon-reachable, which is why this doesn't work. In an DFA, there is only one state that's (trivially) epsilon-reachable, so the construction works.

mcyc · on April 9, 2023

While epsilon transitions often, but not always, cause this issue of not being able to complement the language by just flipping final/non-final states, it isn't always the case. For example, you can make an ambiguous NFA with no epsilon transitions by just doing epsilon removal, and then you still have the same problem but no non-trivial epsilon-reachable states!

On the other hand, you can have an unambiguous [1] NFA (for example, a DFA, but you insert some dummy epsilon transitions between a split up state) where you can just flip the final/non-final states and complement the language.

So, in the end, complementing languages described by NFAs without determinizing them first is a bit of a tricky problem.

[1]: https://en.wikipedia.org/wiki/Unambiguous_finite_automaton --- a superset of DFAs, but they can have epsilon transitions

8uhaal · on April 10, 2023

Oh, interesting! Thanks for the pointer!

But you can't always complement the language easily for an UFA either, right? The path to an accepting state may be unambiguous, but there could at the same time be a path to a non-accepting state, so flipping the states may keep some words in the language. And make the automaton even ambiguous.

zeroimpl · on April 9, 2023

That example doesn’t really fit since one easily could compute the negation of “a|(aa)+” in linear time by simply returning the opposite of the top-level node in the tree.

Perhaps something like “X|!Y” or similar might be impossible?

mcyc · on April 9, 2023

The parent comment asked specifically about negating the finality of each state, which I addressed. As for how to actually implement complement in a regex engine, there are certainly some strategies like the one you mentioned that could be used (but the point of my comment was that just swapping final/non-final states is not a valid one).

zeroimpl · on April 9, 2023

Ah I thought they meant flipping the finality of the final state, but I guess they said "states" so your interpretation makes more sense.

layer8 · on April 9, 2023

The problem is when combining the complement of an expression with another expression (such as when constructing an alternation of the two), because then the complement expression has to be constructed for one of the two, and that can grow double-exponentially in length: https://en.wikipedia.org/wiki/Regular_expression#Expressive_...