GVN and CSE only identify duplicate/common subexpressions. They do not tell you where to place the computation of the common subexpression.
The canonical algorithm to do that is to compute the dominance relation. A node X dominates Y if every path to Y must go through X. Once you have computed the dominance relation, if a common subexpression is located at nodes N1, N2, N3, you can place the computation at some shared dominator of N1, N2, and N3. Because dominance is a statement about /all/ paths, there is a unique lowest dominator [1]. This is exactly the "lowest single common ancestor."
Note that dominance is also defined for cyclic graphs. There may be faster algorithms to compute dominance for acyclic graphs. Expressions in non-lazy programming languages are almost always acyclic (e.g. in Haskell, you can write cyclic expressions).
[1] Claim. Let A, B, and C be reachable nodes. Suppose A and B both dominate C. Then either A dominates B or B dominates A.
Proof. We prove the contrapositive. If neither A dominates B nor B dominates A, then there exist paths a, b from the root such that path a passes through A but not B and path b passes through B but not A. If there is no path from A to C, then A cannot dominate C as C is reachable. Similarly, if there is no path from B to C, then B cannot dominate C. So assume there are paths a' from A to C and b' B to C. Then the path b.b' witnesses that A does not dominate C, and the path a.a' witnesses that B does not dominate C.
(There might be a bug in the proof; I think I proved something too strong, but I'm going to bed.)
I’m also a bit weirded out by the lack of mentions of dominators (in TFA or in the article[1] it links to). Once you have the dominator tree, though, you’re still going to need to answer LCA queries on it (in this problem and in an SSA-based compiler both), so there seems to be no way around this part.
The algorithm in the article does O(1) queries with O(V+E) preprocessing (assuming linear-preprocessing LCA, which, yeah). What’s the best algorithm for dominator trees? People usually talk about Lengauer–Tarjan[2], which is linear in practice (linear except for UNION-FIND), and not the linear one by Georgiadis[3,4]. Unfortunately, I’m not a compiler person.
In practice, if I have to implement dominators, it's probably for a toy language so I just go with the (fast quadratic) dataflow approach. It's simple to implement if you know what you're doing and if you already have a dataflow framework, it's pretty easy to implement: https://github.com/tylerhou/cs265/blob/main/tylerhou/lib/ssa...
You don't need to compute dominators. A topological sort of the graph gives you a computation order where your definitions are computed before their use.
I don't see how a topological sort helps you determine where to place the computation. If an expression A is topologically before an expression B, that does not mean that A always is computed on the path to B.
For an example, consider a three-node binary tree where R is the root, A is the left child, and B is the right child. A valid topological sort is R A B, but it is not the case that whenever B is computed, A has already been computed.
I mean, it does, but that’s not what TFA’s problem statement is, and I can imagine why one may not want to immediately take apart the expression into assembly-like SSA
let $0 = f a b in
let $1 = g $0 c in
...
and instead leave some original structure in place and some tree-level simplifications available.
CSE doens't only identify; it eliminates (hence the E!). It refers to the complete solution.
I don't think you necessarily have to compute the dominance relation because it can pop out implicitly.
If CSE is done on intermediate code, the dominance relation will pop out from thedirecton in which the instructions are followed around the basic blocks.
E.g. simple case: we see t3 <= t2 + t1. So we make a note in some CSE hash table that we had a t2 + t1, and that the result is cached in t3. Then if we see t2 + t1 again in the same basic block, we can replace that with t3.
The dominance relation is that earlier instructions in the basic block dominate later ones, but we don't have to explicitly calculate it.
The canonical algorithm to do that is to compute the dominance relation. A node X dominates Y if every path to Y must go through X. Once you have computed the dominance relation, if a common subexpression is located at nodes N1, N2, N3, you can place the computation at some shared dominator of N1, N2, and N3. Because dominance is a statement about /all/ paths, there is a unique lowest dominator [1]. This is exactly the "lowest single common ancestor."
Note that dominance is also defined for cyclic graphs. There may be faster algorithms to compute dominance for acyclic graphs. Expressions in non-lazy programming languages are almost always acyclic (e.g. in Haskell, you can write cyclic expressions).
[1] Claim. Let A, B, and C be reachable nodes. Suppose A and B both dominate C. Then either A dominates B or B dominates A.
Proof. We prove the contrapositive. If neither A dominates B nor B dominates A, then there exist paths a, b from the root such that path a passes through A but not B and path b passes through B but not A. If there is no path from A to C, then A cannot dominate C as C is reachable. Similarly, if there is no path from B to C, then B cannot dominate C. So assume there are paths a' from A to C and b' B to C. Then the path b.b' witnesses that A does not dominate C, and the path a.a' witnesses that B does not dominate C.
(There might be a bug in the proof; I think I proved something too strong, but I'm going to bed.)