Pretty sure focal length would affect things as you say, but also physical dimensions matter too. My program assumes fixed focal length and then picks the page dimensions that work with the assumed focal length -- almost certainly not the correct ones.
The main rationale is removing redundant degrees of freedom -- since the page is allowed to rotate freely, the edges of the page can still move around plenty.
I see, so basically assume the two end points are at zero and there is some rotation accounting for the endpoint offset in real space.
It still doesn't seem fully accurate as I can imagine a non-rotated cubic curve with endpoints at an offset, but I assume your simplification works well enough.
If you decide to try to make this faster, check out ceres[0] a non-linear least squares optimisation framework that does automatic differentiation using a clever C++ template hack.
I've used it a few times to solve these kind of problems and found it to be very good!
Yep, I'm still waiting to use ceres for something - I didn't end up using it on my image approximation project https://mzucker.github.io/2016/08/01/gabor-2.html because it doesn't work well with inequality constraints.
By the way, you mentioned using word boundaries in regexes to replace variable names. GNU Emacs regexes can actually include "symbol boundaries" (which are a little better for variable names than word boundaries), represented as "\_<symbol\_>". Personally, I like using the "highlight-symbol" package, which provides the "highlight-symbol-query-replace" command to basically execute M-% for the symbol at point.
Author here - woke up this morning and saw a big bump in site traffic which led me back to HN -- nice to see folks are reading! Happy to answer any questions here or in the Disqus comments on my blog.