In general, I think you could show that an 'arithmetic coding' approach is optimal, if the distribution of each bit given the previous bits is known. From there, you'd just have to show that the von Neumann approach for i.i.d. bits is no worse on average.
This is correct, and I came to the comments to say the same thing. It takes some work to implement without arbitrary-precision floating point, but arithmetic coding can make use of the full entropy in the input stream, whereas the approach in the article discards a lot of information.