Where is the error? How much would that change their score of 4.6?

psb217 · on May 31, 2020

The bound is completely invalid, as are the NLL/PPL numbers they report with the MELBO. Look at the equation. If they optimized it directly, it would be trivially driven to 0 by the identity function if we used a latent space equivalent to the input space. The MELBO just adds noiseless autoencoder reconstruction error to a constant offset equal to log of the test set size. This can be driven to zero by evaluating an average bound over test sets of size 1.

The mathematical/conceptual error is that they are assuming each test point is added to the "post-hoc aggregated" prior when they evaluate the bound. This is analogous to including a test point in the training set. Another version of this error would be adding a kernel centered on each test point to a kernel density estimator prior to evaluating test set NLL. In this case, obviously the best kernel has variance 0 and assigns arbitrarily high likelihood to the test data.

The_rationalist · on May 31, 2020

Interesting, thx!