Ignoring latency for a second, one of the tricks for boosting quality is to utilize consensus. One probability does not need to call the lesser model 30x as much to achieve these gains sorta of gains. Moreover you have to take the purported gains with a grain of salt. The models are probably trained on the evaluation sets they are benchmarked against.