Vegeta, the tool I used for benchmarking, iterates through all those targets round-robin style while attacking the server and then averages the results when reporting the average response size in bytes (and it only measures the size of the response body, it doesn't include other things like headers).
Even using the same library and same compression algorithm not all 200px by 200px QR code PNGs will compress to the same size. How well they can be compressed depends a lot on the encoded piece of text as that determines the visual complexity of the generated QR code.