The Pepsi critique can also happen with AB testing depending on what you're measuring. The problem that Pepsi had is that they measured user satisfaction after a small sample not a full can. This would show up even if you only present drinkers with one sample.
By ”the Pepsi critique” I specifically meant the exposure of both variants to a single user.
I agree with you that having users take a single sip is also a problem for construct validity. But it’s not clear from your post how this would this would happen in a properly run AB test.
Are you thinking of the case in which we ship a change that confers short term wins in user spend or churn, but comes at a long term cost that isn’t visible within the window of the test? This is a common problem, but it’s not clear to me that it’s the same problem Pepsi had. Here the problem is with choice of response variable, whereas Pepsi’s problem is choice of treatment.
My point about the Pepsi challenge is that it would have produced the same erroneous results if they ran it in a way that only exposed consumers to one or the other. This is because the flaw was that they were measuring satisfaction after a sip or whatever opposed to a full can.
It's about what you actually measure versus what you think you are measuring. In this case, the can satisfaction vs versus the sip.
Short term spend or churn could be examples. For an information serving website, examples might be mistaking time on the site or pages viewed for user satisfaction and finding the information they want.