Just as some basic context, there are two related approaches to A/B testing. The first comes from statistics, and is going to look like standard hypothesis testing of differences of means or medians. The second is from Machine Learning and is going to discuss multi-armed bandit problems. They are both good and have different tradeoffs. I just wanted you to know that there are two different approaches that are both valid.