loader image

A/B Testing Product Changes

A/B Testing Product Changes

Introduction

With new and exciting AI technology emerging around recommendation engines, how can product leads evaluate which solution is better and how to really measure a “better recommendation”?

173tech who has worked on A/B testing and recommendation engines for Bumble, Plend Loans, MUBI, Treatwell and many others, and here our founder Candice Ren shares her thoughts…

“It is good to be more specific towards the goal. You want a better algorithm. What does that mean?”

Randomised User Allocation

When it comes to conducting the test, it is important that you randomise the user allocation into groups, e.g. a control and a test group. Randomisation is important so as to not introduce biases into your results. For example, if one group has a higher percentage of loyal users than the other, it will likely return more favourable results regardless of the algorithm. 

Is It Worth Testing?

To avoid testing things with limited impact, zoom out and think about the customer journey as a whole. Map out the different touchpoints to identify areas with the largest drop-off. Focus on these areas as the right solution will give your product the biggest boost. If your test only affects 5% of users, then it is probably not worth it.

The Right Metrics

Your test metric should be the immediate lift as a result of the product change. If your algorithm is returning a suggested sofa, the first measure is if users are interested in this recommendation, i.e. how many users click on it (CTR) to learn more. You should also keep an eye on “counter metrics” for every test. These are KPIs with a direct impact on business bottomline, e.g. number of purchases and revenue from the recommendations.

The funnel between clicks and purchase is also important to analyse, e.g. add to cart and checkout. Perhaps users like the recommendations but it does not tailor to their budget. Or certain recommended products are not available in their location.

Prevent Overlap

While a 50/50 split for a single test is the example we used, in real life it is more likely that different departments need to test different elements at the same time. In this case, you will need a mechanism to prevent overlapping tests on the same users. Otherwise your results will be contaminated. For example, the product team is testing different recommendation algorithms while the billing team is testing new pricing strategies. “Make sure when you test, you are changing one thing at a time for the targeted groups”.

Statistical Significance

Once you have results from your test, you have to ask yourself whether it is statistically significant. Significance tests help you minimise the possibility of any uplift resulting from random chance. Here is a useful link to calculate test statistics. We recommend setting a confidence level of 90% or above.

https://www.evanmiller.org/ab-testing/

Automate & Save Your Learnings

From a data perspective, all your test results should be automated into a dashboard with a list of test and counter metrics, and their statistical significance. Once a test is concluded, you should summarise the learnings in a centralised place so everyone has visibility. You should NOT test something that has already been tested before without a good reason.

Conclusion

If you want more info on how to run A/B tests across the product lifecycle or need advice in tying together event tracking, website and revenue data, then be sure to get in touch with the friendly team here at 173tech!

More Articles

Our eBooks

Our Services

Contact Us

150 150 173tech