A/B testing - a guide for beginners

December 21, 2021
-
-
Raphael Mink
A/B testing is a proven method for testing different versions of a product live on the market. There are a few rules to follow when using this method.

The use of A/B testing, especially online, has skyrocketed in recent years, not only for tech companies but also for companies in other industries. They conduct thousands of such online experiments every year. They test whether method "A", i.e. the current application of a product, is superior or inferior to method "B", which represents an improvement. 

"We run hundreds, if not thousands, of experiments at the same time, involving millions of visitors. We don't have to guess what customers want, we can run the most extensive 'customer surveys' available over and over again to get them to tell us what they want."

Mark Okerstrom, CEO of the Expedia Group

A simple example of this is a company's homepage. What is clicked on and what is not? What happens if a certain button is placed on the left instead of the right? To carry out the A/B test, two different groups are each shown a different version of the website. A distinction can also be made between the mobile version and the desktop version. This application of an A/B test appears logical and comprehensible. Nevertheless, there is also another level: for example, it is possible that mobile users generally click fewer buttons than desktop users. How is this fact incorporated into the tests? In addition to web design, A/B tests are also used in SEO optimization, for example. The tests are primarily used to measure traffic, to reduce the bounce rate of users or to measure new products.

A/B tests are particularly popular because they are cost-effective and deliver results quickly. Users' reactions to changes in products can be tracked in real time thanks to online interaction. This is why an A/B test is a popular method for updating digital products or creating new products. They are used in newsrooms of online media, in the financial sector, when launching new apps, but can also be used for innovation in physical products. 

Even though most A/B tests are carried out online today, they have been around for decades. Consumer goods giant P&G used a virtual reality tool for market research back in 1997. When developing a sports-focused Febreze variant, the company reduced the development time to market launch by up to 50 percent. They developed different design variants without testing physical prototypes with customers. This made it clear early on what worked and what didn't.

Successful A/B tests are crucial for a match

The tests provide initial feedback from users and reduce the risk of bad investments. The consequences of changes can be measured objectively, at least in an initial phase. At the same time, A/B tests provide a direction as to how sales can be increased through changes and improvements. For many companies, A/B tests have become indispensable in the development of new products. They serve as a basis for deciding when and whether new products should be introduced. They also show how existing products can be improved. If a company wants to open up new markets or address new target groups, they use the test results to do so. 

Tech companies such as Linkedin, Netflix and Spotify in particular could not exist without A/B testing. The method determines the future of applications. For example, each of the approximately 220 million Netflix users worldwide has an individual start screen tailored to their behavior when they open the app. This is partly based on A/B tests that are carried out automatically in the background. User behavior in particular is analyzed, and another well-known example of this is Booking.com. They carry out over 25,000 growth tests per year. Thanks to these methods and mechanisms, the Amsterdam-based start-up has become the largest accommodation booking platform in the world. According to the company itself, this success is also based on A/B testing. Without this, Booking.com would not have grown so quickly. Every employee is allowed to carry out such tests and does not need permission from their line manager.

When interpreting the results of an A/B test, companies usually use software, such as Optimizely, to carry out the calculations. However, some also employ statisticians to interpret the results or they work with an external service provider who takes on this task. They can then also carry out more complex tests, in which more tests can be carried out simultaneously with different groups. These different versions are then evaluated, compared with the machine-generated data and interpreted. The software delivers results in real time. However, it is important to avoid making quick decisions based on these results and to have the patience to let the tests run to completion.  

Despite the simplicity and effectiveness of A/B tests, there are still a few points to bear in mind. 

1. define goals (Where do we want to go with the product?)

2. prioritize questions: What are the 2-3 most important questions we want to be sure the test answers?

3. which data have we already collected in past tests, which are relevant for a specific topic? (e.g. as possible benchmarks)

4. test design: How do we set up the test optimally to collect the data we need to answer our questions?

5. structure of the test (A/B/C, channels, tools, budget, duration, etc.)

6 Then the actual data collection takes place.

7. preparing the data analysis and reporting: Which hypotheses can be refuted and which can be confirmed?

8. what strategic decisions can be made based on the test and are more data points needed or do we need another test before we can make the big decision?

Don't just pay attention to the average

An A/B test measures the behavior of a specific user group. It displays results that are provided by an average of people. In the real world, however, there are still striking differences in the behavior of different customers. This can lead to a discrepancy in a change that was determined using A/B tests. Some use the product much more, others don't use it at all.

The most commonly used dashboards for A/B testing do not differentiate between two scenarios. They assume that both groups of users exhibit the same behavior. For example, if you make a change in an app and the users then generate more revenue, it may be the case that not all users spend more money, but a certain proportion spend more money than before. In this way, real users are stylized as ideal customers. 

It is therefore advisable to use different versions of A/B tests for the respective user segments in order to sharpen the picture. AI is now also taking on this reporting task, but homogeneity should also be avoided here. The aim should be to represent all users in the testing. To achieve this, different test designs should be set up and switched back and forth between the target groups. Different data is then obtained from the same user group. The aim is to map the perception of each individual user as accurately as possible. To do this, different markets must be broken down and region-specific habits must be taken into account. 

The users are networked with each other

In standard A/B tests, group A and B are compared with each other and conclusions are drawn. It is assumed that the two groups do not interact with each other. However, this is not the case. The two groups communicate with each other and this can influence the results. The control groups interact with each other and therefore cannot be considered completely independent of each other. In order to avoid a distortion of the results, the group interaction should also be measured or the two groups should be isolated from each other. However, it is also possible to use A/B tests based on a random generator, which is tested alternately in different scenarios. This can avoid all users being involved in the same scenario and the product only being changed according to this result. 

Don't start a rush job

A/B tests are only successful if they are used over a certain period of time. It is not enough to carry out a few tests for a few days and then use this conclusion for the launch of new products. Although initial signals from users can indicate a direction, it is advisable to test initial changes to the product again afterwards. This is also due to the fact that users are generally positive about new products or changes to them for the first time. They show a high level of commitment to new features - but this usage behavior can then quickly collapse again. In addition, a longer test series can also take into account how users interact with the new product. Changes can be made gradually. 

Benchmarks are particularly important here: what is a good and what is a bad value? In terms of click-through rate or conversion rate, differences can be identified depending on the product, market, channel saturation or brand. It is therefore important to track your own benchmarks precisely over time in order to be able to fall back on comparative values if necessary. A long-term approach is therefore also key in this context.

Quick overview

This is why a new function should be measured again and again and not just at the beginning. Users' fascination with new applications can quickly fizzle out. A company can also differentiate between a larger group and a smaller subgroup in the A/B tests - and apply the longer test series to a smaller number of users. You can give this group more time to test new products and then receive more extensive feedback. This increases the quality and relevance of the new product.

Online A/B tests are an effective way of testing new products in different customer segments. However, if they are only carried out superficially and not visualized and only short-term effects of the users are taken into account, this can lead to an incorrect interpretation of the results. However, such tests can certainly be used for longer-term findings if they are used with different groups over a longer period of time. 

Company culture is crucial

Despite their effectiveness, the question also arises as to why more companies do not subject their products and concepts to A/B testing. This is primarily a matter of corporate culture. It is therefore important to understand that A/B testing is not just a technical matter, but also a cultural one. Two questions are of central importance here:

  • How willing are you to be confronted with being wrong every day?

  • How much autonomy are you prepared to give your employees?

There is always a defensive attitude towards such tests because the data reflects the reality of the users and not the truth that the company wants. One example of this is A/B testing when setting titles for journalistic articles. The journalist wants a title, but this is then adapted and is more popular with the readership than the old one. This truth is welcome by everyone.

"If the answer is that you don't like to be proven wrong and don't want your employees to decide the future of your products, it won't work. You will never realize the full benefits of experimentation."

David Vismans, Chief Product Officer at Booking.com

A/B tests are ubiquitous, but there are other and more accurate ways of testing control groups. However, A/B tests can be carried out quickly and therefore provide a rapid overview and thus an understanding for users. Because initial results are available in real time, the advantage of A/B testing is that it is also possible to react quickly or set up new framings. This allows for a versatile scope of action. 

Do you have any questions, comments or would you just like to chat?

I would love to hear from you. Contact me directly via Linkedin or email.

More articles

This might also interest you