Building a good UI is hard. Building a good UI that will delight your users is even harder. By incorporating A/B testing into your UI, you can stop guessing what your users love and start making data-driven UI decisions!
4. What’s the big deal?
50% of visitors
see the control
50% of visitors
see the first variant
Order now Order now
5. What’s the big deal?
Order now Order now
33% of visitors
see the second variant
Order now
33% of visitors
see the control
33% of visitors
see the first variant
Hi everyone! I’m Isabela. I’m a software engineer at Microsoft working on React frontend. I’m super passionate about design, so today I’m gonna talk about something I feel like doesn’t get a lot of attention - A/B testing! Specifically, how we can harness the power of A/B testing to build data driven UI and bring some objective answers to a very subjective field. To start, I’ll go into a bit more detail about A/B testing and then we’ll jump into live coding a React app.
What’s A/B testing you might ask? Well, it’s basically a fancy name for an experiment. It’s a way to compare two or more versions of something to decide which is more effective and it’s actually been around for a really long time. Like, back in the day, farmers used to do A/B testing to decide what kind of fertilizer to use. If it’s good enough for farmers, it’s good enough for us!
Before we even get into implementing A/B testing in React, let's first talk about what A/B testing is and why it’s important.
Imagine this: you’re working on this awesome app and you’re almost ready to ship. You’re demoing the append someone asks “why don’t we make this button blue instead of green? I think more users will click on it”.
Hmm… if only there was a way to scientifically determine which color is better for this design.
With A/B testing, we can run UI experiments without having to keep redeploying client-side code. Let’s say we want to test which color button gives us the most clicks, leading to the most sales.
At a very high-level….
We can set up an experiment that shows 50% of users the control color, which is blue n this case. The other 50% of users will be shown the variant which is the green button. Then we keep track of how many times the control is clicked vs the variant to decide which is more effective.
And this works with however many number of variants you want to test. But why even go through all this trouble in setting up an experiment? Why not just pick one of our options and call it a day?
With A/B testing, we can run UI experiments without having to keep redeploying client-side code. Let’s say we want to test which color button gives us the most clicks, leading to the most sales.
At a very high-level….
We can set up an experiment that shows 50% of users the control color, which is blue n this case. The other 50% of users will be shown the variant which is the green button. Then we keep track of how many times the control is clicked vs the variant to decide which is more effective.
And this works with however many number of variants you want to test. But why even go through all this trouble in setting up an experiment? Why not just pick one of our options and call it a day?
Sure, there’s design guidelines and “proven” ways to organize and present your data. But there’s rarely methods set in stone for the little details, like “should this button be blue or green? Should we use this icon or that icon?”
Instead of relying on your gut, or whatever someone else says is right, A/B testing is a great way to make data-driven decisions, removing emotion from the picture and focusing on choosing the best design for the user.
It lets you test different versions of a piece of UI and measure how successful each version is.
By performing these experiments on the users directly, instead of an internal group or small subset of users, you’ll have a pretty good idea of what the customer response will be once you choose the final design and release it into the wild. This will optimize for user engagement and success.
Research the current behavior of users on your app or website. The goal of this is to understand where the user engagement drops off. Think about what’s not going right - are users not finding a particular feature? Are you getting a lot of traffic on a certain page but not a lot of conversions?
Identify the problem. Is it because a sign up form is too long? Because users can’t tell they need to click a certain button?
Create a hypothesis. Making this button green will increase the number of clicks on this button.
Experiment. Based on the hypothesis, you create a variation in the colors of the problematic button. You’ll split the app traffic equally between the control and all the variants. Then you’ll run the experiment until you have a statistically significant value to make a decision.
Once you’ve reached a statistically significant value, you can stop the experiment and see if the one of the variants performed better than the control.
Ship. Decide which variant you’ll implement for all your users and then ship it!
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Some other tips to keep in mind:
- You need to results to be statistically significant. That’s basically a mathematical way of saying a certain statistic is reliable. You should aim for 95 - 99% statistical significance.
In order to get statistically significant results, you can’t run your experiment on just a tiny group of people. Usually, that’s at least 1000 users.
You’ll also want to choose something significantly different to A/B test. For example, don’t A/B test whether a certain text in the UI should have a comma or not. Test things like calls to actions, forms, images, colors. But at the same time…
Test small components of your design. For example, you don’t want to test two completely different layouts because then we won’t know what specific part of those layouts resulted in a better or worse user experience. A better use of A/B testing is if we have a few different designs for one particular piece of UI and we want a data-driven decision for which design to pick. So if we had an icon menu and we were trying to decide which icon is best for our “Store” button, we could use A/B testing to experiment with a few different icons and let the data decide which is best.
Don’t test just for the sake of testing. If a feature is already getting the user engagement you need, find something else to test that will bring more value.
Test your variants at the same time. Running the variants sequentially means that the user base could be different. For example, you might be running a variant on a week with skewed user traffic, so it’ll appear that it was successful just because there happened to be more traffic the week it was being tested. Instead, run the control and variants at the same time to limit the number of uncontrolled variables.
Figuring out all this stuff on your own can be hard, so I put together a few resources to help that you guys can reference from this deck after the talk.
First, there’s this A/B testing calculator you can use to figure out if your results are statistically significant.
- Then there’s this website that helps you figure out what sample size you should be using. The general rule is at least 1000 users and possibly up to 5000, but this tool can help give you more accurate numbers depending on the details of your experiment.
- One last resource is this calculator that gives you an estimated number of days you’ll need to run your experiment for based on your average number of users and number of variants. You’ll still need to hit your minimum of 95% statistical significance, but this tool will give a rough idea useful for sprint planning.
Now let’s get into what it looks like to implement this.
The first thing we need to do is set up our React app. Of course, you can use this in an existing app, but for this talk, we’ll be walking through the end-to-end dev cycle.
- Next, we’ll need to install the A/B testing npm package. In this demo, we’ll be using the marvelapp fork of the react-ab-test package since we’re using React 16. If you’re not using React 16, you can use the main react-ab-test package by pushtell.
So you can see here we have a pretty basic React component. It has a header, some description text, and a call to action button.
Here’s what the component looks like
Let’s say we wanted to A/B test what color we should make the call to action button to increase clicks.
Now it’s time to define our experiment.
First, we import the package we installed earlier
Then we’ll enable the debugger view. We’ll see what that looks like later, but basically, it will let us locally test each variant’s UI without having keep refreshing the page and hoping to be randomly assigned to it.
Next, we’ll define the experiment, along with its control and variants. In this demo, we want to have two variants for the two new button colors and we want to give them all an equal chance of showing up, so we weight each one at 33%
Then, we wrap the smallest possible component with the Experiment tag. Notice how I’m not wrapping the entire component - just the button we want to be part of the experiment. In theory, we could run multiple experiments at the same time by wrapping different parts of a component, or different components entirely with different Experiment tags.
The Experiment tag is going to contain multiple variant tags, one for each of our experiment options. For now, we only have the control UI defined, so we’ll just wrap the current button with a Variant tag and label it as the control
Next, we’ll add the Variant tags and corresponding UI for the remaining variants to match the experiment definition.
In our demo, we’ll add two new Variant tags, each containing a button with a different class, so we can style the buttons differently.
The selected variant is saved in our local storage, so it persists across refreshes.
We can see that if we change the variant through the debug panel, the local storage gets updated to reflect the new choice
To test to make sure that we’re actually getting shown different variants, we can clear the local storage and refresh the page. We should see a different variant.
Now we need to think about what our win condition is
For this experiment, we’ll say that our win condition is user clicking the “learn more” button. So when they do this, we want to emit a win event. We’ll use the ab testing packing from earlier to do this too.
We’ll add a click handler to all our buttons. In this click handler, we’ll emit our win.
Now it’s not enough to emit a win. We also need a win listener. In this win listener, we’ll just console log the experiment name and variant for now.
We can also add a play listener, which fires when the page is loaded. This can log what experiment and variant a user is seeing, even if they don’t interact with anything in the experiment.
Now we’re tracking these wins locally (well… console logging them). But what we actually want to do is upload our wins to some telemetry tracking platform so we can figure out which variant in our experiment wins overall so we can actually make a design decision.
For this demo, we’ll use Mixpanel because there’s very little overhead to set it up, but there’s tons of other options like Optimizely, Google Optimize, and Wasabi. Wasabi’s actually a really great open-source platform for A/B testing that’s super customizable and you guys should check it out.
And as a side note, even if you end up going with a different tracking service for your tests, the only difference it’ll make in the code I’m about to show you is the actual API call. Everything else I’ve demo’d stays the same.
- To use Mixpanel, we’ll go to their website and make an account. Then we’ll create a project and take note of the token it gives us.
Now back to the code, we’ll import the Mixpanel package and init it with our token.
Then in our win listener, we can call the Mixpanel API to log this win. We can send it any string id - here I’m using the experiment name plus the variant name to keep things unique but consistent - and we can send any object along with the API call. Here I’m just sending the experiment name and the variant name, but you could also include a timestamp, user id, or other data.
Let’s say we take this code and deploy it to our users
As users interact with our experiment and trigger our win condition, we’ll start seeing data in the Mixpanel website and it’ll look something like this.
- Looking closer, we can see that the green variant was the most successful. Imagine that we had left this experiment to run until we hit some statistically significant numbers and green was still the winner, we could remove all the experiment code we added and stick with shipping just the green variant.