How Many Visitors Do You Need for Your A/B test? Get the Answer Here!

“How many visitors do I need for my A/B test?” This question springs up in the mind of almost anyone starting out with Conversion Rate Optimization and the internet is littered with people asking just that. This article will provide you with a basic understanding of the mathematics that go into finding that elusive number. You’ll also get a few nifty tools for calculating the required sample size for your specific A/B split test.

Deciding on the number of visitors to achieve statistical significance is not an exact science. Rather, it’s a more of a range. Theoretically, you can never say I need xx visitors to achieve significance. What you can say is “I need at least xx visitors to be xx% sure that the results are reliable”. Please note, the results are still not correct, they’re just reliable.

Reliability

In statistics, significance means “reliable” instead of the English usage which indicates “importance”. What you want is to know is that you can reliably conclude that the winner indicated by an A/B test is actually the winner. The best-practice benchmark for statistical significance is 95%.

Reliability increases as you increase the number of data points. For example, if you split 1000 visitors between two page layouts (the original, called “control”, and the “variation”) and you receive a result, that’s cool. However, if you receive the same result after splitting 2000 visitors between the two page layouts, that’s far more reliable. This is common sense that’s solidly backed by statistics. What statistics will additionally show is that the range of error will go down.

Range

In the image above, the range within which the actual result may lie is given by the ± number. For the variation titled “free download”, It means that the result can be 56% plus or minus 5.9% and the software is 98% certain that the actual value lies within this range. The range decreases as more visitors are tested as the software becomes increasingly certain about the result.

The larger your sample size and statistical confidence level are, and the lower your standard error is, the more reliable your test results will be. As a rule-of-thumb the more dramatic the difference between the two variations, the smaller amount of visitors (sample size) you’ll need to achieve a statistically significant result – and vice versa.

Useful tools for calculating sample size and statistical significance 

The Visual Website Optimizer A/B Split Test Significance Calculator

A/B Split Test Significance Calculator (Excel spreadsheet)

Useful resources

How not to run an A/B test – Evan Miller
Statistical Significance and other A/b testing pitfalls – Cennydd Bowles
Excellent Analytics Tip #1: Statistical Significance – Occam’s Razor
A/B testing Tech Note: determining sample size – 37Signals

Author Bio

Siddharth Deswal works at Visual Website Optimizer, the world’s easiest A/B testing software. He’s been involved with web development for about 8 years now and actively looks to help online businesses discover the value of Conversion Rate Optimization. He tweets about A/B testing, landing pages and effective marketing tips on @wingify

Comments

  1. JV says:

    I normally conclude tests when they’ve reached 90% or 95% reliability. Under what circumstances would it be appropriate to conclude earlier? How big of a sample might one want to call a test at 85% or even 80%?

    • Michael Aagaard says:

      Hi JV – Thanks for your comment.

      Personally I never conclude tests at anything lower than 95%, and I rarely conclude tests at 95%. I’ve seen too many tests change after they’ve reached 95%, so I try to get as close to 99% as possible.

      But it’s all question of how certain you want to be that the results you are seeing are in fact reliable. If you conclude a test at 80% statistical confidence, there’s a 20% chance that your data is off.

      But you need to look at other factors also, in order to determine validity. One very important factor is the standard error. If the conversion rate e.g. is 5% for the control and 6% for the variation, but they have a standard error of 3%, that means that the conversion rate is 5% give or take 3% (control) and 6% give or take 3% (treatment). If you have 95% statistical confidence at that point, it means that there is 95% chance that the conversion rate for the two variants is somewhere between 2% and 9%. All in all that means that you’re pretty darn far from a valid test result. However, as the sample gets larger, you’ll see the standard error begin to fall, and your data will become more valid.

      The required size of the sample depends on how big of a difference there is between the variants, and how certain you want to be that your data is correct.

      - Michael

Speak Your Mind

*