How Negative Test Results Produced a 48.69% Lift in Conversions on a B2C Landing Page

What if my A/B test generates a negative result and decreases conversions? Understandably, this scenario is frightening to many marketers. But the truth is that negative results aren’t the end of the world. In fact, they can teach you just as much as positive ones. In some cases, the insights you get from a treatment that tanked are just what the LPO Doctor ordered. Here’s a case study where a negative result lead to a winning version of a B2C landing page that increased conversions by 48.69%.

Background information:

Client: OK A.M.B.A, one of the major oil and energy companies in Scandinavia.

Product: Heating oil. OK has a great offer, where you can save up to DKK 1.150 ($200) the first time you buy heating oil. However, the offer itself is quite complicated as you have to choose a number of add-ons in order to get the full discount.

Landing page: PPC/SEO landing page designed to push potential clients on to the first step of the checkout flow

Optimization goals: Increase click through to the checkout flow (primary), increase number of sales (secondary)

Restrictions: We could not tweak the checkout flow. The checkout flow is rather complicated and has significant impact on the number of completed sales.

The original landing page (control):

The control version was very copy heavy and lacked visual support of the offer. In fact, the only graphic element was an image of a little girl washing her hands.

As mentioned, the offer is quite complex, and it requires quite a bit of information to understand the setup. The control version had quite a few links to sub-pages that explain more about the different aspects of the offer.

Step 1 of the checkout flow:

When you click the main call-to-action you land on step 1 of the checkout flow where you can add and remove different add-ons and calculate your final discount. The complexity of the flow naturally has a significant influence on the number of completed orders. As mentioned, we could not tweak the checkout flow.

Our variant (Treatment A):

We mainly focused on making the page more visually appealing and easy to interact with. We used a more relevant image and chose to show the calculation that lies behind the discount price, so potential customers could gain a greater understanding of the offer right off the bat.

We also chose to add more details about the add-ons, rather than sending visitors off to a number of sub-pages.

The first test Control vs. Treatment A: 

We were super psyched about our treatment and expectations were sky high when we launched the test. However, the results spoke for themselves, and there was no doubt that our treatment tanked totally.

Our beautiful variant underperformed by -30.27% measured on CTR to the checkout flow. It also underperformed measured on sales, however, we did not reach statistical significance on this conversion goal – in this case we deemed it to risky to let it run longer than absolutely necessary.

Follow-up experiments designed to isolate friction areas: 

We were very surprised by the results, and naturally this wasn’t the result we’d hoped for.

But being experienced testers, we had already prepared the client for the fact that optimization is a scientific process – not magic – and that it sometimes takes several tests in order to get the necessary insights to achieve a significant lift.

In order to get new insights, we did something that many might view as, well… insane. We took all the PPC traffic and sent it to our treatment – the loser.

Doing so allowed us to run a number of smaller follow-up experiments to isolate friction areas and elements that directly influenced prospects’ decision-making processes. During these experiments we learned a lot and isolated a number of friction points – marked by blue circles on the image.

The main learning was that the visualization of the calculation actually was a huge backfire. Also, the button copy and the image had a measurable negative impact.

Treatment B:

We took all our learnings from the previous tests and the combined result became Treatment B.


Control vs. Treatment B:

We held our breath, crossed our fingers, and launched a new test where we tested treatment B against the Control version.

When the test reached statistical significance, we were happy to be able to conclude that Treatment B outperformed the Control by 29.5% measured on CTR to the step 1 of the checkout flow, and by 48.69% measured on sales.

Main Takeways – What you can learn from this case study

“The goal of a test is not to get a lift, but rather to get a learning…” Dr. Flint McGlaughlin, MECLABS

Landing page optimization is a scientific process and the primary goal is to get a learning. As long as you understand what happened in the test and the results give you new insights, it essentially isn’t important whether the initial test results are positive or negative.

Off course hitting a home run in the first swing is easier on the ego. But when you approach optimization as a scientific process – not a one-off opportunity to swing for the fences – you’ll see that stopping at a few bases along the way is often what it takes to win the game.

One might be inclined to view the first test – Control vs. Treatment A – as a bad test. But in fact it wasn’t a bad test at all. It was an important first step towards the winning version that created a dramatic lift in conversions.

Moreover, this case study is a good example that the only way to be sure that your optimization efforts are in fact optimizing your website performance is to put them to the test.

Had we blindly trusted in our experience and not tested this landing page, we would actually have sold the client a page that underperformed significantly to what they already had. You might want to think about that next time someone offers to optimize your website without mentioning the word test.


  1. Interesting case study!
    Following the same idea, sometimes test are negative but avg basket higher.

    • Michael Aagaard says:

      Hi Remi – Good point!

      In some instances, you might see a lower CTR, but a larger basket size.
      That’s why it’s important to have several conversion goals set up.

      If one of your goal is to increase bakset size, and you succeed, then your test results on that goal will be positive ;-)

      – Michael

  2. Hi there,

    I have a couple of questions if you have the time,

    what’s a significant amount of traffic to test this properly and how did you come to the conclusion that the imagery was not spot on? did you try other images on pure guess?

    Many thanks and awesome case study!

    • Michael Aagaard says:

      Hi Lenny – Thanks for the comment!

      Test validity is a pretty complicated area, and it’s difficult to give you the full story in just one comment.
      There is no rule that xx amount of visitors constitute a significant amount of traffic – it very much depends on the individual case and how differently the variations perform. You can read more about sample size and validity in this post

      In relation to the image: as you can see the first and the final treatment feature different images. We tested a few different images and found that the original was underperforming compared to the variations. The idea to test the image came from experience with similar landing pages.

      – Michael

  3. Hey Michael,

    I’m reading through your e-book, and have a question about the study.

    You changed “ORDER” to “GET” and saw a boost in conversions. Did you test other short words like “BUY” or “TRY”? I’m wondering if the length of the words (3 letters vs 5 letters) would have explained some of the difference in conversion rate. Thanks!


    • Michael Aagaard says:

      Hi Tito – thanks for the comment and thanks for reading the book.

      In the case you are referring to, neither “buy” nor “try” is really applicable. However, I’ve tested other variations, and the number of characters hasn’t really had any influence. I’ve performed tons of CTA copy tests where a longer message has by far outperformed a shorter one (there are also examples in the book). All my research points to the fact that a more relevant CTA that conveys value and focuses on what you’ll receive – not what you have to do or part with – gets most clicks.
      Moreover, I see the same tendency across different languages.

      Hope that helps!

      – Michael

  4. Thanks for the quick reply, Michael! Nice to hear that number of characters doesn’t necessarily have an influence.

    I finished the book and loved all the linked case studies! Going to do some tests on my own little contact forms and signups! Any suggestions for what to read next? :D


    • Michael Aagaard says:

      Hi man – sounds great thanks for reading!

      Check out MarketingExperiments – both the blog and their Webclinics are packed with awesome insight and tips. The Unbounce blog is really good too, they also have a number of cool guides.

      – Michael

  5. Hi Michael,

    I like how you say that “landing page optimization is a scientific process”. It should be!

    I think your issue here might be called “premature stopping” or “arbitrary stopping”. If you based your decision to stop the test on achieving statistical significance, then this is definately your problem.

    Doing this completely ruins your statistical significance numbers to the point of making them completely useless. You can read this article on statistical significance and statistical power where I go in detail about those issues:

    • Michael Aagaard says:

      Hi Georgi – Thanks for the comment. This is an old case study that I posted 2 years ago. I’ve learned a lot about testing methodology since then. It really comes down to understanding that significance is not the same as validity. Statistical significance is only one factor in determining whether your test results are valid.

      Nowadays I run tests for full business cycles, I pay close attention to standard error, conversion rate range, and sample size/number of conversions. Moreover, I make sure to integrate the data from the split test tool into GA, so I can get detailed data on each individual variant. This allows me to segment deeply and find out how each variant is performing across platforms, devices, browsers, traffic sources, etc.

      Thanks for the link, I’ll check it our when I have a chance.

      – Michael

Speak Your Mind