Episode 15: A / B Testing: The (not so) holy grail of conversion optimization


Hello! My name is Jörg Dennis Krüger and as my sausage-cable-drum-winder at the reception just said: Yes, I am the conversion hacker. And this conversion hacking podcast episode is about A / B testing.

Anyone who knows me, who has known me a little longer, knows that A / B testing is one of my absolute basic topics. I started A / B testing in 2008 with the topic, and in 2006 I even started a topic for Omniture. In the meantime it's Adobe Test and Adobe Target, these are old products that we used and introduced at large companies like DKV, Allianz or similar at the time. And that means that since A / B testing and my book, which was published in 2011, is conversion boosting with website testing, it is not without reason that it is exactly the same.

My book on A / B testing

Conversion boosting with website testing, because the focus of the book is already very much on A / B testing. I will present the conversion boosting model, how to approach the topic of website optimization and testing at all, and I will show you how to test and how to evaluate time periods. La la la la. But I have to say that I have learned a little further in the meantime, because optimization is not about testing. I mean, testing grew up thanks to Barack Obama, because in his election campaign he collected a lot more donations through A / B testing. And from this fundraising campaign, today's A / B testing provider Optimizely emerged. This is basically what was somehow built for the Obama campaign in the beginning. Of course, a lot has changed in the meantime, Optimizely most has somehow received 80, 90 million venture capital to further develop the tool and so on.

A / B testing tools: Optimizely, Google Website Optimize & Co.

As sophisticated as the software is now, the entry price is now very high. Why I don't recommend it that often - but a cool tool. So that's how everyone wanted to do A / B testing, what worked for Obama, that works for me too, and so on. The big problem is, most of the shops or websites, but for me mostly shops, are simply not test bar. Why? You don't have enough traffic, because such a test is simply a completely normal double-blind study as we know it from medicine or general science.

Statistically significant results

And so that I have enough results and statistically significant results in such a study, I just need enough data and this data is of course always visitors to the page on the other side, conversion, these are the two main factors that play with it. And if I have too few visitors on the site or a currently simply too low conversion rate and mostly both. Then I don't come up with any statistically significant results, then I always have the problem that I somehow have data, but if I do a little math, it's actually all random data. In the testing tools, this is then also displayed as confidence or significance.

And if it somehow doesn't come over 60, 70 percent - well, 50 percent is a coin toss - 60, 70 percent is not very much better and, if you think about it more carefully, then you notice that, you really have a lot of data needs to get really reliable results and over a certain period of time because you have to test at least 7, probably even 14 days in order to have at least once actually at least twice every weekday.

Optimal trial period for A / B testing

But you shouldn't test for too long in order not to get too many external influences too much noise and that's why 2 to 6 weeks are the optimal test period. And yes, if you don't have enough conversions and you don't have enough visitors then it becomes difficult. What do you mean now enough. So the rule of thumb is: I need my minimum of one hundred conversions per test variant. But that's just a rule of thumb. If both variants have the same conversions, I'm only at a "fiftyfifty" probability of which variant applies.

That means, if I have two variants and I have 200 conversions then they have to be clearly different. So 150 to 50 conversions for example. That would probably be a significant difference if we can say yes the 150 conversions variant is definitely better than the 50 convergent variant. But there is a bunch of calculative ones online.

Calculate test duration

If you simply work on your computer or calculatory or something, then you will find in all A / B testing offers whether Optimizely, A / B-Tasty, definitely somewhere at Adobe and VWO and what do I know where, can be found If you have these test links calculated function everywhere, everything a little bit differently might show a few little different results, because there are of course a few more, mathematical variables that you can incorporate with them. But there is a relatively good feeling whether you can even test the app or not.

Because nothing is worse than planning a test at great expense, installing a tool, implementing variants, starting tests and then realizing: I'm not getting any results. Do you turn the thing off after three, four, six or eight weeks and realize “yeah shit, all that work was actually too much”. So you didn't have to do it - it was free. And that you are in no case that is the worst case. Negative test results when you realize Oh this change that doesn't work at all is not the worst case at all. That's pretty cool because I've learned something and we want to work through the particular ones to learn. We want to get to know our visitors better.

Learning through A / B testing results

You don't always hit conversions uplift immediately, i.e. an increased conversion rate, but sometimes just a downlift you notice wow that doesn't work at all. I just tested something in an online shop some time ago, where I was very, very sure that it would lead to more convergence, because we actually built a banner like this in the whole shop at the top, where the reviews were pointed to five stars at Trust .

In fact, that didn't mean that more was sold. On the contrary, we had a significant number of downloads, which resulted in a significant reduction in the conversion rate. Why. I dont know. Why is it extremely difficult for us to answer through testing. But we know that we'd rather let it do something else. And that's also the reason why when you build test variant testing you have to proceed as pragmatically as possible, which means you have to build variants very, very quickly.

Wonderful editors

Do not program a variant for two three four six eight weeks and then perhaps notice it after a week, often not forgetting it, but rather launch a variant quickly and practically as a square. Most tools have wonderful editors or point and click editors. You can quickly and super-coolly build a variant. Of course, you have to be careful that it is still displayed correctly in all browsers and that something doesn't break at point and clicky. But so you can usually in hours, often even in minutes,

Build a very usable test variant of which you then see if you then see whether it works quite well then maybe put more work into it, that is really finely programmed and then implements the shop permanently or something like that very often, strategies tend to say Oh good idea I can do that yes I can program right away. We don't even need to test it at the moment. No, we want to test it. We want to know if it works better and if so how much better. So that we can then simply make decisions and not just implement something afterwards, we don't know what makes more conversions. I once worked for a large car rental company and it's very family-run from above.

Example: Sixt car rental

You could also say authoritarian, even if that - Alexander doesn't do that - Konstantin you a little bit. Your papa, of course, who rules from above. But he can do that too. In any case, they just launched a new website. It should have a bit of Google's style and stuff and you haven't tested anything yet - you just launched a new site. You didn't know what impact this would have on the conversion rate. Then I came. I spent a year on websites and then I just tested it in the other country, but launched it in the USA and basically tested a slightly different logic again. A bit like it was before.

Not a Google slot, but a little more classic, as you know it from travel booking machines. We had a huge uplift and meanwhile the German side has changed a lot again. That means you just learned from it. Because these "Hippo decisions", that is, the Hightest Paid Persons Opinion, i.e. the opinion of whoever earns the most, does not work. Even though I know, Konstantin, you don't earn the most, you only pay a small salary - but there are definitely royalties.


This Hippo (“Highest Paid Persons' Oppionion”) isn't always good, in fact it's exactly the opposite. Because the Hippo (“Highest Paid Person”) often has no idea about his customers, who are usually relatively far away from day-to-day business. And then just a decision, even worse somehow then the woman's opinion or something. But I think it's better to implement it and that without testing is of course a drama.

That's why testing is a cool thing to do and with such large companies like this rental car provider you can of course test very, very well, but funnily enough, not in every country because there are countries that just don't really have enough traffic testing. You also notice that even in such large companies, the traffic is not necessarily so high that you can actually do testing with it. You have to ask yourself again whether you just want to do alibi testing or if you want real results

So I first have to find out if I have enough traffic and enough to be able to test, then I have to do very pragmatic tests with which I can quickly generate results I can implement monitoring and then I have to do my test 14 days three weeks four weeks a maximum of six weeks let go and then I'm good at it. And if I have not yet found that I do not have enough traffic for real testing, then come back to the topic of heuristics or best practice. I think the scientific term heuristics is somehow cooler because it somehow says more clearly what it's about. Because a heuristic is something that can make predictions about the future with limited knowledge.

Best practices and heuristics for quick results

So it's raining and this is my umbrella is a limited knowledge if I use it I will probably stay dry. So heuristic umbrella in the rain makes you dry because I don't know all the factors. It could be that I wind up a lot now and can't even use the rain. But it hits pretty well and no idea about the online shop, it can just be Slider sucks. Unfortunately, in 98 percent of the cases, conveying is not convergent. So unfortunately we've already got out or just like that. Where do users get lost? You can analyze qualitatively and quantitatively and then determine where users simply break off.

Then you can already have a look look here they all find the button, they click yes all of them do not put anything in the shopping cart. First of all, I don't need a lot of tests, but I can first implement heuristics and best practices, and that's 80 percent of what I do with my customers. Okay, what are the right heuristics to generate more conversions straight away. So how can we clean up first. So before I call an interior I somehow call that too.

The holy grail of conversion optimization?

Command to the painter and not immediately in the interior and rumbled, dump comes he will also say Hey what am I supposed to be here and A / B testing is on the interior and, until mostly you get to clearing out and the painter and for many he suffices in blue and the painter also because the boarding school director is so expensive that he may not even be able to play in his added value. So still a nice metaphor at the end. I love that thing and that thing is a super cool thing, but it's not the holy grail of conversion optimization, not the holy grail for more sales in the shop, because it is effort and because you have to have enough traffic at all. In this respect, it is worthwhile to clean up the shop properly and at the very end Here is a tip to do it yourself. Of course you can talk to me but do it yourself.

The LIFT model for A / B testing

There is the lift model - developed by Widerfunnel. This is an agency from Canada. Many greetings to Rachel. In any case, they have a very funny analogy: namely, they compare a website with an airplane. And an airplane needs a few basics to be able to fly: for example, wings. This is basically the representation that the advertising promise on the side without wings, we don't need to do anything else without wings, we can pump in as much kerosene and have as long a runway as we want and what do I know. It will not work.

So first we need wings and then there are things in this model that make the aircraft take off. These are things like trust and a clear structure and things that keep the aircraft on the ground that are something like fear and distraction. And then there is something else about Turbo Boost on the plane. This is urgency and only today that actually often brings a lot if it is meant honestly, so go to Lyft model Google if necessary catch Lyft model again and links down here.

In the blog and podcast. Take a look and just implement it as a basic heuristic. Because I am also happy when I talk to shops that are simply fundamentally well done and then I don't start with the absolute basics because I prefer to paint with a slightly smaller brush than with the thick roller or I prefer to cut through with a machete to walk the jungle. And yes, but I definitely want more conversions. And I think you can do a lot right with the tips from this podcast. Achieve Quite a Lot Give me feedback in the Podcast Comments. But please also on iTunes and Spotify because I'm looking forward to your feedback and I'm looking forward to five stars.

With the machete through the conversion jungle

1 comment

  • Hallo,
    the text variant of this post appears to have been machine transcribed. Unfortunately, it is very tiring to read this. The contribution is nevertheless very valuable. Maybe it makes sense to proofread and edit accordingly...
    LG + Thank you

Leave your comment

the theGerman