Context: we have a majority IAP-based mobile game. On Facebook we almost exclusively run AEO Purchase and Value with wide audiences & lookalikes (mostly segmented by countries).
So keeping that in mind, the way I like to test creative is:
- the main metric we look at is install-to-impression ratio (install divided by impresions). This much more useful that just CTR or store-level conversion rate as it's very possible to have dramatically different CTR and store-level conversion proportions. Stirictly speaking, Store-level conversion rate is also not independent from CTR. CPI on its own isn't very useful either because CPMs tends to fluctuate – seeing a 20% change in CPM is quite normal on Facebook, invalidating your CPI results.
- there's a separate adset with app install optimization + autobid targeting a wide audience in the US on one platform. The reason is that app install optimization is significantly than AEO and it's easier to get significant results. At the same, the correlation between AEO and MAI is very strong. The install-to-impressions ratio proportions between different videos seem to always be the same! Out of 200 videos we produced and tested, we never really had a case where one video would perform better in MAI adsets than the other and we wouldn't have seen the same result in AEO. This works well as long as you are compare comparable things – i.e. it doesn't make sense to directly compare AEO to MAI because the absolute numbers can be very different (AEO typically has slightly lower impression to install and much higher CPM/CPI). We also rarely see significant differences in different regions (most of our videos feature different takes on gameplay and almost no text, so it might be different with heavy CG etc stuff)
- the adset is not running all the time – I only launch it when I have a new video to test. When that happens, I add the video to the adset and switch it on as the only active video. The reason for that is more control – if you have 3-5 videos (like we usually have in "live" AEO+value adsets), Facebook often likes distributing the spend to one video only. If you're testing new videos this way, they might not get enough installs even after a few days. Now this means it's likely they aren't actually performing, but Facebook often picks the wrong videos – and we've had plenty of cases like that.
- there's a standard $500 daily budget and I let it spend at least $500. Sometimes up to a $1000 depending on how many installs I got. This is more than enough to produce statistically significant results, but it's more than just that. There are also performance fluctuations based on how much you spend (i.e. if you spend $100 you typically get much highger conversion rates than with $500)
- placements should be tested separately – we typically run FB+IG (without stories) and FB+FAN separately, but limit it to just FB+IG sometimes. The reason for that is FB and IG performance is typically very similar both in absolute and relative terms. FAN tends to have much higher install-to-impression ratios thanks to the ads being full screen and we did have cases of relative performance differences between different videos between FAN and FB+IG (the easiest example is landscape vs. square – landscape often works better on FAN, square often works in FB+IG). So you want to keep the results separate and this again means you can't rely on placement optimization which often tends to favor particular placements. That leads you to having less control over the experiment and ultimately to losing time and money.
- split tests are a big no-no. They split your audience in at least two halves and this often effectively drops the performance (install to impresions ratio) at 2+ times. This means that to get statistically significant results you would need to spend 2+ times more budget on the test and the results will be questionable because of the narrower audience. There's also no real reason to use them for this! The only risk using split tests would mitigate vs my scheme is the day of the week influencing the results. We actually looked into this and it doesn't significantly affect the performance.
- once it's done, I take note of the install-to-impression ratio of the new video and compare it to the benchmarks. The benchmarks are basically past results of all the previous videos from test ran in the same conditions. – they are in a spreadsheet and you can compare them to each other (this also helps the creative production team make their decisions on what to do next). Of course past performance becomes invalid over time, so it's a good practice to re-test some of the creative every few months (with a strong focus on top videos so you could have a valid benchmark to compare to)
- if the video is 80%+ of the current benchmark, we add it to our live adsets and monitor the performance for about a week there. Typically, if the video is 100%+ of the benchmark, it quickly starts getting almost all of our spend which is of course a lot more significant that $500 a day :). If we see consistently great performance, the 100%+ result is considered to be proven and its install-to-impression ratio becomes the new top benchmark.
- in our experience Facebook almost always spends a large majority of adset budgets on the current top video that we'd gotten from this process. This is done by Facebook's creative optimization algorithm and not by us (we typically have 5 videos per adset from the top of our spreadsheet or others depending on the context). This continues until we find a new top creative. For reference, we only ever had about 5 such videos on Facebook in 2 years since we launched the game, out of around 200 videos we produced