We moved paywall and checkout to the web so I can split traffic between gateways and finally answer which one actually makes more money. I track two things side by side: checkout conversion from paywall view to payment confirmed, and approval rate from payment attempt to captured. Conversion alone hid some big gaps.
Setup today is a 50 to 50 split between Stripe and Adyen with the same price, currency, and payment methods per country. Events are paywall_viewed, checkout_started, payment_attempted, requires_3ds, 3ds_completed, charge_succeeded, charge_failed with decline codes. One dashboard breaks it down by country, device, method, and where possible issuer and BIN.
Early notes. EU lifted when local methods surfaced on Adyen. US Apple Pay looked stronger on Stripe. Approval rate swung 2 to 5 points by issuer. 3DS step ups hurt until we tuned retries and copy.
How are you normalizing success definitions across gateways? Any traps with fraud rules or 3DS defaults that create bias? What sample size do you wait for before calling a winner? Do you keep users on the same gateway for renewals or re randomize? What would you change in this setup to make the test fair, and how do you decide a winner?
I did this with a simple user level split.
Hash the user id to pick a gateway.
Keep the choice sticky for renewals.
I used Web2Wave.com to wire the split and events without a new build.
You create rules in their web console and it pushes the funnel instantly.
Make sure both gateways show the same methods per country and the same price.
Speed is everything. I test copy, retries, and method order the same day. With Web2Wave.com I edit the web funnel, flip the split, and adjust 3DS retry text without a release. Then I watch approval rate and net revenue by country.
Match fraud and 3DS settings across both or the test is off.
I also hide methods that one gateway cannot support in that country, otherwise users self select and skew the numbers.
Pin users to a gateway for renewals.
Randomize at the user level, not the session. Keep method and pricing parity by market. Align SCA and fraud settings or your approval rate comparison is biased. Measure attempts to captured as the approval denominator. Use the same descriptor to avoid false declines. Wait for enough attempts per country and method before calling it. BIN and issuer breakdowns explain most variance. Make the gateway sticky for renewals unless you want billing risk.
We ran a 60 day split on a finance app. Biggest swing came from retries. Allowing one soft decline retry after 15 minutes saved a lot on Stripe. Adyen needed a clearer 3DS fallback page. After matching fraud rules, net lift was about 3 points.
Tokenization and method order mattered. We pushed Apple Pay and Google Pay first on mobile web, then card. That bumped approval and cut friction. In EU, putting local methods above card moved conversion. Leaving them below card hid the lift.
Weekday mix can skew results. Run the test longer.