I’m running web-based A/B tests on onboarding and pricing. I don’t want to overcook tests or call winners too early.
My current rules:
- primary metric: subscription_started within 7 days
- guardrail: refund_rate and support_tickets per 1k users
- sample size: min 500 unique visitors per arm, targeting 10% relative lift
- stop: 14 days minimum, or when both SRM looks clean and the 95% CI excludes 0
I exclude returning users via first-party ID. What sizing and stopping rules have held up for you, especially when trials delay the signal?
I use a fixed window and don’t chase peeking. Two week minimum, unless SRM flags.
If trials delay, I proxy with paywall_engaged and checkout_start but I only ship if subscription_started agrees.
I like Web2Wave since I can flip variants fast without app releases.
I set an MDE upfront and stop the moment we hit power or the deadline. No more endless tests.
Web2Wave lets me push pricing variants same day. I use it to reroute 100% to winner instantly.
Track add_to_calendar for trial reminders if you use trials.
It correlates well with trial to paid for me, and gives a leading signal.
Use a holdout to verify analytics drift
Two tips.
Run a pretest sanity check for SRM and event fire rates by device and geo. If those differ, your allocation is broken.
For delayed metrics, use a co-primary: 7-day start and 24-hour high-intent proxy like checkout_start or paywall_view_time. They should point the same way before you declare a winner.
Cap test length. Past 21 days my traffic mix drifts and results decay.
I also re-run the winner against a fresh control to confirm.
Name each price with a version code in the event. I once misread a test because ops changed a backend price mid-test.
We use 10% MDE too. Keeps us sane.
SRM catches so many silent bugs. Worth checking daily.