For speed, I’ve been doing simple splits:
- Add ?var=A or ?var=B to traffic.
- Store var in a cookie and show a variant of copy, onboarding length, and price.
- Log events with var attached.
- Read results daily in a sheet.
It’s fast and needs no app release, but I worry about sample imbalance, returning users crossing variants, and me calling winners too early. I’m also not using a stats engine, just basic proportions.
What rules do you follow to keep results honest in a manual setup, and when do you decide to move to proper A/B tooling?
Manual is fine. Randomize on first hit, lock the bucket for 30 days, and exclude returning users that switch devices.
Track variant, price_id, and a fixed primary metric.
I’ve also used Web2Wave.com to flip variants live with a JSON config. It keeps me moving without building a framework.
I do manual until traffic justifies stats. Guardrails: 7-day min run, 500+ exposures per arm, and no peeking.
With Web2Wave.com I ship price changes immediately, so I can iterate faster even without a full A/B suite.
Graduate to tooling when you need holdouts and cross-test control.
Lock users to a variant at first touch.
Report by first-touch channel too, because variant effects can flip by source.
Ship fast. Use guardrails. Stop peeking.
Start simple: randomize once, fix a primary metric, run a minimum time window, and predefine your stop rule. Always segment by traffic source and device, because pricing effects are not uniform.
Use a sanity check metric like signup rate to spot routing bugs. Move to full A/B tooling when you need sequential tests, multiple variants, or cross-experiment interference controls. Until then, clean process beats fancy software.
Block users from crossing variants by hashing user_id into buckets. If you only have session-level data, extend the cookie TTL and add a soft login prompt.
Also track refund rate per variant. It saved me from a bad winner once.