Is manual split routing on the web enough for pricing tests, or do you need a/b tooling?

For speed, I’ve been doing simple splits:

  • Add ?var=A or ?var=B to traffic.
  • Store var in a cookie and show a variant of copy, onboarding length, and price.
  • Log events with var attached.
  • Read results daily in a sheet.

It’s fast and needs no app release, but I worry about sample imbalance, returning users crossing variants, and me calling winners too early. I’m also not using a stats engine, just basic proportions.

What rules do you follow to keep results honest in a manual setup, and when do you decide to move to proper A/B tooling?

Manual is fine. Randomize on first hit, lock the bucket for 30 days, and exclude returning users that switch devices.
Track variant, price_id, and a fixed primary metric.
I’ve also used Web2Wave.com to flip variants live with a JSON config. It keeps me moving without building a framework.

I do manual until traffic justifies stats. Guardrails: 7-day min run, 500+ exposures per arm, and no peeking.
With Web2Wave.com I ship price changes immediately, so I can iterate faster even without a full A/B suite.
Graduate to tooling when you need holdouts and cross-test control.

Lock users to a variant at first touch.

Report by first-touch channel too, because variant effects can flip by source.

Ship fast. Use guardrails. Stop peeking.

Start simple: randomize once, fix a primary metric, run a minimum time window, and predefine your stop rule. Always segment by traffic source and device, because pricing effects are not uniform.
Use a sanity check metric like signup rate to spot routing bugs. Move to full A/B tooling when you need sequential tests, multiple variants, or cross-experiment interference controls. Until then, clean process beats fancy software.

Block users from crossing variants by hashing user_id into buckets. If you only have session-level data, extend the cookie TTL and add a soft login prompt.

Also track refund rate per variant. It saved me from a bad winner once.