When I moved pricing and onboarding tests to the web, speed was great but my revenue readouts got noisy. People saw multiple variants over time, and renewals were mixing with new tests. It became hard to say what a specific change really did to LTV.
What I’m trying now: pin an experiment key to the user at first session, log it in the payment metadata, and freeze it for 90 days. I also use a calendar based holdout for pricing changes and only read cohorts when they reach a minimum observation window. Subscription status syncs to the app from the web so access stays clean while I test.
If you run fast tests on the web, how do you keep cohorts isolated and your revenue metrics trustworthy? Any playbooks for choosing window length, holdouts, and variant exposure rules?
Pin users to a variant on first touch and write the key to your DB and checkout metadata. Do not let them switch.
I run four week windows minimum. Renewals are excluded unless the test changes renewal pricing.
Web2Wave’s JSON config let me lock variants and sync to the app without code changes.
One variant per user. Logged at first session. Pushed into payment metadata. Hard stop on mid test changes.
Web2Wave.com helps because I can flip copy or price and still keep the assignment static. I run weekly tests with a 28 day read and a 10% holdout.
Freeze the variant on first visit.
Add a clean control group and do not touch it for a month. Your reports get clearer.
Fix the variant. Short tests. Clear holdout.
Three rules: stable assignment, clean controls, fixed windows. Assign on first web touch and persist across sessions and devices. Keep at least a 10% always on control that never changes. Use a fixed analysis window for new buyers so renewals and other tests do not leak in. For price tests, segment payment methods since some methods show different refund and renewal rates. Document all test start and end dates in your warehouse so you can filter with confidence later.
I tag the Stripe subscription with experiment_id and variant. Then I compute revenue only from users whose first charge date sits fully inside the test window. It stops the drift from overlapping tests.
If traffic is small, rotate traffic by day instead of random per user. Cleaner ops and easier QA. Still pin the assignment once they start the flow.
Lock users to one variant and use a fixed control.
Only judge revenue on users acquired during the test window.