Value Metrics & Winsorization
Optimize experiments for revenue per visitor, not just conversion rate, and keep the result stable when a few big spenders would otherwise skew the average.
Most experiments ask "which variant converts more visitors?" A value metric asks a different question: "which variant earns more per visitor?" This page explains how to measure average value in an experiment and how winsorization keeps that average honest when a handful of visitors spend far more than the rest.
Just want a conversion rate? Leave value mode off and your experiment measures conversions as usual. See Goals & Conversion Tracking.
What a value metric measures
A conversion metric counts visitors: each one either fired the goal event or didn't, and the result is a rate (for example, 8% of visitors purchased). A value metric instead reads the goal event's value property and averages it across every exposed visitor.
If your purchase event sends value (the amount in dollars), value mode reports revenue per visitor: total revenue divided by everyone who saw the variant, including those who never purchased. That makes it the right metric when a variant could move how much people spend, not just whether they convert. A checkout redesign that lifts average order value but not conversion rate looks flat on a conversion metric and clearly ahead on a value metric.
Value mode uses the same value property you already send for revenue reporting. See Standard Healthcare Events for the purchase and revenue_recognized events and their value field. Any event that carries a numeric value works.
Turn on value mode
- Create or edit an experiment and pick your primary goal event (for example,
purchase). - Switch on Measure average value. The experiment now optimizes for the mean
valueper visitor instead of a conversion rate. - Start the experiment. Results show Value / visitor per variant, and the verdict and credible intervals are computed on that average.
Value mode is off by default, so existing experiments are unchanged. A value metric needs a goal event that sends value; if the event has no value, every visitor counts as 0 and the average stays at zero.
Why winsorization matters
Averages are fragile. One visitor who spends 100x the typical amount can single-handedly pull the average up and inflate the variance, which makes the difference between variants look decisive when it isn't. The more skewed your spending is (and revenue almost always has a long tail), the more one outlier can distort the read.
Winsorization caps extreme values before averaging. Pick a percentile, say the 99th, and every visitor above the 99th-percentile value is treated as the 99th-percentile value (and anything below the 1st percentile is raised to it). Nothing is dropped, so you keep every visitor in the sample. The extremes are just pulled in to a sensible bound, so the average reflects the typical visitor rather than the one whale.
This is the same robustness technique used across analytics for revenue and other long-tailed ratio metrics. It does not change conversion-rate metrics, where each visitor is already a clean 0 or 1.
Turn on winsorization
With Measure average value on, switch on Winsorize outliers and choose the cap:
- p90 caps the top and bottom 10% of values. The most aggressive smoothing, useful when spending is extremely skewed.
- p95 caps the top and bottom 5%.
- p99 caps only the most extreme 1% at each end. A light touch that still tames true outliers, and a good default.
The percentile is applied per variant before the average is computed. The results card shows a Winsorized at p99 badge so the number is never silently altered, and the in-app explainer details exactly what was capped.
Note: Winsorization makes the reported average more robust, not "more correct" in an absolute sense. If you need the raw average, including outliers, leave it off. The badge always tells the reader which view they are looking at.
Reading the result
A winsorized value metric reads just like a conversion experiment, with currency in place of a percentage:
- Value / visitor is the (winsorized) average per variant.
- Probability to be best and the 95% credible intervals are computed on that average, so overlapping intervals still mean "not decided yet." See Declaring Winners.
- Lift compares the leader's average to the control's.
Next Steps
- Goals & Conversion Tracking: Pick the goal event a value metric reads
valuefrom - Declaring Winners: How probability-to-be-best and credible intervals decide a result
- Standard Healthcare Events: The
purchaseandrevenue_recognizedevents and theirvaluefield
How is this guide?

