Scope
This note analyzes how missing data in A/B tests biases conversion rate estimates and distorts population segmentation, and provides correction methods and delivery optimization recommendations. It distinguishes missingness mechanisms, diagnostic steps, estimators, and operational instrumentation.
Missingness mechanisms
- MCAR: Missingness independent of treatment, covariates, and outcome. Effects: unbiased estimates, inflated variance.
- MAR: Missingness depends on observed covariates and/or treatment assignment. Effects: biased unless corrected with covariate-aware methods.
- MNAR: Missingness depends on the (unobserved) outcome itself or latent traits. Effects: bias persists unless bounded or modeled with strong assumptions.
How missingness biases conversion rate
- Outcome loss (conversion events missing):
- If converters are more likely to be logged (e.g., server-side payment acknowledged) than non-converters, observed conversion rate inflates. The reverse deflates.
- If missingness differs by arm (e.g., variant B induces more late conversions but the window truncates these), arm comparisons are biased.
- Exposure loss (assignment or exposure events missing):
- Sample selection shifts the composition of observed users. If specific devices or geos with distinct baseline conversion rates are underlogged more in one arm, observed conversion rate differences reflect selection, not treatment.
- Identity instability and reassignment:
- Cookie deletion/ad blockers can cause users to be reassigned to different arms on subsequent visits, contaminating treatment and biasing per-protocol metrics. Late conversions may be attributed to a different arm, causing asymmetric outcome loss.
- Timing and attribution windows:
- Delayed conversions outside the observation window produce downward bias. If the variant changes latency, missingness becomes differential.
Population segmentation distortion
- Missing covariates used for stratification (device, geo, tenure, acquisition source) lead to:
- Complete-case analysis over-representing segments with lower missingness.
- Misclassification of segments when proxy features are partially observed.
- Simpson’s paradox risks: overall effect differs from within-segment effects due to shifted segment weights.
- Segment-level arm imbalance:
- Differential missingness by arm within segments breaks effective randomization at the analysis layer. Observed segment proportions differ from randomized proportions, skewing segment lift estimates.
Diagnostics and quantification
- Measure missingness rates:
- Exposure missingness: fraction of randomized users lacking an exposure log.
- Outcome missingness: fraction of exposed users with no outcome log within window.
- Break down by arm, segment, device, browser, region, acquisition source, and time.
- Model missingness:
- Fit a logistic model for missing indicators using arm and covariates. Test arm coefficients to detect differential missingness.
- Randomization checks:
- Compare covariate distributions by arm among the observed sample versus the randomized population. Compute standardized mean differences; large differences signal selection.
- Sensitivity/bounding:
- Compute extreme bounds assuming all missing outcomes are non-converts versus converts. Report whether conclusions are robust under conservative assumptions.
- Tipping-point analysis: identify the unobserved conversion rate (or arm differential) required to reverse the decision.
- Cross-source reconciliation:
- Compare client-side versus server-side conversions; assess gap and heterogeneity by arm and segment.
- Lag analysis:
- Estimate conversion time distribution; quantify window-induced outcome loss and whether it differs by arm.
Correction methods
- Intent-to-treat analysis:
- Base inference on randomized assignment, not detected exposure. Reduces selection bias from exposure logging loss and reassignment.
- Inverse probability weighting (IPW):
- Estimate the probability that an outcome is observed given covariates and arm.
- Weight observed outcomes by the inverse of this probability to recover population estimates.
- Use robust standard errors or bootstrap for variance.
- Doubly robust estimation (AIPW):
- Combine an outcome model (e.g., conversion probability as a function of arm and covariates) with the missingness model. Consistent if either is correctly specified.
- Multiple imputation:
- Impute missing outcomes using a model that includes treatment, covariates, and interactions. Pool estimates across imputations. Ensure Rubin’s rules are applied for variance.
- Post-stratification/raking:
- For segmentation distortion, calibrate weights so the observed sample matches known population segment totals (e.g., device and geo distributions). Use the weighted arm comparison.
- Bounds for monotone attrition:
- If missingness only removes observations (no false additions), compute monotone attrition bounds (Lee bounds) for the treatment effect.
- Adjust attribution window:
- Extend or harmonize the conversion window across arms to capture delayed conversions. Apply survival-adjusted estimators if extension is impractical.
Design and instrumentation improvements
- Server-side ground truth:
- Log assignment and conversions on the server where possible. Use payment/order ledger as the primary conversion source; use client-side only as secondary telemetry.
- Stable identifiers:
- Use durable user IDs and assignment caching to prevent reassignment due to cookie loss. Implement cross-device identity linkage where compliant.
- Exposure-first logging:
- Log assignment before any consent-dependent or ad-blockable client code runs, while complying with privacy requirements.
- Heartbeat/event loss monitoring:
- Instrument non-outcome signals (page pings, diagnostics) to estimate event-loss. Alert on arm-differential loss.
- Unified pipeline:
- Ensure both arms use identical logging paths, tags, and schemas to avoid arm-specific telemetry drops.
- Time window consistency:
- Align observation windows and latency handling across arms. Record conversion timestamps to correct for truncation analytically.
Delivery optimization recommendations
- Decision under uncertainty:
- Use ITT and bounds. Ship only if the effect remains favorable across conservative bounds. If conclusions depend on imputation assumptions, postpone or run a follow-up with improved instrumentation.
- Traffic allocation:
- Prefer channels, devices, and regions with low and stable missingness for high-stakes tests. Throttle or separately analyze sources with high event loss.
- Weighted objectives:
- For ongoing optimization, use IPW/AIPW-corrected conversion rates rather than raw observed CVR to avoid optimizing to telemetry artifacts.
- Robust KPIs:
- Where feasible, optimize to server-confirmed outcomes (e.g., orders, payments) instead of client-side conversion proxies.
- Segment-level calibration:
- Apply raking/post-stratification so targeting decisions reflect true segment performance, not segment-specific logging gaps.
- Latency-aware strategies:
- If variants change conversion latency, extend attribution windows or use survival-adjusted metrics to avoid penalizing slower paths.
Operational checklist
- Quantify exposure and outcome missingness by arm and segment; test for differences.
- Build a missingness model and compute IPW/AIPW-corrected arm CVR and lift with robust variance.
- Report conservative bounds and a tipping-point analysis.
- Reweight segment analyses to match population composition; validate balance via standardized mean differences.
- Cross-validate client-side conversions against server-side ground truth.
- Implement instrumentation fixes and re-run critical tests if MNAR is suspected and bounds are inconclusive.
By applying these diagnostics and corrections, you reduce bias in conversion rate estimates and restore valid segment comparisons, enabling reliable A/B decisions and more accurate campaign optimization.