Supercharging A/B Testing at Uber

AI, artificial intelligence, data analytics, Machine Learning, Predictive Analytics
1812 Views

2 years ago
Supercharging A/B Testing at Uber

By: Sergey Gitlin, Krishna Puttaswamy, Luke Duncan, Deepak Bobbarjung, and Arun Babu A S P

Originally published in Uber Engineering, July 21, 2022.

Introduction

“Immensely laborious calculations on inferior data may increase the yield from 95 to 100 percent. A gain of 5 percent, of perhaps a small total. A competent overhauling of the process of collection, or of the experimental design, may often increase the yield ten- or twelve-fold, for the same cost in time and labor. To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. To utilize this kind of experience he must be induced to use his imagination, and to foresee in advance the difficulties and uncertainties with which, if they are not foreseen, his investigations will be beset.” (R. A. Fisher’s Presidential address to the 1st Indian Statistical Congress)

While the statistical underpinnings of A/B testing are a century old, building a correct and reliable A/B testing platform and culture at a large scale is still a massive challenge. Mirroring Fisher’s observation above, carefully constructing the building blocks of an A/B platform and ensuring the data collected is correct is critical to guaranteeing correctness of experiment results, but it’s easy to get wrong. Uber went through a similar journey and this blog post describes why and how we rebuilt the A/B testing platform we had at Uber.

Uber had an experimentation platform, called Morpheus, that was built 7+ years ago in the early days to do both feature flagging and A/B testing. Uber outgrew Morpheus significantly since then in terms of scale, users, use cases, etc.

In early 2020, we took a deeper look at this ecosystem. We discovered that a large percentage of the experiments had fatal problems and often needed to be rerun. Obtaining high-quality results required an expert-level understanding of experimentation and statistics, and an inordinate amount of toil (custom analysis, pipelining, etc.). This slowed down decision-making, and re-running poorly conducted experiments was common.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

2 years ago
Supercharging A/B Testing at Uber

Originally published in Uber Engineering, July 21, 2022.

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

2 years agoSupercharging A/B Testing at Uber

Originally published in Uber Engineering, July 21, 2022.

Recommended

This new forecasting model is better than machine learning, researchers say

Widespread machine learning methods behind ‘link prediction’ are performing very poorly, study shows

AI’s $600B Question

Google scrambles to manually remove weird AI answers in search

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

2 years ago
Supercharging A/B Testing at Uber

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact