# Types of Testing

# Frequentist AB Testing

Steps:

From a Hypothesis a. Replace a user experience with another b. dependent variable selection
- incremental profit or revenue
- number / rate or probability of ads clicks
- listening /screen time c. directionality of dependenct variables: anticipate multiple changes d. experiment participants
- country, new users/longtime users , webapp
Setup a. control group in which the experience is unchanged b. determine if treatment should replace control
Comparision with baseline a. what are the baseline numbers b. minimum detectable change

smallest effect that can be measured
what is the practical significance boundary c. Power
percent of time minimum detectable change is found assuming it exists d. Significance
percent of time minimum detectalbe change is found assuming it doesn't exist e. Sample size

Null Hypothesis

is 8% better than 7%
how likely would an 8% appear by chance
pvalue of usually 0.05

Result Extrapolation a. through time (seasonality) b. through population
Change Effect a. does the experiemnt introduce a novelty effect; b. uplift in interactions could be due to that c. users could also be averse to making the new change

AATest

run a test against itself
difference should be stat insgnificant
verify A/B testing tool: sample bias, incorrect analysis process

AB Test Setup: When a user enters our site, we have this information to track them.

Cookie
UserId
Device Id
IP

A user can be part of multiple experiments at the same time.

Tools:

optimizely
google optimize
facebook plan out

# Bayes AB Testing

Notes:

easier to interpret results
often fewer samples to reach lauch decision

Tools:

Visual Web Optimizer

# Multi Armed Bandit

Explore/Exploit Trade-Off

we could explore the less promising treatmet but miss a potentially better control
we could exploit the potentially better control but miss an eventually better treatment

Expsilon Greed Strategy

Reward : the outcome of allocating a user to a particular experience (A/B) Regret: reward obtained from optimal arm minus the reawrd from the arm chosen

algo multiarmed bandit

Thomson Sampling:

the frequency a user should be allocated to an experience should equal the probability of that experience being optimal.

# Multi Artmed Bandit vs Traditional AB Testing

MAB (pros)

many arms to test
move traffic automatically to best arm
short term result
focus on optimizing MAB (cons)
longer experiment

few arms
good when results are needed long teerm
focus on learning
high regret

# Tools

optimizely
visual web optimizer
vowpal Wabbit

← Nvidia Merlin Impact Estimation →