# Types of Testing

# Frequentist AB Testing

Steps:

  1. From a Hypothesis a. Replace a user experience with another b. dependent variable selection

    • incremental profit or revenue
    • number / rate or probability of ads clicks
    • listening /screen time c. directionality of dependenct variables: anticipate multiple changes d. experiment participants
    • country, new users/longtime users , webapp
  2. Setup a. control group in which the experience is unchanged b. determine if treatment should replace control

  3. Comparision with baseline a. what are the baseline numbers b. minimum detectable change

  • smallest effect that can be measured
  • what is the practical significance boundary c. Power
  • percent of time minimum detectable change is found assuming it exists d. Significance
  • percent of time minimum detectalbe change is found assuming it doesn't exist e. Sample size
  1. Null Hypothesis
  • is 8% better than 7%
  • how likely would an 8% appear by chance
  • pvalue of usually 0.05
  1. Result Extrapolation a. through time (seasonality) b. through population

  2. Change Effect a. does the experiemnt introduce a novelty effect; b. uplift in interactions could be due to that c. users could also be averse to making the new change

AATest

  • run a test against itself
  • difference should be stat insgnificant
  • verify A/B testing tool: sample bias, incorrect analysis process

AB Test Setup: When a user enters our site, we have this information to track them.

  • Cookie
  • UserId
  • Device Id
  • IP

A user can be part of multiple experiments at the same time.

Tools:

  • optimizely
  • google optimize
  • facebook plan out

# Bayes AB Testing

Notes:

  • easier to interpret results
  • often fewer samples to reach lauch decision

Tools:

  • Visual Web Optimizer

# Multi Armed Bandit

Explore/Exploit Trade-Off

  • we could explore the less promising treatmet but miss a potentially better control
  • we could exploit the potentially better control but miss an eventually better treatment

Expsilon Greed Strategy

Reward : the outcome of allocating a user to a particular experience (A/B) Regret: reward obtained from optimal arm minus the reawrd from the arm chosen

algo multiarmed bandit

Thomson Sampling:

  • the frequency a user should be allocated to an experience should equal the probability of that experience being optimal.

# Multi Artmed Bandit vs Traditional AB Testing

MAB (pros)

  • many arms to test
  • move traffic automatically to best arm
  • short term result
  • focus on optimizing MAB (cons)
  • longer experiment

AB

  • few arms
  • good when results are needed long teerm
  • focus on learning
  • high regret

# Tools

  • optimizely
  • visual web optimizer
  • vowpal Wabbit