# Types of Testing
# Frequentist AB Testing
Steps:
From a Hypothesis a. Replace a user experience with another b. dependent variable selection
- incremental profit or revenue
- number / rate or probability of ads clicks
- listening /screen time c. directionality of dependenct variables: anticipate multiple changes d. experiment participants
- country, new users/longtime users , webapp
Setup a. control group in which the experience is unchanged b. determine if treatment should replace control
Comparision with baseline a. what are the baseline numbers b. minimum detectable change
- smallest effect that can be measured
- what is the practical significance boundary c. Power
- percent of time minimum detectable change is found assuming it exists d. Significance
- percent of time minimum detectalbe change is found assuming it doesn't exist e. Sample size
- Null Hypothesis
- is 8% better than 7%
- how likely would an 8% appear by chance
- pvalue of usually 0.05
Result Extrapolation a. through time (seasonality) b. through population
Change Effect a. does the experiemnt introduce a novelty effect; b. uplift in interactions could be due to that c. users could also be averse to making the new change
AATest
- run a test against itself
- difference should be stat insgnificant
- verify A/B testing tool: sample bias, incorrect analysis process
AB Test Setup: When a user enters our site, we have this information to track them.
- Cookie
- UserId
- Device Id
- IP
A user can be part of multiple experiments at the same time.
Tools:
- optimizely
- google optimize
- facebook plan out
# Bayes AB Testing
Notes:
- easier to interpret results
- often fewer samples to reach lauch decision
Tools:
- Visual Web Optimizer
# Multi Armed Bandit
Explore/Exploit Trade-Off
- we could explore the less promising treatmet but miss a potentially better control
- we could exploit the potentially better control but miss an eventually better treatment
Expsilon Greed Strategy
Reward : the outcome of allocating a user to a particular experience (A/B) Regret: reward obtained from optimal arm minus the reawrd from the arm chosen

Thomson Sampling:
- the frequency a user should be allocated to an experience should equal the probability of that experience being optimal.
# Multi Artmed Bandit vs Traditional AB Testing
MAB (pros)
- many arms to test
- move traffic automatically to best arm
- short term result
- focus on optimizing MAB (cons)
- longer experiment
AB
- few arms
- good when results are needed long teerm
- focus on learning
- high regret
# Tools
- optimizely
- visual web optimizer
- vowpal Wabbit