# Error Function

Error function

Trying to reach the bottom of a cliff

What if our function hits a plateau. Where either direction returns the same error. So that is why we want our error function to be continous

Discrete vs Continous

Discrete vs Continous Activation Function

Converting predictions using softmax

sigmoid

Softmax normalizes output to 0 -1

softmax

We use softmax because output could be negative which could lead to division by zero

Sigmoid is a simple case of Softmax

# Maximum Likelihood

P(blue) = sigmoid (Wx+b)

maximum_likelihood

The left image is less likely to occur.

Maximizing the porbability is the smae as minimizing the loss.

One problem with multiplying is that the numbers get too small.

The solution is cross entropy.

error_scenario

In Cross entropy, we only consider the true class. So y_3=0, so we consider the inverse probability

cross_entropy_formula

Multi Class Entropy

multi_class_entropy

Is an extension of binary cross entropy. We only consider cases for the actual calss.

m = number of samples n = nuber of classes

multi_class_entropy

error_function_expanded