# Intro to Neural Networks

# Perceptron

# Intro

Perceptorn is one of the simplest Artificail Neural Netowrks.

percepton Source: Hands on Machine Learning

Common Step Functions: percepton Source: Hands on Machine Learning

Perceptrons do not output a class probability like Logistic Regression.

# Perceptron Issues

Single Layer Perceptron can not solve the XOR problem. So, people lost interest.

Multiple Layers of network needed to replace the step function with something like sigmoid/relu/tan ; because it is differentiable.

# Sample Networks

# image classification

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])

history = model.fit(X_train, y_train, epochs=30,
                     validation_data=(X_valid, y_valid))


y_proba = model.predict(X_new)

# Regression Network

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
    keras.layers.Dense(1)
])
model.compile(loss="mean_squared_error", optimizer="sgd")
history = model.fit(X_train, y_train, epochs=20,
                    validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3] # pretend these are new instances
y_pred = model.predict(X_new)

Differences from classification:

  • only one output node
  • no activation function
  • loss is mean squared error

# Wide and Deep Netowrk

Wide and Deep Netowrk

Input goes through hidden layers and to the output layer. Allows the network to learn deep patterns and simple rules.

input_ = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation="relu")(input_)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
concat = keras.layers.Concatenate()([input_, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.Model(inputs=[input_], outputs=[output])

u can also use sublassing to represent model

class WideAndDeepModel(keras.Model):
    def __init__(self, units=30, activation="relu", **kwargs):
        super().__init__(**kwargs) # handles standard args (e.g., name)
        self.hidden1 = keras.layers.Dense(units, activation=activation)
        self.hidden2 = keras.layers.Dense(units, activation=activation)
        self.main_output = keras.layers.Dense(1)
        self.aux_output = keras.layers.Dense(1)

    def call(self, inputs):
        input_A, input_B = inputs
        hidden1 = self.hidden1(input_B)
        hidden2 = self.hidden2(hidden1)
        concat = keras.layers.concatenate([input_A, hidden2])
        main_output = self.main_output(concat)
        aux_output = self.aux_output(hidden2)
        return main_output, aux_output

model = WideAndDeepModel()

# Persisting

model = keras.models.Sequential([...]) # or keras.Model([...])
model.compile([...])
model.fit([...])
model.save("my_keras_model.h5")

model = keras.models.load_model("my_keras_model.h5")

# Callback

checkpoint_cb = keras.callbacks.ModelCheckpoint("my_keras_model.h5",
                                                save_best_only=True)
early_stopping_cb = keras.callbacks.EarlyStopping(patience=10,
                                                  restore_best_weights=True)

history = model.fit(...,
                    callbacks=[checkpoint_cb,early_stopping_cb])

With EarlyStopping,you can set the number of epochs high and let the network stop , when after patient epochs , it dhoesn't improve.

# Other Notes

Loss:

  • sparse_categorical_crossentropy is when the label is 0-n and not one hot encoded.
  • categorical_crossentropy is when the label is one hot encoeded.
  • binary_crossentropy binary classification or multi label

Activation:

  • softmax : classification
  • sigmoid: binary or multilabel classification

Training Error is computed using a running mean during each epoch while validation error is computed at end of epoch.