# PCA

Latent Features

All those features describe the house. But they generally fall under 2 categories.

#How to select features

# Feature Selection

involves finding a subset of original features of your data

Methods: Filter Methods: use a ranking or sorting algorithm to select useful features. Some techniques are Pearson's Correlation, Linear Discriminant Analysis (LDA), and Analysis of Variance (ANOVA).

Wrapper Methods: Select features by directly testing performance on model. Common Examples of wrapper methods are Forward Search, Backward Search, and Recursive Feature Elimination.

# Feature Extraction

involves constructing new features called latent features

Techniques involded Principal Component Analysis, Independent Component Anlaysis , Random Projection.

Ref:

# Dimensionality Introduction

PCA are linear combination of the original features in a dataset. PCA is a latent feature

properties of PCA:

each component captures the largest amount of variance left in the data So, PCA will choose the image in the right
components are othoganl to each other

An eigenvalue is the same as the amount of variability captured by a principal component, and an eigenvector is the principal component itself.

# Scikit-learn

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

#instantiate
clf = RandomForestClassifier(n_estimators=100, max_depth=None)

#fit
clf.fit(X_train, y_train)

#predict
y_preds = clf.predict(X_test)



X = StandardScaler().fit_transform(data)
pca = PCA(n_components)
X_pca = pca.fit_transform(X)

pcs.explained_variance_ratio_
#Percentage of variance explained by each of the selected components.

# Visual

Converting 2D to 1D Here we find the best line

Linear Transformations

PCA with eigen vectors and eigen values

PCA Final visualization: 5D is projected to 2D

Other Notes:

PCA components tells us for each component, how much of the original feature is part of

Scree Plot tells how much variance is explained by each component.

Ref:

https://www.youtube.com/watch?v=HH8pouRwphA
https://www.youtube.com/watch?v=PFDu9oVAE-g
https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

← Other Methods Singular Value Decomposition (SVD) →