# PCA
Latent Features
All those features describe the house. But they generally fall under 2 categories.

#How to select features
# Feature Selection
involves finding a subset of original features of your data

Methods: Filter Methods: use a ranking or sorting algorithm to select useful features. Some techniques are Pearson's Correlation, Linear Discriminant Analysis (LDA), and Analysis of Variance (ANOVA).
Wrapper Methods:
Select features by directly testing performance on model.
Common Examples of wrapper methods are Forward Search, Backward Search, and Recursive Feature Elimination.
# Feature Extraction
involves constructing new features called latent features

Techniques involded Principal Component Analysis, Independent Component Anlaysis , Random Projection.
Ref:
# Dimensionality Introduction
PCA are linear combination of the original features in a dataset. PCA is a latent feature

properties of PCA:
- each component captures the largest amount of variance left in the data
So, PCA will choose the image in the right - components are othoganl to each other

An eigenvalue is the same as the amount of variability captured by a principal component, and an eigenvector is the principal component itself.

# Scikit-learn
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
#instantiate
clf = RandomForestClassifier(n_estimators=100, max_depth=None)
#fit
clf.fit(X_train, y_train)
#predict
y_preds = clf.predict(X_test)
X = StandardScaler().fit_transform(data)
pca = PCA(n_components)
X_pca = pca.fit_transform(X)
pcs.explained_variance_ratio_
#Percentage of variance explained by each of the selected components.
# Visual
Converting 2D to 1D
Here we find the best line






Linear Transformations

PCA with eigen vectors and eigen values

PCA Final visualization:
5D is projected to 2D

Other Notes:
- PCA components tells us for each component, how much of the original feature is part of
Scree Plot tells how much variance is explained by each component.

Ref:
- https://www.youtube.com/watch?v=HH8pouRwphA
- https://www.youtube.com/watch?v=PFDu9oVAE-g
- https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
- http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
- https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues