# Feature Importance
# Model Weights
Models like Linear / Logistic Regression give the features a coefficient-weight.
The higher the value, the more important the feature is.
# Impurity Measurement (Tree)
Imourity is qualified by the splitting criterion of decision trees (Gini , Entropy, Mean Squared Error ).
This method is strongly biased and favors high cardinality features such as numerical features.
# Permutation Importance
This method can be used for any model.
Can be used by
sklearn.inspection.permutation_importanc
Steps:
- calvulate a baseline metrics
- for each feature, shuffle it and calculate metric
- repeat step 2, couple of times ; calculate mean for each feature
- permutation importance is the difference from baselin
# Recursive Feature Eliminaation
Train a model on the entire set of features and compute coef_, feature_importances_. Prune the least important feature. Repeat until minimum number of features is hit.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True, False, False, False, False,
False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
support_ with True is features that are selected.
ranking_ where value is 1, that means feature is selected.