One of the important aspects of machine learning is to determine which features are important and which features can be considered as redundant. This is especially important when the data set has hundreds or even thousands of features. This process is often referred to as feature selection.
There is a plethora of methods that is employed for feature selection (i.e., this article titled An Introduction to Variable and Feature Selection provides a nice overview, and the series of posts on this website are supplemented with Python code).
An important goal in feature selection is feature ranking. The
scikit-learn package provides a versatile function called
RFE to come up with a ranking of the features for a given model by recursively eliminating the most redundant feature(s). The following paragraph is from the official description of the
Feature ranking with recursive feature elimination.
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.
The critical information regarding
RFE is mentioned in its description for the
estimator : object
A supervised learning estimator with a fit method that updates a coef attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef array.
In other words,
RFE works only if:
- There is a
coef_ attribute that is provided by the
- Important features correspond to the high absolute values in the
In this post we study the impact of scaling on how the
RFE works since the
coef_ attribute might directly be affected by the scaling method applied to the input feature data.