Logo

The Data Daily

A Practical Example in Python

 A Practical Example in Python

KDnuggets Home » News » 2018 » Jun » Tutorials, Overviews » Step Forward Feature Selection: A Practical Example in Python (  18:n24  )
Step Forward Feature Selection: A Practical Example in Python
Tags: Feature Selection , Machine Learning , Python
When it comes to disciplined approaches to feature selection, wrapper methods are those which marry the feature selection process to the type of model being built, evaluating feature subsets in order to detect the model performance between features, and subsequently select the best performing subset.
comments
Many methods for feature selection exist, some of which treat the process strictly as an artform, others as a science, while, in reality, some form of domain knowledge along with a disciplined approach are likely your best bet.
When it comes to disciplined approaches to feature selection, wrapper methods are those which marry the feature selection process to the type of model being built, evaluating feature subsets in order to detect the model performance between features, and subsequently select the best performing subset. In other words, instead of existing as an independent process taking place prior to model building, wrapper methods attempt to optimize feature selection process for a given machine learning algorithm in tandem with this algorithm.
2 prominent wrapper methods for feature selection are step forward feature selection and step backward features selection.
Image source
Step forward feature selection starts with the evaluation of each individual feature, and selects that which results in the best performing selected algorithm model. What's the "best?" That depends entirely on the defined evaluation criteria (AUC, prediction accuracy, RMSE, etc.). Next, all possible combinations of the that selected feature and a subsequent feature are evaluated, and a second feature is selected, and so on, until the required predefined number of features is selected.
Step backward feature selection is closely related, and as you may have guessed starts with the entire set of features and works backward from there, removing features to find the optimal subset of a predefined size.
These are both potentially very computationally expensive. Do you have a large, multidimensional dataset? These methods may take too long to be at all useful, or may be totally infeasible. That said, with a dataset of accommodating size and dimensionality, such an approach may well be your best possible approach.
To see how they work, let's take a look at step forward feature selection, specifically. Note that, as discussed, a machine learning algorithm must be defined prior to beginning our symbiotic feature selection process.
Keep in mind that an optimized set of selected features using a given algorithm may or may not perform equally well with a different algorithm. If we select features using logistic regression, for example, there is no guarantee that these same features will perform optimally if we then tried them out using K-nearest neighbors, or an SVM.
 
Implementing Feature Selection and Building a Model
 
So, how do we perform step forward feature selection in Python? Sebastian Raschka's mlxtend library includes an implementation ( Sequential Feature Selector ), and so we will use it to demonstrate. It goes without saying that you should have mlxtend installed before moving forward (check the Github repo).
We will use a Random Forest classifier for feature selection and model building (which, again, are intimately related in the case of step forward feature selection).
We need data to use for demonstration, so let's use the wine quality dataset . Specifically, I have used the untouched winequality-white.csv file as input in the code below.
Arbitrarily, we will set the desired number of features to 5 (there are 12 in the dataset). What we are able to do is compare the evaluation scores for each iteration of the feature selection process, and so keep in mind that if we find that a lower number of features has a better score we can alternatively choose that best-performing subset to run with in our "live" model moving forward. Also keep in mind that setting our desired number of features too low could lead to a sub-optimal number and combination of features being decided upon (say, if some combination of 11 features in our case is better than the best combination of

Images Powered by Shutterstock