Add wrapper methods random and forward feature selection #30

be-marc · 2019-06-10T09:50:43Z

fixes #24

This is a basic implementation of mlr’s makeFeatSelControlRandom and makeFeatSelControlSequential. The overall design is similar to mlr3tuning. I used many descriptions and some parts of the code from this package.

Classes

FeatureSelection*

Implements the generate_states method that generates different feature combinations (states) in a 0-1 encoding.
- For FeatureSelectionRandom n combinations are generated depending on the batch_size.
- For FeatureSelectionForward all combinations of one step are generated.

PerformanceEvaluator

Implements the evaluate_states method that takes the states as an argument. For each state, the task with all features is cloned and a selection is applied based on the encoding of the state. All states are evaluated with mlr3::benchmark.
For each call of evaluate_states, the states are stored in a list entry in self$states.
For each call of evaluate_states, the benchmark object is stored in a list entry in self$bmr.
The storing in list entries is necessary so that FeatureSelectionForward is able to generate the path of the stepwise selection.

Terminator

Works similar to the Terminator class in mlr3tuning.
TerminatorPerformanceStep is specially designed to work with FeatureSelectionForward. It compares the last two chosen states and terminates if the performance improvement is under a certain threshold.

Discussion

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.
Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?
max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward.

be-marc · 2019-06-11T09:38:17Z

Example FeatureSelectionRandom + TerminatorEvaluations

# Specify the task
task = mlr_tasks$get("boston_housing")

# Define the learner
learner = mlr_learners$get("regr.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorEvaluations$new(max_evaluations = 10)

# Specify wrapper method
fs = FeatureSelectionRandom$new(pe = pe,
                                tm = tm,
                                batch_size = 10,
                                max_features = 8)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

be-marc · 2019-06-11T09:42:39Z

FeatureSelectionForward + TerminatorPerformanceStep

# Specify the task
task = mlr_tasks$get("pima")

# Change measure
measures = mlr_measures$mget(c("classif.acc"))
task$measures = measures

# Define the learner
learner = mlr_learners$get("classif.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorPerformanceStep$new(threshold = 0.01)

# Specify wrapper method
fs = FeatureSelectionForward$new(pe = pe, tm = tm)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

# Get optimization path
fs$get_optimization_path()

pat-s · 2019-06-11T10:35:23Z

max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward

Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features is def. a termination criterion. I would also put it into TerminatorEvaluations and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward can stay "simple" with only pe and tm.

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.

Do these two incorporate all functionality of mlr::getFeatSelResult()?
If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz.
See also ?mlr::plotFilterValues().

Misc

Can you please use a branch of mlr3featsel for this PR instead of your fork? Makes it easier to checkout the branch and run the code.
Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?

What is the current behavior? 0/1 returns?
Thanks for the good work. Looks really good is a huge contribution.
Please add yourself into the DESCR of the package as an author.
Check the Travis errors
Please always add a "fixes XY" as the first line of the PR so things are cross-linked and get closed automatically
Tests are needed
Examples in the functions are needed

be-marc · 2019-06-20T17:37:33Z

Moved to #35

Add wrapper methods random and forward feature selection

750f025

pat-s added Priority: High Status: Accepted Status: In Progress Type: Enhancement labels Jun 11, 2019

pat-s self-assigned this Jun 11, 2019

pat-s added this to the v0.1 CRAN milestone Jun 11, 2019

pat-s mentioned this pull request Jun 18, 2019

Refactor Filter class #32

Merged

be-marc mentioned this pull request Jun 20, 2019

Add wrapper methods #35

Closed

be-marc closed this Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wrapper methods random and forward feature selection #30

Add wrapper methods random and forward feature selection #30

be-marc commented Jun 10, 2019 •

edited by pat-s

Loading

be-marc commented Jun 11, 2019 •

edited by pat-s

Loading

be-marc commented Jun 11, 2019 •

edited by pat-s

Loading

pat-s commented Jun 11, 2019 •

edited

Loading

be-marc commented Jun 20, 2019

Add wrapper methods random and forward feature selection #30

Add wrapper methods random and forward feature selection #30

Conversation

be-marc commented Jun 10, 2019 • edited by pat-s Loading

Classes

Discussion

be-marc commented Jun 11, 2019 • edited by pat-s Loading

Example FeatureSelectionRandom + TerminatorEvaluations

be-marc commented Jun 11, 2019 • edited by pat-s Loading

FeatureSelectionForward + TerminatorPerformanceStep

pat-s commented Jun 11, 2019 • edited Loading

Misc

be-marc commented Jun 20, 2019

be-marc commented Jun 10, 2019 •

edited by pat-s

Loading

be-marc commented Jun 11, 2019 •

edited by pat-s

Loading

be-marc commented Jun 11, 2019 •

edited by pat-s

Loading

pat-s commented Jun 11, 2019 •

edited

Loading