Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrapper methods random and forward feature selection #30

Closed
wants to merge 1 commit into from
Closed

Add wrapper methods random and forward feature selection #30

wants to merge 1 commit into from

Conversation

be-marc
Copy link
Sponsor Member

@be-marc be-marc commented Jun 10, 2019

fixes #24

This is a basic implementation of mlr’s makeFeatSelControlRandom and makeFeatSelControlSequential. The overall design is similar to mlr3tuning. I used many descriptions and some parts of the code from this package.

Classes

FeatureSelection*

  • Implements the generate_states method that generates different feature combinations (states) in a 0-1 encoding.
    • For FeatureSelectionRandom n combinations are generated depending on the batch_size.
    • For FeatureSelectionForward all combinations of one step are generated.

PerformanceEvaluator

  • Implements the evaluate_states method that takes the states as an argument. For each state, the task with all features is cloned and a selection is applied based on the encoding of the state. All states are evaluated with mlr3::benchmark.
  • For each call of evaluate_states, the states are stored in a list entry in self$states.
  • For each call of evaluate_states, the benchmark object is stored in a list entry in self$bmr.
  • The storing in list entries is necessary so that FeatureSelectionForward is able to generate the path of the stepwise selection.

Terminator

  • Works similar to the Terminator class in mlr3tuning.
  • TerminatorPerformanceStep is specially designed to work with FeatureSelectionForward. It compares the last two chosen states and terminates if the performance improvement is under a certain threshold.

Discussion

  • We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.
  • Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?
  • max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward.

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 11, 2019

Example FeatureSelectionRandom + TerminatorEvaluations

# Specify the task
task = mlr_tasks$get("boston_housing")

# Define the learner
learner = mlr_learners$get("regr.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorEvaluations$new(max_evaluations = 10)

# Specify wrapper method
fs = FeatureSelectionRandom$new(pe = pe,
                                tm = tm,
                                batch_size = 10,
                                max_features = 8)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 11, 2019

FeatureSelectionForward + TerminatorPerformanceStep

# Specify the task
task = mlr_tasks$get("pima")

# Change measure
measures = mlr_measures$mget(c("classif.acc"))
task$measures = measures

# Define the learner
learner = mlr_learners$get("classif.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorPerformanceStep$new(threshold = 0.01)

# Specify wrapper method
fs = FeatureSelectionForward$new(pe = pe, tm = tm)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

# Get optimization path
fs$get_optimization_path()

@pat-s
Copy link
Member

pat-s commented Jun 11, 2019

max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward

Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features is def. a termination criterion. I would also put it into TerminatorEvaluations and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward can stay "simple" with only pe and tm.

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.

Do these two incorporate all functionality of mlr::getFeatSelResult()?
If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz.
See also ?mlr::plotFilterValues().

Misc

  • Can you please use a branch of mlr3featsel for this PR instead of your fork? Makes it easier to checkout the branch and run the code.

  • Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?

    What is the current behavior? 0/1 returns?

  • Thanks for the good work. Looks really good is a huge contribution.

  • Please add yourself into the DESCR of the package as an author.

  • Check the Travis errors

  • Please always add a "fixes XY" as the first line of the PR so things are cross-linked and get closed automatically

  • Tests are needed

  • Examples in the functions are needed

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 20, 2019

Moved to #35

@be-marc be-marc closed this Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementation of "wrapper" methods
2 participants