Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrapper methods #35

Closed
wants to merge 15 commits into from
Closed

Add wrapper methods #35

wants to merge 15 commits into from

Conversation

be-marc
Copy link
Sponsor Member

@be-marc be-marc commented Jun 20, 2019

Moved from #30
Fixes #24

Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features is def. a termination criterion. I would also put it into TerminatorEvaluations and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward can stay "simple" with only pe and tm.

It is more complicated than I thought. Now max_features is implemented in all Terminator* classes but it is only used in conjunction with FeatureSelectionForward. It makes no sense to check for it if the user uses FeatureSelectionRandom. In this case the maximum number of features just determines the design of the 0-1 encoding. They are not a stopping criterion. I implemented it like this. If FeatureSelectionForward is called, it activates the check for maximum features in the Terminator* object.

I am not happy with this implementation because it adds so many specific code lines to the Terminator class, which are just necessary for FeatureSelectionForward. The other FeatureSelection* classes will not need this in order to work with the Terminator* classes. Moreover, the user needs to provide the PerformanceEvaluation object to the Terminator object in order to check for the possible number of features, which is again just necessary for FeatureSelectionForward.

Do these two incorporate all functionality of mlr::getFeatSelResult()?
If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz.
See also ?mlr::plotFilterValues().

$get_result returns the result in the same way as mlr::getFeatSelResult() now. Just a list with the selected features and the performance. mlr has a function called analyzeFeatSelResult which would be helpful to analysis the result of FeatureSelectionForward. mlr prints out a text. Maybe it would be better to implement it this time in a machine readable form like a list?

What is the current behavior? 0/1 returns?

It translates 0/1 to feature names which is needed in all FeatureSelection* classes. Therefore, I made it a private method in the FeatureSelection class so that all FeatureSelection* classes can use it.

@berndbischl
Copy link
Sponsor Member

As you are saying. Max features is really not a termination criterion. I guess that is bad design. It would only make sense like that for forward search.

But it might be a relevant side constraint for many algorithms. Eg random search. So it would rather go into the control.

Does that make sense? Especially if your code is now uglier when that is in the termination... M

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 20, 2019

Yes, it makes sense. We should keep max_features as a setting in the FeatureSelection* classes. FeatureSelectionForward can check if the maximum features are reached and then set $terminate = TRUE in the Terminator class. With this implementation the Terminator* classes remain clean. Thank you.

@berndbischl
Copy link
Sponsor Member

Well. I didn't do much 😀. Welcome.

}

super$initialize(id = "forward_selection", pe = pe, tm = tm,
settings = list(max_features = checkmate::assert_numeric(max_features,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of settings we should use param_set here inheriting from paradox. Similar to how it is done in the Filter class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self$param_set = assert_param_set(param_set)

#' @family FeatureSelection
#' @examples
#' task = mlr3::mlr_tasks$get("pima")
#' measures = mlr3::mlr_measures$mget(c("classif.acc"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' measures = mlr3::mlr_measures$mget(c("classif.acc"))
#' measures = mlr3::mlr_measures$get(c("classif.acc"))

public = list(
initialize = function(pe, tm, max_features = NA) {
if(is.na(max_features)) {
max_features = length(pe$task$feature_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hier sollte dann sowas stehen wie:

      super$initialize(
        id = id,
		[...]
        param_set = ParamSet$new(list([..])),
        param_vals = param_vals
      )

public = list(
initialize = function(pe, tm, max_features = NA, batch_size = 10) {
super$initialize(id = "random_selection", pe = pe, tm = tm,
settings = list(max_features = checkmate::assert_numeric(max_features,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paradox::ParamSet()

state = NULL,

initialize = function(settings) {
self$settings = checkmate::assert_list(settings, names = "unique")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paradox::ParamSet()

inherit = Terminator,
public = list(
initialize = function(max_evaluations) {
super$initialize(settings = list(max_evaluations = checkmate::assert_count(max_evaluations, positive = TRUE, coerce = TRUE)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paradox::ParamSet()

inherit = Terminator,
public = list(
initialize = function(threshold) {
super$initialize(settings = list(threshold = checkmate::assert_numeric(threshold)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paradox::ParamSet()

inherit = Terminator,
public = list(
initialize = function(max_time, units) {
super$initialize(settings = list(max_time = checkmate::assert_count(max_time, positive = TRUE, coerce = TRUE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paradox::ParamSet()

Copy link
Member

@pat-s pat-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Marc!

Question

Why a new PR and not simply changes to the old one? This splits the discussion somehow.

ToDO

  • As my comments indicate, we should use paradox::ParamSet() for things related to settings or hyperpars. There is still a lot missing in mlr3featsel also and for now only FilterVariance has a ParamSet.
  • Please apply the Style guide

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 21, 2019

Why a new PR and not simply changes to the old one? This splits the discussion somehow.

The old PR was started from a fork. I switched the developement to a branch.

I will change the other things. Thank you for the review.

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 22, 2019

I am writing a short summary about what is already done and what not because my time is limited for the rest of the month. I will be back on it next month. I think the general design can stay like this. The complicated cases like FeatureSelectionSequential and FeatureSelectionGenetic work with this design.

FeatureSelection

  • Sequential
  • Genetic
  • Random
  • Exhaustive

FeatureSelectionSequential

  • Misses the model without any predictor variable. Need to find out how this works in mlr
  • No sffs and sfbs
  • $get_path returns a list. Do we want something like analyzeFeatSelResult from mlr?

FeatureSelectionGenetic

  • Could return more information about each geneation but I left it out because I did not want to produce too much overhead
  • Do we want something like analyzeFeatSelResult from mlr?

Terminator

  • Evaluations
  • Runtime
  • PerformanceStep
  • Performance
  • Multiplexer
  • Messages why the feature selection stopped. Needs to work with max_features in FeatureSelectionSequential

PerformanceEvaluator

  • For each feature combination, a task is cloned from the task with all features and the features are adjusted according the 0/1 encoding. Does this work with big data sets?

Misc

  • Something like AutoTuner in mlr3tuning
  • Vignette
  • Paradox

If you are missing a feature from mlr please write it here.

@pat-s
Copy link
Member

pat-s commented Jun 23, 2019

I am writing a short summary about what is already done and what not because my time is limited for the rest of the month. I will be back on it next month.

No worries, you already did a lot here!

I think the general design can stay like this. The complicated cases like FeatureSelectionSequential and FeatureSelectionGenetic work with this design.

No need to do everything at once. It is even better if we split everything up into small PRs. So you can focus on getting one method working and then continue with the next one.

Misses the model without any predictor variable. Need to find out how this works in mlr

This is not so crucial and can be skipped for now. Just add it in an issue.

No sffs and sfbs

No problem, just do a new PR for these

$get_path returns a list. Do we want something like analyzeFeatSelResult from mlr?

Never used it so far so cant really comment on it. Sadly there is also no example in the help page to quickly take a look.
But this should also go into a separate PR.
Remember to keep PRs small (this is always hard, I constantly fail at this too 😅 )

The summary post is cool -> Please always do it (edit) in the first post of the PR so that new people do not have to search for it. 👍

Hope you're having fun, we're on the right track 🚀

@be-marc
Copy link
Sponsor Member Author

be-marc commented Jun 23, 2019

Never used it so far so cant really comment on it. Sadly there is also no example in the help page to quickly take a look.

Last code example in https://mlr.mlr-org.com/articles/tutorial/feature_selection.html#select-a-feature-subset

Remember to keep PRs small (this is always hard, I constantly fail at this too 😅 )

Okay I am going to add just the missing main parts in this PR and the small details will be Issues/ PR after the merge

Hope you're having fun, we're on the right track

Yes 😄

@pat-s
Copy link
Member

pat-s commented Jul 27, 2019

Moved to https://github.com/mlr-org/mlr3fswrap.

@pat-s pat-s closed this Jul 27, 2019
@pat-s pat-s deleted the wrapper branch July 27, 2019 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementation of "wrapper" methods
3 participants