diff --git a/DESCRIPTION b/DESCRIPTION index cc5b6a42..00bc6741 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -16,7 +16,11 @@ Authors@R: family = "Bischl", role = "aut", email = "bernd_bischl@gmx.net", - comment = c(ORCID = "0000-0001-6002-6980"))) + comment = c(ORCID = "0000-0001-6002-6980")), + person(given = "Marc", + family = "Becker", + role = "aut", + email = "marcbecker@posteo.de")) Description: Implements methods for feature selection and filtering in mlr3. License: MIT + file LICENSE @@ -57,6 +61,11 @@ NeedsCompilation: no Roxygen: list(markdown = TRUE) RoxygenNote: 6.1.1 Collate: + 'FeatureSelection.R' + 'FeatureSelectionExhaustive.R' + 'FeatureSelectionGenetic.R' + 'FeatureSelectionRandom.R' + 'FeatureSelectionSequential.R' 'Filter.R' 'FilterAUC.R' 'FilterCMIM.R' @@ -73,6 +82,11 @@ Collate: 'FilterSymmetricalUncertainty.R' 'FilterVariableImportance.R' 'FilterVariance.R' + 'PerformanceEvaluator.R' + 'Terminator.R' + 'TerminatorEvaluations.R' + 'TerminatorPerformanceStep.R' + 'TerminatorRuntime.R' 'helpers.R' 'mlr_filters.R' 'reexports.R' diff --git a/NAMESPACE b/NAMESPACE index 617b7eaa..daa3adcb 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -2,6 +2,11 @@ S3method(as.data.table,DictionaryFilter) S3method(as.data.table,Filter) +export(FeatureSelection) +export(FeatureSelectionExhaustive) +export(FeatureSelectionGenetic) +export(FeatureSelectionRandom) +export(FeatureSelectionSequential) export(Filter) export(FilterAUC) export(FilterCMIM) @@ -18,6 +23,11 @@ export(FilterRankCorrelation) export(FilterSymmetricalUncertainty) export(FilterVariableImportance) export(FilterVariance) +export(PerformanceEvaluator) +export(Terminator) +export(TerminatorEvaluations) +export(TerminatorPerformanceStep) +export(TerminatorRuntime) export(as.data.table) export(mlr_filters) import(checkmate) diff --git a/R/FeatureSelection.R b/R/FeatureSelection.R new file mode 100644 index 00000000..2de30739 --- /dev/null +++ b/R/FeatureSelection.R @@ -0,0 +1,71 @@ +#' @title Abstract FeatureSelection Class +#' +#' @description `FeatureSelection` class that implements the main functionality each fs must have. A fs is an object that describes the optimization method for choosing the features given within the `[PerformanceEvaluator]` object. +#' +#' @section Usage: +#' ``` +#' # Construction +#' fs = FeatureSelectionr$new(id, pe, tm, settings = list()) +#' +#' # public members +#' fs$id +#' fs$pe +#' fs$tm +#' fs$settings +#' +#' # public methods +#' fs$calculate() +#' ``` +#' @section Arguments: +#' * `id` (`character(1)`):\cr +#' The id of the FeatureSelection. +#' * `pe` (`[PerformanceEvaluator]`). +#' * `tm` (`[Terminator]`). +#' * `settings` (`list`):\cr +#' The settings for the FeatureSelection. +#' +#' @section Details: +#' * `$new()` creates a new object of class `[FeatureSelection]`. +#' * `$id` stores an identifier for this `[FeatureSelection]`. +#' * `$pe` stores the [PerformanceEvaluator] to optimize. +#' * `$tm` stores the `[Terminator]`. +#' * `$settings` is a list of settings for this `[FeatureSelection]`. +#' * `$state` stores currently evaluated 0/1 encoded feature combinations. +#' * `$calculate()` performs the feature selection, until the budget of the `[Terminator]` in the `[PerformanceEvaluator]` is exhausted. +#' @name FeatureSelection +#' @family FeatureSelection +NULL + +#' @export +FeatureSelection = R6Class("FeatureSelection", + public = list( + id = NULL, + pe = NULL, + tm = NULL, + settings = NULL, + state = NULL, + + initialize = function(id, pe, tm, settings = list()) { + self$id = checkmate::assert_string(id) + self$pe = checkmate::assert_r6(pe, "PerformanceEvaluator") + self$tm = checkmate::assert_r6(tm, "Terminator") + self$settings = checkmate::assert_list(settings, names = "unique") + }, + + calculate = function() { + while (!self$tm$terminated) { + private$calculate_step() + } + } + ), + private = list( + binary_to_features = function(binary_features) { + task$feature_names[as.logical(binary_features)] + }, + eval_states_terminator = function(states) { + self$tm$update_start(self$pe) + self$pe$eval_states(states) + self$tm$update_end(self$pe) + } + ) +) diff --git a/R/FeatureSelectionExhaustive.R b/R/FeatureSelectionExhaustive.R new file mode 100644 index 00000000..e52862c0 --- /dev/null +++ b/R/FeatureSelectionExhaustive.R @@ -0,0 +1,102 @@ +#' @title FeatureSelectionExhaustive +#' +#' @description +#' FeatureSelection child class to conduct exhaustive search +#' +#' @section Usage: +#' ``` +#' fs = FeatureSelectionExhaustive$new() +#' ``` +#' See [FeatureSelection] for a description of the interface. +#' +#' @section Arguments: +#' * `pe` (`[PerformanceEvaluator]`). +#' * `tm` (`[Terminator]`). +#' * `max_features` (`integer(1)`) +#' Maximum number of features +#' +#' @section Details: +#' `$new()` creates a new object of class [FeatureSelectionExhaustive]. +#' `$get_result()` Returns best feature combination. +#' The interface is described in [FeatureSelection]. +#' +#' @name FeatureSelectionExhaustive +#' @family FeatureSelection +#' @examples +#' task = mlr3::mlr_tasks$get("pima") +#' task$select(c("age", "glucose", "insulin", "mass")) +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +#' pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +#' tm = TerminatorRuntime$new(max_time = 20, units = "secs") +#' fs = FeatureSelectionExhaustive$new(pe = pe, tm = tm, max_features = 3) +#' fs$calculate() +#' fs$get_result() +NULL + +#' @export +#' @include FeatureSelection.R + +FeatureSelectionExhaustive = R6Class("FeatureSelectionExhaustive", + inherit = FeatureSelection, + public = list( + initialize = function(pe, tm, max_features = NA) { + if (is.na(max_features)) { + max_features = length(pe$task$feature_names) + } + + super$initialize(id = "exhaustive_selection", pe = pe, tm = tm, + settings = list( + max_features = checkmate::assert_numeric( + max_features, + lower = 1, + upper = length(pe$task$feature_names)))) + + self$state = private$generate_states(1) + }, + + get_result = function() { + if (length(self$pe$bmr) > 1) { + bmr = lapply(self$pe$bmr[1:length(self$pe$bmr)], + function(bmr) self$pe$bmr[[1]]$combine(bmr)) + } else { + bmr = self$pe$bmr + } + bmr_best = bmr[[length(bmr)]]$get_best(self$pe$task$measures[[1L]]$id) + list( + features = bmr_best$task$feature_names, + performance = bmr_best$aggregated) + } + ), + private = list( + calculate_step = function() { + # Convert 0/1 states to feature names + named_states = lapply(self$state, private$binary_to_features) + + # Evaluation + private$eval_states_terminator(named_states) + + # Generate new states + self$state = private$generate_states( + min((sum(self$state[[1]]) + 1), self$settings$max_features)) + }, + generate_states = function(feature_count) { + combinations = combn(length(self$pe$task$feature_names), feature_count) + self$state = lapply(seq_len(ncol(combinations)), function(j) { + state = rep(0, length(self$pe$task$feature_names)) + state[combinations[, j]] = 1 + state + }) + }, + eval_states_terminator = function(states) { + self$tm$update_start(self$pe) + self$pe$eval_states(states) + self$tm$update_end(self$pe) + + # Side-effect stop + if (!self$tm$terminated) { + self$tm$terminated = (length(states[[1]]) == self$settings$max_features) + } + } + ) +) diff --git a/R/FeatureSelectionGenetic.R b/R/FeatureSelectionGenetic.R new file mode 100644 index 00000000..88a7281e --- /dev/null +++ b/R/FeatureSelectionGenetic.R @@ -0,0 +1,159 @@ +#' @title FeatureSelectionGenetic +#' +#' @description +#' FeatureSelection child class to conduct genetic search. The comma strategy `(mu, lambda)` selects a new population of size `mu` out of the `lambda > mu` offspring. The plus strategy `(mu + lambda)` uses the joint pool of `mu` parents and `lambda` offspring for selecting `mu` new candidates. Out of those `mu` features, the new `lambda` features are generated by randomly choosing pairs of parents. These are crossed over and `crossover_rate` represents the probability of choosing a feature from the first parent instead of the second parent. The resulting offspring is mutated, i.e., its bits are flipped with probability `mutation_rate`. If `max_features` is set, offspring are repeatedly generated until the setting is satisfied. +#' +#' @section Usage: +#' ``` +#' fs = FeatureSelectionGenetic$new() +#' ``` +#' See [FeatureSelection] for a description of the interface. +#' +#' @section Arguments: +#' * `pe` (`[PerformanceEvaluator]`). +#' * `tm` (`[Terminator]`). +#' * `mu` (`integer(1)` +#' Size of the parent population. +#' * `lambda` (`integer(1)`) +#' Size of the children population. +#' * `crossover_rate` (`numeric(1)`) +#' Probability of choosing a bit from the first parent within the crossover mutation. +#' * `mutation_rate` (`numeric(1)`) +#' Probability of flipping a feature bit, i.e. switch between selecting / deselecting a feature. +#' * `max_features` (`integer(1)`) +#' Maximum number of features. +#' * `strategy` (`character(1)`) +#' `plus` or `comma`. Indicates whether to use a (mu, lambda) or (mu + lambda) genetic algorithm. +#' +#' @section Details: +#' `$new()` creates a new object of class [FeatureSelectionGenetic]. +#' `$get_result()` Returns best feature combination with performance. +#' `$get_path()` Returns each generation (`mu` feature sets and performanes) as a list entry. +#' The interface is described in [FeatureSelection]. +#' +#' @name FeatureSelectionGenetic +#' @family FeatureSelection +#' @examples +#' task = mlr3::mlr_tasks$get("pima") +#' measures = mlr3::mlr_measures$mget(c("classif.acc")) +#' task$measures = measures +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +#' pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +#' tm = TerminatorRuntime$new(max_time = 20, units = "secs") +#' fs = FeatureSelectionGenetic$new(pe = pe, tm = tm, mu = 10, lambda = 20, strategy = "plus") +#' fs$calculate() +#' fs$get_result() +#' fs$get_path() +NULL + +#' @export +#' @include FeatureSelection.R + +FeatureSelectionGenetic = R6Class("FeatureSelectionGenetic", + inherit = FeatureSelection, + public = list( + initialize = function(pe, tm, mu, lambda, crossover_rate = 0.5, + mutation_rate = 0.05, max_features = NA, strategy = "plus") { + super$initialize(id = "genetic_selection", pe = pe, tm = tm, + settings = list( + max_features = checkmate::assert_numeric(max_features, lower = 1, upper = length(pe$task$feature_names)), + mu = checkmate::assert_numeric(mu), + lambda = checkmate::assert_numeric(lambda), + crossover_rate = checkmate::assert_numeric(crossover_rate, lower = 0, upper = 1), + mutation_rate = checkmate::assert_numeric(mutation_rate, lower = 0, upper = 1), + strategy = checkmate::assert_string(strategy, pattern = "(^comma$|^plus$)"))) + + if (strategy == "comma" & lambda < mu) { + stop("For comma strategy lambda >= mu") + } + + self$state = private$initialize_states() + }, + + get_result = function() { + if (length(self$pe$bmr) > 1) { + bmr = lapply(self$pe$bmr[1:length(self$pe$bmr)], function(bmr) self$pe$bmr[[1]]$combine(bmr)) + } else { + bmr = self$pe$bmr + } + bmr_best = bmr[[length(bmr)]]$get_best(self$pe$task$measures[[1L]]$id) + list( + features = bmr_best$task$feature_names, + performance = bmr_best$aggregated) + }, + get_path = function() { + lapply(self$pe$bmr, function(bmr) { + aggr = bmr$aggregated() + aggr = setorderv(aggr, self$pe$task$measures[[1]]$id, order = -1)[1:self$settings$mu, ] + performance = aggr[, self$pe$task$measures[[1]]$id, with = FALSE][[1]] + features = lapply(aggr$task, function(task) task$feature_names) + list( + features = features, + performance = performance) + }) + } + ), + private = list( + calculate_step = function() { + + # Generate population depending on strategy + if (self$settings$strategy == "plus") { + states = c(self$state, private$generate_states()) + } else if (self$settings$strategy == "comma") { + states = private$generate_states() + } + named_states = lapply(states, private$binary_to_features) + + # Evaluation + private$eval_states_terminator(named_states) + bmr = self$pe$bmr[[length(self$pe$bmr)]] + + # Select mu best results + aggr = bmr$aggregated() + aggr = setorderv(aggr, self$pe$task$measures[[1]]$id, order = -1)[1:self$settings$mu, ] + + # Convert feature names to 0/1 encoding and set state + features = lapply(aggr$task, function(task) { + task$feature_names + }) + self$state = lapply(features, function(y) { + as.numeric(Reduce("|", lapply(y, function(x) x == self$pe$task$feature_names))) + }) + }, + initialize_states = function() { + lapply(seq_len(self$settings$mu), function(i) { + if (is.na(self$settings$max_features)) { + return(rbinom(length(self$pe$task$feature_names), 1, 0.5)) + } + x = Inf + while (sum(x) >= self$settings$max_features) { + x = rbinom(length(self$pe$task$feature_names), 1, 0.5) + } + return(x) + }) + }, + generate_states = function() { + lapply(seq_len(self$settings$lambda), function(i) { + while (TRUE) { + # Randomly select parents + parents = sample(1:length(self$state), 2, replace = TRUE) + + # Crossover + cross = rbinom(length(self$state[[parents[1]]]), 1, self$settings$crossover_rate) + children = ifelse(cross == 1, self$state[[parents[1]]], self$state[[parents[2L]]]) + + # Mutation + mutation = rbinom(length(self$state[[parents[1]]]), 1, self$settings$mutation_rate) + children = (children + mutation) %% 2 + + # Check max features + if (is.na(self$settings$max_features) || sum(children) <= self$settings$max_features) { + break + } + } + return(children) + }) + } + ) +) diff --git a/R/FeatureSelectionRandom.R b/R/FeatureSelectionRandom.R new file mode 100644 index 00000000..4f489559 --- /dev/null +++ b/R/FeatureSelectionRandom.R @@ -0,0 +1,95 @@ +#' @title FeatureSelectionRandom +#' +#' @description +#' FeatureSelection child class to conduct random search +#' +#' @section Usage: +#' ``` +#' fs = FeatureSelectionRandom$new() +#' ``` +#' See [FeatureSelection] for a description of the interface. +#' +#' @section Arguments: +#' * `pe` (`[PerformanceEvaluator]`). +#' * `tm` (`[Terminator]`). +#' * `max_features` (`integer(1)`) +#' Maximum number of features +#' * `batch_size` (`integer(1`): +#' Maximum number of feature combinations to try in a batch. +#' Each batch is possibly executed in parallel via [mlr3::benchmark()]. +#' +#' @section Details: +#' `$new()` creates a new object of class [FeatureSelectionRandom]. +#' `$get_result()` Returns best feature combination. +#' The interface is described in [FeatureSelection]. +#' +#' @name FeatureSelectionRandom +#' @family FeatureSelection +#' @examples +#' task = mlr3::mlr_tasks$get("boston_housing") +#' learner = mlr3::mlr_learners$get("regr.rpart") +#' resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +#' pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +#' tm = TerminatorEvaluations$new(max_evaluations = 20) +#' fs = FeatureSelectionRandom$new(pe, tm, batch_size = 10, max_features = 8) +#' fs$calculate() +#' fs$get_result() +NULL + +#' @export +#' @include FeatureSelection.R + +FeatureSelectionRandom = R6Class("FeatureSelectionRandom", + inherit = FeatureSelection, + public = list( + initialize = function(pe, tm, max_features = NA, batch_size = 10) { + super$initialize(id = "random_selection", pe = pe, tm = tm, + settings = list( + max_features = checkmate::assert_numeric( + max_features, + lower = 1, + upper = length(pe$task$feature_names)), + batch_size = checkmate::assert_numeric(batch_size))) + + self$state = private$generate_states() + }, + + get_result = function() { + if (length(self$pe$bmr) > 1) { + bmr = lapply(self$pe$bmr[1:length(self$pe$bmr)], function(bmr) self$pe$bmr[[1]]$combine(bmr)) + } else { + bmr = self$pe$bmr + } + bmr_best = bmr[[length(bmr)]]$get_best(self$pe$task$measures[[1L]]$id) + list( + features = bmr_best$task$feature_names, + performance = bmr_best$aggregated) + } + ), + private = list( + calculate_step = function() { + + # Convert 0/1 states to feature names + named_states = lapply(self$state, private$binary_to_features) + + # Evaluation + private$eval_states_terminator(named_states) + + # Generate new states + self$state = private$generate_states() + }, + generate_states = function() { + lapply(seq_len(self$settings$batch_size), function(i) { + if (is.na(self$settings$max_features)) { + return(rbinom(length(self$pe$task$feature_names), 1, 0.5)) + } + x = Inf + while (sum(x) >= self$settings$max_features) { + x = rbinom(length(self$pe$task$feature_names), 1, 0.5) + } + return(x) + } + ) + } + ) +) diff --git a/R/FeatureSelectionSequential.R b/R/FeatureSelectionSequential.R new file mode 100644 index 00000000..31b7556e --- /dev/null +++ b/R/FeatureSelectionSequential.R @@ -0,0 +1,130 @@ +#' @title FeatureSelectionSequential +#' +#' @description +#' FeatureSelection child class to conduct sequential search. +#' +#' @section Usage: +#' ``` +#' fs = FeatureSelectionSequential$new() +#' ``` +#' See [FeatureSelection] for a description of the interface. +#' +#' @section Arguments: +#' * `pe` (`[PerformanceEvaluator]`). +#' * `tm` (`[Terminator]`). +#' * `max_features` (`integer(1)`) +#' Maximum number of features +#' * `strategy` (`character(1)`). +#' Forward selection `fsf` or backward selection `fsb`. +#' +#' @section Details: +#' `$new()` creates a new object of class [FeatureSelectionSequential]. +#' `$get_result()` Returns selected features in each step. +#' The interface is described in [FeatureSelection]. +#' +#' Each step is possibly executed in parallel via [mlr3::benchmark()] +#' +#' @name FeatureSelectionSequential +#' @family FeatureSelection +#' @examples +#' task = mlr3::mlr_tasks$get("pima") +#' measures = mlr3::mlr_measures$mget(c("classif.acc")) +#' task$measures = measures +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +#' pe = PerformanceEvaluator$new(task, learner, resampling) +#' tm = TerminatorPerformanceStep$new(threshold = 0.01) +#' fs = FeatureSelectionSequential$new(pe, tm) +#' fs$calculate() +#' fs$get_result() +NULL + +#' @export +#' @include FeatureSelection.R + +FeatureSelectionSequential = R6Class("FeatureSelectionSequential", + inherit = FeatureSelection, + public = list( + initialize = function(pe, tm, max_features = NA, strategy = "fsf") { + if (is.na(max_features)) { + max_features = length(pe$task$feature_names) + } + + super$initialize(id = "sequential_selection", pe = pe, tm = tm, + settings = list( + max_features = checkmate::assert_numeric( + max_features, + lower = 1, + upper = length(pe$task$feature_names)), + strategy = checkmate::assert_string( + strategy, + pattern = "(^fsf$|^fsb$)"))) + + if (strategy == "fsf") { + self$state = private$generate_states(rep(0, length(pe$task$feature_names))) + } else if (strategy == "fsb") { + self$state = rep(list(rep(1, length(pe$task$feature_names))), length(pe$task$feature_names)) + } + }, + + get_result = function() { + bmr = self$pe$bmr[[length(self$pe$bmr)]]$get_best(self$pe$task$measures[[1L]]$id) + list( + features = bmr$task$feature_names, + performance = bmr$aggregated) + }, + get_path = function() { + lapply(self$pe$bmr, function(bmr) { + bmr = bmr$get_best(self$pe$task$measures[[1L]]$id) + list( + features = bmr$task$feature_names, + performance = bmr$aggregated) + }) + } + ), + private = list( + calculate_step = function() { + + # Convert 0/1 states to feature names + named_states = lapply(self$state, private$binary_to_features) + + # Evaluation + private$eval_states_terminator(named_states) + + # Select best state + bmr = self$pe$get_best() + features = bmr[[length(bmr)]]$features + best_state = as.numeric(Reduce("|", lapply(features, function(x) x == self$pe$task$feature_names))) + + # Generate new states + self$state = private$generate_states(best_state) + }, + generate_states = function(state) { + x = ifelse(self$settings$strategy == "fsf", 0, 1) + y = ifelse(self$settings$strategy == "fsf", 1, 0) + new_states = list() + for (i in seq_along(state)) { + if (state[i] == x) { + changed_state = state + changed_state[i] = y + new_states[[length(new_states) + 1]] = changed_state + } + } + new_states + }, + eval_states_terminator = function(states) { + self$tm$update_start(self$pe) + self$pe$eval_states(states) + self$tm$update_end(self$pe) + + # Side-effect stop + if (!self$tm$terminated) { + if (self$settings$strategy == "fsf") { + self$tm$terminated = (length(states[[1]]) == self$settings$max_features) + } else if (self$settings$strategy == "fsb") { + self$tm$terminated = (length(states[[1]]) == 1) + } + } + } + ) +) diff --git a/R/PerformanceEvaluator.R b/R/PerformanceEvaluator.R new file mode 100644 index 00000000..9e5fc6be --- /dev/null +++ b/R/PerformanceEvaluator.R @@ -0,0 +1,88 @@ +#' @title Abstract PerformanceEvaluator Class +#' +#' @description +#' `PerformanceEvaluator` class that implements the performance evaluation on a set of feature combinations. A pe is an object that stores all informations that are necesarry to conduct a feature selection (`mlr3::Task`, `mlr3::Learner`, `mlr3::Resampling`). +#' +#' @section Usage: +#' ``` +#' # Construction +#' pe = PerformanceEvaluator$new() +#' +#' # Public members +#' pe$task +#' pe$learner +#' pe$resampling +#' pe$bmr +#' +#' # Public methods +#' pe$eval_states(states) +#' pe$get_best() +#' ``` +#' +#' @section Arguments: +#' * `task` (`mlr3::Task`): +#' The task that we want to evaluate. +#' * `learner` (`mlr3::Learner`): +#' The learner that we want to evaluate. +#' * `resampling` (`mlr3::Resampling`): +#' The Resampling method that is used to evaluate the learner. +#' +#' @section Details: +#' * `$new()` creates a new object of class [PerformanceEvaluator]. +#' * `$task` (`mlr3::Task`) the task for which the feature selection should be conducted. +#' * `$learner` (`mlr3::Learner`) the algorithm for which the feature selection should be conducted. +#' * `$resampling` (`mlr3::Resampling`) strategy to evaluate a feature combination +#' * `$bmr` (`list`) of (`mlr3::BenchmarkResult`) objects. Each entry corresponds to one batch or step depending one the used feature selection method. +#' * `$eval_states(states)` evaluates the feature combinations `states`. +#' * `$get_best()` returns selected features with the best performance of each entry in `$bmr`. +#' +#' @name PerformanceEvaluator +#' @keywords internal +#' @family PerformanceEvaluator +#' @examples +#' task = mlr3::mlr_tasks$get("iris") +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("holdout") +#' pe = PerformanceEvaluator$new(task, learner, resampling) +NULL + +#' @export +PerformanceEvaluator = R6Class("PerformanceEvaluator", + public = list( + task = NULL, + learner = NULL, + resampling = NULL, + bmr = list(), + + initialize = function(task, learner, resampling) { + self$task = mlr3::assert_task(task) + self$learner = mlr3::assert_learner(learner, task = task) + self$resampling = mlr3::assert_resampling(resampling) + }, + + eval_states = function(states) { + # For each state, clone task and set feature subset + task_list <- list() + for (state in states) { + task = self$task$clone() + task$select(state) + task_list[[length(task_list) + 1]] <- task + } + + # Evaluate + new_bmr = benchmark(data.table::data.table( + task = task_list, + learner = list(self$learner), + resampling = list(self$resampling))) + self$bmr[[length(self$bmr) + 1]] <- new_bmr + }, + get_best = function() { + lapply(self$bmr, function(x) { + rr = x$get_best(self$task$measures[[1L]]$id) + list( + features = rr$task$feature_names, + performance = mean(rr$performance(self$task$measures[[1L]]$id))) + }) + } + ) +) diff --git a/R/Terminator.R b/R/Terminator.R new file mode 100644 index 00000000..63e4a027 --- /dev/null +++ b/R/Terminator.R @@ -0,0 +1,52 @@ +#' @title Abstract Terminator Class +#' +#' @description Abstract `Terminator` class that implements the main functionality each terminator must have. A terminator is an object that says when to stop the feature selection. +#' +#' @section Usage: +#' ``` +#' # Construction +#' tm = Terminator$new() +#' +#' # Public members +#' tm$terminated +#' tm$state +#' +#' # Public methods +#' tm$update_state(pe) +#' tm$update_end(pe) +#' ``` +#' +#' @section Arguments: +#' *`settings` (`list(0)`) +#' +#' @section Details: +#' * `$new()` creates a new object of class [Terminator]. +#' * `$terminated` (`logical(1)`) is the termination criterion met? Updated by each call of `update_start()`/`update_end()`. +#' * `$settings` (`list()`) settings that are set by the child classes to define stopping criteria. +#' * `$state` (`list()`) arbitrary state of the Terminator. Gets updated with each call of `update_start()` and `update_end()`. +#' * `$update_start()` is called in each tuning iteration before the evaluation. +#' * `$update_end()` is called in each tuning iteration after the evaluation. +#' @name Terminator +#' @keywords internal +#' @family Terminator +NULL + +#' @export +Terminator = R6Class("Terminator", + public = list( + terminated = NULL, + settings = NULL, + state = NULL, + + initialize = function(settings) { + self$settings = checkmate::assert_list(settings, names = "unique") + }, + + update_start = function(pe) { + stop("$update_start() not implemented for Terminator") + }, + update_end = function(pe) { + stop("$update_end() not implemented for Terminator") + } + ) +) diff --git a/R/TerminatorEvaluations.R b/R/TerminatorEvaluations.R new file mode 100644 index 00000000..d73fd604 --- /dev/null +++ b/R/TerminatorEvaluations.R @@ -0,0 +1,60 @@ +#' @title TerminatorEvaluations +#' +#' @description +#' Terminator child class to terminate the feature selection if the model performance does not improve to a specified threshold in the next step. +#' +#' @section Usage: +#' ``` +#' tm = TerminatorEvaluations$new() +#' ``` +#' See [Terminator] for a description of the interface. +#' +#' @section Arguments: +#' * `max_evaluations` (`integer(1)`): +#' Maximum number of function evaluations. +#' +#' @section Details: +#' `$new()` creates a new object of class [TerminatorEvaluations]. +#' +#' The interface is described in [Terminator]. +#' +#' @name TerminatorEvaluations +#' @family Terminator +#' @examples +#' task = mlr3::mlr_tasks$get("iris") +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("holdout") +#' pe = PerformanceEvaluator$new(task, learner, resampling) +#' tm = TerminatorEvaluations$new(max_evaluations = 100) +NULL + +#' @export +#' @include Terminator.R +TerminatorEvaluations = R6Class("TerminatorEvaluations", + inherit = Terminator, + public = list( + initialize = function(max_evaluations) { + super$initialize(settings = list(max_evaluations = checkmate::assert_count(max_evaluations, positive = TRUE, coerce = TRUE))) + + self$state = list(evals = 0L) + self$terminated = FALSE + }, + + update_start = function(pe) { + if (length(pe$bmr) < 1) { + self$state$evals = 0L + } else { + row_num = lapply(pe$bmr, function(bmr) nrow(bmr$aggregated())) + self$state$evals = Reduce("sum", row_num) + } + + self$terminated = self$state$evals >= self$settings$max_evaluations + + invisible(self) + }, + + update_end = function(pe) { + self$update_start(pe) + } + ) +) diff --git a/R/TerminatorPerformanceStep.R b/R/TerminatorPerformanceStep.R new file mode 100644 index 00000000..af04e7b4 --- /dev/null +++ b/R/TerminatorPerformanceStep.R @@ -0,0 +1,64 @@ +#' @title TerminatorPerformanceStep +#' +#' @description +#' Terminator child class to terminate the sequential feature selection if the model performance does not improve to a specified threshold in the next step. +#' +#' @section Usage: +#' ``` +#' tm = TerminatorPerformanceStep$new(threshold) +#' ``` +#' See [Terminator] for a description of the interface. +#' +#' @section Arguments: +#' * `threshold` (`numeric(1)``): +#' The feature selection is terminated if the performance improvement between two steps is less than the threshold. +#' +#' @section Details: +#' `$new()` creates a new object of class [TerminatorPerformanceStep]. +#' +#' The interface is described in [Terminator]. +#' +#' @name TerminatorPerformanceStep +#' @family Terminator +#' @examples +#' task = mlr3::mlr_tasks$get("iris") +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("holdout") +#' pe = PerformanceEvaluator$new(task, learner, resampling) +#' tm = TerminatorPerformanceStep$new(threshold = 0.01) +NULL + +#' @export +#' @include Terminator.R +TerminatorPerformanceStep = R6Class("TerminatorPerformanceStep", + inherit = Terminator, + public = list( + initialize = function(threshold) { + super$initialize( + settings = list(threshold = checkmate::assert_numeric(threshold))) + + self$terminated = FALSE + self$state = list(step_performance = NA) + }, + + update_start = function(pe) { + invisible(self) + }, + update_end = function(pe) { + bmr = pe$get_best() + if (!is.na(self$state$step_performance)) { + if (pe$task$measures[[1]]$minimize) { + if (self$state$step_performance - bmr[[length(bmr)]]$performance <= self$settings$threshold) { + self$terminated = TRUE + } + } else { + if (bmr[[length(bmr)]]$performance - self$state$step_performance <= self$settings$threshold) { + self$terminated = TRUE + } + } + } + self$state$step_performance = bmr[[length(bmr)]]$performance + invisible(self) + } + ) +) diff --git a/R/TerminatorRuntime.R b/R/TerminatorRuntime.R new file mode 100644 index 00000000..3e8a9786 --- /dev/null +++ b/R/TerminatorRuntime.R @@ -0,0 +1,71 @@ +#' @title TerminatorRuntime Class +#' +#' @description +#' Terminator child class to terminate the feature selection after a specific time. Note that the runtime is checked after each step and therefore it could happen that the final runtime is longer than the specified one. Time is measured for everything that happens between update_start and update_end. +#' @section Usage: +#' ``` +#' tm = TerminatorRuntime$new(max_time, time_unit) +#' ``` +#' See [Terminator] for a description of the interface. +#' +#' @section Arguments: +#' * `max_time` (integer(1)): +#' Maximal amount of time measures in `units`. +#' * `units` (character(1)): +#' Unit used for measuring time. Possible choices are "secs", "mins", "hours", "days", and "weeks" that +#' are directly passed to `difftime()`. +#' +#' @section Details: +#' `$new()` creates a new object of class [TerminatorRuntime]. +#' +#' The interface is described in [Terminator]. +#' +#' @name TerminatorRuntime +#' @family Terminator +#' @examples +#' task = mlr3::mlr_tasks$get("iris") +#' learner = mlr3::mlr_learners$get("classif.rpart") +#' resampling = mlr3::mlr_resamplings$get("holdout") +#' pe = PerformanceEvaluator$new(task, learner, resampling) +#' tm = TerminatorRuntime$new(max_time = 5, units = "secs") +NULL + +#' @export +#' @include Terminator.R +TerminatorRuntime = R6Class("TerminatorRuntime", + inherit = Terminator, + public = list( + initialize = function(max_time, units) { + super$initialize(settings = list( + max_time = checkmate::assert_count( + max_time, + positive = TRUE, + coerce = TRUE), + units = checkmate::assert_choice( + units, + choices = c("secs", "mins", "hours", "days", "weeks")))) + + self$state = list( + time_start = NULL, + time_end = NULL, + time_remaining = self$settings$max_time) + self$terminated = FALSE + }, + + update_start = function(pe) { + self$state$time_start = Sys.time() + invisible(self) + }, + + update_end = function(pe) { + self$state$time_end = Sys.time() + dtime = difftime( + time1 = self$state$time_end, + time2 = self$state$time_start, + units = self$settings$units) + self$state$time_remaining = self$state$time_remaining - dtime + self$terminated = self$state$time_remaining < 0 + invisible(self) + } + ) +) diff --git a/man/FeatureSelection.Rd b/man/FeatureSelection.Rd new file mode 100644 index 00000000..29231880 --- /dev/null +++ b/man/FeatureSelection.Rd @@ -0,0 +1,55 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/FeatureSelection.R +\name{FeatureSelection} +\alias{FeatureSelection} +\title{Abstract FeatureSelection Class} +\description{ +\code{FeatureSelection} class that implements the main functionality each fs must have. A fs is an object that describes the optimization method for choosing the features given within the \code{[PerformanceEvaluator]} object. +} +\section{Usage}{ +\preformatted{# Construction +fs = FeatureSelectionr$new(id, pe, tm, settings = list()) + +# public members +fs$id +fs$pe +fs$tm +fs$settings + +# public methods +fs$calculate() +} +} + +\section{Arguments}{ + +\itemize{ +\item \code{id} (\code{character(1)}):\cr +The id of the FeatureSelection. +\item \code{pe} (\code{[PerformanceEvaluator]}). +\item \code{tm} (\code{[Terminator]}). +\item \code{settings} (\code{list}):\cr +The settings for the FeatureSelection. +} +} + +\section{Details}{ + +\itemize{ +\item \code{$new()} creates a new object of class \code{[FeatureSelection]}. +\item \code{$id} stores an identifier for this \code{[FeatureSelection]}. +\item \code{$pe} stores the \link{PerformanceEvaluator} to optimize. +\item \code{$tm} stores the \code{[Terminator]}. +\item \code{$settings} is a list of settings for this \code{[FeatureSelection]}. +\item \code{$state} stores currently evaluated 0/1 encoded feature combinations. +\item \code{$calculate()} performs the feature selection, until the budget of the \code{[Terminator]} in the \code{[PerformanceEvaluator]} is exhausted. +} +} + +\seealso{ +Other FeatureSelection: \code{\link{FeatureSelectionExhaustive}}, + \code{\link{FeatureSelectionGenetic}}, + \code{\link{FeatureSelectionRandom}}, + \code{\link{FeatureSelectionSequential}} +} +\concept{FeatureSelection} diff --git a/man/FeatureSelectionExhaustive.Rd b/man/FeatureSelectionExhaustive.Rd new file mode 100644 index 00000000..e897b112 --- /dev/null +++ b/man/FeatureSelectionExhaustive.Rd @@ -0,0 +1,50 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/FeatureSelectionExhaustive.R +\name{FeatureSelectionExhaustive} +\alias{FeatureSelectionExhaustive} +\title{FeatureSelectionExhaustive} +\description{ +FeatureSelection child class to conduct exhaustive search +} +\section{Usage}{ +\preformatted{fs = FeatureSelectionExhaustive$new() +} + +See \link{FeatureSelection} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{pe} (\code{[PerformanceEvaluator]}). +\item \code{tm} (\code{[Terminator]}). +\item \code{max_features} (\code{integer(1)}) +Maximum number of features +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{FeatureSelectionExhaustive}. +\code{$get_result()} Returns best feature combination. +The interface is described in \link{FeatureSelection}. +} + +\examples{ +task = mlr3::mlr_tasks$get("pima") +task$select(c("age", "glucose", "insulin", "mass")) +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +tm = TerminatorRuntime$new(max_time = 20, units = "secs") +fs = FeatureSelectionExhaustive$new(pe = pe, tm = tm, max_features = 3) +fs$calculate() +fs$get_result() +} +\seealso{ +Other FeatureSelection: \code{\link{FeatureSelectionGenetic}}, + \code{\link{FeatureSelectionRandom}}, + \code{\link{FeatureSelectionSequential}}, + \code{\link{FeatureSelection}} +} +\concept{FeatureSelection} diff --git a/man/FeatureSelectionGenetic.Rd b/man/FeatureSelectionGenetic.Rd new file mode 100644 index 00000000..e061e06f --- /dev/null +++ b/man/FeatureSelectionGenetic.Rd @@ -0,0 +1,63 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/FeatureSelectionGenetic.R +\name{FeatureSelectionGenetic} +\alias{FeatureSelectionGenetic} +\title{FeatureSelectionGenetic} +\description{ +FeatureSelection child class to conduct genetic search. The comma strategy \code{(mu, lambda)} selects a new population of size \code{mu} out of the \code{lambda > mu} offspring. The plus strategy \code{(mu + lambda)} uses the joint pool of \code{mu} parents and \code{lambda} offspring for selecting \code{mu} new candidates. Out of those \code{mu} features, the new \code{lambda} features are generated by randomly choosing pairs of parents. These are crossed over and \code{crossover_rate} represents the probability of choosing a feature from the first parent instead of the second parent. The resulting offspring is mutated, i.e., its bits are flipped with probability \code{mutation_rate}. If \code{max_features} is set, offspring are repeatedly generated until the setting is satisfied. +} +\section{Usage}{ +\preformatted{fs = FeatureSelectionGenetic$new() +} + +See \link{FeatureSelection} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{pe} (\code{[PerformanceEvaluator]}). +\item \code{tm} (\code{[Terminator]}). +\item \code{mu} (\code{integer(1)} +Size of the parent population. +\item \code{lambda} (\code{integer(1)}) +Size of the children population. +\item \code{crossover_rate} (\code{numeric(1)}) +Probability of choosing a bit from the first parent within the crossover mutation. +\item \code{mutation_rate} (\code{numeric(1)}) +Probability of flipping a feature bit, i.e. switch between selecting / deselecting a feature. +\item \code{max_features} (\code{integer(1)}) +Maximum number of features. +\item \code{strategy} (\code{character(1)}) +\code{plus} or \code{comma}. Indicates whether to use a (mu, lambda) or (mu + lambda) genetic algorithm. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{FeatureSelectionGenetic}. +\code{$get_result()} Returns best feature combination with performance. +\code{$get_path()} Returns each generation (\code{mu} feature sets and performanes) as a list entry. +The interface is described in \link{FeatureSelection}. +} + +\examples{ +task = mlr3::mlr_tasks$get("pima") +measures = mlr3::mlr_measures$mget(c("classif.acc")) +task$measures = measures +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +tm = TerminatorRuntime$new(max_time = 20, units = "secs") +fs = FeatureSelectionGenetic$new(pe = pe, tm = tm, mu = 10, lambda = 20, strategy = "plus") +fs$calculate() +fs$get_result() +fs$get_path() +} +\seealso{ +Other FeatureSelection: \code{\link{FeatureSelectionExhaustive}}, + \code{\link{FeatureSelectionRandom}}, + \code{\link{FeatureSelectionSequential}}, + \code{\link{FeatureSelection}} +} +\concept{FeatureSelection} diff --git a/man/FeatureSelectionRandom.Rd b/man/FeatureSelectionRandom.Rd new file mode 100644 index 00000000..b025bcba --- /dev/null +++ b/man/FeatureSelectionRandom.Rd @@ -0,0 +1,52 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/FeatureSelectionRandom.R +\name{FeatureSelectionRandom} +\alias{FeatureSelectionRandom} +\title{FeatureSelectionRandom} +\description{ +FeatureSelection child class to conduct random search +} +\section{Usage}{ +\preformatted{fs = FeatureSelectionRandom$new() +} + +See \link{FeatureSelection} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{pe} (\code{[PerformanceEvaluator]}). +\item \code{tm} (\code{[Terminator]}). +\item \code{max_features} (\code{integer(1)}) +Maximum number of features +\item \code{batch_size} (\code{integer(1}): +Maximum number of feature combinations to try in a batch. +Each batch is possibly executed in parallel via \code{\link[mlr3:benchmark]{mlr3::benchmark()}}. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{FeatureSelectionRandom}. +\code{$get_result()} Returns best feature combination. +The interface is described in \link{FeatureSelection}. +} + +\examples{ +task = mlr3::mlr_tasks$get("boston_housing") +learner = mlr3::mlr_learners$get("regr.rpart") +resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +pe = PerformanceEvaluator$new(task = task, learner = learner, resampling = resampling) +tm = TerminatorEvaluations$new(max_evaluations = 20) +fs = FeatureSelectionRandom$new(pe, tm, batch_size = 10, max_features = 8) +fs$calculate() +fs$get_result() +} +\seealso{ +Other FeatureSelection: \code{\link{FeatureSelectionExhaustive}}, + \code{\link{FeatureSelectionGenetic}}, + \code{\link{FeatureSelectionSequential}}, + \code{\link{FeatureSelection}} +} +\concept{FeatureSelection} diff --git a/man/FeatureSelectionSequential.Rd b/man/FeatureSelectionSequential.Rd new file mode 100644 index 00000000..7ed399d4 --- /dev/null +++ b/man/FeatureSelectionSequential.Rd @@ -0,0 +1,55 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/FeatureSelectionSequential.R +\name{FeatureSelectionSequential} +\alias{FeatureSelectionSequential} +\title{FeatureSelectionSequential} +\description{ +FeatureSelection child class to conduct sequential search. +} +\section{Usage}{ +\preformatted{fs = FeatureSelectionSequential$new() +} + +See \link{FeatureSelection} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{pe} (\code{[PerformanceEvaluator]}). +\item \code{tm} (\code{[Terminator]}). +\item \code{max_features} (\code{integer(1)}) +Maximum number of features +\item \code{strategy} (\code{character(1)}). +Forward selection \code{fsf} or backward selection \code{fsb}. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{FeatureSelectionSequential}. +\code{$get_result()} Returns selected features in each step. +The interface is described in \link{FeatureSelection}. + +Each step is possibly executed in parallel via \code{\link[mlr3:benchmark]{mlr3::benchmark()}} +} + +\examples{ +task = mlr3::mlr_tasks$get("pima") +measures = mlr3::mlr_measures$mget(c("classif.acc")) +task$measures = measures +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("cv", param_vals = list(folds = 5L)) +pe = PerformanceEvaluator$new(task, learner, resampling) +tm = TerminatorPerformanceStep$new(threshold = 0.01) +fs = FeatureSelectionSequential$new(pe, tm) +fs$calculate() +fs$get_result() +} +\seealso{ +Other FeatureSelection: \code{\link{FeatureSelectionExhaustive}}, + \code{\link{FeatureSelectionGenetic}}, + \code{\link{FeatureSelectionRandom}}, + \code{\link{FeatureSelection}} +} +\concept{FeatureSelection} diff --git a/man/PerformanceEvaluator.Rd b/man/PerformanceEvaluator.Rd new file mode 100644 index 00000000..5e6a580c --- /dev/null +++ b/man/PerformanceEvaluator.Rd @@ -0,0 +1,57 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PerformanceEvaluator.R +\name{PerformanceEvaluator} +\alias{PerformanceEvaluator} +\title{Abstract PerformanceEvaluator Class} +\description{ +\code{PerformanceEvaluator} class that implements the performance evaluation on a set of feature combinations. A pe is an object that stores all informations that are necesarry to conduct a feature selection (\code{mlr3::Task}, \code{mlr3::Learner}, \code{mlr3::Resampling}). +} +\section{Usage}{ +\preformatted{# Construction +pe = PerformanceEvaluator$new() + +# Public members +pe$task +pe$learner +pe$resampling +pe$bmr + +# Public methods +pe$eval_states(states) +pe$get_best() +} +} + +\section{Arguments}{ + +\itemize{ +\item \code{task} (\code{mlr3::Task}): +The task that we want to evaluate. +\item \code{learner} (\code{mlr3::Learner}): +The learner that we want to evaluate. +\item \code{resampling} (\code{mlr3::Resampling}): +The Resampling method that is used to evaluate the learner. +} +} + +\section{Details}{ + +\itemize{ +\item \code{$new()} creates a new object of class \link{PerformanceEvaluator}. +\item \code{$task} (\code{mlr3::Task}) the task for which the feature selection should be conducted. +\item \code{$learner} (\code{mlr3::Learner}) the algorithm for which the feature selection should be conducted. +\item \code{$resampling} (\code{mlr3::Resampling}) strategy to evaluate a feature combination +\item \code{$bmr} (\code{list}) of (\code{mlr3::BenchmarkResult}) objects. Each entry corresponds to one batch or step depending one the used feature selection method. +\item \code{$eval_states(states)} evaluates the feature combinations \code{states}. +\item \code{$get_best()} returns selected features with the best performance of each entry in \code{$bmr}. +} +} + +\examples{ +task = mlr3::mlr_tasks$get("iris") +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("holdout") +pe = PerformanceEvaluator$new(task, learner, resampling) +} +\concept{PerformanceEvaluator} +\keyword{internal} diff --git a/man/Terminator.Rd b/man/Terminator.Rd new file mode 100644 index 00000000..642ad964 --- /dev/null +++ b/man/Terminator.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/Terminator.R +\name{Terminator} +\alias{Terminator} +\title{Abstract Terminator Class} +\description{ +Abstract \code{Terminator} class that implements the main functionality each terminator must have. A terminator is an object that says when to stop the feature selection. +} +\section{Usage}{ +\preformatted{# Construction +tm = Terminator$new() + +# Public members +tm$terminated +tm$state + +# Public methods +tm$update_state(pe) +tm$update_end(pe) +} +} + +\section{Arguments}{ + +*\code{settings} (\code{list(0)}) +} + +\section{Details}{ + +\itemize{ +\item \code{$new()} creates a new object of class \link{Terminator}. +\item \code{$terminated} (\code{logical(1)}) is the termination criterion met? Updated by each call of \code{update_start()}/\code{update_end()}. +\item \code{$settings} (\code{list()}) settings that are set by the child classes to define stopping criteria. +\item \code{$state} (\code{list()}) arbitrary state of the Terminator. Gets updated with each call of \code{update_start()} and \code{update_end()}. +\item \code{$update_start()} is called in each tuning iteration before the evaluation. +\item \code{$update_end()} is called in each tuning iteration after the evaluation. +} +} + +\seealso{ +Other Terminator: \code{\link{TerminatorEvaluations}}, + \code{\link{TerminatorPerformanceStep}}, + \code{\link{TerminatorRuntime}} +} +\concept{Terminator} +\keyword{internal} diff --git a/man/TerminatorEvaluations.Rd b/man/TerminatorEvaluations.Rd new file mode 100644 index 00000000..c1538d82 --- /dev/null +++ b/man/TerminatorEvaluations.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/TerminatorEvaluations.R +\name{TerminatorEvaluations} +\alias{TerminatorEvaluations} +\title{TerminatorEvaluations} +\description{ +Terminator child class to terminate the feature selection if the model performance does not improve to a specified threshold in the next step. +} +\section{Usage}{ +\preformatted{tm = TerminatorEvaluations$new() +} + +See \link{Terminator} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{max_evaluations} (\code{integer(1)}): +Maximum number of function evaluations. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{TerminatorEvaluations}. + +The interface is described in \link{Terminator}. +} + +\examples{ +task = mlr3::mlr_tasks$get("iris") +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("holdout") +pe = PerformanceEvaluator$new(task, learner, resampling) +tm = TerminatorEvaluations$new(max_evaluations = 100) +} +\seealso{ +Other Terminator: \code{\link{TerminatorPerformanceStep}}, + \code{\link{TerminatorRuntime}}, \code{\link{Terminator}} +} +\concept{Terminator} diff --git a/man/TerminatorPerformanceStep.Rd b/man/TerminatorPerformanceStep.Rd new file mode 100644 index 00000000..d5530aad --- /dev/null +++ b/man/TerminatorPerformanceStep.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/TerminatorPerformanceStep.R +\name{TerminatorPerformanceStep} +\alias{TerminatorPerformanceStep} +\title{TerminatorPerformanceStep} +\description{ +Terminator child class to terminate the sequential feature selection if the model performance does not improve to a specified threshold in the next step. +} +\section{Usage}{ +\preformatted{tm = TerminatorPerformanceStep$new(threshold) +} + +See \link{Terminator} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{threshold} (`numeric(1)``): +The feature selection is terminated if the performance improvement between two steps is less than the threshold. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{TerminatorPerformanceStep}. + +The interface is described in \link{Terminator}. +} + +\examples{ +task = mlr3::mlr_tasks$get("iris") +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("holdout") +pe = PerformanceEvaluator$new(task, learner, resampling) +tm = TerminatorPerformanceStep$new(threshold = 0.01) +} +\seealso{ +Other Terminator: \code{\link{TerminatorEvaluations}}, + \code{\link{TerminatorRuntime}}, \code{\link{Terminator}} +} +\concept{Terminator} diff --git a/man/TerminatorRuntime.Rd b/man/TerminatorRuntime.Rd new file mode 100644 index 00000000..c18f380b --- /dev/null +++ b/man/TerminatorRuntime.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/TerminatorRuntime.R +\name{TerminatorRuntime} +\alias{TerminatorRuntime} +\title{TerminatorRuntime Class} +\description{ +Terminator child class to terminate the feature selection after a specific time. Note that the runtime is checked after each step and therefore it could happen that the final runtime is longer than the specified one. Time is measured for everything that happens between update_start and update_end. +} +\section{Usage}{ +\preformatted{tm = TerminatorRuntime$new(max_time, time_unit) +} + +See \link{Terminator} for a description of the interface. +} + +\section{Arguments}{ + +\itemize{ +\item \code{max_time} (integer(1)): +Maximal amount of time measures in \code{units}. +\item \code{units} (character(1)): +Unit used for measuring time. Possible choices are "secs", "mins", "hours", "days", and "weeks" that +are directly passed to \code{difftime()}. +} +} + +\section{Details}{ + +\code{$new()} creates a new object of class \link{TerminatorRuntime}. + +The interface is described in \link{Terminator}. +} + +\examples{ +task = mlr3::mlr_tasks$get("iris") +learner = mlr3::mlr_learners$get("classif.rpart") +resampling = mlr3::mlr_resamplings$get("holdout") +pe = PerformanceEvaluator$new(task, learner, resampling) +tm = TerminatorRuntime$new(max_time = 5, units = "secs") +} +\seealso{ +Other Terminator: \code{\link{TerminatorEvaluations}}, + \code{\link{TerminatorPerformanceStep}}, + \code{\link{Terminator}} +} +\concept{Terminator} diff --git a/man/mlr3featsel-package.Rd b/man/mlr3featsel-package.Rd index 23f82993..a659bca0 100644 --- a/man/mlr3featsel-package.Rd +++ b/man/mlr3featsel-package.Rd @@ -18,13 +18,13 @@ Useful links: } \author{ -\strong{Maintainer}: Janek Thomas \email{janek.thomas@stat.uni-muenchen.de} (0000-0003-4511-6245) +\strong{Maintainer}: Patrick Schratz \email{patrick.schratz@gmail.com} (0000-0003-0748-6624) Authors: \itemize{ - \item Patrick Schratz \email{patrick.schratz@gmail.com} (0000-0003-0748-6624) \item Michel Lang \email{michellang@gmail.com} (0000-0001-9754-0393) \item Bernd Bischl \email{bernd_bischl@gmx.net} (0000-0001-6002-6980) + \item Marc Becker \email{marcbecker@posteo.de} } }