-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add chapter on validation and internal tuning #829
base: main
Are you sure you want to change the base?
Conversation
minor wording suggestions (but can also be left out):
otherwise, great job! |
I'm trying to add early stopping to the XGBoost learner in my benchmark based on this chapter, and I'm not sure whether I just misunderstand a few things or maybe the chapter could be extended in that regard. My problem is that I'm using an One of my naive attempts below: library(mlr3)
library(mlr3tuning)
library(mlr3pipelines)
library(mlr3proba)
library(mlr3extralearners)
task = tsk("lung")
xgb_base = lrn("surv.xgboost.cox",
early_stopping_rounds = 10,
nrounds = to_tune(upper = 1000, internal = TRUE),
tree_method = "hist", booster = "gbtree")
xgb_glearn = po("fixfactors") %>>%
po("imputesample", affect_columns = selector_type("factor")) %>>%
po("encode", method = "treatment") %>>%
po("removeconstants") %>>%
xgb_base |>
as_learner()
set_validate(xgb_glearn, "test")
xgb_autotuner = auto_tuner(
learner = xgb_glearn,
search_space = ps(
surv.xgboost.cox.eta = p_dbl(0.001, 1, logscale = TRUE),
surv.xgboost.cox.max_depth = p_int(1, 20),
surv.xgboost.cox.subsample = p_dbl(0, 1),
surv.xgboost.cox.colsample_bytree = p_dbl(0, 1),
surv.xgboost.cox.grow_policy = p_fct(c("depthwise", "lossguide"))
),
resampling = rsmp("cv", folds = 3),
measure = msr("surv.cindex"),
terminator = trm("evals", n_evals = 20, k = 0),
tuner = tnr("random_search")
) Resulting in the not unexpected error
I'm not sure how to indicate to my
|
my previous answers were bad and thanks for making me aware that this is not documented yet. library(mlr3)
library(mlr3tuning)
#> Loading required package: paradox
library(mlr3pipelines)
library(mlr3proba)
library(mlr3extralearners)
task = tsk("lung")
xgb_base = lrn("surv.xgboost.cox",
early_stopping_rounds = 10,
tree_method = "hist", booster = "gbtree")
xgb_glearn = po("fixfactors") %>>%
po("imputesample", affect_columns = selector_type("factor")) %>>%
po("encode", method = "treatment") %>>%
po("removeconstants") %>>%
xgb_base |>
as_learner()
set_validate(xgb_glearn, "test")
xgb_autotuner = auto_tuner(
learner = xgb_glearn,
search_space = ps(
surv.xgboost.cox.eta = p_dbl(0.001, 1, logscale = TRUE),
surv.xgboost.cox.nrounds = p_int(upper = 1000, tags = "internal_tuning", aggr = function(x) as.integer(mean(unlist(x)))),
surv.xgboost.cox.max_depth = p_int(1, 20),
surv.xgboost.cox.subsample = p_dbl(0, 1),
surv.xgboost.cox.colsample_bytree = p_dbl(0, 1),
surv.xgboost.cox.grow_policy = p_fct(c("depthwise", "lossguide"))
),
resampling = rsmp("cv", folds = 3),
measure = msr("surv.cindex"),
terminator = trm("evals", n_evals = 20, k = 0),
tuner = tnr("random_search")
)
xgb_autotuner$train(task) Created on 2024-09-05 with reprex v2.1.1 |
Maybe we should include the internal tune tokens in the tuning spaces @be-marc? |
Great, thanks! Is there something I can to to keep the library(mlr3)
library(mlr3pipelines)
library(mlr3proba)
library(mlr3extralearners)
task = tsk("lung")
xgb_base = lrn("surv.xgboost.cox",
early_stopping_rounds = 100,
max_depth = 3, eta = .01,
tree_method = "hist", booster = "gbtree")
xgb_glearn = po("fixfactors") %>>%
po("imputesample", affect_columns = selector_type("factor")) %>>%
po("encode", method = "treatment") %>>%
po("removeconstants") %>>%
xgb_base |>
as_learner()
set_validate(xgb_glearn, "test")
rr = resample(
task = task,
learner = xgb_glearn,
resampling = rsmp("cv", folds = 3),
store_models = TRUE
)
#> INFO [11:45:40.575] [mlr3] Applying learner 'fixfactors.imputesample.encode.removeconstants.surv.xgboost.cox' on task 'lung' (iter 1/3)
#> INFO [11:45:40.786] [mlr3] Applying learner 'fixfactors.imputesample.encode.removeconstants.surv.xgboost.cox' on task 'lung' (iter 2/3)
#> INFO [11:45:40.921] [mlr3] Applying learner 'fixfactors.imputesample.encode.removeconstants.surv.xgboost.cox' on task 'lung' (iter 3/3)
rr$learners[[1]]$model$surv.xgboost.cox$model$model$evaluation_log |>
ggplot2::ggplot(ggplot2::aes(x = iter, y = test_cox_nloglik)) +
ggplot2::geom_line() +
ggplot2::theme_minimal() Created on 2024-09-06 with reprex v2.1.1 I was hoping to sanity check the internal tuning using the evaluation log when using the
...I found |
you are accessing the final model fit but in the final model fit there is no early stopping. |
Also, are you aware that xgboost will use the optimal model during prediction and NOT the final model? --> You should be less worried about a too high patience parameter (except for increased runtime I guess). |
Ah right, of course, makes sense 😅
I was banking on that -- my main concern is to avoid overfitting in the benchmark, and saving some compute would be a bonus but not a must. Thanks for the clarifications! |
TODOs:
in_tune_fn
).* mlr3learners: BREAKING_CHANGE(xgboost): stricter checks on eval_metric mlr3learners#306
* mlr3extralearners: stricter metric checks when using internal tuning mlr3extralearners#376
set_internal_tuning()
: created an issue in mlr3$divide()
:predict_sets = NULL
when one tunes internal valid score