Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Dynamically Tune the Number of Layers in a Neural Network Using mlr3torch? #285

Open
iLivius opened this issue Sep 26, 2024 · 12 comments

Comments

@iLivius
Copy link

iLivius commented Sep 26, 2024

I would like to build a neural network with a tunable number of layers. While I can tune the number of neurons per layer, I’m encountering issues when it comes to dynamically changing the number of layers.

Initially, I thought I could handle this using po("nn_block"). However, I understood that nn_block is more suited for repeating a segment of the network multiple times. My goal is to be able to tune the number of layers, from 1 to a maximum value, while maintaining the ability to tune the number of neurons in each layer.

Here’s a minimal reproducible example that demonstrates my current approach:

### Load necessary libraries
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  future,
  future.apply,
  mlr3hyperband,
  mlr3torch,
  mlr3tuning,
  mlr3verse,
  tidyverse,
  install = TRUE
)

torch::install_torch()

### Set seed for reproducibility
seed = 123
set.seed(seed)

### Use built-in iris dataset for simplicity
tab <- iris
colnames(tab)[which(names(tab) == "Species")] <- "target"
tab$target <- as.factor(tab$target)

### Initialize classification task
task <- TaskClassif$new(id = "iris", backend = tab, target = "target")

### Function to generate neural network layers with tunable parameters
generate_layers <- function(max_layers) {
  layers <- list()
  for (i in 1:max_layers) {
    layers[[i]] <- po("nn_linear", out_features = to_tune(p_int(32, 512)), id = paste0("linear_", i)) %>>%
      po("nn_dropout", p = to_tune(p_dbl(0.3, 0.5)), id = paste0("dropout_", i)) %>>%
      po("nn_relu", id = paste0("relu_", i))
  }
  return(layers)
}

### Create a list of layers
max_layers <- 3
layers <- generate_layers(max_layers)

### Concatenate the layers into the neural network architecture
architecture <- po("torch_ingress_num")
for (i in 1:max_layers) {
  architecture <- architecture %>>% layers[[i]]
}

### Add head, loss, optimizer, and configurations to the architecture
architecture <- architecture %>>%
  po("nn_head") %>>%
  po("torch_loss", t_loss("cross_entropy")) %>>%
  po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
  po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)

### Convert the architecture to a learner object
learner <- as_learner(architecture)
learner$id <- "iris"
learner$predict_type <- "prob"
learner$param_set$values$torch_model_classif.seed <- seed

### Define the tuning terminator and instance for optimization
terminator <- trm("evals", n_evals = 5)
resampling <- rsmp("cv", folds = 5)

instance <- ti(
  task = task, learner = learner, resampling = resampling,
  measure = msr("classif.bacc"), terminator = terminator
)

### Optimize using Hyperband tuner
tuner <- tnr("hyperband", eta = 3, repetitions = 1)
num_threads = 8
future::plan(multisession, workers = num_threads)
tuner$optimize(instance)

stackoverflow

@sebffischer
Copy link
Sponsor Member

sebffischer commented Sep 30, 2024

Hey @iLivius and thanks for your interest in the package!
I think there is currently no obvious way to achieve what you want, so thanks for raising this usecase.

I have a suggestion that might work for you
Let's say your individual blocks are simply po("nn_linear") %>>% po("nn_relu") and you want at most n of those.
For each block you want to tune the out_features parameter individually.

You can indirectly tune over the number of layers by defining 10 blocks (with their tuning parameters) and wrapping each of them in a PipeOpBranch. Each of these branches can either add the block to the architecture or do nothing (PipeOpNop). You can then tune the out_features parameter of each block individually together with the selection parameter of the PipeOpBranchs. The selection parameter determines which of the branches to take, so in our case it can be interpreted as either using a block or not.

Below is some pseudocode that illustrates the idea.

  po("branch_1", list(po("nn_linear_1", out_features = to_tune(...)) %>>% po("nn_relu_1"), po("nop_1")), selection = to_tune()) %>>%
  po("branch_2", list(po("nn_linear_2", out_features = to_tune(...)) %>>% po("nn_relu_2"), po("nop_2"), selection = to_tune()) %>>%
  ...
  po("branch_<n>", list(po("nn_linear_<n>", out_features = to_tune(...)) %>>% po("nn_relu_<n>"), po("nop_<n>"), selection = to_tune())) %>>%
  po("nn_head") %>>%
  ...

Let me know whether that works for you!

@iLivius
Copy link
Author

iLivius commented Sep 30, 2024

Hi @sebffischer,

Thank you for your prompt and helpful response. The mlr3torch project is very interesting, and I consider it a powerful asset and add-on to the mlr3 ecosystem. I am very grateful to be able to use it in my work.

Following your suggestion, I have tried to integrate your guidance into the following architecture, which seems to work for me.

architecture <- po("torch_ingress_num") %>>%

    # First branch
    po("branch", options = c("block_1", "nop_1"), id = "branch_1", selection = to_tune()) %>>%
    gunion(list(
        block_1 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_1") %>>% 
                  po("nn_relu", id = "relu_1") %>>%
                  po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_1"),
        nop_1 = po("nop", id = "nop_1")
    )) %>>% po("unbranch", id = "unbranch_1") %>>%

    # Second branch
    po("branch", options = c("block_2", "nop_2"), id = "branch_2", selection = to_tune()) %>>%
    gunion(list(
        block_2 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_2") %>>% 
                  po("nn_relu", id = "relu_2") %>>%
                  po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_2"),
        nop_2 = po("nop", id = "nop_2")
    )) %>>% po("unbranch", id = "unbranch_2") %>>%

    # Third branch
    po("branch", options = c("block_3", "nop_3"), id = "branch_3", selection = to_tune()) %>>%
    gunion(list(
        block_3 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_3") %>>% 
                  po("nn_relu", id = "relu_3") %>>%
                  po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_3"),
        nop_3 = po("nop", id = "nop_3")
    )) %>>% po("unbranch", id = "unbranch_3") %>>%

    # Rest of the network
    po("nn_head") %>>%
    po("torch_loss", t_loss("cross_entropy")) %>>%
    po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
    po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)

Please let me know if this implementation aligns with your vision and if it is acceptable in its current form?

@sebffischer
Copy link
Sponsor Member

yes, this is what i had in mind!

@iLivius
Copy link
Author

iLivius commented Oct 1, 2024

Hi @sebffischer,

Thank you for your positive feedback. The approach for defining the architecture is working well so far.

However, the tuning results show the nop blocks along with seemingly tuned parameters, including the number of neurons and dropout values. Please, see below the bbotk output based on the iris dataset:

INFO  [11:35:55.338] [bbotk] Result:
INFO  [11:35:55.338] [bbotk]  branch_1.selection block_1.linear_1.out_features block_1.dropout_1.p branch_2.selection block_2.linear_2.out_features block_2.dropout_2.p branch_3.selection block_3.linear_3.out_features block_3.dropout_3.p
INFO  [11:35:55.338] [bbotk]              <char>                         <int>               <num>             <char>                         <int>               <num>             <char>                         <int>               <num>
INFO  [11:35:55.338] [bbotk]             block_1                           292           0.2882727              nop_2                           315           0.1214148              nop_3                           158           0.3017759

I’m wondering if the nop blocks are simply placeholders and the parameters shown are not used, or if they might have been activated somehow. Would you expect this behavior?

Thank you in advance for your insights!

@sebffischer
Copy link
Sponsor Member

So if I understand you correctly, your question is why e.g. block_2.linear_2.out_features is present even though the nop_2 was selected in the second layer?

This parameter (block_2.linear_2.out_features) definitely has no influence when it is not selected by the PipeOpBranch.

I think the problem is that the graph does not understand that the block_2.linear_2.out_features only has an effect when the preceding branching pipeop selects block_2.
This means that when the tuner samples configurations, it will independently sample the value for branch_2.selection and block_2.linear.out_features, even though it does not need a value when branch_2.selection selects nop_2.
This means that there will still be parameter values given that have no effect.

Maybe we want this on the issuetracker of mlr3pipelines @mb706 ?

@iLivius
Copy link
Author

iLivius commented Oct 1, 2024

Thank you for the explanation, @sebffischer!

@sebffischer
Copy link
Sponsor Member

what i noticed / forgot to mention is that the approach that i have suggested here uses a non-uniform distribution for the number of layers which might not be what you want.

To solve this you could write down your search space using ps() (no to_tune()), and introduce a custom parameter for the number of layers as well as a parameter transformation that switches on the first m blocks depending on the value of this custom parameter. This parameter transformation (from custom n_layers to block_1.selection, … block_n.selectionis possible via the .extra_trafo argument of ps, also see https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html#sec-tune-trafo

sorry that i did not mention this before

@mb706
Copy link
Contributor

mb706 commented Oct 2, 2024

Maybe we want this on the issuetracker of mlr3pipelines @mb706 ?

would be covered by this issue I guess? mlr-org/mlr3pipelines#101

@mb706
Copy link
Contributor

mb706 commented Oct 2, 2024

@iLivius You could also build something like this

image

library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")


block <- po("nn_linear", out_features = to_tune(p_int(32, 512))) %>>%
   po("nn_relu") %>>%
   po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)))

numblocks <- 5

graph = NULL
for (i in seq_len(numblocks - 1)) {
  unbranch_id <- paste0("unbranch_", i)
  graph <- gunion(list(
      po(unbranch_id, 2),
      graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", i))
  ), in_place = TRUE)
  graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}

graph = po("branch", numblocks) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks))

The PipeOpBranch output selection is an integer-valued hyperparameter, with output1 going to unbranch_4 (the last one, only the _5-operations are performed), output2 going to unbranch_3 (both _4 and _5 operations are performed) etc. Maybe play around with this a little to see if it actually does what you want; you could inspect the state of the torch_model_classif to make sure.

graph$edges
#>           src_id src_channel       dst_id dst_channel
#>           <char>      <char>       <char>      <char>
#>  1:  nn_linear_1      output    nn_relu_1       input
#>  2:    nn_relu_1      output nn_dropout_1       input
#>  3: nn_dropout_1      output   unbranch_1      input2
#>  4:  nn_linear_2      output    nn_relu_2       input
#>  5:    nn_relu_2      output nn_dropout_2       input
#>  6:   unbranch_1      output  nn_linear_2       input
#>  7: nn_dropout_2      output   unbranch_2      input2
#>  8:  nn_linear_3      output    nn_relu_3       input
#>  9:    nn_relu_3      output nn_dropout_3       input
#> 10:   unbranch_2      output  nn_linear_3       input
#> 11: nn_dropout_3      output   unbranch_3      input2
#> 12:  nn_linear_4      output    nn_relu_4       input
#> 13:    nn_relu_4      output nn_dropout_4       input
#> 14:   unbranch_3      output  nn_linear_4       input
#> 15: nn_dropout_4      output   unbranch_4      input2
#> 16:       branch     output1   unbranch_4      input1
#> 17:       branch     output2   unbranch_3      input1
#> 18:       branch     output3   unbranch_2      input1
#> 19:       branch     output4   unbranch_1      input1
#> 20:       branch     output5  nn_linear_1       input
#> 21:  nn_linear_5      output    nn_relu_5       input
#> 22:    nn_relu_5      output nn_dropout_5       input
#> 23:   unbranch_4      output  nn_linear_5       input
#>           src_id src_channel       dst_id dst_channel

You still need to set hyperparameter dependencies manually, unfortunately. You can use depends inside the to_tune() declaration and would ideally do it inside the loop:

library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")


block <- po("nn_linear") %>>%
   po("nn_relu") %>>%
   po("nn_dropout")

numblocks <- 5

graph = NULL
for (i in seq_len(numblocks - 1)) {
  unbranch_id <- paste0("unbranch_", i)
  curblock <- block$clone(deep = TRUE)
  curblock$param_set$set_values(
    nn_linear.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% (numblocks - i + 1):numblocks)),
    nn_dropout.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% (numblocks - i + 1):numblocks))
  )
  curblock$update_ids(postfix = paste0("_", i))
  graph <- gunion(list(
      po(unbranch_id, 2),
      graph %>>!% curblock
  ), in_place = TRUE)
  graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}

graph = po("branch", numblocks, selection = to_tune()) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks))

which is basically the same as doing

graph$param_set$set_values(
  branch.selection = to_tune(),
  nn_linear_1.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% 5)),
  nn_dropout_1.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 5)),

  nn_linear_2.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% 4:5)),
  nn_dropout_2.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 4:5)),

  nn_linear_3.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% 3:5)),
  nn_dropout_3.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 3:5)),

  nn_linear_4.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% 2:5)),
  nn_dropout_4.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 2:5)),

  nn_linear_5.out_features = to_tune(p_int(32, 512,  depends = branch.selection %in% 1:5)),
  nn_dropout_5.p           = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 1:5))
)

You can run

generate_design_random(graph$param_set$search_space(), 3)$transpose()

to see a few sample configurations that this generates, to verify that only the relevant hyperparameters are set.

@mb706
Copy link
Contributor

mb706 commented Oct 2, 2024

(If you want to have fewer hyperparameters, e.g. have a single dropout.p that controls all the dropout values, then I think you can't do that with to_tune(); instead, you have to create the search space manually with ps() and add an .extra_trafo that populates nn_dropout_1.p, nn_dropout_2.p, ... nn_dropout_5.p based on that. You can use c() to combine ParamSets, though, so you can use to_tune() for the parts of your search space that don't need this treatment, and give search_space = c(graph$param_set$search_space(), ps(<custom additional search space>) to the tuner.)

@iLivius
Copy link
Author

iLivius commented Oct 3, 2024

Thank you both for taking the game to the next level!

I have tried to implement the solution suggested by @sebffischer, using ps() and .extra_trafo but ended up in a cul-de-sac as I need to explicitly define the number of active layers to make the code running and therefore they are not tunable. See here:

# Define the maximum number of layers
max_layers <- 5

# Define the search space
search_space <- ps(
  n_layers = p_int(1, max_layers),
  .extra_trafo = function(x, param_set) {
    for (i in 1:max_layers) {
      if (i <= x$n_layers) {
        x[[paste0("block_", i, "_selection")]] <- "on"  # Activate the layer
      } else {
        x[[paste0("block_", i, "_selection")]] <- "off"  # Deactivate the layer
      }
    }
    return(x)
  }
)

# Function to generate neural network layers with tunable parameters
generate_branch <- function(layer_num, block_selection) {
  if (block_selection == "on") {
    branch <- po("nn_linear", out_features = to_tune(p_int(32, 512)), id = paste0("linear_", layer_num)) %>>%
      po("nn_relu", id = paste0("relu_", layer_num)) %>>%
      po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = paste0("dropout_", layer_num))
  } else {
    branch <- po("nop", id = paste0("nop_", layer_num))
  }
  
  return(branch)
}

# Function to build the entire architecture based on the number of layers and block selections
generate_architecture <- function(n_layers, block_selections) {
  architecture <- po("torch_ingress_num")
  
  # Loop over the layers and add them to the architecture based on selections
  for (i in 1:max_layers) {
    architecture <- architecture %>>% generate_branch(i, block_selections[[paste0("block_", i, "_selection")]])
  }
  
  # Add the rest of the network
  architecture <- architecture %>>%
    po("nn_head") %>>%
    po("torch_loss", t_loss("cross_entropy")) %>>%
    po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
    po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)
  
  return(architecture)
}

# Apply the transformation to extract the tuned values
trafo_params <- search_space$trafo(list(
  n_layers = 3  # Example value for the number of layers
))

# Generate the architecture based on the tuned number of layers and block selections
architecture <- generate_architecture(trafo_params$n_layers, trafo_params)

This is how the tuned instance looks like, based on the iris dataset:

INFO  [09:25:35.375] [bbotk] Finished optimizing after 8 evaluation(s)
INFO  [09:25:35.377] [bbotk] Result:
INFO  [09:25:35.383] [bbotk]  linear_1.out_features dropout_1.p linear_2.out_features dropout_2.p linear_3.out_features dropout_3.p torch_optimizer.lr torch_model_classif.epochs learner_param_vals  x_domain classif.bacc
INFO  [09:25:35.383] [bbotk]                  <int>       <num>                 <int>       <num>                 <int>       <num>              <num>                      <num>             <list>    <list>        <num>
INFO  [09:25:35.383] [bbotk]                    283   0.2134847                    43   0.3523094                   510   0.4388965        0.002550026                        125         <list[17]> <list[8]>    0.9851852
   linear_1.out_features dropout_1.p linear_2.out_features dropout_2.p linear_3.out_features dropout_3.p torch_optimizer.lr torch_model_classif.epochs learner_param_vals  x_domain classif.bacc
                   <int>       <num>                 <int>       <num>                 <int>       <num>              <num>                      <num>             <list>    <list>        <num>
1:                   283   0.2134847                    43   0.3523094                   510   0.4388965        0.002550026                        125         <list[17]> <list[8]>    0.9851852

Here below I tried to wrap up the solution provided by @mb706 in a reproducible example, please let me know if I nailed it:

library("future")
library("mlr3hyperband")
library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")

### Use built-in iris dataset for simplicity
tab <- iris
colnames(tab)[which(names(tab) == "Species")] <- "target"
tab$target <- as.factor(tab$target)

### Initialize classification task
task <- TaskClassif$new(id = "iris", backend = tab, target = "target")

# Build the graph object with tunable layers
block <- po("nn_linear", out_features = to_tune(p_int(32, 512))) %>>%
  po("nn_relu") %>>%
  po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)))

numblocks <- 5

graph = NULL
for (i in seq_len(numblocks - 1)) {
  unbranch_id <- paste0("unbranch_", i)
  graph <- gunion(list(
    po(unbranch_id, 2),
    graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", i))
  ), in_place = TRUE)
  graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}

graph = po("branch", numblocks) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks))

# **Add torch_ingress_num to preprocess the input data**
graph <- po("torch_ingress_num") %>>% graph

# Add the final components for classification
graph <- graph %>>%
  po("nn_head") %>>%
  po("torch_loss", t_loss("cross_entropy")) %>>%
  po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
  po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)

# Convert the architecture to a learner object
learner <- as_learner(graph)
learner$id <- "iris"
learner$predict_type <- "prob"

# Set resampling
kfold = 5
inner_resampling <- rsmp("cv", folds = kfold)

# Define the tuning terminator and instance for optimization
terminator <- trm("evals", n_evals = 5)
instance <- ti(
  task = task, learner = learner, resampling = inner_resampling,
  measure = msr("classif.bacc"), terminator = terminator
)

# Optimize using Hyperband tuner
future::plan(multisession, workers = 24)
tuner <- tnr("hyperband", eta = 2, repetitions = 1)
tuner$optimize(instance)

The tuned instance looks like the following:

INFO  [09:50:09.572] [bbotk] Finished optimizing after 8 evaluation(s)
INFO  [09:50:09.574] [bbotk] Result:
INFO  [09:50:09.579] [bbotk]  nn_linear_1.out_features nn_dropout_1.p nn_linear_2.out_features nn_dropout_2.p nn_linear_3.out_features nn_dropout_3.p nn_linear_4.out_features nn_dropout_4.p nn_linear_5.out_features nn_dropout_5.p
INFO  [09:50:09.579] [bbotk]                     <int>          <num>                    <int>          <num>                    <int>          <num>                    <int>          <num>                    <int>          <num>
INFO  [09:50:09.579] [bbotk]                        93      0.4843554                      453      0.3147442                       78      0.3420649                      329       0.387233                      494      0.3151132
INFO  [09:50:09.579] [bbotk]  torch_optimizer.lr torch_model_classif.epochs learner_param_vals   x_domain classif.bacc
INFO  [09:50:09.579] [bbotk]               <num>                      <num>             <list>     <list>        <num>
INFO  [09:50:09.579] [bbotk]         0.002098667                        125         <list[22]> <list[12]>    0.9833333
   nn_linear_1.out_features nn_dropout_1.p nn_linear_2.out_features nn_dropout_2.p nn_linear_3.out_features nn_dropout_3.p nn_linear_4.out_features nn_dropout_4.p nn_linear_5.out_features nn_dropout_5.p
                      <int>          <num>                    <int>          <num>                    <int>          <num>                    <int>          <num>                    <int>          <num>
1:                       93      0.4843554                      453      0.3147442                       78      0.3420649                      329       0.387233                      494      0.3151132
   torch_optimizer.lr torch_model_classif.epochs learner_param_vals   x_domain classif.bacc
                <num>                      <num>             <list>     <list>        <num>
1:        0.002098667                        125         <list[22]> <list[12]>    0.9833333

@sebffischer
Copy link
Sponsor Member

but ended up in a cul-de-sac as I need to explicitly define the number of active layers to make the code running and therefore they are not tunable. See here:

If you have such a custom parameter, the whole search space needs to be defined as a ps()and passed to e.g. the search_space argument of the tune() function (https://mlr3tuning.mlr-org.com/reference/tune.html?q=tune%20#null).
This means you can either set parameters to to_tune() tokens or define a search space using ps() and pass it as the search_space argument of e.g. the tune() function

This means that the search space would look something like:

search_space <- ps(
  branch_1.nn_linear_1.out_features = p_int(100, 200),
  branch_1.nn_dropout_1.p = p_dbl(0, 1),
  ...
  branch_n.nn_linear_n.out_features = p_int(100, 200),
  branch_n.nn_dropout_n.p = p_dbl(0, 1),
  ...
  n_layers = p_int(1, max_layers),
  .extra_trafo = function(x, param_set) {
    for (i in 1:max_layers) {
      if (i <= x$n_layers) {
        x[[paste0("block_", i, "_selection")]] <- "on"  # Activate the layer
      } else {
        x[[paste0("block_", i, "_selection")]] <- "off"  # Deactivate the layer
      }
    }
    return(x)
  }
)

It is a bit more cumbersome to write down the search spaces like this, but it gives you also more flexibility.

(Also on an unrelated note, mlr3pipelines now supports the syntax po("nn_linear_1") to be equivalent to po("nn_linear", id = "nn_linear_1"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants