-
- - - -
-

QPTUNA CLI Tutorial

-
-

This tutorial

-

This tutorial is intended to provide a new user with the necessary background to start using Qptuna through a command line interface (CLI).

-

A separate tutorial is available describing the use of the Qptuna GUI.

-
-
-

Background

-

QPTUNA is a python package to automate the model building process for REINVENT. These models can use a variety of algorithms to fit to your input data and most of them have one or more so-called hyper-parameters (e.g. the maximum number of trees using a Random Forest or the C parameter in SVRs, controlling the influence of every support vector).

-

For both regression and classification tasks, QPTUNA allows you to specify input data for which the optimal hyper-parameters and a model can obtained automatically. If you want to get an idea on how the package is structured, read on otherwise you might want to skip it and The following examples should give you an idea how.

-
-

The three-step process

-

Qptuna is structured around three steps: 1. Hyperparameter Optimization: Train many models with different parameters using Optuna. Only the training dataset is used here. Training is usually done with cross-validation. 2. Build (Training): Pick the best model from Optimization, re-train it without cross-validation, and optionally evaluate its performance on the test dataset. 3. Prod-build (or build merged): Re-train the best-performing model on the merged training and test datasets. -This step has a drawback that there is no data left to evaluate the resulting model, but it has a big benefit that this final model is trained on the all available data.

-
-
-
-

Preparation

-

To use QPTUNA from Jupyter Notebook, install it with:

-
python -m pip install http://pages.scp.astrazeneca.net/mai/qptuna/releases/Qptuna_latest.tar.gz
-
-
-
-

Regression example

-

This is a toy example of training a model that will predict molecular weight for a subset of DRD2 molecules. This example was chosen so that the whole run would take less than a minute.

-

Training dataset is a CSV file. It has SMILES strings in a column named “canonical”. It has the value that we will try to predict in column “molwt”.

-

This example has train and test (holdout) dataset ready. If you have single dataset and would like QPTUNA to split it into train and test (holdout) datasets, see the next section.

-

Here are a few lines from the input file:

-
-
[1]:
-
-
-
!head  ../tests/data/DRD2/subset-50/train.csv
-
-
-
-
-
-
-
-
-canonical,activity,molwt,molwt_gt_330
-Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,0,387.233,True
-O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,0,387.4360000000001,True
-COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,380.36000000000007,True
-CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,0,312.39400000000006,False
-CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,0,349.4340000000001,True
-Brc1ccccc1OCCCOc1cccc2cccnc12,0,358.235,True
-CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,0,310.39700000000005,False
-CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,0,407.4700000000001,True
-COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,0,454.54800000000023,True
-
-
-
-
-

Create configuration

-

QPTUNA configuration can be read from a JSON file or created in Python. Here we create it in Python.

-
-
[2]:
-
-
-
import sys
-sys.path.append("..")
-
-
-
-
-
[3]:
-
-
-
# Start with the imports.
-import sklearn
-from optunaz.three_step_opt_build_merge import (
-    optimize,
-    buildconfig_best,
-    build_best,
-    build_merged,
-)
-from optunaz.config import ModelMode, OptimizationDirection
-from optunaz.config.optconfig import (
-    OptimizationConfig,
-    SVR,
-    RandomForestRegressor,
-    Ridge,
-    Lasso,
-    PLSRegression,
-    KNeighborsRegressor
-)
-from optunaz.datareader import Dataset
-from optunaz.descriptors import ECFP, MACCS_keys, ECFP_counts, PathFP
-
-
-
-
-
[4]:
-
-
-
# Prepare hyperparameter optimization configuration.
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",  # Typical names are "SMILES" and "smiles".
-        response_column="molwt",  # Often a specific name (like here), or just "activity".
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
-        test_dataset_file="../tests/data/DRD2/subset-50/test.csv"  # Hidden during optimization.
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-        PathFP.new()
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-        KNeighborsRegressor.new()
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,  # Total number of trials.
-        n_startup_trials=50,  # Number of startup ("random") trials.
-        random_seed=42, # Seed for reproducability
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-
-
-
-
-

Run optimization

-
-
[5]:
-
-
-
# Setup basic logging.
-import logging
-from importlib import reload
-reload(logging)
-logging.basicConfig(level=logging.INFO)
-logging.getLogger("train").disabled = True # Prevent ChemProp from logging
-import numpy as np
-np.seterr(divide="ignore")
-import warnings
-warnings.filterwarnings("ignore", category=FutureWarning)
-warnings.filterwarnings("ignore", category=RuntimeWarning)
-
-import tqdm
-from functools import partialmethod, partial
-tqdm.__init__ = partialmethod(tqdm.__init__, disable=True) # Prevent tqdm in ChemProp from flooding log
-
-# Avoid decpreciated warnings from packages etc
-import warnings
-warnings.simplefilter("ignore")
-def warn(*args, **kwargs):
-    pass
-warnings.warn = warn
-
-
-
-
-
[6]:
-
-
-
# Run Optuna Study.
-study = optimize(config, study_name="my_study")
-# Optuna will log it's progress to sys.stderr
-# (usually rendered in red in Jupyter Notebooks).
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:03,367] A new study created in memory with name: my_study
-[I 2024-08-23 10:51:03,440] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:51:03,760] Trial 0 finished with value: -3594.2228073972638 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 3, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 0 with value: -3594.2228073972638.
-[I 2024-08-23 10:51:03,915] Trial 1 finished with value: -5029.734616310275 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.039054412752107935, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.1242780840717016e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3594.2228073972638.
-[I 2024-08-23 10:51:04,195] Trial 2 finished with value: -4242.092751193529 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -3594.2228073972638.
-[I 2024-08-23 10:51:04,341] Trial 3 finished with value: -3393.577488426015 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06877704223043679, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -3393.577488426015.
-[I 2024-08-23 10:51:04,506] Trial 4 finished with value: -427.45250420148204 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,577] Trial 5 finished with value: -3387.245629616474 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,646] Trial 6 finished with value: -5029.734620250011 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3661540064603184, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.1799882524170321, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,711] Trial 7 finished with value: -9650.026568221794 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,727] Trial 8 finished with value: -5437.151635569594 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.05083825348819038, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,858] Trial 9 finished with value: -2669.8534551928174 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 4, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:04,876] Trial 10 finished with value: -4341.586120152291 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7921825998469865, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,098] Trial 11 finished with value: -5514.404088878843 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,210] Trial 12 finished with value: -5431.634989239215 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,228] Trial 13 finished with value: -3530.5496618991288 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,245] Trial 14 finished with value: -3497.6833185436312 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,263] Trial 15 finished with value: -4382.16208862162 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,279] Trial 16 finished with value: -5029.734620031822 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002825619931800395, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.309885135051862e-09, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,296] Trial 17 finished with value: -679.3109044887755 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.16827992999009767, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,370] Trial 18 finished with value: -2550.114129318373 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,386] Trial 19 finished with value: -4847.085792360169 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.735431606118867, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,404] Trial 20 finished with value: -5029.268760278916 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0014840820994557746, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04671166881768783, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,518] Trial 21 finished with value: -4783.047015479679 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 15, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,536] Trial 22 finished with value: -3905.0064899852296 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,604] Trial 23 finished with value: -4030.4577379164707 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 11, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,672] Trial 24 finished with value: -4681.602145939593 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 4, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,690] Trial 25 finished with value: -4398.544034028325 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6452011213193165, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,756] Trial 26 finished with value: -4454.143979828406 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,760] Trial 27 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:05,765] Trial 28 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:05,833] Trial 29 finished with value: -4397.330360587512 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 8, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,838] Trial 30 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:05,880] Trial 31 finished with value: -2602.7561184287083 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 6, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:05,897] Trial 32 finished with value: -5267.388279961089 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2015560027548533, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -427.45250420148204.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-3530.5496618991288]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-3530.5496618991288]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-3387.245629616474]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:06,014] Trial 33 finished with value: -4863.5817607510535 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 23, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -427.45250420148204.
-[I 2024-08-23 10:51:06,032] Trial 34 finished with value: -388.96473594016675 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.5528259214839937, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,076] Trial 35 finished with value: -5539.698232987626 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6400992020612235, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,107] Trial 36 finished with value: -5180.5533034102455 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8968910439566395, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,125] Trial 37 finished with value: -4989.929984864281 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04458440839692226, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.492108041427977, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,130] Trial 38 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:06,173] Trial 39 finished with value: -6528.215066535042 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.16700143339733753, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,237] Trial 40 finished with value: -4168.7955967552625 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,292] Trial 41 finished with value: -6177.060727800014 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 1, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 8, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-4397.330360587512]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:06,358] Trial 42 finished with value: -3963.9069546583414 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 21, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,378] Trial 43 finished with value: -5029.6805334166565 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.013186009009851564, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.001008958590140135, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,421] Trial 44 finished with value: -9300.86840721566 and parameters: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 9, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,441] Trial 45 finished with value: -5029.734620250011 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 83.87968210939489, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.382674443425525e-09, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,447] Trial 46 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:06,454] Trial 47 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:06,460] Trial 48 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:06,525] Trial 49 finished with value: -3660.9359502556003 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 2, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "PathFP", "parameters": {"maxPath": 3, "fpSize": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,550] Trial 50 finished with value: -688.5244070398325 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.5267860995545326, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,572] Trial 51 finished with value: -690.6494438072099 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8458809314722497, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,597] Trial 52 finished with value: -691.1197058420935 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9167866889210807, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,622] Trial 53 finished with value: -691.3111710449325 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.945685900574672, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,646] Trial 54 finished with value: -690.9665592812149 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8936837761725833, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 9, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}, return [-9300.86840721566]
-Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 7, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2550.114129318373]
-Duplicated trial: {'algorithm_name': 'KNeighborsRegressor', 'KNeighborsRegressor_algorithm_hash': '1709d2c39117ae29f6c9debe7241287b', 'metric__1709d2c39117ae29f6c9debe7241287b': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__1709d2c39117ae29f6c9debe7241287b': 6, 'weights__1709d2c39117ae29f6c9debe7241287b': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2602.7561184287083]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:06,672] Trial 55 finished with value: -688.4682747008223 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.5183865279530455, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,695] Trial 56 finished with value: -687.5230947231512 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.3771771681361766, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,718] Trial 57 finished with value: -687.4503442069594 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.3663259819415374, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,743] Trial 58 finished with value: -686.9553733616618 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2925652230875628, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 34 with value: -388.96473594016675.
-[I 2024-08-23 10:51:06,766] Trial 59 finished with value: -370.2038330506566 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3962903248948568, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,790] Trial 60 finished with value: -377.25988028857313 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.45237513161879, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,814] Trial 61 finished with value: -379.8933285317637 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4741161933311207, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,838] Trial 62 finished with value: -374.50897467366013 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4290962207409417, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,864] Trial 63 finished with value: -376.5588572940058 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4464295711264585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,890] Trial 64 finished with value: -379.237448916406 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4687500034684213, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,914] Trial 65 finished with value: -375.7474776359051 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4395650011783436, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 59 with value: -370.2038330506566.
-[I 2024-08-23 10:51:06,941] Trial 66 finished with value: -362.2834906299732 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3326755354190032, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 66 with value: -362.2834906299732.
-[I 2024-08-23 10:51:06,966] Trial 67 finished with value: -357.3474880122588 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2887212943233457, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -357.3474880122588.
-[I 2024-08-23 10:51:06,993] Trial 68 finished with value: -354.279045046449 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2577677164664005, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 68 with value: -354.279045046449.
-[I 2024-08-23 10:51:07,031] Trial 69 finished with value: -347.36894395697703 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1672928587680225, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 69 with value: -347.36894395697703.
-[I 2024-08-23 10:51:07,069] Trial 70 finished with value: -345.17697390093394 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1242367255308854, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
-[I 2024-08-23 10:51:07,095] Trial 71 finished with value: -347.74610809299037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1728352983905301, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
-[I 2024-08-23 10:51:07,133] Trial 72 finished with value: -345.23464281634324 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1265380781508565, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: -345.17697390093394.
-[I 2024-08-23 10:51:07,171] Trial 73 finished with value: -344.6848312222365 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0829896313820404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
-[I 2024-08-23 10:51:07,208] Trial 74 finished with value: -344.9111966504334 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1070414661080543, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
-[I 2024-08-23 10:51:07,245] Trial 75 finished with value: -344.70116419828565 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0875643695329498, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 73 with value: -344.6848312222365.
-[I 2024-08-23 10:51:07,271] Trial 76 finished with value: -344.62647974688133 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0716281620790837, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 76 with value: -344.62647974688133.
-[I 2024-08-23 10:51:07,298] Trial 77 finished with value: -344.6759429204596 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0456289319914898, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 76 with value: -344.62647974688133.
-[I 2024-08-23 10:51:07,324] Trial 78 finished with value: -343.58131497761616 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0010195360522613, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 78 with value: -343.58131497761616.
-[I 2024-08-23 10:51:07,351] Trial 79 finished with value: -342.7290581014813 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9073210715005748, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 79 with value: -342.7290581014813.
-[I 2024-08-23 10:51:07,377] Trial 80 finished with value: -342.67866114080107 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9166305667100072, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 80 with value: -342.67866114080107.
-[I 2024-08-23 10:51:07,402] Trial 81 finished with value: -342.6440308445311 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9248722692093634, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,430] Trial 82 finished with value: -343.02085648448934 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8776928646870886, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,457] Trial 83 finished with value: -343.1662266300702 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.867592364677856, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,484] Trial 84 finished with value: -343.30158716569775 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8599491178327108, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,523] Trial 85 finished with value: -344.2803074848341 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8396948389352923, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,563] Trial 86 finished with value: -344.28301101884045 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8396651775801683, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,589] Trial 87 finished with value: -344.6781906268143 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8356021935129933, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,617] Trial 88 finished with value: -354.0405418264898 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.7430046191126949, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,645] Trial 89 finished with value: -342.77203208258476 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9015965341429055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,684] Trial 90 finished with value: -363.1622720320929 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6746575663752555, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,712] Trial 91 finished with value: -342.7403796626193 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9057564666836629, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -342.6440308445311.
-[I 2024-08-23 10:51:07,740] Trial 92 finished with value: -342.63579667712696 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9332275205203372, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
-[I 2024-08-23 10:51:07,767] Trial 93 finished with value: -342.6886425884964 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9433063264508291, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
-[I 2024-08-23 10:51:07,795] Trial 94 finished with value: -342.9341048659705 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.884739221967487, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 92 with value: -342.63579667712696.
-[I 2024-08-23 10:51:07,823] Trial 95 finished with value: -342.63507445779743 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9381000493689634, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
-[I 2024-08-23 10:51:07,851] Trial 96 finished with value: -343.06021011302374 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.963138023068903, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
-[I 2024-08-23 10:51:07,879] Trial 97 finished with value: -342.9990546212019 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9601651093867907, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
-[I 2024-08-23 10:51:07,910] Trial 98 finished with value: -3821.2267845437514 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
-[I 2024-08-23 10:51:07,938] Trial 99 finished with value: -356.6786067133016 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.721603508336166, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -342.63507445779743.
-
-
-
-
-

Visualize optimization progress

-
-
[7]:
-
-
-
import seaborn as sns
-sns.set_theme(style="darkgrid")
-default_reg_scoring= config.settings.scoring
-ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value");
-ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_21_0.png -
-
-

Sometimes it might be interesting to look at individual CV scores instead of aggregated score (mean CV score by default). Here we can plot all 3 cross validation scores (neg_mean_squared_error) for each trial (folds highlighted using different colors).

-
-
[8]:
-
-
-
cv_test = study.trials_dataframe()["user_attrs_test_scores"].map(lambda d: d[default_reg_scoring])
-x = []
-y = []
-fold = []
-for i, vs in cv_test.items():
-    for idx, v in enumerate(vs):
-        x.append(i)
-        y.append(v)
-        fold.append(idx)
-ax = sns.scatterplot(x=x, y=y, hue=fold, style=fold, palette='Set1')
-ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_23_0.png -
-
-
-
-

Pick the best trial and build a model for it

-

We pick the best trial, inspect its configuration, build the best model, and save it as a pickled file.

-
-
[9]:
-
-
-
# Get the best Trial from the Study and make a Build (Training) configuration for it.
-buildconfig = buildconfig_best(study)
-
-
-
-

Optional: write out JSON of the best configuration.

-
-
[10]:
-
-
-
import apischema
-buildconfig_as_dict = apischema.serialize(buildconfig)
-
-import json
-print(json.dumps(buildconfig_as_dict, indent=2))
-
-
-
-
-
-
-
-
-{
-  "data": {
-    "training_dataset_file": "../tests/data/DRD2/subset-50/train.csv",
-    "input_column": "canonical",
-    "response_column": "molwt",
-    "response_type": "regression",
-    "deduplication_strategy": {
-      "name": "KeepMedian"
-    },
-    "split_strategy": {
-      "name": "NoSplitting"
-    },
-    "test_dataset_file": "../tests/data/DRD2/subset-50/test.csv",
-    "save_intermediate_files": false,
-    "log_transform": false,
-    "log_transform_base": null,
-    "log_transform_negative": null,
-    "log_transform_unit_conversion": null,
-    "probabilistic_threshold_representation": false,
-    "probabilistic_threshold_representation_threshold": null,
-    "probabilistic_threshold_representation_std": null
-  },
-  "metadata": {
-    "name": "",
-    "cross_validation": 3,
-    "shuffle": false,
-    "best_trial": 95,
-    "best_value": -342.63507445779743,
-    "n_trials": 100,
-    "visualization": null
-  },
-  "descriptor": {
-    "name": "ECFP_counts",
-    "parameters": {
-      "radius": 3,
-      "useFeatures": true,
-      "nBits": 2048
-    }
-  },
-  "settings": {
-    "mode": "regression",
-    "scoring": "neg_mean_squared_error",
-    "direction": "maximize",
-    "n_trials": 100,
-    "tracking_rest_endpoint": null
-  },
-  "algorithm": {
-    "name": "Lasso",
-    "parameters": {
-      "alpha": 0.9381000493689634
-    }
-  },
-  "task": "building"
-}
-
-
-

Build (re-Train) and save the best model. This time training uses all training data, without splitting it into cross-validation folds.

-
-
[11]:
-
-
-
best_build = build_best(buildconfig, "../target/best.pkl")
-
-
-
-

We can use the best (or merged) model as following

-
-
[12]:
-
-
-
import pickle
-with open("../target/best.pkl", "rb") as f:
-    model = pickle.load(f)
-model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])
-
-
-
-
-
[12]:
-
-
-
-
-array([ 67.43103985, 177.99850936])
-
-
-

Now we can explore how good the best model performs on the test (holdout) set.

-
-
[13]:
-
-
-
import pandas as pd
-
-df = pd.read_csv(config.data.test_dataset_file)  # Load test data.
-
-expected = df[config.data.response_column]
-predicted = model.predict_from_smiles(df[config.data.input_column])
-
-
-
-
-
[14]:
-
-
-
# Plot expected vs predicted values for the best model.
-import matplotlib.pyplot as plt
-ax = plt.scatter(expected, predicted)
-lims = [expected.min(), expected.max()]
-plt.plot(lims, lims)  # Diagonal line.
-plt.xlabel(f"Expected {config.data.response_column}");
-plt.ylabel(f"Predicted {config.data.response_column}");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_35_0.png -
-
-

We can also calculate custom metrics for the best model:

-
-
[15]:
-
-
-
from sklearn.metrics import (r2_score, mean_squared_error, mean_absolute_error)
-import numpy as np
-
-# R2
-r2 = r2_score(y_true=expected, y_pred=predicted)
-
-# RMSE. sklearn 0.24 added squared=False to get RMSE, here we use np.sqrt().
-rmse = np.sqrt(mean_squared_error(y_true=expected, y_pred=predicted))
-
-# MAE
-mae = mean_absolute_error(y_true=expected, y_pred=predicted)
-
-print(f"R2: {r2}, RMSE: {rmse}, Mean absolute error: {mae}")
-
-
-
-
-
-
-
-
-R2: 0.8566354978126369, RMSE: 26.204909888075044, Mean absolute error: 19.298453946973815
-
-
-

If the metrics look acceptable, the model is ready for use.

-
-
-

Build merged model

-

Now we can merge train and test data, and build (train) the model again. We will have no more holdout data to evaluate the model, but hopefully the model will be a little better by seeing a little more data.

-
-
[16]:
-
-
-
# Build (Train) and save the model on the merged train+test data.
-build_merged(buildconfig, "../target/merged.pkl")
-
-
-
-
-
-
-

Preprocessing: splitting data into train and test sets, and removing duplicates

-
-

Splitting into train and test dataset

-

QPTUNA can split data into train and test (holdout) datasets. To do so, send all data in as training_dataset_file, and choose a splitting strategy. Currently QPTUNA supports three splitting strategies: random, temporal and stratified.

-

Random strategy splits data randomly, taking a specified fraction of observations to be test dataset.

-

Temporal strategy takes the first observations as training dataset, and the last specified fraction of observations as test dataset. The input dataset must be already sorted, from oldest in the beginning to newest and the end. This sorting can be done in any external tool (e.g. Excel).

-

Stratified strategy splits data into bins first, and then takes a fraction from each bin to be the test dataset. This ensures that the distributions in the train and test data are similar. This is a better strategy if dataset is unballanced.

-
-
-

Removing duplicates

-

All the algorithms QPTUNA supports do not work with duplicates. Duplicates can come from multiple measurements for the same compound, or from the fact that the molecular descriptors we use are all disregard stereochemistry, so even if compounds are different, descriptors make them into duplicates. QPTUNA provides several strategies to remove duplicates: * keep median - factors experimental deviation using all replicates into one median value (robust to outliers - recommended) * keep average - -use all experimental data acorss all replicates (less robust to outliers vs. median) * keep first / keep last - when the first or the last measurement is the trusted one * keep max / keep min - when we want to keep the most extreme value out of many * keep random - when we are agnostic to which replicate kept

-
-
-

Configuration example

-
-
[17]:
-
-
-
from optunaz.utils.preprocessing.splitter import Stratified
-from optunaz.utils.preprocessing.deduplicator import KeepMedian
-# Prepare hyperparameter optimization configuration.
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-100/train.csv",  # This will be split into train and test.
-        split_strategy=Stratified(fraction=0.2),
-        deduplication_strategy=KeepMedian(),
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        n_startup_trials=50,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-    ),
-)
-
-
-
-
-
[18]:
-
-
-
study = optimize(config, study_name="my_study_stratified_split")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:10,165] A new study created in memory with name: my_study_stratified_split
-[I 2024-08-23 10:51:10,207] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:51:10,303] Trial 0 finished with value: -261.95269731189177 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.586114272804535, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,439] Trial 1 finished with value: -3455.51800700426 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 31, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,460] Trial 2 finished with value: -1856.4459752935309 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,477] Trial 3 finished with value: -1235.3128104073717 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.5613443439636077, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,495] Trial 4 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 11.259060787354118, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.06151214721649829, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,560] Trial 5 finished with value: -3258.3324669641333 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,580] Trial 6 finished with value: -281.6313215642597 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.821793264230599, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,597] Trial 7 finished with value: -2756.046839500092 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,615] Trial 8 finished with value: -2720.793752592223 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,631] Trial 9 finished with value: -3949.4702695112846 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.11028790699101433, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.001202131310186554, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,710] Trial 10 finished with value: -2695.2514836330784 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,731] Trial 11 finished with value: -1688.7128683041683 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.1044548905141272, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,746] Trial 12 finished with value: -2658.13214897931 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,813] Trial 13 finished with value: -1948.0314425327626 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 11, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,831] Trial 14 finished with value: -1332.6840893052315 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.8033739312636219, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,848] Trial 15 finished with value: -279.7730407032913 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.978415570131035, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,864] Trial 16 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 46.380966239365776, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.5380266414879525e-08, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,882] Trial 17 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 66.39037036873405, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 7.13170545295199e-10, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,898] Trial 18 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 31.347685324232952, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.782238919549724e-08, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,902] Trial 19 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:10,920] Trial 20 finished with value: -3949.4997657609406 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.004091119479264935, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.7670039814136804e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,939] Trial 21 finished with value: -3949.4997709689146 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.020406357580717727, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.826432657033465e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,956] Trial 22 finished with value: -3942.5257596151837 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00017594354214526438, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3221158157501884, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,972] Trial 23 finished with value: -1775.55204856041 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:10,989] Trial 24 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 55.9426790782418, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.28574770987033293, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,006] Trial 25 finished with value: -1254.9841129079468 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.0742769549097546, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,073] Trial 26 finished with value: -3455.51800700426 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 15, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,090] Trial 27 finished with value: -1249.5519579928275 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.26166764283582, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -261.95269731189177.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1856.4459752935309]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:11,122] Trial 28 finished with value: -3949.4997740490603 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.1551243322855379, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.869530660905885e-06, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,141] Trial 29 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 13.613671789797623, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.18743805815241568, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,160] Trial 30 finished with value: -3949.4997740057183 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0006647090582038176, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 8.720240531591189e-10, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -261.95269731189177.
-[I 2024-08-23 10:51:11,188] Trial 31 finished with value: -236.75701162742902 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.15225101226627, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,207] Trial 32 finished with value: -2726.0476769808097 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,225] Trial 33 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 72.98897579737036, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.3434026346873007e-05, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,292] Trial 34 finished with value: -3596.741420193717 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 3, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,311] Trial 35 finished with value: -1242.8479265462504 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.3806781553300398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,343] Trial 36 finished with value: -3949.4997740833423 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 22.148057819462277, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.1220747475846438e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,410] Trial 37 finished with value: -2906.3484169581293 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 31, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,474] Trial 38 finished with value: -2182.2854817163393 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,494] Trial 39 finished with value: -1682.7555601297397 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.7692026965764096, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,527] Trial 40 finished with value: -1885.3761105075926 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8499325582942474, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,560] Trial 41 finished with value: -3949.7934477837753 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.26437658363366806, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.399729524954495, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,580] Trial 42 finished with value: -3949.4996545768313 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03999295021459913, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 9.487454158254508e-07, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,648] Trial 43 finished with value: -2279.772434063323 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,668] Trial 44 finished with value: -1686.497519225056 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.9798152426640634, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,700] Trial 45 finished with value: -1734.418175645478 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1638039970995402, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,720] Trial 46 finished with value: -2641.7637473751115 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,726] Trial 47 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:11,732] Trial 48 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:11,751] Trial 49 finished with value: -3949.4997740833387 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.6381698921109232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.0041111342254524e-10, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,774] Trial 50 finished with value: -279.8133725349282 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9498873960159637, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,797] Trial 51 finished with value: -279.7735399551454 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9780612881075026, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,822] Trial 52 finished with value: -279.8160071680375 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9480302946593064, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,843] Trial 53 finished with value: -279.75592491031455 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9905788379110985, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,866] Trial 54 finished with value: -279.7455012578744 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.998003047330558, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,891] Trial 55 finished with value: -265.5420569489236 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.731766992009085, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,925] Trial 56 finished with value: -268.5837896907764 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.8088769537936915, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1856.4459752935309]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-2720.793752592223]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:11,961] Trial 57 finished with value: -270.2819638853734 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.8381998920032558, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:11,986] Trial 58 finished with value: -270.5370756433875 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.8423274630826914, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,009] Trial 59 finished with value: -268.6648549453774 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.8105870658571404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,034] Trial 60 finished with value: -267.33331573420924 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7812473578529109, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,058] Trial 61 finished with value: -267.54266068640237 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7865813805833881, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,084] Trial 62 finished with value: -264.7374685857255 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.7033718535925544, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,108] Trial 63 finished with value: -259.06013832754854 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.530230339296878, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,134] Trial 64 finished with value: -254.45098385749847 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4449038663990794, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,173] Trial 65 finished with value: -252.46933447499055 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4118163766482357, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,211] Trial 66 finished with value: -250.400610865412 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3751234485145145, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,248] Trial 67 finished with value: -250.55142126098917 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3782312158054713, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,285] Trial 68 finished with value: -249.939720762774 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3654947227249123, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,324] Trial 69 finished with value: -248.51859203038146 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3389300472278876, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,362] Trial 70 finished with value: -247.97854622218964 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.3291551817821483, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,400] Trial 71 finished with value: -248.82875013698148 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344470263084507, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,437] Trial 72 finished with value: -245.89404238932664 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2945916241697062, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,475] Trial 73 finished with value: -245.88444574036467 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.294423585518049, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,512] Trial 74 finished with value: -242.7250858383981 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2374026497690556, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,550] Trial 75 finished with value: -241.38995410978927 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2216216823389983, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,588] Trial 76 finished with value: -237.75773569374167 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1685257395531474, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,627] Trial 77 finished with value: -238.9591618086847 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1871718734668695, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,667] Trial 78 finished with value: -238.16615158396067 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1749827907502346, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,705] Trial 79 finished with value: -236.87253431766433 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1541307320952652, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 31 with value: -236.75701162742902.
-[I 2024-08-23 10:51:12,744] Trial 80 finished with value: -234.54432497574712 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1118826871769896, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 80 with value: -234.54432497574712.
-[I 2024-08-23 10:51:12,783] Trial 81 finished with value: -234.500370208023 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1109739335876776, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 81 with value: -234.500370208023.
-[I 2024-08-23 10:51:12,820] Trial 82 finished with value: -227.05479722761888 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9557242358902104, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 82 with value: -227.05479722761888.
-[I 2024-08-23 10:51:12,859] Trial 83 finished with value: -226.75744778941316 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9492111894491083, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 83 with value: -226.75744778941316.
-[I 2024-08-23 10:51:12,900] Trial 84 finished with value: -226.23291544469544 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9371731029417466, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 84 with value: -226.23291544469544.
-[I 2024-08-23 10:51:12,940] Trial 85 finished with value: -225.8608948363877 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9283493142218042, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 85 with value: -225.8608948363877.
-[I 2024-08-23 10:51:12,981] Trial 86 finished with value: -223.79470023518647 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8924342905874942, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 86 with value: -223.79470023518647.
-[I 2024-08-23 10:51:13,021] Trial 87 finished with value: -225.04998656317707 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.914666812200899, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 86 with value: -223.79470023518647.
-[I 2024-08-23 10:51:13,061] Trial 88 finished with value: -223.3255192874075 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8835546302939554, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 88 with value: -223.3255192874075.
-[I 2024-08-23 10:51:13,099] Trial 89 finished with value: -224.5251215421697 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9055752025758994, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 88 with value: -223.3255192874075.
-[I 2024-08-23 10:51:13,140] Trial 90 finished with value: -224.49998736816636 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9051359167587669, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 88 with value: -223.3255192874075.
-[I 2024-08-23 10:51:13,180] Trial 91 finished with value: -224.40892455090952 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9035104089361927, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 88 with value: -223.3255192874075.
-[I 2024-08-23 10:51:13,220] Trial 92 finished with value: -225.1896344823456 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.9170686489009984, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 88 with value: -223.3255192874075.
-[I 2024-08-23 10:51:13,249] Trial 93 finished with value: -217.03166841852928 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6196471760805207, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 93 with value: -217.03166841852928.
-[I 2024-08-23 10:51:13,289] Trial 94 finished with value: -217.15921264908027 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6137087588616705, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 93 with value: -217.03166841852928.
-[I 2024-08-23 10:51:13,330] Trial 95 finished with value: -216.30437532378687 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6380025600331409, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 95 with value: -216.30437532378687.
-[I 2024-08-23 10:51:13,372] Trial 96 finished with value: -215.71630507323695 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.663588879747517, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -215.71630507323695.
-[I 2024-08-23 10:51:13,411] Trial 97 finished with value: -216.1690838497103 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.641298655189503, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -215.71630507323695.
-[I 2024-08-23 10:51:13,450] Trial 98 finished with value: -215.79332895242592 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.654362351183699, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -215.71630507323695.
-[I 2024-08-23 10:51:13,491] Trial 99 finished with value: -215.737398967865 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6605844367915987, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -215.71630507323695.
-
-
-
-
-
-

Choosing scoring function

-

By default, QPTUNA uses neg_mean_squared_error for regression and roc_auc for classification. It is possible to change to other scoring functions that supported by scikit-learn (https://scikit-learn.org/stable/modules/model_evaluation.html) amongst others:

-
-
[19]:
-
-
-
from optunaz import objective
-list(objective.regression_scores) + list(objective.classification_scores)
-
-
-
-
-
[19]:
-
-
-
-
-['explained_variance',
- 'max_error',
- 'neg_mean_absolute_error',
- 'neg_mean_squared_error',
- 'neg_median_absolute_error',
- 'r2',
- 'accuracy',
- 'average_precision',
- 'balanced_accuracy',
- 'f1',
- 'f1_macro',
- 'f1_micro',
- 'f1_weighted',
- 'jaccard',
- 'jaccard_macro',
- 'jaccard_micro',
- 'jaccard_weighted',
- 'neg_brier_score',
- 'precision',
- 'precision_macro',
- 'precision_micro',
- 'precision_weighted',
- 'recall',
- 'recall_macro',
- 'recall_micro',
- 'recall_weighted',
- 'roc_auc',
- 'auc_pr_cal',
- 'bedroc',
- 'concordance_index']
-
-
-

This value can be set using settings.scoring:

-
-
[20]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-100/train.csv",
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        n_startup_trials=50,
-        random_seed=42,
-        scoring="r2",  # Scoring function name from scikit-learn.
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-    ),
-)
-
-
-
-
-
[21]:
-
-
-
study = optimize(config, study_name="my_study_r2")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:14,439] A new study created in memory with name: my_study_r2
-[I 2024-08-23 10:51:14,441] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:51:14,590] Trial 0 finished with value: -0.01117186866515977 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.01117186866515977.
-[I 2024-08-23 10:51:14,657] Trial 1 finished with value: -0.08689402230378156 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.01117186866515977.
-[I 2024-08-23 10:51:14,797] Trial 2 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.01117186866515977.
-[I 2024-08-23 10:51:14,922] Trial 3 finished with value: 0.3039309544203818 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: 0.3039309544203818.
-[I 2024-08-23 10:51:14,936] Trial 4 finished with value: 0.20182749628697164 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: 0.3039309544203818.
-[I 2024-08-23 10:51:14,957] Trial 5 finished with value: 0.8187194367176578 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: 0.8187194367176578.
-[I 2024-08-23 10:51:14,979] Trial 6 finished with value: 0.4647239019719945 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: 0.8187194367176578.
-[I 2024-08-23 10:51:15,009] Trial 7 finished with value: 0.8614818478547979 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: 0.8614818478547979.
-[I 2024-08-23 10:51:15,086] Trial 8 finished with value: -0.12769795082909816 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 7 with value: 0.8614818478547979.
-[I 2024-08-23 10:51:15,127] Trial 9 finished with value: 0.8639946428338224 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,157] Trial 10 finished with value: -0.12553701248377633 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,186] Trial 11 finished with value: -0.12553700871203702 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,202] Trial 12 finished with value: 0.2935582042429075 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,219] Trial 13 finished with value: 0.18476333152695587 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,236] Trial 14 finished with value: 0.8190707459213998 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,301] Trial 15 finished with value: 0.12206148974315863 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,318] Trial 16 finished with value: 0.3105263811279067 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,336] Trial 17 finished with value: 0.3562469062424869 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,401] Trial 18 finished with value: 0.045959695906983344 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,432] Trial 19 finished with value: 0.8583939656024446 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,448] Trial 20 finished with value: 0.3062574078515544 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,479] Trial 21 finished with value: -0.11657354998283716 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,483] Trial 22 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:15,502] Trial 23 finished with value: 0.8498478905829554 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,572] Trial 24 finished with value: -0.12769795082909816 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,602] Trial 25 finished with value: -0.13519830637607919 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,621] Trial 26 finished with value: 0.8198078293055633 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,640] Trial 27 finished with value: 0.8201573964824842 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [0.2935582042429075]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:15,706] Trial 28 finished with value: 0.04595969590698312 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,736] Trial 29 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,803] Trial 30 finished with value: 0.11934070343348317 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,822] Trial 31 finished with value: 0.4374125584543907 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,852] Trial 32 finished with value: 0.3625576518621392 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,871] Trial 33 finished with value: 0.36175556871883746 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,876] Trial 34 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:15,896] Trial 35 finished with value: 0.8202473217121523 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,913] Trial 36 finished with value: 0.3672927879319306 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,918] Trial 37 pruned. Duplicate parameter set
-[I 2024-08-23 10:51:15,938] Trial 38 finished with value: 0.40076792599874356 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:15,996] Trial 39 finished with value: 0.26560316846701765 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,064] Trial 40 finished with value: 0.41215254857081174 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,069] Trial 41 pruned. Duplicate parameter set
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [0.2935582042429075]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [0.3062574078515544]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [0.3039309544203818]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:16,221] Trial 42 finished with value: -0.004614143721600701 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,240] Trial 43 finished with value: 0.27282533524183633 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,322] Trial 44 finished with value: -0.10220127407364972 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,342] Trial 45 finished with value: 0.30323404130582854 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,362] Trial 46 finished with value: 0.3044553805553568 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,382] Trial 47 finished with value: -0.12553701248394863 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,402] Trial 48 finished with value: 0.36160209098547913 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,423] Trial 49 finished with value: 0.2916101445983833 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.936e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.434e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,496] Trial 50 finished with value: 0.8609413020928532 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.04987590926279814, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.794e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.830e+02, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.578e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,580] Trial 51 finished with value: 0.8610289662757457 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.019211413400468974, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.754e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.843e+02, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.507e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,668] Trial 52 finished with value: 0.8610070549049179 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.018492644772509947, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.840e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.924e+02, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.513e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,754] Trial 53 finished with value: 0.8569771623635769 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.008783442408928633, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.243e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.014e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,825] Trial 54 finished with value: 0.8624781673814641 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.05782221001517797, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.113e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.935e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.122e+02, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:16,900] Trial 55 finished with value: 0.8618589507037001 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.02487072255316275, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 9 with value: 0.8639946428338224.
-[I 2024-08-23 10:51:16,960] Trial 56 finished with value: 0.864754359721037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2079910754941946, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:16,998] Trial 57 finished with value: 0.8622236413326235 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.333215560931422, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,034] Trial 58 finished with value: 0.861832165638517 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3628098560209365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,072] Trial 59 finished with value: 0.8620108533993581 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.34240779695521706, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,120] Trial 60 finished with value: 0.8638540565650902 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.26493714991266293, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,171] Trial 61 finished with value: 0.8629799500771645 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.30596394512914815, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,210] Trial 62 finished with value: 0.8621408609583922 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.33648829357762355, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,259] Trial 63 finished with value: 0.8638132124078156 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2679814646317183, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,319] Trial 64 finished with value: 0.863983758876634 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.24062119162159595, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,370] Trial 65 finished with value: 0.8627356047945115 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3141728910335158, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,421] Trial 66 finished with value: 0.8639203054085788 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.23391390640786494, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,460] Trial 67 finished with value: 0.8570103863991635 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6124885145996103, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: 0.864754359721037.
-[I 2024-08-23 10:51:17,532] Trial 68 finished with value: 0.8647961976727571 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2059976546070975, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 68 with value: 0.8647961976727571.
-[I 2024-08-23 10:51:17,591] Trial 69 finished with value: 0.8648312544921793 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.20266060662750784, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 69 with value: 0.8648312544921793.
-[I 2024-08-23 10:51:17,653] Trial 70 finished with value: 0.8648431452862716 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.20027647978240445, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 70 with value: 0.8648431452862716.
-[I 2024-08-23 10:51:17,715] Trial 71 finished with value: 0.8648491459660418 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1968919999787333, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: 0.8648491459660418.
-[I 2024-08-23 10:51:17,778] Trial 72 finished with value: 0.8650873115156988 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.174598921162764, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:17,855] Trial 73 finished with value: 0.8650350577921149 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.16468002989641095, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:17,928] Trial 74 finished with value: 0.8649412283687147 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1606717091615047, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.986e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,015] Trial 75 finished with value: 0.8649537211609554 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.14694925097689848, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,090] Trial 76 finished with value: 0.8649734575435447 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.147612713300643, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.446e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,175] Trial 77 finished with value: 0.8648761002838515 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.14440434705706803, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.398e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,251] Trial 78 finished with value: 0.8639826593122782 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1265357179513065, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.690e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,328] Trial 79 finished with value: 0.864435565531768 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1374245525868926, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,366] Trial 80 finished with value: 0.8590221951825531 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.49890830155012533, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,441] Trial 81 finished with value: 0.8649098880804443 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1573428812070292, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.405e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,520] Trial 82 finished with value: 0.864536410656637 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13886104722511608, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,561] Trial 83 finished with value: 0.8597401050431873 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.47746341180045787, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,600] Trial 84 finished with value: 0.8537465461603838 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8599491178327108, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.050e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,679] Trial 85 finished with value: 0.8642643827090003 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13446778921611002, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.175e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,766] Trial 86 finished with value: 0.8641621818665252 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1286796719653316, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.446e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:18,854] Trial 87 finished with value: 0.864182755916388 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.13303218726548235, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,896] Trial 88 finished with value: -0.1255357440899417 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.021711452917433944, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.559714273835951e-05, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:18,937] Trial 89 finished with value: 0.8604596648091501 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.43644874418279245, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.463e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:19,015] Trial 90 finished with value: 0.8635689909135862 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.10940922083495383, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:19,078] Trial 91 finished with value: 0.8648544336551733 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1912756875742137, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:19,142] Trial 92 finished with value: 0.8648496595672595 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.19628449928540487, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:19,170] Trial 93 finished with value: 0.8452625121122099 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.4324661283995224, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:19,198] Trial 94 finished with value: 0.8378670635846416 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.839206620815206, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.002e+01, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.082e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:19,286] Trial 95 finished with value: 0.8649365368153895 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.07270781179126021, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 72 with value: 0.8650873115156988.
-[I 2024-08-23 10:51:19,373] Trial 96 finished with value: 0.8875676754699953 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0006995169897945908, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.618e+01, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.234e+01, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.586e+01, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:19,465] Trial 97 finished with value: 0.8730555131061773 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.0018186269840273495, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
-[I 2024-08-23 10:51:19,509] Trial 98 finished with value: -0.12553508835019533 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.04867556317570456, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.177e+02, tolerance: 4.977e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.284e+02, tolerance: 4.782e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.016e+02, tolerance: 4.906e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 10:51:19,599] Trial 99 finished with value: 0.8586292788613132 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.005078762921098462, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: 0.8875676754699953.
-
-
-
-
[22]:
-
-
-
ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value")
-ax.set(xlabel="Trial number", ylabel="Ojbective value\n(r2)");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_56_0.png -
-
-
-
-

Advanced functoinaility: algorithms & runs

-

Various algorithms are available in Qptuna:

-
-
[23]:
-
-
-
from optunaz.config.optconfig import AnyAlgorithm
-AnyAlgorithm.__args__
-
-
-
-
-
[23]:
-
-
-
-
-(optunaz.config.optconfig.Lasso,
- optunaz.config.optconfig.PLSRegression,
- optunaz.config.optconfig.RandomForestRegressor,
- optunaz.config.optconfig.Ridge,
- optunaz.config.optconfig.KNeighborsRegressor,
- optunaz.config.optconfig.SVR,
- optunaz.config.optconfig.XGBRegressor,
- optunaz.config.optconfig.PRFClassifier,
- optunaz.config.optconfig.ChemPropRegressor,
- optunaz.config.optconfig.ChemPropRegressorPretrained,
- optunaz.config.optconfig.ChemPropHyperoptRegressor,
- optunaz.config.optconfig.AdaBoostClassifier,
- optunaz.config.optconfig.KNeighborsClassifier,
- optunaz.config.optconfig.LogisticRegression,
- optunaz.config.optconfig.RandomForestClassifier,
- optunaz.config.optconfig.SVC,
- optunaz.config.optconfig.ChemPropClassifier,
- optunaz.config.optconfig.ChemPropHyperoptClassifier,
- optunaz.config.optconfig.CalibratedClassifierCVWithVA,
- optunaz.config.optconfig.Mapie)
-
-
-

This tutorial will now look at more complex considerations that should be factored for more advanced functionaility such as the PRF and ChemProp algorithms

-
-
-

Probabilistic Random Forest (PRF)

-

PRF is a modification of the long-established Random Forest (RF) algorithm and takes into account uncertainties in features and/or labels (though only uncertainty in labels are currently implemented in Qptuna), which was first described in[1]. It can be seen as a probabilistic method to factor experimental uncertainty during training, and is considered a hybrid between regression and classification algorithms.

-

In more detail; PRF treats labels as probability distribution functions [PDFs] (denoted as ∆y), rather than deterministic quantities. In comparison, the traditional RF uses discrete variables for activity (binary y-labels, also referred to as y) from the discretised bioactivity scale defining active/inactive sets.

-

PTR integration was added to Qptuna to afford this probabilistic approach towards modelling, and is particularly useful combined with the PTR (See the preprocessing notebook for details). In this combination, PRF takes as input real-valued probabilities (similar to regression), from a Probabilistic Threshold Representation (PTR). However, similar to classification algorithms, PRF outputs the probability of activity for the active class.

-

Note that Qptuna runs the PRFClassifier in a regression setting, since the model only outputs class liklihood membership based on ∆y

-

[1] https://iopscience.iop.org/article/10.3847/1538-3881/aaf101/meta

-

The following code imports the PRFClassifier and sets up a config to use the PRF with PTR:

-
-
[24]:
-
-
-
from optunaz.config.optconfig import PRFClassifier
-
-# Prepare hyperparameter optimization configuration.
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="Smiles",
-        response_column="Measurement",
-        training_dataset_file="../tests/data/pxc50/P24863.csv",
-        probabilistic_threshold_representation=True, # This enables PTR
-        probabilistic_threshold_representation_threshold=8, # This defines the activity threshold
-        probabilistic_threshold_representation_std=0.6, # This captures the deviation/uncertainty in the dataset
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        PRFClassifier.new(n_estimators={"low": 20, "high": 20}), #n_estimators set low for the example to run fast
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=15,
-        random_seed=42,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-
-
-

Note that Qptuna is run in regression mode (ModelMode.REGRESSION), as outputs from the algorithm are always continuous values.

-

Next we can run the PRF/PTR study:

-
-
[25]:
-
-
-
# Run the PRF/PTR Optuna Study.
-study = optimize(config, study_name="my_study")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:20,724] A new study created in memory with name: my_study
-[I 2024-08-23 10:51:20,726] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:51:23,237] Trial 0 finished with value: -0.0811707042483984 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 13, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.0811707042483984.
-[I 2024-08-23 10:51:26,520] Trial 1 finished with value: -0.07385123845467624 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 6, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -0.07385123845467624.
-[I 2024-08-23 10:51:28,891] Trial 2 finished with value: -0.08693605025593726 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -0.07385123845467624.
-[I 2024-08-23 10:51:31,782] Trial 3 finished with value: -0.07306390786920249 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 7, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 3 with value: -0.07306390786920249.
-[I 2024-08-23 10:51:36,706] Trial 4 finished with value: -0.07213945175504542 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 4 with value: -0.07213945175504542.
-[I 2024-08-23 10:51:48,023] Trial 5 finished with value: -0.055757209329220986 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 26, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:51:48,039] Trial 6 pruned. Duplicate parameter set
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.07213945175504542]
-
-
-
-
-
-
-
-[I 2024-08-23 10:51:51,986] Trial 7 finished with value: -0.06330901806749258 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 27, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:51:54,811] Trial 8 finished with value: -0.07619841217081819 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 5, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 3, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:51:58,648] Trial 9 finished with value: -0.061815145745506755 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 22, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:52:01,091] Trial 10 finished with value: -0.07429343450473058 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 32, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 4, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:52:04,778] Trial 11 finished with value: -0.06446287784137206 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 30, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:52:08,574] Trial 12 finished with value: -0.06120344765133655 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 14, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 2, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:52:11,745] Trial 13 finished with value: -0.0686143607166384 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 18, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.055757209329220986.
-[I 2024-08-23 10:52:22,199] Trial 14 finished with value: -0.05295650394252901 and parameters: {'algorithm_name': 'PRFClassifier', 'PRFClassifier_algorithm_hash': 'efe0ba9870529a6cde0dd3ad22447cbb', 'max_depth__efe0ba9870529a6cde0dd3ad22447cbb': 25, 'n_estimators__efe0ba9870529a6cde0dd3ad22447cbb': 20, 'max_features__efe0ba9870529a6cde0dd3ad22447cbb': <PRFClassifierMaxFeatures.AUTO: 'auto'>, 'min_py_sum_leaf__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_gini__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'use_py_leafs__efe0ba9870529a6cde0dd3ad22447cbb': 1, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.05295650394252901.
-
-
-

We can now plot obtained performance across the Optuna trials.

-
-
[26]:
-
-
-
sns.set_theme(style="darkgrid")
-default_reg_scoring = config.settings.scoring
-ax = sns.scatterplot(data=study.trials_dataframe(), x="number", y="value")
-ax.set(xlabel="Trial number", ylabel=f"Ojbective value\n({default_reg_scoring})");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_67_0.png -
-
-

Build the best PRF model:

-
-
[27]:
-
-
-
buildconfig = buildconfig_best(study)
-best_built = build_best(buildconfig, "../target/best.pkl")
-
-with open("../target/best.pkl", "rb") as f:
-    model = pickle.load(f)
-
-
-
-

Plot predictions from the merged model for the (seen) train data for demonstration purposes

-
-
[28]:
-
-
-
#predict the input from the trained model (transductive evaluation of the model)
-example_smiles=config.data.get_sets()[0]
-expected = config.data.get_sets()[1]
-predicted = model.predict_from_smiles(example_smiles)
-
-# Plot expected vs predicted values for the best model.
-ax = plt.scatter(expected, predicted)
-lims = [expected.min(), expected.max()]
-plt.plot(lims, lims)  # Diagonal line.
-plt.xlabel(f"Expected {config.data.response_column} (PTR)");
-plt.ylabel(f"Predicted {config.data.response_column}");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_71_0.png -
-
-
-

Interlude: Cautionary advice for PRF ∆y (response column) validity

-

N.B It is not possible to train on response column values outside the likelihood for y-label memberships (ranging from 0-1), as expected for ∆y. Doing so will result in the following error from Qptuna:

-
-
[29]:
-
-
-
# Prepare problematic hyperparameter optimization configuration without PTR.
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="Smiles",
-        response_column="Measurement",
-        training_dataset_file="../tests/data/pxc50/P24863.csv"),
-    descriptors=[
-        ECFP.new(),
-    ],
-    algorithms=[
-        PRFClassifier.new(n_estimators={"low": 5, "high": 10}), #n_estimators set low for the example to run fast
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=2,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-try:
-    study = optimize(config, study_name="my_study")
-except ValueError as e:
-    print(f'As expected, training the PRF on the raw pXC50 values resulted in the following error:\n\n"{e}')
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:52:29,551] A new study created in memory with name: my_study
-[I 2024-08-23 10:52:29,595] A new study created in memory with name: study_name_0
-[W 2024-08-23 10:52:29,597] Trial 0 failed with parameters: {} because of the following error: ValueError('PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3 ').
-Traceback (most recent call last):
-  File "/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
-    value_or_values = func(trial)
-  File "/Users/kljk345/PycharmProjects/optuna_az/optunaz/objective.py", line 128, in __call__
-    self._validate_algos()
-  File "/Users/kljk345/PycharmProjects/optuna_az/optunaz/objective.py", line 270, in _validate_algos
-    raise ValueError(
-ValueError: PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3
-[W 2024-08-23 10:52:29,598] Trial 0 failed with value None.
-
-
-
-
-
-
-
-As expected, training the PRF on the raw pXC50 values resulted in the following error:
-
-"PRFClassifier supplied but response column outside [0.0-1.0] acceptable range. Response max: 9.7, response min: 5.3
-
-
-

To summarise: 1. PRF handles probability of y or ∆y labels, which range between [0-1] 2. PRF is evaluated in a probabilistic setting via conventional regression metrics (e.g. RMSE, R2 etc.), despite the fact that PRF can be considered a modification to the classic Random Forest classifier 3. Probabilistic output is the probability of activity at a relevant cutoff, similar to a classification algorithm 4. Ouputs reflect liklihoods for a molecular property to be above a -relevant threshold, given experimental uncertainty (and arguably a more useful component for within a REINVENT MPO score)

-
-
-
-

ChemProp

-

QPTUNA has the functionaility to train ChemProp deep learning models. These are message passing neural networks (MPNNs) based on a graph representation of training molecules. They are considered by many to offer the state-of-the-art approach for property prediction.

-

ChemProp was first described in the paper Analyzing Learned Molecular Representations for Property Prediction: https://pubs.acs.org/doi/full/10.1021/acs.jcim.9b00237

-

More information is available in their slides: https://docs.google.com/presentation/d/14pbd9LTXzfPSJHyXYkfLxnK8Q80LhVnjImg8a3WqCRM/edit

-

The ChemProp package expects SMILES as molecule inputs, since it calcaultes a molecule graph directly from these and so expects SMILES as descriptors. The SmilesFromFile and SmilesAndSideInfoFromFile descriptors (more about this later) are available for this purpose and are only supported by the ChemProp algorithms:

-
-
[30]:
-
-
-
from optunaz.config.optconfig import ChemPropRegressor
-from optunaz.descriptors import SmilesBasedDescriptor, SmilesFromFile
-print(f"Smiles based descriptors:\n{SmilesBasedDescriptor.__args__}")
-
-
-
-
-
-
-
-
-Smiles based descriptors:
-(<class 'optunaz.descriptors.SmilesFromFile'>, <class 'optunaz.descriptors.SmilesAndSideInfoFromFile'>)
-
-
-
-

Simple ChemProp example

-

The following is an example of the most basic ChemProp run, which will train the algorithm using the recommended (sensible) defaults for the MPNN architecture:

-
-
[31]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-        split_strategy=Stratified(fraction=0.50),
-        deduplication_strategy=KeepMedian(),
-    ),
-    descriptors=[
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        ChemPropRegressor.new(epochs=5), #epochs=5 to ensure run finishes quickly
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=2,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:52:29,648] A new study created in memory with name: my_study
-[I 2024-08-23 10:52:29,650] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__668a7428ff5cdb271b01c0925e8fea45': 'ReLU', 'aggregation__668a7428ff5cdb271b01c0925e8fea45': 'mean', 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': 'none', 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45'}
-[I 2024-08-23 10:53:18,416] Trial 0 finished with value: -6833.034983241957 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -6833.034983241957.
-[I 2024-08-23 10:54:10,531] Trial 1 finished with value: -6341.72494883772 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 9.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 115.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 6.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 500.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -2, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 1500.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 1 with value: -6341.72494883772.
-
-
-

You may safely ignore ChemProp warnings such as Model 0 provided with no test set, no metric evaluation will be performed, "rmse = nan" and 1-fold cross validation, as they are information prompts printed from ChemProp due to some (deactivated) CV functionaility (ChemProp can perform it’s own cross validation - details for this are still printed despite its deactivation within Qptuna).

-

NB: Qptuna will first trial the sensible defaults for the MPNN architecture (where possible given the user config). This is communicated to the user, e.g. see the output which advises:

-

A new study created in memory with name: study_name_0 INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation': 'ReLU', 'aggregation': 'mean', 'aggregation_norm': 100, 'batch_size': 50, 'depth': 3, 'dropout': 0.0, 'features_generator': 'none', 'ffn_hidden_size': 300, 'ffn_num_layers': 3, 'final_lr_ratio_exp': -1, 'hidden_size': 300, 'init_lr_ratio_exp': -1, 'max_lr_exp': -3, 'warmup_epochs_ratio': 0.1, 'algorithm_name': 'ChemPropRegressor'}.

-

Enqueuing custom parameters ensures sampling from a sensible hyperparameter space to begin with, and to facilitate further optimisation from this point. Additional trials will not have any further preset enqueing and use Bayesian optimization for trial suggestion.

-
-
-

ChemProp optimization separate from shallow methods (default behavior)

-

By default, Qptuna separates ChemProp from the other shallow methods using the split_chemprop flag. When this setting is set, the user must specify the number of ChemProp trials using the n_chemprop_trials flag if more than 1 (default) trial is desired:

-
-
[32]:
-
-
-
from optunaz.config.optconfig import ChemPropClassifier, RandomForestClassifier
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
-        split_strategy=Stratified(fraction=0.75),
-        deduplication_strategy=KeepMedian(),
-    ),
-    descriptors=[
-        ECFP.new(),
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        ChemPropClassifier.new(epochs=4),
-        RandomForestClassifier.new(n_estimators={"low": 5, "high": 5}),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1, # run only one random forest classifier trial
-        n_chemprop_trials=2, # run one enqueued chemprop trial and 1 undirected trial
-        split_chemprop=True, # this is set to true by default (shown here for illustration)
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-
-
-
-
-

Turn on Hyperopt within trials (advanced functionaility & very large computational cost)

-

Qptuna optimises all aspects of the ChemProp architecture when using ChemPropRegressor or ChemPropClassifier, however, users can activate the original hyperparameter-optimization implementation in the ChemProp package, which performs automated Bayesian hyperparameter optimization using the Hyperopt package within each trial, at large computational cost.

-

NB: The principal way for users to expand and perform more advanced runs is to extend the available non-network hyperparameters, such as the features_generator option or e.g. to trial differnt side information weighting (if side information is available).

-

NB: Please note that when num_iters=1 (default behavior), any optimisation of the MPNN architecture (done by Hyperopt) is deactivated - the sensible defaults as specified by the ChemProp authors are applied. i.e. optimisation of the MPNN is only possible when num_iters>=2, like so:

-
-
[33]:
-
-
-
from optunaz.config.optconfig import ChemPropHyperoptRegressor, ChemPropHyperoptClassifier
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
-        split_strategy=Stratified(fraction=0.5),
-        deduplication_strategy=KeepMedian(),
-    ),
-    descriptors=[
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        ChemPropHyperoptRegressor.new(epochs=5, num_iters=2), #num_iters>2: enable hyperopt within ChemProp trials
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1, #just optimise one ChemProp model for this example
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-
-
-

NB: Remember that parameter tuning of the MPNN network is performed within each trial.

-
-

A note on MPNN Hyperopt search space

-

ChemProp models trained using Hyperopt use the original implementation, but one key difference is the search_parameter_level setting created for Qptuna; Instead of using pre-defined search spaces as in the original package, Qptuna can (and will by the default since search_parameter_level=auto unless changed) alter the space depending on the characteristics of user input data. For example, no. training set -compounds, hyperparameter trials (num_iters) & epochs (epochs) are used by the auto setting to ensure search spaces are not too large for limited data/epochs, and vice-versa, an extensive search space is trailed when applicable.

-

N.B: Users can also manually define Hyperopt search spaces by altering search_parameter_level from auto to a different level between [0-8], representing the increasing search space size (see the Qptuna documentation for details).

-
-
-
-

Side information and multi-task learning (MTL)

-

“Even if you are only optimizing one loss as is the typical case, chances are there is an auxiliary task that will help you improve upon your main task” [Caruana, 1998]

-

Qptuna typically optimizes for one particular metric for a given molecule property. While we can generally achieve acceptable performance this way, these single task (ST) models ignore information that may improve the prediction of main task of intent. See option a. in the figure below.

-

Signals from relevant related tasks (aka “auxiliary tasks” or “side information”) could come from the training signals of other molecular properties and by sharing representations between related tasks, we can enable a neural network to generalize better on our original task of intent. This approach is called Multi-Task Learning (MTL) See option b. in the figure below.

-

Difference between ST and MTL

-

(above) Differences between optimizing one vs. more than one loss function. a.) Single-task (ST): one model trained to predict one task one model optimised until performance no longer increases b.) Multi-task (MT/MTL): training one model to predict multiple tasks one model optimising more than one loss function at once enables representations to be shared between trained tasks training signals of related tasks shared between all tasks.

-

ChemProp performs MTL by using the knowledge learnt during training one task to reduce the loss of other tasks included in training. In order to use this function in Qptuna, a user should provide side information in a separate file, and it should have the same ordering and length as the input/response columns (i.e. length of y should = length of side information for y).

-

E.g: consider the DRD2 example input from earlier:

-
-
[34]:
-
-
-
!head  ../tests/data/DRD2/subset-50/train.csv
-
-
-
-
-
-
-
-
-canonical,activity,molwt,molwt_gt_330
-Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,0,387.233,True
-O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,0,387.4360000000001,True
-COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,380.36000000000007,True
-CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,0,312.39400000000006,False
-CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,0,349.4340000000001,True
-Brc1ccccc1OCCCOc1cccc2cccnc12,0,358.235,True
-CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,0,310.39700000000005,False
-CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,0,407.4700000000001,True
-COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,0,454.54800000000023,True
-
-
-

There is an accompying example of side information/auxiliary data inputs (calculated PhysChem properties ) as provided in train_side_info.csv within the tests data folder:

-
-
[35]:
-
-
-
!head  ../tests/data/DRD2/subset-50/train_side_info.csv
-
-
-
-
-
-
-
-
-canonical,cLogP,cLogS,H-Acceptors,H-Donors,Total Surface Area,Relative PSA
-Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,4.04,-5.293,5,1,265.09,0.22475
-O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,4.8088,-5.883,4,2,271.39,0.32297
-COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,1.6237,-3.835,9,1,287.39,0.33334
-CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,3.2804,-4.314,4,0,249.51,0.26075
-CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,3.2533,-4.498,5,1,278.05,0.18917
-Brc1ccccc1OCCCOc1cccc2cccnc12,4.5102,-4.694,3,0,246.29,0.12575
-CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,3.7244,-2.678,4,0,255.14,0.14831
-CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,4.4338,-6.895,5,2,302.18,0.26838
-COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,3.2041,-5.057,7,1,343.67,0.22298
-
-
-

I.e. the first column (Smiles) should match between the two files, and any columns after the SMILES within the train_side_info.csv side information file will be used as y-label side information in the training of the network.

-

N.B: that calculated PhysChem properties are only one example of side information, and that side information may come from any related property that improves the main task of intent.

-

A classification example can also be found here:

-
-
[36]:
-
-
-
!head  ../tests/data/DRD2/subset-50/train_side_info_cls.csv
-
-
-
-
-
-
-
-
-canonical,cLogP_Gt2.5,cLogS_Gt-3.5,H-Acceptors_Gt5,H-Donors_Gt0,Total Surface Area_Gt250,Relative PSA_Lt0.25
-Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1,1,0,0,1,1,1
-O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1,1,0,0,1,1,0
-COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1ccccc1,0,0,1,1,1,0
-CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N,1,0,0,0,0,0
-CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12,1,0,0,1,1,1
-Brc1ccccc1OCCCOc1cccc2cccnc12,1,0,0,0,0,1
-CCCCn1c(COc2cccc(OC)c2)nc2ccccc21,1,1,0,0,1,1
-CCOc1cccc(NC(=O)c2sc3nc(-c4ccc(F)cc4)ccc3c2N)c1,1,0,0,1,1,0
-COc1ccc(S(=O)(=O)N(CC(=O)Nc2ccc(C)cc2)c2ccc(C)cc2)cc1OC,1,0,1,1,1,1
-
-
-

The contribution or weight of all side information tasks in their contribution to the loss function during training a network is a parameter that can be optimised within Qptuna, e.g:

-
-
[37]:
-
-
-
from optunaz.descriptors import SmilesAndSideInfoFromFile
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",
-        test_dataset_file="../tests/data/DRD2/subset-50/test.csv"),  # Hidden during optimization.
-    descriptors=[
-        SmilesAndSideInfoFromFile.new(file='../tests/data/DRD2/subset-50/train_side_info.csv',\
-                                     input_column='canonical',
-                                     aux_weight_pc={"low": 0, "high": 100, "q": 10}) #try different aux weights
-    ],
-    algorithms=[
-        ChemPropHyperoptRegressor.new(epochs=4), #epochs=4 to ensure run finishes quickly
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=1,
-        n_trials=8,
-        n_startup_trials=0,
-        random_seed=42,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:54:11,832] A new study created in memory with name: my_study
-[I 2024-08-23 10:54:11,835] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:54:13,809] Trial 0 finished with value: -5817.944294219682 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 50}. Best is trial 0 with value: -5817.944294219682.
-[I 2024-08-23 10:54:13,841] Trial 1 pruned. Duplicate parameter set
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 50}, return [-5817.944294219682]
-
-
-
-
-
-
-
-[I 2024-08-23 10:54:15,590] Trial 2 finished with value: -5796.344216469237 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 80}. Best is trial 2 with value: -5796.344216469237.
-[I 2024-08-23 10:54:17,443] Trial 3 finished with value: -5795.086276167766 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}. Best is trial 3 with value: -5795.086276167766.
-[I 2024-08-23 10:54:17,468] Trial 4 pruned. Duplicate parameter set
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}, return [-5795.086276167766]
-
-
-
-
-
-
-
-[I 2024-08-23 10:54:19,134] Trial 5 finished with value: -5820.228288292862 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 0}. Best is trial 3 with value: -5795.086276167766.
-[I 2024-08-23 10:54:19,157] Trial 6 pruned. Duplicate parameter set
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 100}, return [-5795.086276167766]
-
-
-
-
-
-
-
-[I 2024-08-23 10:54:21,012] Trial 7 finished with value: -5852.160071204277 and parameters: {'algorithm_name': 'ChemPropHyperoptRegressor', 'ChemPropHyperoptRegressor_algorithm_hash': 'db9e60f9b8f0a43eff4b41917b6293d9', 'ensemble_size__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'epochs__db9e60f9b8f0a43eff4b41917b6293d9': 4, 'features_generator__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropFeatures_Generator.NONE: 'none'>, 'num_iters__db9e60f9b8f0a43eff4b41917b6293d9': 1, 'search_parameter_level__db9e60f9b8f0a43eff4b41917b6293d9': <ChemPropSearch_Parameter_Level.AUTO: 'auto'>, 'descriptor': '{"name": "SmilesAndSideInfoFromFile", "parameters": {"file": "../tests/data/DRD2/subset-50/train_side_info.csv", "input_column": "canonical", "aux_weight_pc": {"low": 0, "high": 100, "q": 10}}}', 'aux_weight_pc__db9e60f9b8f0a43eff4b41917b6293d9': 10}. Best is trial 3 with value: -5795.086276167766.
-
-
-

In the toy example above, the ChemPropRegressor has been trialed with a variety of auxiliary weights ranging from 0-100%, using the SmilesAndSideInfoFromFile setting aux_weight_pc={"low": 0, "high": 100}.

-

The inlfuence of the weighting of side information on model performance next hence be explored via a scatterplot of the auxiliary weight percent as a product of the objective value:

-
-
[38]:
-
-
-
data = study.trials_dataframe().query('user_attrs_trial_ran==True') #drop any pruned/erroneous trials
-data.columns = [i.split('__')[0] for i in data.columns] # remove algorithm hash from columns
-ax = sns.scatterplot(data=data, x="params_aux_weight_pc", y="value")
-ax.set(xlabel="Aux weight percent (%)", ylabel=f"Ojbective value\n({default_reg_scoring})");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_101_0.png -
-
-

Hence we can conclude that 100% weighting of the side information produces the most performant ChemProp model

-
- -
-

Pre-training and adapting ChemProp models (Transfer Learning)

-

Transfer learning (TL) to adapt pre-trained models on a specific (wider) dataset to a specific dataset of interest in a similar manner to this publication can be performed in Qptuna. This option is available for ChemProp models and employs the original ChemProp package implementation. For example, a user can perform optimisation to pre-train a model using the -following:

-
-
[42]:
-
-
-
from optunaz.descriptors import SmilesFromFile
-from optunaz.config.optconfig import ChemPropRegressor
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[SmilesFromFile.new()],
-    algorithms=[
-        ChemPropRegressor.new(epochs=4),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-_ = build_best(buildconfig_best(study), "../target/pretrained.pkl")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:54:22,787] A new study created in memory with name: my_study
-[I 2024-08-23 10:54:22,788] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
-[I 2024-08-23 10:55:11,897] Trial 0 finished with value: -4937.540075659691 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -4937.540075659691.
-
-
-
-

The pretrained model saved to ../target/pretrained.pkl can now be supplied as an input for the ChemPropRegressorPretrained algorithm. This model can be retrained with (or adapted to) a new dataset (../tests/data/DRD2/subset-50/test.csv) like so:

-
-
[43]:
-
-
-
from optunaz.config.optconfig import ChemPropRegressorPretrained
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/test.csv",
-    ),
-    descriptors=[SmilesFromFile.new()],
-    algorithms=[
-        ChemPropRegressorPretrained.new(
-            pretrained_model='../target/pretrained.pkl',
-            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=4,high=4))
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:56:25,953] A new study created in memory with name: my_study
-[I 2024-08-23 10:56:25,998] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:57:15,462] Trial 0 finished with value: -5114.7131239123555 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': 'dfc518a76317f23d95e5aa5a3eac77f0', 'frzn__dfc518a76317f23d95e5aa5a3eac77f0': <ChemPropFrzn.NONE: 'none'>, 'epochs__dfc518a76317f23d95e5aa5a3eac77f0': 4, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5114.7131239123555.
-
-
-

Now we have the basics covered, we can now provide an example of how Qptuna can compare the performance of local, adapted and global (no epochs for transfer learning) models within a single optimisation job in the following example:

-
-
[44]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv", #  test.csv supplied for fair comparison
-        test_dataset_file="../tests/data/DRD2/subset-50/test.csv", #  test.csv supplied for fair comparison
-    ),
-    descriptors=[SmilesFromFile.new()],
-    algorithms=[
-        ChemPropRegressor.new(epochs=4), # local
-        ChemPropRegressorPretrained.new(
-            pretrained_model='../target/pretrained.pkl',
-            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=0,high=0)) # global
-    ,
-        ChemPropRegressorPretrained.new(
-            pretrained_model='../target/pretrained.pkl',
-            epochs=ChemPropRegressorPretrained.Parameters.ChemPropParametersEpochs(low=4,high=4)) #adapted
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=1,
-        n_trials=5,
-        n_startup_trials=0,
-        random_seed=0, # ensure all model types trialed
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-tl_study = optimize(config, study_name="my_study").trials_dataframe()
-
-tl_study['epochs'] = tl_study.loc[:,tl_study.columns.str.contains('params_epochs'
-            )].fillna(''
-            ).astype(str
-            ).agg(''.join, axis=1).astype(float) # merge epochs into one column
-
-tl_study.loc[~tl_study['params_ChemPropRegressor_algorithm_hash'].isna(),
-             "Model type"]='Local' # Annotate the local model
-
-tl_study.loc[tl_study['params_ChemPropRegressor_algorithm_hash'].isna()
-             & (tl_study['epochs'] == 4), "Model type"] = 'Adapted' # Annotate the adapted model (TL to new data)
-
-tl_study.loc[tl_study['params_ChemPropRegressor_algorithm_hash'].isna()
-             & (tl_study['epochs'] == 0), "Model type"] = 'Global' # Annotate the global model (no TL)
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:57:15,570] A new study created in memory with name: my_study
-[I 2024-08-23 10:57:15,572] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
-[I 2024-08-23 10:57:40,452] Trial 0 finished with value: -5891.7552821093905 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5891.7552821093905.
-[I 2024-08-23 10:58:04,994] Trial 1 finished with value: -5891.7552821093905 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 105.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 60.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -5891.7552821093905.
-[I 2024-08-23 10:58:33,237] Trial 2 finished with value: -5846.8674879655655 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 14.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 10.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 2.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.24, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 1600.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 900.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -1, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -2, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.8674879655655.
-[I 2024-08-23 10:58:57,861] Trial 3 finished with value: -5890.94653501547 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': '77dfc8230317e08504ed5e643243fbc2', 'frzn__77dfc8230317e08504ed5e643243fbc2': <ChemPropFrzn.NONE: 'none'>, 'epochs__77dfc8230317e08504ed5e643243fbc2': 0, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.8674879655655.
-[I 2024-08-23 10:59:22,283] Trial 4 finished with value: -5890.881210303758 and parameters: {'algorithm_name': 'ChemPropRegressorPretrained', 'ChemPropRegressorPretrained_algorithm_hash': 'dfc518a76317f23d95e5aa5a3eac77f0', 'frzn__dfc518a76317f23d95e5aa5a3eac77f0': <ChemPropFrzn.NONE: 'none'>, 'epochs__dfc518a76317f23d95e5aa5a3eac77f0': 4, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 2 with value: -5846.8674879655655.
-
-
-
-
[45]:
-
-
-
sns.set_theme(style="darkgrid")
-default_reg_scoring= config.settings.scoring
-ax = sns.scatterplot(data=tl_study, x="number",
-                     y="value",hue='Model type')
-ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
-sns.move_legend(ax, "upper right", bbox_to_anchor=(1.6, 1), ncol=1, title="")
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_119_0.png -
-
-

For this toy example we do not observe a large difference between the three model types, but in a real world setting a user can build the best model from the three model types evaluated.

-
-
-

ChemProp fingerprints (encode latent representation as descriptors)

-

It is possible for ChemProp to provide generate outputs in the form intended for use as a fingerprint using the original package implementation. Fingerprints are derived from the latent representation from the MPNN or penultimate FFN output layer, which can be used as a form of learned descriptor or fingerprint.

-
-
[46]:
-
-
-
with open("../target/pretrained.pkl", "rb") as f:
-    chemprop_model = pickle.load(f)
-
-ax = sns.heatmap(
-    chemprop_model.predictor.chemprop_fingerprint(
-    df[config.data.input_column].head(5),
-    fingerprint_type="MPN",
-    ), # MPN specified for illustration purposes - this is the default method in Qptuna
-    cbar_kws={'label': 'Fingerprint value'}
-)
-ax.set(ylabel="Compound query", xlabel=f"Latent representation\n(ChemProp descriptor/fingerprint)");
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_123_1.png -
-
-

The output is n compounds as the input query in the rows by n latent representation features from the MPN in the columns. This output can then be used for any semi-supervise learning approach outside of Qptuna, as required. Alternatively the last layer of the FFN can be used as so:

-
-
[47]:
-
-
-
ax = sns.heatmap(
-    chemprop_model.predictor.chemprop_fingerprint(
-    df[config.data.input_column].head(5),
-    fingerprint_type="last_FFN"), # Last FFN
-    cbar_kws={'label': 'Fingerprint value'}
-)
-ax.set(ylabel="Compound query", xlabel=f"Latent representation\n(ChemProp descriptor)");
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_125_1.png -
-
-

The 5 compounds in the user query are also represented by the rows, howeever the 300 features are now derived from the last output layer of the FFN

-
-
-
-

Probability calibration (classification)

-

When performing classification you often want not only to predict the class label, but also obtain a probability of the respective label. This probability gives you some kind of confidence on the prediction. Some models can give you poor estimates of the class probabilities. The CalibratedClassifierCV Qptuna models allow better calibration for the probabilities of a given model.

-

First, we should understand that well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

-

See the Scikit-learn documentation on the topic for more details.

-

The available methods are Sigmoid, Isotonic regression and VennABERS, and a review of those calibration methods for QSAR has been performed here.

-

we can review the effect of e.g. sigmoid calibration on the Random Forest algorithm by doing a calibrated run:

-
-
[48]:
-
-
-
from optunaz.config.optconfig import CalibratedClassifierCVWithVA, RandomForestClassifier
-from sklearn.calibration import calibration_curve
-import seaborn as sns
-
-from collections import defaultdict
-
-import pandas as pd
-
-from sklearn.metrics import (
-    precision_score,
-    recall_score,
-    f1_score,
-    brier_score_loss,
-    log_loss,
-    roc_auc_score,
-)
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
-    descriptors=[ECFP.new()],
-    algorithms=[ # the CalibratedClassifierCVWithVA is used here
-        CalibratedClassifierCVWithVA.new(
-            estimator=RandomForestClassifier.new(
-                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
-                    low=100, high=100
-                )
-            ),
-            n_folds=5,
-            ensemble="True",
-            method="sigmoid",
-        )
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        n_startup_trials=0,
-        n_jobs=-1,
-        direction=OptimizationDirection.MAXIMIZATION,
-        random_seed=42,
-    ),
-)
-
-study = optimize(config, study_name="calibrated_rf")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    calibrated_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:59:46,701] A new study created in memory with name: calibrated_rf
-[I 2024-08-23 10:59:46,703] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:59:47,775] Trial 0 finished with value: 0.8353535353535354 and parameters: {'algorithm_name': 'CalibratedClassifierCVWithVA', 'CalibratedClassifierCVWithVA_algorithm_hash': 'e788dfbfc5075967acb5ddf9d971ea20', 'n_folds__e788dfbfc5075967acb5ddf9d971ea20': 5, 'max_depth__e788dfbfc5075967acb5ddf9d971ea20': 16, 'n_estimators__e788dfbfc5075967acb5ddf9d971ea20': 100, 'max_features__e788dfbfc5075967acb5ddf9d971ea20': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8353535353535354.
-
-
-

followed by an uncalibrated run:

-
-
[49]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
-    descriptors=[ECFP.new()],
-    algorithms=[ # an uncalibrated RandomForestClassifier is used here
-        RandomForestClassifier.new(
-                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
-                    low=100, high=100
-                )
-        )
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        n_startup_trials=0,
-        n_jobs=-1,
-        direction=OptimizationDirection.MAXIMIZATION,
-        random_seed=42,
-    ),
-)
-
-study = optimize(config, study_name="uncalibrated_rf")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    uncalibrated_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:59:50,306] A new study created in memory with name: uncalibrated_rf
-[I 2024-08-23 10:59:50,352] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:59:50,719] Trial 0 finished with value: 0.8185858585858585 and parameters: {'algorithm_name': 'RandomForestClassifier', 'RandomForestClassifier_algorithm_hash': '167e1e88dd2a80133e317c78f009bdc9', 'max_depth__167e1e88dd2a80133e317c78f009bdc9': 16, 'n_estimators__167e1e88dd2a80133e317c78f009bdc9': 100, 'max_features__167e1e88dd2a80133e317c78f009bdc9': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8185858585858585.
-
-
-

Sigmoid calibration assigns more conservative probability estimates compared to the default RF, as shown by the lower median:

-
-
[50]:
-
-
-
df = pd.read_csv(
-    '../tests/data/DRD2/subset-1000/train.csv'
-    ).sample(500, random_state=123)  # Load and sample test data.
-expected = df[config.data.response_column]
-input_column = df[config.data.input_column]
-calibrated_predicted = uncalibrated_model.predict_from_smiles(input_column)
-uncalibrated_predicted = calibrated_model.predict_from_smiles(input_column)
-
-cal_df=pd.DataFrame(data={"default":uncalibrated_predicted,"sigmoid":calibrated_predicted})
-sns.boxplot(data=cal_df.melt(),x='value',y='variable').set_ylabel('');
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_133_0.png -
-
-

Plotting the (sigmoid) calibrated predictions as a function of uncalibrated (default) values further highlights the behaviour of the probability calibration scaling:

-
-
[51]:
-
-
-
# Plot expected vs predicted values for the best model.
-import matplotlib.pyplot as plt
-ax = plt.scatter(calibrated_predicted, uncalibrated_predicted)
-lims = [expected.min(), expected.max()]
-plt.plot(lims, lims)  # Diagonal line.
-plt.xlabel(f"Calibrated {config.data.response_column}");
-plt.ylabel(f"Uncalibrated {config.data.response_column}");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_135_0.png -
-
-

We can now visualize how well calibrated the predicted probabilities are using calibration curves. A calibration curve, also known as a reliability diagram, uses inputs from a binary classifier and plots the average predicted probability for each bin against the fraction of positive classes, on the y-axis. See here for more info.

-
-
[52]:
-
-
-
from sklearn.calibration import calibration_curve
-
-plt.figure(figsize=(10, 10))
-ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
-ax2 = plt.subplot2grid((3, 1), (2, 0))
-
-ax1.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")
-for pred, name in [(uncalibrated_predicted, 'default'),
-                  (calibrated_predicted, 'sigmoid')]:
-
-    fraction_of_positives, mean_predicted_value = \
-        calibration_curve(expected, pred, n_bins=10)
-
-    brier=brier_score_loss(expected,pred)
-
-    ax1.plot(mean_predicted_value, fraction_of_positives, "s-",
-             label="%s, brier=%.2f" % (name, brier))
-
-    ax2.hist(pred, range=(0, 1), bins=10, label=name,
-             histtype="step", lw=2)
-
-ax1.set_ylabel("Fraction of positives")
-ax1.set_ylim([-0.05, 1.05])
-ax1.legend(loc="lower right")
-ax1.set_title('Calibration plots  (reliability curve)')
-
-ax2.set_xlabel("Mean predicted value")
-ax2.set_ylabel("Count")
-ax2.legend(loc="upper center", ncol=2)
-
-plt.tight_layout()
-plt.show()
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_137_0.png -
-
-

The diagonal line on the calibration (scatter) plot indicates the situation when a classifier is perfectly calibrationed, when the proportion of active instances annotated by the model are perfectly captured by the probability generated by the model. Deviation above this line indicates when a classifier is under-confident, since the proportion of actives obtaining that score is higher than the score itself, and vice-versa, lines below indicate over-confident estimators, when the proportion of -actives obtaining a given score is lower.

-

Brier score loss (a metric composed of calibration term and refinement term) is one way to capture calibration calibration improvement (this is recorded in the legend above). Notice that this metric does not significantly alter the prediction accuracy measures (precision, recall and F1 score) as shown in the cell below. This is because calibration should not significantly change prediction probabilities at the location of the decision threshold (at x = 0.5 on the graph). Calibration should -however, make the predicted probabilities more accurate and thus more useful for making allocation decisions under uncertainty.

-
-
[53]:
-
-
-
from collections import defaultdict
-
-import pandas as pd
-
-from sklearn.metrics import (
-    precision_score,
-    recall_score,
-    f1_score,
-    brier_score_loss,
-    log_loss,
-    roc_auc_score,
-)
-
-scores = defaultdict(list)
-for i, (name, y_prob) in enumerate([('yes',calibrated_predicted), ('no',uncalibrated_predicted)]):
-
-    y_pred = y_prob > 0.5
-    scores["calibration"].append(name)
-
-    for metric in [brier_score_loss, log_loss]:
-        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()
-        scores[score_name].append(metric(expected, y_prob))
-
-    for metric in [precision_score, recall_score, f1_score, roc_auc_score]:
-        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()
-        scores[score_name].append(metric(expected, y_pred))
-
-    score_df = pd.DataFrame(scores).set_index("calibration")
-    score_df.round(decimals=3)
-
-score_df
-
-
-
-
-
[53]:
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Brier lossLog lossPrecisionRecallF1Roc auc
calibration
yes0.1847050.5471290.8305650.7440480.7849290.716536
no0.1752970.5294740.8112090.8184520.8148150.714104
-
-
-
-
-

Uncertainty estimation

-

Qptuna offers three different ways to calculate uncertainty estimates and they are returned along with the normal predictions in the format [[predictions], [uncertainties]]. The currently implemented methods are:

-
    -
  1. VennABERS calibration (a probability calibration covered in the section above).

  2. -
  3. Ensemble uncertainty (ChemProp models trained with random initialisations).

  4. -
  5. MAPIE (uncertainty for regression)

  6. -
-
-

VennABERS uncertainty

-

VennABERS (VA) uncertainty is implemented as in the section “Uses for the Multipoint Probabilities from the VA Predictors” from https://pubs.acs.org/doi/10.1021/acs.jcim.0c00476. This is based on the margin between the upper (p1) and lower (p0) probability bounary, output by the VennABERS algorithm. More details on this can be found in this tutorial

-
-
[54]:
-
-
-
from optunaz.config.optconfig import CalibratedClassifierCVWithVA, RandomForestClassifier
-from sklearn.calibration import calibration_curve
-import seaborn as sns
-
-from collections import defaultdict
-
-import pandas as pd
-
-from sklearn.metrics import (
-    precision_score,
-    recall_score,
-    f1_score,
-    brier_score_loss,
-    log_loss,
-    roc_auc_score,
-)
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-100/train.csv"),
-    descriptors=[ECFP.new()],
-    algorithms=[ # the CalibratedClassifierCVWithVA is used here
-        CalibratedClassifierCVWithVA.new(
-            estimator=RandomForestClassifier.new(
-                n_estimators=RandomForestClassifier.Parameters.RandomForestClassifierParametersNEstimators(
-                    low=100, high=100
-                )
-            ),
-            n_folds=5,
-            ensemble="True",
-            method="vennabers",
-        )
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        n_startup_trials=0,
-        n_jobs=-1,
-        direction=OptimizationDirection.MAXIMIZATION,
-        random_seed=42,
-    ),
-)
-
-study = optimize(config, study_name="calibrated_rf")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    calibrated_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:59:52,384] A new study created in memory with name: calibrated_rf
-[I 2024-08-23 10:59:52,430] A new study created in memory with name: study_name_0
-[I 2024-08-23 10:59:53,469] Trial 0 finished with value: 0.8213131313131313 and parameters: {'algorithm_name': 'CalibratedClassifierCVWithVA', 'CalibratedClassifierCVWithVA_algorithm_hash': '79765fbec1586f3c917ff30de274fdb4', 'n_folds__79765fbec1586f3c917ff30de274fdb4': 5, 'max_depth__79765fbec1586f3c917ff30de274fdb4': 16, 'n_estimators__79765fbec1586f3c917ff30de274fdb4': 100, 'max_features__79765fbec1586f3c917ff30de274fdb4': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8213131313131313.
-
-
-

VennABERS uncertainty can now be obtained by running inference and supplying uncert=True.

-
-
[55]:
-
-
-
from rdkit.Chem import AllChem
-from rdkit.Chem import PandasTools
-from rdkit import RDConfig
-from rdkit import DataStructs
-
-# get training data, mols & fingerprints
-train_df = pd.read_csv('../tests/data/DRD2/subset-100/train.csv')  # Load test data.
-PandasTools.AddMoleculeColumnToFrame(train_df,'canonical','molecule',includeFingerprints=True)
-train_df["fp"]=train_df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
-
-# get test data, mols & fingerprints and calculate the nn to training set
-df = pd.read_csv('../tests/data/DRD2/subset-1000/train.csv')  # Load test data.
-PandasTools.AddMoleculeColumnToFrame(df,'canonical','molecule',includeFingerprints=True)
-df["fp"]=df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
-df['nn']=df["fp"].apply(lambda x: max(DataStructs.BulkTanimotoSimilarity(x,[i for i in train_df["fp"]])))
-
-# add uncertainty & prediction to the df
-df['va_pred'], df['va_uncert'] = calibrated_model.predict_from_smiles(df[config.data.input_column], uncert=True)
-
-
-
-

It is possible to relate the uncertainty to the nearest neighbor (nn) to look for distance-to-model (DTM) effect and to the probabilistic output from the RF model scaled by VA:

-
-
[56]:
-
-
-
# Plot uncertainty as a function of nn or true label in a trellis for overview.
-fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
-sns.regplot(data=df,y='va_uncert',x='nn', ax=ax[0])
-sns.regplot(data=df,y='va_uncert',x='va_pred', ax=ax[1]).set_ylabel("")
-fig.tight_layout()
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_148_0.png -
-
-

Similar to the findings in the referenced scaling evaluation paper above, the lower and upper probability boundary intervals are shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict.

-
-
-

Ensemble uncertainty (ChemProp Only)

-

Training a ChemProp model with ensemble_size >1 will enable uncertainty estimation based on the implementation in the original ChemProp package, using the deviation of predictions from the ensemble of models trained with different random initialisation of the weights. This can be done like so:

-
-
[57]:
-
-
-
# Start with the imports.
-import sklearn
-from optunaz.three_step_opt_build_merge import (
-    optimize,
-    buildconfig_best,
-    build_best,
-    build_merged,
-)
-from optunaz.config import ModelMode, OptimizationDirection
-from optunaz.config.optconfig import (
-    OptimizationConfig,
-    ChemPropHyperoptRegressor,
-    ChemPropHyperoptClassifier
-)
-from optunaz.datareader import Dataset
-from optunaz.descriptors import SmilesFromFile
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        ChemPropClassifier.new(epochs=4, ensemble_size=5), #epochs=15 to ensure run finishes quickly
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    chemprop_model = pickle.load(f)
-
-# add chemprop uncertainty & prediction to the df
-df["cp_pred_ensemble"], df["cp_uncert_ensemble"] = chemprop_model.predict_from_smiles(df[config.data.input_column], uncert=True)
-
-
-
-
-
-
-
-
-[I 2024-08-23 10:59:57,886] A new study created in memory with name: my_study
-[I 2024-08-23 10:59:57,933] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__fd833c2dde0b7147e6516ea5eebb2657': 'ReLU', 'aggregation__fd833c2dde0b7147e6516ea5eebb2657': 'mean', 'aggregation_norm__fd833c2dde0b7147e6516ea5eebb2657': 100, 'batch_size__fd833c2dde0b7147e6516ea5eebb2657': 50, 'depth__fd833c2dde0b7147e6516ea5eebb2657': 3, 'dropout__fd833c2dde0b7147e6516ea5eebb2657': 0.0, 'features_generator__fd833c2dde0b7147e6516ea5eebb2657': 'none', 'ffn_hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300, 'ffn_num_layers__fd833c2dde0b7147e6516ea5eebb2657': 2, 'final_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300, 'init_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'max_lr_exp__fd833c2dde0b7147e6516ea5eebb2657': -3, 'warmup_epochs_ratio__fd833c2dde0b7147e6516ea5eebb2657': 0.1, 'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'fd833c2dde0b7147e6516ea5eebb2657'}
-[I 2024-08-23 11:07:48,137] Trial 0 finished with value: 0.65625 and parameters: {'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'fd833c2dde0b7147e6516ea5eebb2657', 'activation__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__fd833c2dde0b7147e6516ea5eebb2657': 100.0, 'batch_size__fd833c2dde0b7147e6516ea5eebb2657': 50.0, 'depth__fd833c2dde0b7147e6516ea5eebb2657': 3.0, 'dropout__fd833c2dde0b7147e6516ea5eebb2657': 0.0, 'ensemble_size__fd833c2dde0b7147e6516ea5eebb2657': 5, 'epochs__fd833c2dde0b7147e6516ea5eebb2657': 4, 'features_generator__fd833c2dde0b7147e6516ea5eebb2657': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300.0, 'ffn_num_layers__fd833c2dde0b7147e6516ea5eebb2657': 2.0, 'final_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'hidden_size__fd833c2dde0b7147e6516ea5eebb2657': 300.0, 'init_lr_ratio_exp__fd833c2dde0b7147e6516ea5eebb2657': -4, 'max_lr_exp__fd833c2dde0b7147e6516ea5eebb2657': -3, 'warmup_epochs_ratio__fd833c2dde0b7147e6516ea5eebb2657': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: 0.65625.
-
-
-
-
-
[58]:
-
-
-
# Plot uncertainty as a function of nn or true label in a trellis for overview.
-fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
-sns.regplot(data=df,y='cp_uncert_ensemble',x='nn', ax=ax[0])
-sns.regplot(data=df,y='cp_uncert_ensemble',x='cp_pred_ensemble', ax=ax[1]).set(ylabel='')
-fig.tight_layout()
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_153_0.png -
-
-

Similar to the VA uncertainty, the largest ensemble uncertainty is observed for test set molecules that are neither very similar nor very dissimilar to the active training set, which are hence difficult to predict. Larger uncertainty is also seen toward the midpoint of the ChemProp predictions, for cases when the probabilistic output from models is also neither very high nor very low.

-
-
-

ChemProp dropout uncertainty

-

ChemProp uncertainty based on dropout is available for single model and not an ensemble (i.e. when ChemProp is provided with ensemble_size=1. It is based on the implementation in the original ChemProp package

-

The method uses Monte Carlo dropout to generate a virtual ensemble of models and reports the ensemble variance of the predictions.

-

Note that this dropout is distinct from dropout regularization used during training, which is not active during predictions.

-
-
[59]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt_gt_330",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        ChemPropClassifier.new(epochs=5), #ensemble_size not supplied (defaults back to 1)
-                                                  #to ensure uncertainty will be based on dropout
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    chemprop_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 11:38:44,883] A new study created in memory with name: my_study
-[I 2024-08-23 11:38:44,931] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__c73885c5d5a4182168b8b002d321965a': 'ReLU', 'aggregation__c73885c5d5a4182168b8b002d321965a': 'mean', 'aggregation_norm__c73885c5d5a4182168b8b002d321965a': 100, 'batch_size__c73885c5d5a4182168b8b002d321965a': 50, 'depth__c73885c5d5a4182168b8b002d321965a': 3, 'dropout__c73885c5d5a4182168b8b002d321965a': 0.0, 'features_generator__c73885c5d5a4182168b8b002d321965a': 'none', 'ffn_hidden_size__c73885c5d5a4182168b8b002d321965a': 300, 'ffn_num_layers__c73885c5d5a4182168b8b002d321965a': 2, 'final_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'hidden_size__c73885c5d5a4182168b8b002d321965a': 300, 'init_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'max_lr_exp__c73885c5d5a4182168b8b002d321965a': -3, 'warmup_epochs_ratio__c73885c5d5a4182168b8b002d321965a': 0.1, 'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'c73885c5d5a4182168b8b002d321965a'}
-[I 2024-08-23 11:40:17,371] Trial 0 finished with value: 0.46875 and parameters: {'algorithm_name': 'ChemPropClassifier', 'ChemPropClassifier_algorithm_hash': 'c73885c5d5a4182168b8b002d321965a', 'activation__c73885c5d5a4182168b8b002d321965a': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__c73885c5d5a4182168b8b002d321965a': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__c73885c5d5a4182168b8b002d321965a': 100.0, 'batch_size__c73885c5d5a4182168b8b002d321965a': 50.0, 'depth__c73885c5d5a4182168b8b002d321965a': 3.0, 'dropout__c73885c5d5a4182168b8b002d321965a': 0.0, 'ensemble_size__c73885c5d5a4182168b8b002d321965a': 1, 'epochs__c73885c5d5a4182168b8b002d321965a': 5, 'features_generator__c73885c5d5a4182168b8b002d321965a': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__c73885c5d5a4182168b8b002d321965a': 300.0, 'ffn_num_layers__c73885c5d5a4182168b8b002d321965a': 2.0, 'final_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'hidden_size__c73885c5d5a4182168b8b002d321965a': 300.0, 'init_lr_ratio_exp__c73885c5d5a4182168b8b002d321965a': -4, 'max_lr_exp__c73885c5d5a4182168b8b002d321965a': -3, 'warmup_epochs_ratio__c73885c5d5a4182168b8b002d321965a': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: 0.46875.
-
-
-
-
-
[60]:
-
-
-
# add chemprop uncertainty & prediction to the df
-df["cp_pred_dropout"], df["cp_uncert_dropout"] = chemprop_model.predict_from_smiles(df[config.data.input_column], uncert=True)
-
-
-
-
-
-
-
-
-
-
-
-

Similar to previous findings using ensembling, the dropout approach toward uncertainty shows largest uncertainty for marginal cases neither similar not dissimilar to training, and with proabilities toward the midpoint (0.5):

-
-
[61]:
-
-
-
# Plot uncertainty as a function of nn or true label in a trellis for overview.
-fig, ax =plt.subplots(1,2, figsize=(10, 5), sharey=True)
-sns.regplot(data=df,y='cp_uncert_dropout',x='nn', ax=ax[0])
-sns.regplot(data=df,y='cp_uncert_dropout',x='cp_pred_dropout', ax=ax[1]).set(ylabel='')
-fig.tight_layout()
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_160_0.png -
-
-

Comparison of dropout vs. ensemble uncertainties can be performed as follows:

-
-
[62]:
-
-
-
# Plot uncertainty as a function of va_prediction and true label in a trellis for overview.
-r2 = r2_score(y_true=df['cp_uncert_dropout'], y_pred=df['cp_uncert_ensemble'])
-print(f"R2 correlation between drouput and ensemble uncertatinties:{r2:.2f}")
-
-fig, ax =plt.subplots(1,2, figsize=(10, 5))
-df['cp_uncert_delta']=df['cp_uncert_dropout']-df['cp_uncert_ensemble']
-sns.regplot(data=df,y='cp_uncert_dropout',x='cp_uncert_ensemble', ax=ax[0])
-sns.boxplot(data=df,y='cp_uncert_delta',x='activity', ax=ax[1])
-fig.tight_layout()
-
-
-
-
-
-
-
-
-INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
-INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
-
-
-
-
-
-
-
-R2 correlation between drouput and ensemble uncertatinties:-100.98
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_162_2.png -
-
-

Findings show that a limited correlation between dropout and ensemble uncertainty for the toy example (real world examples with more epochs/more predictive models will be different)

-
-
-

MAPIE (regression uncertainty)

-

For regression uncertainty, the MAPIE package is available within Qptuna for regression algorithms, and is selected like so:

-
-
[63]:
-
-
-
from optunaz.config.optconfig import Mapie
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-300/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[
-        ECFP.new(),
-    ],
-    algorithms=[Mapie.new( # mapie 'wraps' around a regressor of choice
-                estimator=RandomForestRegressor.new(n_estimators={"low": 50, "high": 50})
-    )
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    mapie = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:01:18,346] A new study created in memory with name: my_study
-[I 2024-08-23 12:01:18,391] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:01:20,582] Trial 0 finished with value: -4259.713886871285 and parameters: {'algorithm_name': 'Mapie', 'Mapie_algorithm_hash': '976d211e4ac64e5568d369bcddd3aeb1', 'mapie_alpha__976d211e4ac64e5568d369bcddd3aeb1': 0.05, 'max_depth__976d211e4ac64e5568d369bcddd3aeb1': 22, 'n_estimators__976d211e4ac64e5568d369bcddd3aeb1': 50, 'max_features__976d211e4ac64e5568d369bcddd3aeb1': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -4259.713886871285.
-
-
-

Analysis of the nn’s and behaviour of uncertainty vs. predicted values can be perfomed like so:

-
-
[64]:
-
-
-
# get training data, mols & fingerprints
-train_df = pd.read_csv('../tests/data/DRD2/subset-300/train.csv')  # Load test data.
-PandasTools.AddMoleculeColumnToFrame(train_df,'canonical','molecule',includeFingerprints=True)
-train_df["fp"]=train_df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
-
-# get test data, mols & fingerprints and calculate the nn to training set
-df = pd.read_csv('../tests/data/DRD2/subset-50/train.csv')  # Load test data.
-PandasTools.AddMoleculeColumnToFrame(df,'canonical','molecule',includeFingerprints=True)
-df["fp"]=df["molecule"].apply(lambda x: AllChem.GetMorganFingerprint(x,2 ))
-df['nn']=df["fp"].apply(lambda x: max(DataStructs.BulkTanimotoSimilarity(x,[i for i in train_df["fp"]])))
-
-mapie.predictor.mapie_alpha=0.99 # it is possible to alter the alpha of mapie post-train using this approach
-
-# add uncertainty & prediction to the df
-df['mapie_pred'], df['mapie_unc'] = mapie.predict_from_smiles(df[config.data.input_column], uncert=True)
-
-
-
-

Plotting mapie uncertainty as a product of the nearest neighbors/mapie predictions is performed here:

-
-
[65]:
-
-
-
# Plot uncertainty as a function of nn or true label in a trellis for overview.
-fig, ax =plt.subplots(1,3, figsize=(10, 5))
-sns.regplot(data=df,y='mapie_unc',x='nn', ax=ax[0])
-sns.regplot(data=df,y='mapie_unc',x='mapie_pred', ax=ax[1])
-sns.regplot(data=df,y=df[config.data.response_column],x='mapie_pred', ax=ax[2])
-fig.tight_layout()
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_170_0.png -
-
-

Further analysis of the uncertainty using error bars is shown here:

-
-
[66]:
-
-
-
# Plot true value as a function of predicted value, with MAPIE uncertainty error bars for visualisation.
-plt.figure(figsize=(12,5))
-plt.errorbar(df[config.data.response_column], df['mapie_pred'], yerr=df['mapie_unc'].abs(), fmt='o',color='black', alpha=.8, ecolor='gray', elinewidth=1, capsize=10);
-plt.xlabel('Predicted Mw');
-plt.ylabel('Expected Mw');
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_172_0.png -
-
-

where more certain predictions have smaller error bars.

-

The same analysis can be performed by plotting similarity to nn’s (increasing similarity to the training set moving from left to right on the x-axis):

-
-
[67]:
-
-
-
# Plot true value as a function of predicted value, with MAPIE uncertainty error bars for visualisation.
-plt.figure(figsize=(12,5))
-plt.errorbar(df['nn'], df['mapie_pred'], yerr=df['mapie_unc'].abs(), fmt='o',color='black', alpha=.8, ecolor='gray', elinewidth=1, capsize=10);
-plt.xlabel('Nearest neighbor (NN) similarity');
-plt.ylabel('Expected Mw');
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_174_0.png -
-
-

The MAPIE package uses the alpha parameter to set the uncertainty of the confidence interval, see here for details. It is possible to alter the uncertainty of the confidence interval by setting the mapie_alpha parameter of the Qptuna model predictor. Here lower alpha produce larger (more conservative) prediction intervals. N.B: alpha is set to 0.05 by default and will hence provide more conservative predictions if not changed.

-

The alpha settings as a function of uncertainty (over all point predictions) can be analysed for our toy example using the following (error bars denote deviations across all point predictions which have been extended by two standard error widths):

-
-
[68]:
-
-
-
alpha_impact=[]
-for ma in range(1,100,5):
-    mapie.predictor.mapie_alpha=ma/100
-    preds = mapie.predict_from_smiles(df[config.data.input_column], uncert=True)
-    unc_df = pd.DataFrame(
-    data={
-        "pred": preds[0],
-        "unc": preds[1],
-        "alpha": ma,
-        }
-    )
-    alpha_impact.append(unc_df.reset_index())
-alpha_impact=pd.concat(alpha_impact).reset_index(drop=True)
-
-sns.lineplot(data=alpha_impact[alpha_impact['index']<=20],x='alpha',y='unc',err_style="bars", errorbar=("se", 2))
-plt.xlabel('MAPIE Alpha');
-plt.ylabel('MAPIE uncertainty (±MW)');
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_176_0.png -
-
-

As expected larger alpha values produce smaller (less conservative) prediction intervals.

-
-
-
-

Explainability

-

Model explainability is incorporated into Qptuna using two different approaches, depending on the algorithm chosen: 1. SHAP: Any shallow algorithm is compatible with the SHAP package (even traditionally unsupported packages use the KernelExplainer) 2. ChemProp interpret: This explainability approach is based on the interpret function in the original ChemProp package

-
-

SHAP

-

SHAP (SHapley Additive exPlanations) are available in Qptuna based on the implementation available at https://github.com/slundberg/shap. The method uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see -here for more details on the published tool and here for papers using the approach).

-

In the following example, a RIDGE regressor is trained using the a comopsite descriptor based on the ECFP, MACCS keys and PhysChem descriptors:

-
-
[69]:
-
-
-
from optunaz.descriptors import CompositeDescriptor, UnscaledPhyschemDescriptors, UnscaledJazzyDescriptors
-
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[
-        CompositeDescriptor.new(
-            descriptors=[
-                ECFP.new(),
-                MACCS_keys.new(),
-                UnscaledJazzyDescriptors.new(),
-                UnscaledPhyschemDescriptors.new(),
-            ]
-        )
-    ],
-    algorithms=[
-        Ridge.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    ridge = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:01:26,101] A new study created in memory with name: my_study
-[I 2024-08-23 12:01:26,146] A new study created in memory with name: study_name_0
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_ridge.py:243: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
-  warnings.warn(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_ridge.py:243: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
-  warnings.warn(
-[I 2024-08-23 12:01:27,578] Trial 0 finished with value: -0.36553318492385256 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.9346053663473015, 'descriptor': '{"parameters": {"descriptors": [{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}, {"name": "MACCS_keys", "parameters": {}}, {"name": "UnscaledJazzyDescriptors", "parameters": {"jazzy_names": ["dga", "dgp", "dgtot", "sa", "sdc", "sdx"], "jazzy_filters": {"NumHAcceptors": 25, "NumHDonors": 25, "MolWt": 1000}}}, {"name": "UnscaledPhyschemDescriptors", "parameters": {"rdkit_names": ["MaxAbsEStateIndex", "MaxEStateIndex", "MinAbsEStateIndex", "MinEStateIndex", "qed", "SPS", "MolWt", "HeavyAtomMolWt", "ExactMolWt", "NumValenceElectrons", "NumRadicalElectrons", "MaxPartialCharge", "MinPartialCharge", "MaxAbsPartialCharge", "MinAbsPartialCharge", "FpDensityMorgan1", "FpDensityMorgan2", "FpDensityMorgan3", "BCUT2D_MWHI", "BCUT2D_MWLOW", "BCUT2D_CHGHI", "BCUT2D_CHGLO", "BCUT2D_LOGPHI", "BCUT2D_LOGPLOW", "BCUT2D_MRHI", "BCUT2D_MRLOW", "AvgIpc", "BalabanJ", "BertzCT", "Chi0", "Chi0n", "Chi0v", "Chi1", "Chi1n", "Chi1v", "Chi2n", "Chi2v", "Chi3n", "Chi3v", "Chi4n", "Chi4v", "HallKierAlpha", "Ipc", "Kappa1", "Kappa2", "Kappa3", "LabuteASA", "PEOE_VSA1", "PEOE_VSA10", "PEOE_VSA11", "PEOE_VSA12", "PEOE_VSA13", "PEOE_VSA14", "PEOE_VSA2", "PEOE_VSA3", "PEOE_VSA4", "PEOE_VSA5", "PEOE_VSA6", "PEOE_VSA7", "PEOE_VSA8", "PEOE_VSA9", "SMR_VSA1", "SMR_VSA10", "SMR_VSA2", "SMR_VSA3", "SMR_VSA4", "SMR_VSA5", "SMR_VSA6", "SMR_VSA7", "SMR_VSA8", "SMR_VSA9", "SlogP_VSA1", "SlogP_VSA10", "SlogP_VSA11", "SlogP_VSA12", "SlogP_VSA2", "SlogP_VSA3", "SlogP_VSA4", "SlogP_VSA5", "SlogP_VSA6", "SlogP_VSA7", "SlogP_VSA8", "SlogP_VSA9", "TPSA", "EState_VSA1", "EState_VSA10", "EState_VSA11", "EState_VSA2", "EState_VSA3", "EState_VSA4", "EState_VSA5", "EState_VSA6", "EState_VSA7", "EState_VSA8", "EState_VSA9", "VSA_EState1", "VSA_EState10", "VSA_EState2", "VSA_EState3", "VSA_EState4", "VSA_EState5", "VSA_EState6", "VSA_EState7", "VSA_EState8", "VSA_EState9", "FractionCSP3", "HeavyAtomCount", "NHOHCount", "NOCount", "NumAliphaticCarbocycles", "NumAliphaticHeterocycles", "NumAliphaticRings", "NumAromaticCarbocycles", "NumAromaticHeterocycles", "NumAromaticRings", "NumHAcceptors", "NumHDonors", "NumHeteroatoms", "NumRotatableBonds", "NumSaturatedCarbocycles", "NumSaturatedHeterocycles", "NumSaturatedRings", "RingCount", "MolLogP", "MolMR", "fr_Al_COO", "fr_Al_OH", "fr_Al_OH_noTert", "fr_ArN", "fr_Ar_COO", "fr_Ar_N", "fr_Ar_NH", "fr_Ar_OH", "fr_COO", "fr_COO2", "fr_C_O", "fr_C_O_noCOO", "fr_C_S", "fr_HOCCN", "fr_Imine", "fr_NH0", "fr_NH1", "fr_NH2", "fr_N_O", "fr_Ndealkylation1", "fr_Ndealkylation2", "fr_Nhpyrrole", "fr_SH", "fr_aldehyde", "fr_alkyl_carbamate", "fr_alkyl_halide", "fr_allylic_oxid", "fr_amide", "fr_amidine", "fr_aniline", "fr_aryl_methyl", "fr_azide", "fr_azo", "fr_barbitur", "fr_benzene", "fr_benzodiazepine", "fr_bicyclic", "fr_diazo", "fr_dihydropyridine", "fr_epoxide", "fr_ester", "fr_ether", "fr_furan", "fr_guanido", "fr_halogen", "fr_hdrzine", "fr_hdrzone", "fr_imidazole", "fr_imide", "fr_isocyan", "fr_isothiocyan", "fr_ketone", "fr_ketone_Topliss", "fr_lactam", "fr_lactone", "fr_methoxy", "fr_morpholine", "fr_nitrile", "fr_nitro", "fr_nitro_arom", "fr_nitro_arom_nonortho", "fr_nitroso", "fr_oxazole", "fr_oxime", "fr_para_hydroxylation", "fr_phenol", "fr_phenol_noOrthoHbond", "fr_phos_acid", "fr_phos_ester", "fr_piperdine", "fr_piperzine", "fr_priamide", "fr_prisulfonamd", "fr_pyridine", "fr_quatN", "fr_sulfide", "fr_sulfonamd", "fr_sulfone", "fr_term_acetylene", "fr_tetrazole", "fr_thiazole", "fr_thiocyan", "fr_thiophene", "fr_unbrch_alkane", "fr_urea"]}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -0.36553318492385256.
-
-
-

Predictions from the algorithms can be explained like so:

-
-
[70]:
-
-
-
ridge.predict_from_smiles(df[config.data.input_column], explain=True).query('shap_value > 0')
-
-
-
-
-
[70]:
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
shap_valuedescriptorbitinfo
22272.042517e+01UnscaledPhyschemDescriptors7.0MolWt
22292.025057e+01UnscaledPhyschemDescriptors9.0ExactMolWt
22281.804876e+01UnscaledPhyschemDescriptors8.0HeavyAtomMolWt
22672.372192e+00UnscaledPhyschemDescriptors47.0LabuteASA
22302.106846e+00UnscaledPhyschemDescriptors10.0NumValenceElectrons
...............
4405.352611e-07ECFP441.0N(C(C)=O)(C)Cc
13755.352611e-07ECFP1376.0S1(=O)(=O)N=C(c)C=C(C)N1C
17845.352611e-07ECFP1785.0c1(OC)c(OC)ccc(C)c1
9955.352611e-07ECFP996.0C(C(N)=C)(=O)N(C)C
16175.352611e-07ECFP1618.0N(C(C(N)=C)=O)(C)Cc(c)c
-

1570 rows × 4 columns

-
-
-

Outputs are ordered by shap_value (higher is more important). We see that the UnscaledPhyschemDescriptors bits corresponding to e.g. MolWt, ExactMolWt, HeavyAtomMolWt and NumValenceElectrons. We can hence interpret these as the most important features contrinubting to predicting the MolWt for the DRD2 datset. UnscaledPhyschemJazzy descriptors are also ranked relatively high in the list.

-

Other descriptor types in the composite descriptor such as the ECFP fingerprints are also shown in the output. ECFP bits are translated to the atom environments for which the bit was turned on within the training set.

-

Other descriptors are less interpretable as no additional information is available in the info column.

-
-
-

ChemProp interpret

-

ChemProp explainability is based on the interpret in the original package.

-

The follow example shows the usage:

-
-
[71]:
-
-
-
config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="../tests/data/DRD2/subset-50/train.csv",  # This will be split into train and test.
-    ),
-    descriptors=[SmilesFromFile.new()],
-    algorithms=[
-        ChemPropRegressor.new(epochs=4),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=1,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-study = optimize(config, study_name="my_study")
-build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    chemprop = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:01:32,053] A new study created in memory with name: my_study
-[I 2024-08-23 12:01:32,251] A new study created in memory with name: study_name_0
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__e0d3a442222d4b38f3aa1434851320db': 'ReLU', 'aggregation__e0d3a442222d4b38f3aa1434851320db': 'mean', 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50, 'depth__e0d3a442222d4b38f3aa1434851320db': 3, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'features_generator__e0d3a442222d4b38f3aa1434851320db': 'none', 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db'}
-[I 2024-08-23 12:02:21,742] Trial 0 finished with value: -4937.540075659691 and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': 'e0d3a442222d4b38f3aa1434851320db', 'activation__e0d3a442222d4b38f3aa1434851320db': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__e0d3a442222d4b38f3aa1434851320db': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__e0d3a442222d4b38f3aa1434851320db': 100.0, 'batch_size__e0d3a442222d4b38f3aa1434851320db': 50.0, 'depth__e0d3a442222d4b38f3aa1434851320db': 3.0, 'dropout__e0d3a442222d4b38f3aa1434851320db': 0.0, 'ensemble_size__e0d3a442222d4b38f3aa1434851320db': 1, 'epochs__e0d3a442222d4b38f3aa1434851320db': 4, 'features_generator__e0d3a442222d4b38f3aa1434851320db': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'ffn_num_layers__e0d3a442222d4b38f3aa1434851320db': 2.0, 'final_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'hidden_size__e0d3a442222d4b38f3aa1434851320db': 300.0, 'init_lr_ratio_exp__e0d3a442222d4b38f3aa1434851320db': -4, 'max_lr_exp__e0d3a442222d4b38f3aa1434851320db': -3, 'warmup_epochs_ratio__e0d3a442222d4b38f3aa1434851320db': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}. Best is trial 0 with value: -4937.540075659691.
-
-
-
-
-
[72]:
-
-
-
build_best(buildconfig_best(study), "../target/best.pkl")
-with open("../target/best.pkl", "rb") as f:
-    chemprop = pickle.load(f)
-
-
-
-
-
-
-
-
-
-
-
-

Similar to SHAP, ChemProp explainability inference is called using the explain flag from the predict_from_smiles

-
-
[73]:
-
-
-
chemprop.predict_from_smiles(df[config.data.input_column].head(5), explain=True)
-
-
-
-
-
-
-
-
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 8 9 18 19 20 21 22 23 24
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 4 5 14 15 16 17 18 19 20
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14 15 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14 15 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 9 10 11 12 13 14 15
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10 11 12 13 14
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 7 11
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 7 10
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19 20 21 22 23
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14 15 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 9 10 11
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15 16 17 18 19
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 5 6 7
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 8 9 18 19 20
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 4 5 14 15 16
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 12 13 14
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 8 9 13 14 15 16 17 18 19
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 4 5 9 10 11 12 13 14 15
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9 10 11 12 13
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9 10 11 12 13
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 8 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 8 11
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 8 9 10
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 9 10
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 6 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 5 6 7
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 4 5 6
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17 18 19 20 21
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12 13 14 15 16
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 11 14
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 10 13
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 9 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 7 8 9 11 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 0 1 3 4 5
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 4 5 9 10 11
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 7 8 9
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19
-[12:04:46] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 9 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 7 8 9
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 8 9 12 13 14 15 16 17 18
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12 13 14 15 16
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 4 5 6 9 10
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 17 18 19
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 10 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 8 9 10 12 13
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18 19 20 21 22
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 12 16
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 11 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 10 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 9 13
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 8 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 10 11 12
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 4 5 6 9 10
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 4 5 8 9 10
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 15 16 17
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 10 11 12 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 8 9 10
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 5 6 7
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14 15 16 17 18
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 11 12 13 14 15 16 17
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 9 10
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 12 13 14 15 16 17 18
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4 5 6 12 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 16 17 18
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 6 7 8
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 13 14 15 16 17 18 19
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 12 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 0 1 4 5 6
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 8 9 13 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 8 9 10 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 8 9 12 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 9 10 11 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 9 12 13
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 7 8 11 12 13
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 6 7 11 12 13 14 15 16 17
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 9 10 11 12 13 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 5 6 7 10 11
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 10 11 12 15 16
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 9 10 11 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 13 14
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 7 8
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 7
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 10 11 12 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 15 16
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 2 3 4 14 15
-[12:04:47] Can't kekulize mol.  Unkekulized atoms: 11 12 13 15 16
-
-
-
-
[73]:
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
smilesscorerationalerationale_score
0Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1386.097c1cc(CO[CH3:1])c[cH:1]c1389.151
0O=C(Nc1ccc(F)cc1F)Nc1sccc1-c1nc2ccccc2s1389.485c1c[cH:1]c[cH:1]c1N[CH2:1][NH2:1]388.565
0COC(=O)c1ccccc1NC(=O)c1cc([N+](=O)[O-])nn1Cc1c...384.720CO[CH2:1]c1cccc[cH:1]1389.151
0CCOC(=O)C(C)Sc1nc(-c2ccccc2)ccc1C#N387.110c1c[cH:1]c(S[CH2:1][CH3:1])n[cH:1]1388.871
0CCC(CC)NC(=O)c1nn(Cc2ccccc2)c(=O)c2ccccc12388.997n1c([CH2:1]N[CH3:1])[cH:1][cH:1][cH:1][n:1]1387.854
-
-
-

The output contians the following:

-
    -
  • The first column is a molecule and second column is its predicted property (in this dummy case MolWt).

  • -
  • The third column is the smallest substructure that made this molecule obtain that MolWt prediction (called rationale).

  • -
  • The fourth column is the predicted MolWt of that substructure.

  • -
-
-
-
-

Log transformation

-

Qptuna can be used to transform input labels so that log-scaled or irregularly distributed data can be transformed to a normal distribution as required for most Machine Learning inputs. The following example shows how XC50 values can be scaled to pXC50 values by using the -Log10 to the 6th unit conversion, like so:

-
-
[74]:
-
-
-
from optunaz.utils.preprocessing.transform import (
-    LogBase,
-    LogNegative,
-    ModelDataTransform
-)
-
-config = OptimizationConfig(
-        data=Dataset(
-        input_column="Smiles",
-        response_column="Measurement",
-        response_type="regression",
-        training_dataset_file="../tests/data/sdf/example.sdf",
-        split_strategy=Stratified(fraction=0.4),
-        deduplication_strategy=KeepMedian(),
-        log_transform=True, # Set to True to perform
-        log_transform_base=LogBase.LOG10, # Log10 base will be used
-        log_transform_negative=LogNegative.TRUE, # Negated transform for the pXC50 calculation
-        log_transform_unit_conversion=6, # 6 units used for pXC50 conversion
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        n_startup_trials=50,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-transformed_study = optimize(config, study_name="transform_example")
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:49,701] A new study created in memory with name: transform_example
-[I 2024-08-23 12:04:49,746] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:04:51,220] Trial 0 finished with value: -0.5959493772536109 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
-[I 2024-08-23 12:04:51,287] Trial 1 finished with value: -0.6571993250300608 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
-[I 2024-08-23 12:04:51,426] Trial 2 finished with value: -4.1511102853256885 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.5959493772536109.
-[I 2024-08-23 12:04:51,516] Trial 3 finished with value: -1.2487063317112765 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -0.5959493772536109.
-[I 2024-08-23 12:04:51,532] Trial 4 finished with value: -0.6714912461080983 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.5959493772536109.
-[I 2024-08-23 12:04:51,550] Trial 5 finished with value: -0.2725944467796781 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,606] Trial 6 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,621] Trial 7 finished with value: -0.7520919188596032 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,748] Trial 8 finished with value: -0.7803723847416691 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,764] Trial 9 finished with value: -0.6397753979196248 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,781] Trial 10 finished with value: -4.151110299986041 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,798] Trial 11 finished with value: -4.151110111437006 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,813] Trial 12 finished with value: -0.5410418750776741 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,829] Trial 13 finished with value: -0.7183231137124538 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.2725944467796781.
-[I 2024-08-23 12:04:51,845] Trial 14 finished with value: -0.2721824844856162 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:51,912] Trial 15 finished with value: -1.19009294702225 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:51,929] Trial 16 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:51,945] Trial 17 finished with value: -0.5585323973564646 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,012] Trial 18 finished with value: -1.3169218304262786 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,028] Trial 19 finished with value: -0.7974925066137679 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,045] Trial 20 finished with value: -1.218395226466336 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,062] Trial 21 finished with value: -1.1474226942497083 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,066] Trial 22 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:52,084] Trial 23 finished with value: -1.0239005731675412 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,150] Trial 24 finished with value: -0.7803723847416691 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,169] Trial 25 finished with value: -2.178901060853144 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.2721824844856162.
-[I 2024-08-23 12:04:52,187] Trial 26 finished with value: -0.27137790098830755 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 26 with value: -0.27137790098830755.
-[I 2024-08-23 12:04:52,206] Trial 27 finished with value: -0.2710284516876423 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,259] Trial 28 finished with value: -1.3169218304262786 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.5410418750776741]
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:52,278] Trial 29 finished with value: -3.6273152492418945 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,344] Trial 30 finished with value: -1.1900929470222508 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,361] Trial 31 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,379] Trial 32 finished with value: -2.1907041717628215 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,398] Trial 33 finished with value: -1.3209075619139279 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.2710284516876423.
-[I 2024-08-23 12:04:52,403] Trial 34 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:52,421] Trial 35 finished with value: -0.2709423025014604 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,438] Trial 36 finished with value: -1.3133943310851415 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,443] Trial 37 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:52,461] Trial 38 finished with value: -1.257769959239938 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,588] Trial 39 finished with value: -0.40359637945134746 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.5410418750776741]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1.218395226466336]
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:52,658] Trial 40 finished with value: -0.4127882135896648 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,663] Trial 41 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:52,734] Trial 42 finished with value: -0.5959493772536109 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,754] Trial 43 finished with value: -0.9246005133276612 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-1.2487063317112765]
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:52,885] Trial 44 finished with value: -0.8908739215746118 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,905] Trial 45 finished with value: -1.107536316777608 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,925] Trial 46 finished with value: -2.194926264155893 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,945] Trial 47 finished with value: -4.054360360588395 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,964] Trial 48 finished with value: -0.5428179904345867 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:52,983] Trial 49 finished with value: -0.5696273642213351 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,007] Trial 50 finished with value: -0.27099769667470536 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1580741708125475, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,031] Trial 51 finished with value: -0.2709564785634315 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10900413894771653, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,056] Trial 52 finished with value: -0.2709799905898163 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.13705914456987853, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,080] Trial 53 finished with value: -0.27097230608092054 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.12790870116376127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,102] Trial 54 finished with value: -0.2709499903064464 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10123180962907431, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,127] Trial 55 finished with value: -0.2710895886052581 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.26565663774320425, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.2709423025014604.
-[I 2024-08-23 12:04:53,149] Trial 56 finished with value: -0.2708711012023424 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.005637048678674678, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
-[I 2024-08-23 12:04:53,174] Trial 57 finished with value: -0.27092322402109364 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06902647427781451, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
-[I 2024-08-23 12:04:53,200] Trial 58 finished with value: -0.2712140349882 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4076704953178294, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
-[I 2024-08-23 12:04:53,224] Trial 59 finished with value: -0.27090080367174 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.04187106800188596, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.2708711012023424.
-[I 2024-08-23 12:04:53,246] Trial 60 finished with value: -0.27086925247190047 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.003371853599610078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,271] Trial 61 finished with value: -0.2708933298483799 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.032781796328385376, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,296] Trial 62 finished with value: -0.27087205624489635 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.006806773659187283, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,318] Trial 63 finished with value: -0.2708869511176179 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.025009489814943348, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,344] Trial 64 finished with value: -0.2711465077924297 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.3311125627707556, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,369] Trial 65 finished with value: -0.2708756855936628 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011249102380159387, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,395] Trial 66 finished with value: -0.27087301924224993 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.007985924302396141, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.27086925247190047.
-[I 2024-08-23 12:04:53,419] Trial 67 finished with value: -0.2708685399954944 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00249856291483601, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
-[I 2024-08-23 12:04:53,444] Trial 68 finished with value: -0.27121879554836553 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4130244908975993, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
-[I 2024-08-23 12:04:53,470] Trial 69 finished with value: -0.2708693196600531 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0034541978803366022, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
-[I 2024-08-23 12:04:53,496] Trial 70 finished with value: -0.27110195265802334 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.27994943662091765, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.2708685399954944.
-[I 2024-08-23 12:04:53,520] Trial 71 finished with value: -0.2708682582859318 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0021532199144365088, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,545] Trial 72 finished with value: -0.27087024523986086 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0045884092728113585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,569] Trial 73 finished with value: -0.27087351807632193 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.008596600952859433, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,595] Trial 74 finished with value: -0.2710818633795896 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2567049271070902, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,620] Trial 75 finished with value: -0.27103241786565463 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1990111983307052, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,646] Trial 76 finished with value: -0.2710350879598171 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.20214459724424078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,672] Trial 77 finished with value: -0.2708688328221868 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00285750520671645, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,697] Trial 78 finished with value: -0.27100832234449684 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.17064008990759916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,721] Trial 79 finished with value: -0.27268613236193845 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8725420109733135, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,747] Trial 80 finished with value: -0.27119617446689237 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.387533542012365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,773] Trial 81 finished with value: -0.2708691110831552 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0031985656730512953, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,800] Trial 82 finished with value: -0.27086852174155146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.002476186542950981, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,823] Trial 83 finished with value: -0.27135383618835024 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5626643670396761, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,848] Trial 84 finished with value: -0.2709819654433871 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1394077979875128, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,876] Trial 85 finished with value: -0.2718548944510965 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.0858347526799794, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,905] Trial 86 finished with value: -4.1508084699212935 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03329943145150872, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00025672309762227527, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,931] Trial 87 finished with value: -0.27249853374634975 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.702026434077893, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,960] Trial 88 finished with value: -0.27095660957755363 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10916094511173127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:53,987] Trial 89 finished with value: -0.27102160995407715 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.18630665884100353, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,014] Trial 90 finished with value: -0.27095708822582026 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10973377642487026, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,041] Trial 91 finished with value: -0.27088222008661084 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.019235980282946118, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,069] Trial 92 finished with value: -0.2708703086029017 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.004666043957133775, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,097] Trial 93 finished with value: -0.27095279044622245 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1045877457096882, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,125] Trial 94 finished with value: -0.2709408288690431 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.09023455456986404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,152] Trial 95 finished with value: -0.9289218260898663 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8200088368788958, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.2708682582859318.
-[I 2024-08-23 12:04:54,181] Trial 96 finished with value: -0.27086675101898655 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00030502148265565063, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
-[I 2024-08-23 12:04:54,209] Trial 97 finished with value: -0.2710491243757999 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.21858260742423916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
-[I 2024-08-23 12:04:54,239] Trial 98 finished with value: -4.1491615840508995 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.024725853754515203, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
-[I 2024-08-23 12:04:54,265] Trial 99 finished with value: -0.2709462479577586 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0967427718847167, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.27086675101898655.
-
-
-

In comparison, Qptuna does not normally transform the data:

-
-
[75]:
-
-
-
config = OptimizationConfig(
-        data=Dataset(
-        input_column="Smiles",
-        response_column="Measurement",
-        response_type="regression",
-        training_dataset_file="../tests/data/sdf/example.sdf",
-        split_strategy=Stratified(fraction=0.4),
-        deduplication_strategy=KeepMedian(),
-        log_transform=False, # Shown for illustration: Log transform defaults to False
-        log_transform_base=None, # Shown for illustration: Log10 base is None/ignored if not log scaled
-        log_transform_negative=None, # Shown for illustration: negation is None/ignored if not log scaled
-        log_transform_unit_conversion=None, # Shown for illustration: conversion is None/ignored if not log scaled
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        n_startup_trials=50,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-default_study = optimize(config, study_name="non-transform_example")
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:56,746] A new study created in memory with name: non-transform_example
-[I 2024-08-23 12:04:56,748] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:04:56,837] Trial 0 finished with value: -3501.942111261296 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3501.942111261296.
-[I 2024-08-23 12:04:56,905] Trial 1 finished with value: -5451.207265576796 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -3501.942111261296.
-[I 2024-08-23 12:04:56,945] Trial 2 finished with value: -208.1049201007814 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:56,986] Trial 3 finished with value: -9964.541364058234 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,003] Trial 4 finished with value: -3543.953608539901 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,023] Trial 5 finished with value: -6837.057544630979 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,043] Trial 6 finished with value: -2507.1794330606067 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,072] Trial 7 finished with value: -21534.719219668405 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,137] Trial 8 finished with value: -2899.736555614694 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.294e+02, tolerance: 2.760e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 12:04:57,167] Trial 9 finished with value: -21674.445000284228 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,181] Trial 10 finished with value: -208.1049203123567 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 2 with value: -208.1049201007814.
-[I 2024-08-23 12:04:57,198] Trial 11 finished with value: -208.1049192609138 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,215] Trial 12 finished with value: -3630.72768093756 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,233] Trial 13 finished with value: -3431.942816967268 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,249] Trial 14 finished with value: -6908.462045154488 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,314] Trial 15 finished with value: -5964.65935954044 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,332] Trial 16 finished with value: -21070.107195348774 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,348] Trial 17 finished with value: -4977.068508997133 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,404] Trial 18 finished with value: -8873.669262669626 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,432] Trial 19 finished with value: -21387.63697424318 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,448] Trial 20 finished with value: -9958.573006910125 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 11 with value: -208.1049192609138.
-[I 2024-08-23 12:04:57,463] Trial 21 finished with value: -180.5182695600183 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 21 with value: -180.5182695600183.
-[I 2024-08-23 12:04:57,467] Trial 22 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:57,494] Trial 23 finished with value: -20684.56412138056 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 21 with value: -180.5182695600183.
-[I 2024-08-23 12:04:57,561] Trial 24 finished with value: -2899.736555614694 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 21 with value: -180.5182695600183.
-[I 2024-08-23 12:04:57,577] Trial 25 finished with value: -150.3435882510586 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,595] Trial 26 finished with value: -7068.705383113378 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,613] Trial 27 finished with value: -7150.482090052133 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-3630.72768093756]
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:57,680] Trial 28 finished with value: -8873.669262669626 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,698] Trial 29 finished with value: -203.93637462922368 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,762] Trial 30 finished with value: -5964.65935954044 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,779] Trial 31 finished with value: -2570.5111262532305 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,797] Trial 32 finished with value: -21987.659957192194 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,814] Trial 33 finished with value: -9889.493204596083 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,819] Trial 34 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:57,838] Trial 35 finished with value: -7172.208490771303 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,856] Trial 36 finished with value: -9804.512701665093 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,861] Trial 37 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:57,881] Trial 38 finished with value: -9165.74081120673 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:57,949] Trial 39 finished with value: -543.0280270800017 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,015] Trial 40 finished with value: -161.1602933782954 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-3630.72768093756]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-9958.573006910125]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-9964.541364058234]
-
-
-
-
-
-
-
-[I 2024-08-23 12:04:58,021] Trial 41 pruned. Duplicate parameter set
-[I 2024-08-23 12:04:58,090] Trial 42 finished with value: -3501.888460860864 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,109] Trial 43 finished with value: -8414.932694243476 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,166] Trial 44 finished with value: -2270.540799189148 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,186] Trial 45 finished with value: -10383.79559309305 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,206] Trial 46 finished with value: -20815.025469865475 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,224] Trial 47 finished with value: -206.7560385808573 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,244] Trial 48 finished with value: -5264.4700789389035 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,264] Trial 49 finished with value: -3668.255064135424 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,290] Trial 50 finished with value: -156.12174877890536 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 56.793408178086295, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 9.99902820845678, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,316] Trial 51 finished with value: -157.371632749506 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 57.88307313087517, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 8.140915461519354, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,340] Trial 52 finished with value: -153.66773675231477 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 46.177324126813716, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 40.77906017834145, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,366] Trial 53 finished with value: -186.52056745848623 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 89.4565714180547, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 93.6710444346508, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,392] Trial 54 finished with value: -153.30976119334312 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 35.62916671166313, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 40.023639423189294, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,418] Trial 55 finished with value: -181.053696900694 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 23.914617418880486, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 86.31140591484044, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,444] Trial 56 finished with value: -201.33573874994386 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 12.569769302718845, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.5781354926491789, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,469] Trial 57 finished with value: -190.1384885119049 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 95.87666716965626, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.2537791489618, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,496] Trial 58 finished with value: -208.076949848299 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.9559574710535281, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0032830967319653665, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,522] Trial 59 finished with value: -170.764974036324 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 15.03910427457823, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.406811480459925, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,546] Trial 60 finished with value: -164.4477304958181 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 17.701690847791482, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.819274780536123, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,573] Trial 61 finished with value: -157.87939164358104 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 28.32187661108304, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 7.660320437878754, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,601] Trial 62 finished with value: -157.01705178481896 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 38.61397716361812, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 8.603665957830847, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,629] Trial 63 finished with value: -155.73257312230092 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 40.759645965959294, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 11.503212714246787, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,656] Trial 64 finished with value: -154.46848394144124 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 93.8546740801317, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 15.35327336610912, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,683] Trial 65 finished with value: -161.20421802817864 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 93.57596974747163, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 51.84756262407801, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,709] Trial 66 finished with value: -190.51233215278089 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 6.3564642040401464, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.5034542273159819, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,734] Trial 67 finished with value: -207.68667089892196 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 24.034895878929095, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03653571911285094, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 25 with value: -150.3435882510586.
-[I 2024-08-23 12:04:58,762] Trial 68 finished with value: -102.52277054278186 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.01961499216484045, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 17.670937191883546, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 68 with value: -102.52277054278186.
-[I 2024-08-23 12:04:58,790] Trial 69 finished with value: -97.28722475694815 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.012434370509176538, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 19.34222704431493, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 69 with value: -97.28722475694815.
-[I 2024-08-23 12:04:58,816] Trial 70 finished with value: -93.87402050281146 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.008452015347522093, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 24.914863578437455, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 70 with value: -93.87402050281146.
-[I 2024-08-23 12:04:58,844] Trial 71 finished with value: -89.38847505937936 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.01573542234868893, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.99307522974174, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 71 with value: -89.38847505937936.
-[I 2024-08-23 12:04:58,870] Trial 72 finished with value: -81.96336195786391 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009845516063879428, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 80.59422914099683, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
-[I 2024-08-23 12:04:58,900] Trial 73 finished with value: -89.19345618324213 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009382525091504246, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.35573659237662, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
-[I 2024-08-23 12:04:58,928] Trial 74 finished with value: -86.30772721342525 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.010579672066291478, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 84.35550323165882, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
-[I 2024-08-23 12:04:58,954] Trial 75 finished with value: -90.23970902543148 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.013369359066405863, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 87.4744102498801, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 72 with value: -81.96336195786391.
-[I 2024-08-23 12:04:58,984] Trial 76 finished with value: -81.34331248758777 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.011398351701814368, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 72.54146340620301, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 76 with value: -81.34331248758777.
-[I 2024-08-23 12:04:59,012] Trial 77 finished with value: -208.104535853341 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.011708779850509646, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.682286191624579e-05, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 76 with value: -81.34331248758777.
-[I 2024-08-23 12:04:59,040] Trial 78 finished with value: -80.0653774146952 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.009806826677473646, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 76.90274406278985, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 78 with value: -80.0653774146952.
-[I 2024-08-23 12:04:59,069] Trial 79 finished with value: -81.64646042813787 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0038598153381434685, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 73.20918134828555, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 78 with value: -80.0653774146952.
-[I 2024-08-23 12:04:59,098] Trial 80 finished with value: -78.68420472011734 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0032474576673554513, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 98.35551178979624, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,125] Trial 81 finished with value: -80.85985201823172 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.003187930738019005, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 89.29431603544847, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,156] Trial 82 finished with value: -80.21583898009355 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.003122319313153475, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 93.83526418992966, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,185] Trial 83 finished with value: -83.34787242859676 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002781955938462633, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 89.76228981520067, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,213] Trial 84 finished with value: -194.70914272129673 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0023173546614751305, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.3000082904498813, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,242] Trial 85 finished with value: -208.10492031097328 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.002606064524407, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 1.7861330234653922e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,271] Trial 86 finished with value: -208.1049154281806 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0029210589377408366, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 4.200933937391094e-07, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,300] Trial 87 finished with value: -208.10492028002287 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.06431564840324226, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 3.2981641934644904e-09, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,330] Trial 88 finished with value: -196.56066541774658 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0010848843623839548, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.151493073951163, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 80 with value: -78.68420472011734.
-[I 2024-08-23 12:04:59,358] Trial 89 finished with value: -76.76337597039308 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.004134805589645341, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 90.88115336652716, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
-[I 2024-08-23 12:04:59,388] Trial 90 finished with value: -108.58009587759925 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.004763418454688096, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 22.02920758025023, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
-[I 2024-08-23 12:04:59,415] Trial 91 finished with value: -113.35230417583477 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009098023238189749, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 79.57100980886017, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
-[I 2024-08-23 12:04:59,445] Trial 92 finished with value: -113.30807467406214 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03739791555156691, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.12818940557025, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 89 with value: -76.76337597039308.
-[I 2024-08-23 12:04:59,476] Trial 93 finished with value: -76.44100655116532 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.006380481141720477, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 88.4882351186755, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,505] Trial 94 finished with value: -150.35181001564942 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0036244007454981787, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.608797806921866, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,533] Trial 95 finished with value: -124.3719027482892 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0014198536004321608, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 35.05588994284273, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,562] Trial 96 finished with value: -95.28568052794907 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.005434972462746285, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 30.215759789700954, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,591] Trial 97 finished with value: -20325.66479442037 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.9696417046589247, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,622] Trial 98 finished with value: -132.21507621375022 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0004528978867024753, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 84.80386923876023, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-[I 2024-08-23 12:04:59,655] Trial 99 finished with value: -166.85570350846885 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0016948043699497222, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 5.455627755557016, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 93 with value: -76.44100655116532.
-
-
-

The importance of scaling can be analysed by directly contrasting the two different studies with and without log transformation:

-
-
[76]:
-
-
-
import seaborn as sns
-
-comparison = pd.concat((default_study.trials_dataframe().assign(run=f'no transform (best ={study.best_value:.2f})'),
-            transformed_study.trials_dataframe().assign(run=f'transform (best ={transformed_study.best_value:.2f})')))
-
-default_reg_scoring= config.settings.scoring
-ax = sns.relplot(data=comparison, x="number", y="value",
-                 col='run',hue='params_algorithm_name',
-                 facet_kws={"sharey":False})
-ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
-ax.tight_layout()
-
-
-
-
-
[76]:
-
-
-
-
-<seaborn.axisgrid.FacetGrid at 0x7ff3a1d3ba30>
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_199_1.png -
-
-

This example shows the influence of scaling the pXC50 values to the log scale. The non-noramlised distribution of the unlogged data yields very large (negative) model evaluation scores, since evaluation metrics such as MSE are relative, and the scale of the error is reported in performance values.

-

Users generate predictions for a model trained on log transformed data in the same way as the normal models, like so:

-
-
[77]:
-
-
-
# Get the best Trial from the log transformed study and build the model.
-buildconfig = buildconfig_best(transformed_study)
-best_build = build_best(buildconfig, "../target/best.pkl")
-
-# generate predictions
-import pickle
-with open("../target/best.pkl", "rb") as f:
-    model = pickle.load(f)
-model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])
-
-
-
-
-
[77]:
-
-
-
-
-array([1126.56968721,  120.20237903])
-
-
-

NB: Please note that outputs have automatically been reversed transformed at inference, back onto the original XC50 scale, as shown by large values outside the log pXC50.

-

This is the default behaviour of Qptuna; reverse transform is performed at inference when log transformation was applied, so that users can action on prediction the original input data scale. Importantly, a user can easily override this behaviour by providing the transform parameter as None:

-
-
[78]:
-
-
-
model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"], transform=None)
-
-
-
-
-
[78]:
-
-
-
-
-array([2.94824194, 3.92008694])
-
-
-

This will instruct Qptuna to avoid the reverse transform on the predictions. This transform parameter is ignored if no transformation was applied in the user config.

-

Log transformation can also be combined with the PTR transform. In this situation, all user inputs are expected to be on the untransformed scale. For example, if a user wishes to create a PTR model, trained on pXC50 data and a cut-off for pXC50 values of 5 (10um), the following config can be used:

-
-
[79]:
-
-
-
ptr_config_log_transform = OptimizationConfig(
-        data=Dataset(
-        input_column="Smiles",
-        response_column="Measurement",
-        response_type="regression",
-        training_dataset_file="../tests/data/sdf/example.sdf",
-        split_strategy=Stratified(fraction=0.4),
-        deduplication_strategy=KeepMedian(),
-        log_transform=True, # Set to True to perform
-        log_transform_base=LogBase.LOG10, # Log10 base will be used
-        log_transform_negative=LogNegative.TRUE, # Negated transform for the pXC50 calculation
-        log_transform_unit_conversion=6, # 6 units used for pXC50 conversion
-        probabilistic_threshold_representation=True, # This enables PTR
-        probabilistic_threshold_representation_threshold=5, # This defines the activity threshold for 10um
-        probabilistic_threshold_representation_std=0.6, # This captures the deviation/uncertainty in the dataset
-
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        n_startup_trials=50,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-ptr_transformed_study = optimize(ptr_config_log_transform, study_name="ptr_and_transform_example")
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:04,279] A new study created in memory with name: ptr_and_transform_example
-[I 2024-08-23 12:05:04,320] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:05:04,419] Trial 0 finished with value: -0.002341918451736244 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736244.
-[I 2024-08-23 12:05:04,483] Trial 1 finished with value: -0.0024908979029632677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736244.
-[I 2024-08-23 12:05:04,526] Trial 2 finished with value: -0.007901407671048116 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 0 with value: -0.002341918451736244.
-[I 2024-08-23 12:05:04,569] Trial 3 finished with value: -0.00496231674623194 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -0.002341918451736244.
-[I 2024-08-23 12:05:04,585] Trial 4 finished with value: -0.0026848278110363512 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -0.002341918451736244.
-[I 2024-08-23 12:05:04,606] Trial 5 finished with value: -0.0010872728889471893 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,624] Trial 6 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,640] Trial 7 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,704] Trial 8 finished with value: -0.002999462459688866 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,721] Trial 9 finished with value: -0.00825680029907454 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,736] Trial 10 finished with value: -0.007901407993550248 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,753] Trial 11 finished with value: -0.007901405163828307 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,769] Trial 12 finished with value: -0.0021653695362066753 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,787] Trial 13 finished with value: -0.002869169486971014 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 5 with value: -0.0010872728889471893.
-[I 2024-08-23 12:05:04,804] Trial 14 finished with value: -0.0010855652626111146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:04,867] Trial 15 finished with value: -0.005505338042993082 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:04,884] Trial 16 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:04,901] Trial 17 finished with value: -0.002236800860454562 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:04,955] Trial 18 finished with value: -0.006105985607235417 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:04,971] Trial 19 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,101] Trial 20 finished with value: -0.004846526544994462 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,119] Trial 21 finished with value: -0.006964668794465202 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,123] Trial 22 pruned. Duplicate parameter set
-[I 2024-08-23 12:05:05,140] Trial 23 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,204] Trial 24 finished with value: -0.0029994624596888677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,222] Trial 25 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 43.92901911959232, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 27.999026012594694, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 14 with value: -0.0010855652626111146.
-[I 2024-08-23 12:05:05,238] Trial 26 finished with value: -0.001082194093844804 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5888977841391714, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 26 with value: -0.001082194093844804.
-[I 2024-08-23 12:05:05,256] Trial 27 finished with value: -0.0010807084256204563 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.19435298754153707, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,321] Trial 28 finished with value: -0.006105985607235417 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 13, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.0021653695362066753]
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:05,338] Trial 29 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 1.6285506249643193, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.35441495011256785, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,404] Trial 30 finished with value: -0.005505338042993082 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,422] Trial 31 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.2457809516380005, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,440] Trial 32 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6459129458824919, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,458] Trial 33 finished with value: -0.005247934991526694 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8179058888285398, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 27 with value: -0.0010807084256204563.
-[I 2024-08-23 12:05:05,462] Trial 34 pruned. Duplicate parameter set
-[I 2024-08-23 12:05:05,480] Trial 35 finished with value: -0.0010803393728928605 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0920052840435055, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,498] Trial 36 finished with value: -0.005218354425190125 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.8677032984759461, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,503] Trial 37 pruned. Duplicate parameter set
-[I 2024-08-23 12:05:05,521] Trial 38 finished with value: -0.004999207507691546 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.2865764368847064, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,589] Trial 39 finished with value: -0.0015694919308122948 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,658] Trial 40 finished with value: -0.0019757694194001397 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.0021653695362066753]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.004846526544994462]
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}, return [-0.00496231674623194]
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:05,664] Trial 41 pruned. Duplicate parameter set
-[I 2024-08-23 12:05:05,730] Trial 42 finished with value: -0.002341918451736245 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,748] Trial 43 finished with value: -0.00368328296527152 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,816] Trial 44 finished with value: -0.003412828259848677 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 9, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,836] Trial 45 finished with value: -0.004412110711416997 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,854] Trial 46 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6437201185807124, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,874] Trial 47 finished with value: -0.008384326901042542 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 82.41502276709562, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.10978379088847677, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,893] Trial 48 finished with value: -0.0021743798524909573 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.022707289534838138, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,911] Trial 49 finished with value: -0.0022761245849848527 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,934] Trial 50 finished with value: -0.0010805768178458735 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1580741708125475, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,959] Trial 51 finished with value: -0.001080400188305814 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10900413894771653, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:05,982] Trial 52 finished with value: -0.0010805009783570441 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.13705914456987853, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:06,006] Trial 53 finished with value: -0.0010804680472500541 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.12790870116376127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:06,030] Trial 54 finished with value: -0.0010803723579987025 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10123180962907431, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:06,053] Trial 55 finished with value: -0.001080969596032512 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.26565663774320425, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 35 with value: -0.0010803393728928605.
-[I 2024-08-23 12:05:06,076] Trial 56 finished with value: -0.0010800333715082816 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.005637048678674678, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
-[I 2024-08-23 12:05:06,098] Trial 57 finished with value: -0.0010802574700236845 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.06902647427781451, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
-[I 2024-08-23 12:05:06,122] Trial 58 finished with value: -0.0010814994986419817 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4076704953178294, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
-[I 2024-08-23 12:05:06,147] Trial 59 finished with value: -0.001080161136846237 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.04187106800188596, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 56 with value: -0.0010800333715082816.
-[I 2024-08-23 12:05:06,171] Trial 60 finished with value: -0.0010800254136811547 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.003371853599610078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,197] Trial 61 finished with value: -0.0010801290036870739 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.032781796328385376, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,220] Trial 62 finished with value: -0.001080037482216557 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.006806773659187283, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,245] Trial 63 finished with value: -0.0010801015705851358 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.025009489814943348, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,270] Trial 64 finished with value: -0.0010812122378841013 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.3311125627707556, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,295] Trial 65 finished with value: -0.0010800531021304936 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011249102380159387, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,320] Trial 66 finished with value: -0.00108004162698813 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.007985924302396141, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 60 with value: -0.0010800254136811547.
-[I 2024-08-23 12:05:06,345] Trial 67 finished with value: -0.0010800223466649803 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00249856291483601, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
-[I 2024-08-23 12:05:06,370] Trial 68 finished with value: -0.0010815197263834202 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.4130244908975993, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
-[I 2024-08-23 12:05:06,394] Trial 69 finished with value: -0.0010800257029027847 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0034541978803366022, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
-[I 2024-08-23 12:05:06,418] Trial 70 finished with value: -0.0010810223438672223 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.27994943662091765, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 67 with value: -0.0010800223466649803.
-[I 2024-08-23 12:05:06,442] Trial 71 finished with value: -0.0010800211339555509 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0021532199144365088, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,467] Trial 72 finished with value: -0.0010800296871141684 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0045884092728113585, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,492] Trial 73 finished with value: -0.0010800437739166451 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.008596600952859433, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,517] Trial 74 finished with value: -0.0010809366267195716 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.2567049271070902, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,540] Trial 75 finished with value: -0.001080725386603206 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1990111983307052, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,565] Trial 76 finished with value: -0.0010807368035830652 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.20214459724424078, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,590] Trial 77 finished with value: -0.0010800236072155854 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00285750520671645, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,616] Trial 78 finished with value: -0.0010806223050773966 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.17064008990759916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,642] Trial 79 finished with value: -0.0010876516369772728 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8725420109733135, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,669] Trial 80 finished with value: -0.00108142358144501 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.387533542012365, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,694] Trial 81 finished with value: -0.0010800248050489667 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0031985656730512953, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,719] Trial 82 finished with value: -0.001080022268085466 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.002476186542950981, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,745] Trial 83 finished with value: -0.0010820922958715991 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.5626643670396761, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,771] Trial 84 finished with value: -0.0010805094397523254 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1394077979875128, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,798] Trial 85 finished with value: -0.0010841993753324146 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.0858347526799794, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,826] Trial 86 finished with value: -0.007899735988203994 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.03329943145150872, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00025672309762227527, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,853] Trial 87 finished with value: -0.0010868762004637347 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.702026434077893, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,879] Trial 88 finished with value: -0.001080400750193767 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10916094511173127, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,907] Trial 89 finished with value: -0.0010806791616300314 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.18630665884100353, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,934] Trial 90 finished with value: -0.0010804028029753213 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.10973377642487026, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,962] Trial 91 finished with value: -0.0010800812188506515 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.019235980282946118, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:06,989] Trial 92 finished with value: -0.0010800299598580359 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.004666043957133775, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:07,017] Trial 93 finished with value: -0.0010803843696362083 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.1045877457096882, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:07,047] Trial 94 finished with value: -0.001080333048974234 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.09023455456986404, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:07,073] Trial 95 finished with value: -0.008706109201510277 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.8200088368788958, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 71 with value: -0.0010800211339555509.
-[I 2024-08-23 12:05:07,102] Trial 96 finished with value: -0.001080014645182176 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.00030502148265565063, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
-[I 2024-08-23 12:05:07,129] Trial 97 finished with value: -0.0010807968027851892 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.21858260742423916, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
-[I 2024-08-23 12:05:07,161] Trial 98 finished with value: -0.007907028395366658 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.024725853754515203, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0011658455138452, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
-[I 2024-08-23 12:05:07,188] Trial 99 finished with value: -0.0010803563024666294 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.0967427718847167, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 96 with value: -0.001080014645182176.
-
-
-

Analysis of the study is performed in the same manner as above:

-
-
[80]:
-
-
-
sns.set_theme(style="darkgrid")
-default_reg_scoring= config.settings.scoring
-ax = sns.scatterplot(data=ptr_transformed_study.trials_dataframe(), x="number",
-                     y="value",style='params_algorithm_name',hue='params_algorithm_name')
-ax.set(xlabel="Trial number",ylabel=f"Ojbective value\n({default_reg_scoring})")
-sns.move_legend(ax, "upper right", bbox_to_anchor=(1.6, 1), ncol=1, title="")
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_207_0.png -
-
-

In comparison to log scaled models trained without the PRF transform, log-transformed models trained with PTR functions will always output the probabilistic class membership likelihoods from the PTR function:

-
-
[81]:
-
-
-
# Get the best Trial from the log transformed study and build the model.
-buildconfig = buildconfig_best(ptr_transformed_study)
-best_build = build_best(buildconfig, "../target/best.pkl")
-
-# generate predictions
-import pickle
-with open("../target/best.pkl", "rb") as f:
-    model = pickle.load(f)
-model.predict_from_smiles(["CCC"], transform=None)
-
-
-
-
-
[81]:
-
-
-
-
-array([0.3506154])
-
-
-

Similar to log scaled models trained without the PRF transform, log-transformed models trained with PTR functions will reverse both the probabilistic class membership likelihoods from the PTR function and reverse the subsequent log transform from any log scaling, scaling predictions back inline with original data:

-
-
-

Covariate modelling

-
-

Modelling one simple covariate, e.g. dose or time point

-

A covariate, such as dose or timepoint, can be used as an auxiliary descriptor to account for the effect of this parameter in predictions. In this situation, a compound can be represented more than once across n distinct covariate measurements. Each of the covariate response values can now be used in training an algorithm in this approach. Replicates across each compound-covariate pair may be deduplicated using the standard deduplication approaches.

-

To activate this function in Qptuna, the aux_column setting can be used according to the column denoting the covariate to be modelled, like so:

-
-
[82]:
-
-
-
aux_col_config = OptimizationConfig(
-        data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        response_type="regression",
-        training_dataset_file="../tests/data/aux_descriptors_datasets/train_with_conc.csv",
-        aux_column="aux1" # use column aux1 as a co-variate in modelling
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=10,
-        random_seed=42,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-aux_col_study = optimize(aux_col_config, study_name="covariate_example")
-build_best(buildconfig_best(aux_col_study), "../target/aux1_model.pkl")
-with open("../target/aux1_model.pkl", "rb") as f:
-    aux1_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:09,904] A new study created in memory with name: covariate_example
-[I 2024-08-23 12:05:09,946] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:05:10,070] Trial 0 finished with value: -5186.76766395672 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -5186.76766395672.
-[I 2024-08-23 12:05:10,138] Trial 1 finished with value: -4679.740824270968 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 1 with value: -4679.740824270968.
-[I 2024-08-23 12:05:10,207] Trial 2 finished with value: -4890.6705099499995 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 1 with value: -4679.740824270968.
-[I 2024-08-23 12:05:10,276] Trial 3 finished with value: -3803.9324375833753 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 3 with value: -3803.9324375833753.
-[I 2024-08-23 12:05:10,291] Trial 4 finished with value: -3135.6497388676926 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -3135.6497388676926.
-[I 2024-08-23 12:05:10,310] Trial 5 finished with value: -551.2518812859375 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 5 with value: -551.2518812859375.
-[I 2024-08-23 12:05:10,330] Trial 6 finished with value: -4309.124112370974 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 5 with value: -551.2518812859375.
-[I 2024-08-23 12:05:10,357] Trial 7 finished with value: -362.30159424580074 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: -362.30159424580074.
-[I 2024-08-23 12:05:10,419] Trial 8 finished with value: -4357.02827013125 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 7 with value: -362.30159424580074.
-[I 2024-08-23 12:05:10,458] Trial 9 finished with value: -386.1437929337522 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 7 with value: -362.30159424580074.
-
-
-

Predictions from a covariate-trained model can now be generated like so:

-
-
[83]:
-
-
-
aux1_model.predict_from_smiles(["CCC", "CCC"], aux=[10,5])
-
-
-
-
-
[83]:
-
-
-
-
-array([52.45281013, 52.45281013])
-
-
-

where the aux parameter of predict_from_smiles is used (and required) to generate predictions for a an input covariate auxiliary query, and the shape of the aux query must be the same shape as the SMILES input query, otherwise a ValueError will be thrown.

-

So, for this toy example query the predicitons are for the SMILES CCC and two separate auxiliary covariate queries of 10 and 5.

-

N.B: For this particular toy training example, the molecular weight response column (molwt) is the same regardless of the modelled covariate value, and so the predictions are the same regardless the aux query, as expected.

-
-
-

Transformation of co-variates: Proteochemometric (PCM) modelling + more

-
-

VectorFromSmiles

-

In order to utilise more than one type of covariate value at a time, an auxiliary transformation must be applied to process co-variates in a manner expected for the algorithms.

-

Pre-computed covariates (in a similar manner to pre-computed descriptors), can be processed using the VectorFromColumn. Similar to pre-computed descriptors, the VectorFromColumn will split covariates on , or comma seperations like so:

-
-
[84]:
-
-
-
from optunaz.utils.preprocessing.transform import VectorFromColumn
-
-vector_covariate_config = OptimizationConfig(
-        data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        response_type="regression",
-        training_dataset_file="../tests/data/precomputed_descriptor/train_with_fp.csv",
-        aux_column="fp", # use a comma separated co-variate vector in column `fp`
-        aux_transform=VectorFromColumn.new(), # split the comma separated values into a vector
-        split_strategy=Stratified(fraction=0.2),
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=10,
-        n_startup_trials=0,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-vector_covariate_study = optimize(vector_covariate_config, study_name="vector_aux_example")
-build_best(buildconfig_best(vector_covariate_study), "../target/vector_covariate_model.pkl")
-with open("../target/vector_covariate_model.pkl", "rb") as f:
-    vector_covariate_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:10,735] A new study created in memory with name: vector_aux_example
-[I 2024-08-23 12:05:10,776] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:05:10,857] Trial 0 finished with value: -2200.6817959410578 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011994365911634164, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -2200.6817959410578.
-[I 2024-08-23 12:05:10,880] Trial 1 finished with value: -2200.95660880078 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.029071783512897825, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}. Best is trial 0 with value: -2200.6817959410578.
-[I 2024-08-23 12:05:10,940] Trial 2 finished with value: -5798.564494725643 and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.022631709120790048, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.2198637677605415, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 0 with value: -2200.6817959410578.
-[I 2024-08-23 12:05:10,987] Trial 3 finished with value: -972.2899178898048 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8916194399474267, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 3 with value: -972.2899178898048.
-[I 2024-08-23 12:05:11,022] Trial 4 finished with value: -647.3336440433073 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5914093983615214, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
-[I 2024-08-23 12:05:11,050] Trial 5 finished with value: -653.3036472748931 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.6201811079699818, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
-[I 2024-08-23 12:05:11,068] Trial 6 finished with value: -3807.8035919667395 and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.986e+01, tolerance: 1.914e+01
-  model = cd_fast.enet_coordinate_descent(
-/Users/kljk345/Library/Caches/pypoetry/virtualenvs/qptuna-_QsKTRFT-py3.10/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:678: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.901e+01, tolerance: 1.892e+01
-  model = cd_fast.enet_coordinate_descent(
-[I 2024-08-23 12:05:11,150] Trial 7 finished with value: -5019.459500770764 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.1376436589359351, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}. Best is trial 4 with value: -647.3336440433073.
-[I 2024-08-23 12:05:11,223] Trial 8 finished with value: -2756.4017711284796 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 25, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
-[I 2024-08-23 12:05:11,243] Trial 9 finished with value: -771.797115414836 and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.74340620175102, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}. Best is trial 4 with value: -647.3336440433073.
-
-
-

We can inspect the input query for the auxiliary co-variates used in the modelling like so:

-
-
[85]:
-
-
-
train_smiles, train_y, train_aux, test_smiles, test_y, test_aux = vector_covariate_config.data.get_sets()
-
-train_aux, train_aux.shape
-
-
-
-
-
[85]:
-
-
-
-
-(array([[0., 0., 0., ..., 0., 0., 0.],
-        [1., 0., 0., ..., 1., 0., 0.],
-        [1., 0., 0., ..., 1., 0., 1.],
-        ...,
-        [1., 1., 0., ..., 0., 0., 1.],
-        [1., 0., 0., ..., 0., 0., 0.],
-        [1., 0., 1., ..., 0., 0., 0.]]),
- (40, 512))
-
-
-

For this toy example, the co-variate descriptors 512 in legth for the 40 training instances are used in training. Inference for the model can be performed on the test like so:

-
-
[86]:
-
-
-
vector_covariate_model.predict_from_smiles(test_smiles, aux=test_aux)
-
-
-
-
-
[86]:
-
-
-
-
-array([454.39754917, 465.06352766, 340.52031134, 341.89875316,
-       371.5516046 , 389.85042171, 436.33406203, 504.91439129,
-       237.80585907, 346.48565041])
-
-
-
-
-

Z-Scales (for PCM)

-

Proteochemometric modelling (PCM) is the term used for the approach of training protein-descriptors as a distinct input space alongside the chemical ones. This can be performed in Qptuna by providing Z-Scales as an auxiliary transformation to a user input column containing sequence information. Protein sequence is transformed to Z-Scales based on this publication using the Peptides Python package.

-

N:B. Note that Z-Scales as covariates are a distinct method separate to ZScales descriptors, since the former treats Z-Scales as a distinct input parameter (for PCM modelling), whereas the latter treates them as a descriptor trial that may or may not be selected during optimisation (e.g. for Protein-peptide interaction modelling). In other words, Z-scales will always be an input descriptor parameter when applied as a covariate and duplicates are treated on a compound-ZScale pair basis).

-

Now let us consider the following toy data set file:

-
-
[87]:
-
-
-
!head -n 5 ../tests/data/peptide/toxinpred3/train.csv
-
-
-
-
-
-
-
-
-Peptide,Class,Smiles
-MDLITITWASVMVAFTFSLSLVVWGRSGL,0,N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)NCC(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)O
-ARRGGVLNFGQFGLQALECGFVTNR,0,N[C@@H](C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](Cc1ccccc1)C(=O)NCC(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](Cc1ccccc1)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CCCNC(=N)N)C(=O)O
-GWCGDPGATCGKLRLYCCSGACDCYTKTCKDKSSA,1,NCC(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](CC(=O)O)C(=O)N1[C@@H](CCC1)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CS)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CS)C(=O)N[C@@H](CS)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(=O)O
-NGNLLGGLLRPVLGVVKGLTGGLGKK,1,N[C@@H](CC(=O)N)C(=O)NCC(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)O
-
-
-

The following example demponstrates how Z-Scales may be utilised for PCM by specifying the ZScales data transform on the “Peptide” column containing our peptide sequence, like so:

-
-
[88]:
-
-
-
from optunaz.utils.preprocessing.transform import ZScales
-from optunaz.config.optconfig import KNeighborsClassifier
-
-zscale_covariate_config = OptimizationConfig(
-        data=Dataset(
-        input_column="Smiles",
-        response_column="Class",
-        response_type="classification",
-        training_dataset_file="../tests/data/peptide/toxinpred3/train.csv",
-        aux_column="Peptide", # Name of the column containing peptide/protein amino acid sequence
-        aux_transform=ZScales.new(), # Zscales transform is used to transform sequence into a Z-scales vector
-        split_strategy=Stratified(fraction=0.2),
-    ),
-    descriptors=[
-        ECFP.new(nBits=128),
-    ],
-    algorithms=[
-        KNeighborsClassifier.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.CLASSIFICATION,
-        cross_validation=2,
-        n_trials=1,
-        n_startup_trials=0,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-zscale_covariate_study = optimize(zscale_covariate_config, study_name="zscale_aux_example")
-build_best(buildconfig_best(zscale_covariate_study), "../target/zscale_covariate_model.pkl")
-with open("../target/zscale_covariate_model.pkl", "rb") as f:
-    zscale_covariate_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:05:15,425] A new study created in memory with name: zscale_aux_example
-[I 2024-08-23 12:05:15,477] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:05:43,385] Trial 0 finished with value: 0.8735224395254063 and parameters: {'algorithm_name': 'KNeighborsClassifier', 'KNeighborsClassifier_algorithm_hash': 'e51ca55089f389fc37a736adb2aa0e42', 'metric__e51ca55089f389fc37a736adb2aa0e42': <KNeighborsMetric.MINKOWSKI: 'minkowski'>, 'n_neighbors__e51ca55089f389fc37a736adb2aa0e42': 5, 'weights__e51ca55089f389fc37a736adb2aa0e42': <KNeighborsWeights.UNIFORM: 'uniform'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 128, "returnRdkit": false}}'}. Best is trial 0 with value: 0.8735224395254063.
-
-
-

N:B. Unlike the ZScale descriptor (which works on SMILES level of a peptide/protein), the ZScale data transform expects amino acid sequence as inputs.

-

We can inspect the input query for the auxiliary co-variates used in the modelling like so:

-
-
[89]:
-
-
-
train_smiles, train_y, train_aux, test_smiles, test_y, test_aux = zscale_covariate_config.data.get_sets()
-
-train_aux, train_aux.shape
-
-
-
-
-
[89]:
-
-
-
-
-(array([[ 1.31176471,  0.08058824, -0.27176471,  0.56470588, -0.62529412],
-        [-0.99521739, -0.59826087, -0.34695652, -0.03086957,  0.13391304],
-        [ 0.08083333, -0.6125    ,  0.82916667, -0.05083333, -0.56083333],
-        ...,
-        [ 0.93357143, -0.02785714, -0.04214286, -0.36      , -0.02785714],
-        [ 0.30461538, -0.55307692,  0.31307692, -0.11076923,  0.00846154],
-        [-0.1232    , -0.3364    ,  0.2328    , -0.1368    ,  0.2304    ]]),
- (7060, 5))
-
-
-

For this toy example, the Z-scale co-variate descriptors 7062 with the expected length of 5 Z-Scale descriptors used in training. Inference for the model can be performed on the test by providing the auxiliary co-variate Z-Scales like so:

-
-
[90]:
-
-
-
zscale_covariate_model.predict_from_smiles(test_smiles, aux=test_aux)
-
-
-
-
-
[90]:
-
-
-
-
-array([0.2, 0. , 0.8, ..., 0.2, 0.2, 0. ])
-
-
-

We may also inspect the X-matrix (descriptor) used to train the toy model like so:

-
-
[91]:
-
-
-
ax = sns.heatmap(zscale_covariate_model.predictor.X_,
-            vmin=-1, vmax=1, cmap='Spectral',
-            cbar_kws={'label': 'Fingerprint value'})
-ax.set(ylabel="Compound input", xlabel=f"Input descriptor (248bit ECFP & Z-Scale))");
-
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_236_0.png -
-
-

Note that the (continuous) Z-scales covariates can be seen in the final columns (129-132) after the 128bit ECFP fingerprints used in this example

-
-
-
-
-

Advanced options for Qptuna runs

-
-

Multi-objective prioritization of performance and standard deviation

-

Qptuna can optimize for the minimzation of the standard deviation of performance across the folds. This should in theory prioritize hyperparameters that are consistently performative across different splits of the data, and so should be more generalizable/performative in production. This can be performed with the minimize_std_dev in the example below:

-
-
[92]:
-
-
-
config = OptimizationConfig(
-        data=Dataset(
-        input_column="Smiles",
-        response_column="pXC50",
-        response_type="regression",
-        training_dataset_file="../tests/data/sdf/example.sdf",
-    ),
-    descriptors=[
-        ECFP.new(),
-        ECFP_counts.new(),
-        MACCS_keys.new(),
-        SmilesFromFile.new(),
-    ],
-    algorithms=[
-        SVR.new(),
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-        ChemPropRegressor.new(epochs=5),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=25,
-        n_startup_trials=25,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-        n_chemprop_trials=3,
-        minimise_std_dev=True # Multi-objective optimization for performance and std. dev.
-    ),
-)
-
-study = optimize(config, study_name="example_multi-parameter_analysis")
-default_reg_scoring= config.settings.scoring
-study.set_metric_names([default_reg_scoring.value,'Standard deviation']) # Set the names of the multi-parameters
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:06:30,245] A new study created in memory with name: example_multi-parameter_analysis
-[I 2024-08-23 12:06:30,286] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:06:30,600] Trial 0 finished with values: [-1.4008740644240856, 0.9876203329634794] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 5, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:30,673] Trial 1 finished with values: [-1.3561484909673425, 0.9875061220991905] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 7, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:30,725] Trial 2 finished with values: [-2.7856521165563053, 0.21863029956806662] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 5.141096648805748, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.4893466963980463e-08, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:30,831] Trial 3 finished with values: [-0.9125905675311808, 0.7861693342190089] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 5, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
-[I 2024-08-23 12:06:30,850] Trial 4 finished with values: [-0.5238765412750027, 0.2789424384877304] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 3, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:30,873] Trial 5 finished with values: [-0.5348363849100434, 0.5741725628917808] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.7896547008552977, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:30,894] Trial 6 finished with values: [-2.0072511048320134, 0.2786318125997387] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.6574750183038587, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
-[I 2024-08-23 12:06:30,911] Trial 7 finished with values: [-0.9625764609276656, 0.27575381401822424] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.3974313630683448, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:30,973] Trial 8 finished with values: [-1.1114006274062536, 0.7647766019001522] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 28, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:30,990] Trial 9 finished with values: [-0.7801680863916906, 0.2725738454485389] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.2391884918766034, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,006] Trial 10 finished with values: [-2.785652116470164, 0.21863029955530786] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00044396482429275296, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.3831436879125245e-10, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,060] Trial 11 finished with values: [-2.785651973436432, 0.21863032832257323] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.00028965395242758657, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 2.99928292425642e-07, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,075] Trial 12 finished with values: [-0.6101359993004856, 0.3011280543457062] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,092] Trial 13 finished with values: [-0.5361950698070447, 0.23560786523195643] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 2, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,108] Trial 14 finished with values: [-0.5356113574175657, 0.5769721187181905] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.4060379177903557, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,174] Trial 15 finished with values: [-0.5434303669217287, 0.5147474123466615] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 20, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
-[I 2024-08-23 12:06:31,191] Trial 16 finished with values: [-2.0072511048320134, 0.2786318125997387] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.344271094811757, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,207] Trial 17 finished with values: [-0.5194661889628072, 0.40146744515282495] and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.670604991178476, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,271] Trial 18 finished with values: [-0.659749443628722, 0.6659085938841998] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 22, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 6, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
-[I 2024-08-23 12:06:31,287] Trial 19 finished with values: [-1.1068495306229729, 0.24457822094737378] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 0.5158832554303112, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,305] Trial 20 finished with values: [-0.8604898820838102, 0.7086875504668667] and parameters: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "MACCS_keys", "parameters": {}}'}.
-[I 2024-08-23 12:06:31,322] Trial 21 finished with values: [-0.5919869916997383, 0.2367498627927979] and parameters: {'algorithm_name': 'SVR', 'SVR_algorithm_hash': 'ea7ccc7ef4a9329af0d4e39eb6184933', 'gamma__ea7ccc7ef4a9329af0d4e39eb6184933': 0.0009327650919528738, 'C__ea7ccc7ef4a9329af0d4e39eb6184933': 6.062479210472502, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,327] Trial 22 pruned. Duplicate parameter set
-[I 2024-08-23 12:06:31,344] Trial 23 finished with values: [-1.2497762395862362, 0.10124660026536195] and parameters: {'algorithm_name': 'Lasso', 'Lasso_algorithm_hash': '5457f609662e44f04dcc9423066d2f58', 'alpha__5457f609662e44f04dcc9423066d2f58': 1.1366172066709432, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}.
-[I 2024-08-23 12:06:31,399] Trial 24 finished with values: [-1.1114006274062536, 0.7647766019001522] and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 26, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 8, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}'}.
-[I 2024-08-23 12:06:31,452] A new study created in memory with name: study_name_1
-INFO:root:Enqueued ChemProp manual trial with sensible defaults: {'activation__668a7428ff5cdb271b01c0925e8fea45': 'ReLU', 'aggregation__668a7428ff5cdb271b01c0925e8fea45': 'mean', 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': 'none', 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45'}
-
-
-
-
-
-
-
-Duplicated trial: {'algorithm_name': 'PLSRegression', 'PLSRegression_algorithm_hash': '9f2f76e479633c0bf18cf2912fed9eda', 'n_components__9f2f76e479633c0bf18cf2912fed9eda': 4, 'descriptor': '{"name": "ECFP_counts", "parameters": {"radius": 3, "useFeatures": true, "nBits": 2048}}'}, return [-0.6101359993004856, 0.3011280543457062]
-
-
-
-
-
-
-
-[I 2024-08-23 12:07:43,190] Trial 0 finished with values: [-2.0621601907738047, 0.2749020946925899] and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 50.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}.
-[I 2024-08-23 12:08:53,217] Trial 1 finished with values: [-2.0621601907738047, 0.2749020946925899] and parameters: {'algorithm_name': 'ChemPropRegressor', 'ChemPropRegressor_algorithm_hash': '668a7428ff5cdb271b01c0925e8fea45', 'activation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropActivation.RELU: 'ReLU'>, 'aggregation__668a7428ff5cdb271b01c0925e8fea45': <ChemPropAggregation.MEAN: 'mean'>, 'aggregation_norm__668a7428ff5cdb271b01c0925e8fea45': 100.0, 'batch_size__668a7428ff5cdb271b01c0925e8fea45': 45.0, 'depth__668a7428ff5cdb271b01c0925e8fea45': 3.0, 'dropout__668a7428ff5cdb271b01c0925e8fea45': 0.0, 'ensemble_size__668a7428ff5cdb271b01c0925e8fea45': 1, 'epochs__668a7428ff5cdb271b01c0925e8fea45': 5, 'features_generator__668a7428ff5cdb271b01c0925e8fea45': <ChemPropFeatures_Generator.NONE: 'none'>, 'ffn_hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'ffn_num_layers__668a7428ff5cdb271b01c0925e8fea45': 2.0, 'final_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'hidden_size__668a7428ff5cdb271b01c0925e8fea45': 300.0, 'init_lr_ratio_exp__668a7428ff5cdb271b01c0925e8fea45': -4, 'max_lr_exp__668a7428ff5cdb271b01c0925e8fea45': -3, 'warmup_epochs_ratio__668a7428ff5cdb271b01c0925e8fea45': 0.1, 'descriptor': '{"name": "SmilesFromFile", "parameters": {}}'}.
-
-
-

Note the multi-parameter performance reported for each trial, e.g. Trial 1 finished with values: [XXX, XXX], which correspond to negated MSE and deviation of negated MSE performance across the 3-folds, respectively. The two objectives may be plot as a function of trial number, as follows:

-
-
[93]:
-
-
-
df = study.trials_dataframe()
-df.number = df.number+1
-fig=plt.figure(figsize=(12,4))
-ax = sns.scatterplot(data=df, x="number", y="values_neg_mean_squared_error",
-                     legend=False, color="b")
-ax2 = sns.scatterplot(data=df, x="number", y="values_Standard deviation",
-                      ax=ax.axes.twinx(), legend=False, color="r")
-
-a = df['values_neg_mean_squared_error'].apply(np.floor).min()
-b = df['values_neg_mean_squared_error'].apply(np.ceil).max()
-c = df['values_Standard deviation'].apply(np.floor).min()
-d = df['values_Standard deviation'].apply(np.ceil).max()
-
-# Align both axes
-ax.set_ylim(a,b);
-ax.set_yticks(np.linspace(a,b, 7));
-ax2.set_ylim(c,d);
-ax2.set_yticks(np.linspace(c,d, 7));
-ax.set_xticks(df.number);
-
-# Set the colors of labels
-ax.set_xlabel('Trial Number')
-ax.set_ylabel('(Performance) Negated MSE', color='b')
-ax2.set_ylabel('Standard Deviation across folds', color='r')
-
-
-
-
-
[93]:
-
-
-
-
-Text(0, 0.5, 'Standard Deviation across folds')
-
-
-
-
-
-
-../_images/notebooks_QPTUNA_Tutorial_243_1.png -
-
-

We may plot the Pareto front of this multi-objective study using the Optuna plotting functionaility directly:

-
-
[94]:
-
-
-
from optuna.visualization import plot_pareto_front
-
-plot_pareto_front(study)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

Further visualization of Qptuna runs

-

It is possible to evaluate the parameter importances on regression metric performance across descriptor vs. algorithm choice, based on the completed trials in our study:

-
-
[95]:
-
-
-
from optuna.visualization import plot_param_importances
-
-plot_param_importances(study, target=lambda t: t.values[0])
-
-
-
-
-
-
-
-
-
-

Parameter importances are represented by non-negative floating point numbers, where higher values mean that the parameters are more important. The returned dictionary is of type collections.OrderedDict and is ordered by its values in a descending order (the sum of the importance values are normalized to 1.0). Hence we can conclude that choice of algortihm is more important than choice of descriptor for our current study.

-

It is also possible to analyse the importance of these hyperparameter choices on the impact on trial duration:

-
-
[96]:
-
-
-
plot_param_importances(
-    study, target=lambda t: t.duration.total_seconds(), target_name="duration"
-)
-
-
-
-
-
-
-
-
-
-

Optuna also allows us to plot the parameter relationships for our study, like so:

-
-
[97]:
-
-
-
from optuna.visualization import plot_parallel_coordinate
-
-plot_parallel_coordinate(study,
-                         params=["algorithm_name", "descriptor"],
-                         target=lambda t: t.values[0]) # First performance value taken
-
-
-
-
-
-
-
-
-
-

The same can be done for the relationships for the standard deviation of performance:

-
-
[98]:
-
-
-
from optuna.visualization import plot_parallel_coordinate
-
-plot_parallel_coordinate(study,
-                         params=["algorithm_name", "descriptor"],
-                         target=lambda t: t.values[1]) # Second standard deviation value taken
-
-
-
-
-
-
-
-
-
-
-
-

Precomputed descriptors from a file example

-

Precomputed descriptors can be supplied to models using the “PrecomputedDescriptorFromFile” descriptor, and supplying the input_column and response_column like so:

-
-
[99]:
-
-
-
from optunaz.descriptors import PrecomputedDescriptorFromFile
-
-descriptor=PrecomputedDescriptorFromFile.new(
-            file="../tests/data/precomputed_descriptor/train_with_fp.csv",
-            input_column="canonical", # Name of the identifier for the compound
-            response_column="fp") # Name of the column with the pretrained (comma separated) descriptors
-
-descriptor.calculate_from_smi("Cc1cc(NC(=O)c2cccc(COc3ccc(Br)cc3)c2)no1").shape
-
-
-
-
-
[99]:
-
-
-
-
-(512,)
-
-
-

In this toy example there are 512 precomputed bit descriptor vectors, and a model can be trained with precomputed descriptors from a file (in a composite descriptor with ECFP), like so:

-
-
[100]:
-
-
-
precomputed_config = OptimizationConfig(
-        data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        response_type="regression",
-        training_dataset_file="../tests/data/precomputed_descriptor/train_with_fp.csv",
-        split_strategy=Stratified(fraction=0.2),
-    ),
-    descriptors=[
-        CompositeDescriptor.new(
-            descriptors=[
-                PrecomputedDescriptorFromFile.new(file="../tests/data/precomputed_descriptor/train_with_fp.csv",
-                                                 input_column="canonical", response_column="fp"),
-                ECFP.new()])
-    ],
-    algorithms=[
-        RandomForestRegressor.new(n_estimators={"low": 5, "high": 10}),
-        Ridge.new(),
-        Lasso.new(),
-        PLSRegression.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=2,
-        n_trials=4,
-        n_startup_trials=0,
-        direction=OptimizationDirection.MAXIMIZATION,
-        track_to_mlflow=False,
-        random_seed=42,
-    ),
-)
-
-precomputed_study = optimize(precomputed_config, study_name="precomputed_example")
-build_best(buildconfig_best(precomputed_study), "../target/precomputed_model.pkl")
-with open("../target/precomputed_model.pkl", "rb") as f:
-    precomputed_model = pickle.load(f)
-
-
-
-
-
-
-
-
-[I 2024-08-23 12:08:55,311] A new study created in memory with name: precomputed_example
-[I 2024-08-23 12:08:55,312] A new study created in memory with name: study_name_0
-[I 2024-08-23 12:08:55,426] Trial 0 finished with value: -3014.274803630188 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.011994365911634164, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
-[I 2024-08-23 12:08:55,481] Trial 1 finished with value: -3014.471088599086 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 0.03592375122963953, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
-[I 2024-08-23 12:08:55,519] Trial 2 finished with value: -3029.113810544919 and parameters: {'algorithm_name': 'Ridge', 'Ridge_algorithm_hash': 'cfa1990d5153c8812982f034d788d7ee', 'alpha__cfa1990d5153c8812982f034d788d7ee': 1.8153295905650357, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
-[I 2024-08-23 12:08:55,618] Trial 3 finished with value: -4358.575772003129 and parameters: {'algorithm_name': 'RandomForestRegressor', 'RandomForestRegressor_algorithm_hash': 'f1ac01e1bba332215ccbd0c29c9ac3c3', 'max_depth__f1ac01e1bba332215ccbd0c29c9ac3c3': 14, 'n_estimators__f1ac01e1bba332215ccbd0c29c9ac3c3': 10, 'max_features__f1ac01e1bba332215ccbd0c29c9ac3c3': <RandomForestMaxFeatures.AUTO: 'auto'>, 'descriptor': '{"parameters": {"descriptors": [{"name": "PrecomputedDescriptorFromFile", "parameters": {"file": "../tests/data/precomputed_descriptor/train_with_fp.csv", "input_column": "canonical", "response_column": "fp"}}, {"name": "ECFP", "parameters": {"radius": 3, "nBits": 2048, "returnRdkit": false}}]}, "name": "CompositeDescriptor"}'}. Best is trial 0 with value: -3014.274803630188.
-
-
-

N.B: The qptuna-predict CLI command for Qptuna contains the options --input-precomputed-file, input-precomputed-input-column and --input-precomputed-response-column for generating predictions at inference time. However this is not available within python notebooks and calling predict on a new set of unseen molecules will cause “Could not find descriptor errors” like so:

-
-
[101]:
-
-
-
new_molecules = ["CCC", "CC(=O)Nc1ccc(O)cc1"]
-
-precomputed_model.predict_from_smiles(new_molecules)
-
-
-
-
-
-
-
-
-Could not find descriptor for CCC in file ../tests/data/precomputed_descriptor/train_with_fp.csv.
-Could not find descriptor for CC(=O)Nc1ccc(O)cc1 in file ../tests/data/precomputed_descriptor/train_with_fp.csv.
-
-
-
-
[101]:
-
-
-
-
-array([nan, nan])
-
-
-

A precomputed desciptors from a file should be provided, and the inference_parameters function called, like so:

-
-
[102]:
-
-
-
import tempfile # For this example we use a temp file to store a temporary inference dataset
-
-# extract precomputed descriptor (i.e the 1st descriptor in the composite descriptor for this example)
-precomputed_descriptor = precomputed_model.descriptor.parameters.descriptors[0]
-
-# example fp with 0's for illustration purposes
-example_fp = str([0] * 512)[1:-1]
-
-with tempfile.NamedTemporaryFile() as temp_file:
-    # write the query data to a new file
-    X = pd.DataFrame(
-        data={"canonical": new_molecules,
-              "fp": [example_fp for i in range(len(new_molecules))]})
-    X.to_csv(temp_file.name, index=False)
-
-    # set precomputed descriptor to the new file
-    precomputed_descriptor.inference_parameters(temp_file.name, "canonical", "fp")
-    preds = precomputed_model.predict_from_smiles(["CCC", "CC(=O)Nc1ccc(O)cc1"])
-
-preds
-
-
-
-
-
[102]:
-
-
-
-
-array([292.65709987, 302.64327077])
-
-
-
-
-
-
-

AutoML (Automated model retraining)

-
-

Overview

-

The AutoML functionaility in Qptuna automates the process of preparing data for model training, including data cleaning, feature extraction, and data formatting, streamlining the data preprocessing stage. The main aspects of this workflow are the following:

-
    -
  • Automated Data Preparation: Automated process of preparing data for model training, including cleaning, feature extraction, formatting and quorum checks, streamlining data preprocessing

  • -
  • Model Training with SLURM: Integration with SLURM to dispatch tasks, leveraging distributed computing resources for efficient and scalable model training

  • -
  • Scalable and Efficient with Dynamic Resource Allocation: Workflow designed to handle large datasets (with multiple prediction tasks) and dynamically utilize CPU/GPU/memory HPC resources

  • -
  • Customizable SLURM and Qptuna Templates: SLURM templates can be tailored for different use cases. Both initial training and retraining Qptuna JSON configurations are used, allowing users customise which algorithms and descriptors should be trialed. The default configuration will for e.g. train an initial ChemProp model, and subsequent models will automatically trial Transfer Learning (TL) from previous models for new data, when appropriate

  • -
  • Metadata, Prediction and Model Tracking: The code includes functionality for tracking temporal performance, raw test predictions, active learning predictions and exported Qptuna models, aiding monitoring and evaluating pseudo-prospective model performance over time

  • -
  • Automatic Job Resubmission: In case of SLURM job failures, the code provides functionality to automatically resubmit failed jobs with modified resource allocations, enhancing the robustness of the model training process

  • -
  • Parallel Task Processing: Supports for parallel processing training tasks, allowing for efficient handling of multiple retraining tasks simultaneously, reducing overall processing time

  • -
  • Dry Run Mode: Dry run mode option enables users to simulate the process without actually submitting jobs, useful for verifying configurations and testing the workflow

  • -
-

The following is an example from the Qptuna unit tests:

-
-
[105]:
-
-
-
from optunaz import automl
-from unittest.mock import patch
-import sys
-
-aml_args = [
-    "prog",
-    "-h",
-]
-with patch.object(sys, "argv", aml_args):
-    try:
-        automl.main()
-    except SystemExit:
-        pass
-
-
-
-
-
-
-
-
-usage: prog [-h] --output-path OUTPUT_PATH --email EMAIL --user_name USER_NAME
-            --input-data INPUT_DATA --input-smiles-csv-column
-            INPUT_SMILES_CSV_COLUMN --input-activity-csv-column
-            INPUT_ACTIVITY_CSV_COLUMN --input-task-csv-column
-            INPUT_TASK_CSV_COLUMN --input-initial-template
-            INPUT_INITIAL_TEMPLATE --input-retrain-template
-            INPUT_RETRAIN_TEMPLATE --input-slurm-template INPUT_SLURM_TEMPLATE
-            [--quorum QUORUM] [--n-cores N_CORES] [--dry-run] [-v]
-            [--slurm-req-cores SLURM_REQ_CORES]
-            [--slurm-req-mem SLURM_REQ_MEM]
-            [--slurm-req-partition SLURM_REQ_PARTITION] --slurm-al-pool
-            SLURM_AL_POOL --slurm-al-smiles-csv-column
-            SLURM_AL_SMILES_CSV_COLUMN --slurm-job-prefix SLURM_JOB_PREFIX
-            [--slurm-failure-cores-increment SLURM_FAILURE_CORES_INCREMENT]
-            [--slurm-failure-mem-increment SLURM_FAILURE_MEM_INCREMENT]
-            [--slurm-failure-mins-increment SLURM_FAILURE_MINS_INCREMENT]
-            [--slurm-failure-max-retries SLURM_FAILURE_MAX_RETRIES]
-            [--slurm-failure-max-mem SLURM_FAILURE_MAX_MEM]
-            [--slurm-failure-max-cpu SLURM_FAILURE_MAX_CPU]
-            [--save-previous-models]
-
-AutoML scheduling for temporal automatic retraining of models
-
-options:
-  -h, --help            show this help message and exit
-  --quorum QUORUM
-  --n-cores N_CORES
-  --dry-run
-  -v, --verbose
-  --slurm-req-cores SLURM_REQ_CORES
-  --slurm-req-mem SLURM_REQ_MEM
-  --slurm-req-partition SLURM_REQ_PARTITION
-  --slurm-failure-cores-increment SLURM_FAILURE_CORES_INCREMENT
-  --slurm-failure-mem-increment SLURM_FAILURE_MEM_INCREMENT
-  --slurm-failure-mins-increment SLURM_FAILURE_MINS_INCREMENT
-  --slurm-failure-max-retries SLURM_FAILURE_MAX_RETRIES
-  --slurm-failure-max-mem SLURM_FAILURE_MAX_MEM
-  --slurm-failure-max-cpu SLURM_FAILURE_MAX_CPU
-  --save-previous-models
-
-required named arguments:
-  --output-path OUTPUT_PATH
-                        Path to the output AutoML directory
-  --email EMAIL         Email for SLURM job notifications
-  --user_name USER_NAME
-                        PRID for the AutoML user
-  --input-data INPUT_DATA
-                        Name of the input file[s]. For multiple files use '*'
-                        in wildcard expression
-  --input-smiles-csv-column INPUT_SMILES_CSV_COLUMN
-                        Column name of SMILES column in csv file
-  --input-activity-csv-column INPUT_ACTIVITY_CSV_COLUMN
-                        Column name of activity column in data file
-  --input-task-csv-column INPUT_TASK_CSV_COLUMN
-                        Column name of task column in data file
-  --input-initial-template INPUT_INITIAL_TEMPLATE
-  --input-retrain-template INPUT_RETRAIN_TEMPLATE
-  --input-slurm-template INPUT_SLURM_TEMPLATE
-  --slurm-al-pool SLURM_AL_POOL
-  --slurm-al-smiles-csv-column SLURM_AL_SMILES_CSV_COLUMN
-  --slurm-job-prefix SLURM_JOB_PREFIX
-
-
-
-
-

Note on High-Performance Computing (HPC) Setup

-

This workflow is designed for use with SLURM. If you intend to functionilaity with a different job scheduling system, significant modifications to the SLURM-specific components are necessary. Please ensure thorough understanding and necessary adaptations.

-
-
-

Data extraction options

-

Qptuna AutoML expects temporal data (--input-data) to have been exported from warehouses/databases in a flat file structure in CSV format (which can also be gz compressed), containing SMILES, activity and task (which denotes each distinct property to be modelled) CSV columns.

-

Exports are expected to be temporal in nature, with the naming convention %Y-%m-%d (see here for details). Data can be exported in two ways:

-
    -
  • 1.) Multiple files: Each extraction date gets an distinct/unique file with %Y-%m-%d format within the filename, which denotes that point in temporal train time, like so:

  • -
-
-
[106]:
-
-
-
ls -lrth ../tests/data/automl/
-
-
-
-
-
-
-
-
-total 128
--rw-r--r--@ 1 kljk345  staff   9.2K Aug 23 09:19 2024-01-01.csv
--rw-r--r--@ 1 kljk345  staff    12K Aug 23 09:19 2024-02-01.csv
--rw-r--r--@ 1 kljk345  staff    12K Aug 23 09:19 2024-03-01.csv
--rw-r--r--@ 1 kljk345  staff    12K Aug 23 09:19 2024-04-01.csv
--rw-r--r--@ 1 kljk345  staff   438B Aug 23 09:19 2024-05-01.csv
-
-
-
    -
  • 2.) Single file: Data extraction is written to a single file (with any naming convention). The last modified date of the file is compared to the most recent round of AutoML training. If the file has undergone an update, then a new model building procedure is triggered.

  • -
-
-
-

Walkthough running an AutoML setup

-

The main files that should be configured by users are the following:

-
    -
  • --input-initial-template dictates the first optimisation configuration for a property

  • -
  • --input-retrain-template defines any further optimisation configurations, which e.g. could be configured to perform transfer learning on a compatible model from the --input-initial-template

  • -
  • --input-slurm-template is the bash script to orchestrate model [re]-training. The default provided in ../examples/slurm-scripts/automl.template is setup to use singularity commands, and for easy configuration within our current HPC setup (e.g. different partitions/memory allocations can be easily modified)

  • -
-

Let us consider the following AutoML setup:

-
    -
  • Training data is located in ../tests/data/automl/

  • -
  • The config for an initial round of model training is in ../examples/automl/config.initial.template

  • -
  • A config for any subsequent models is in ../examples/automl/config.retrain.template

  • -
  • Our slurm template is in ../examples/slurm-scripts/automl.template

  • -
  • HPC environment dictates our maximum requested resources should not exceed 50G of memory and 26 cpu

  • -
  • We would like to retain all previous models

  • -
  • A pool of compounds that could be added tested and added to a future model is in ../tests/data/DRD2/subset-1000/train.csv

  • -
-

Then our configuration would be:

-
qptuna-automl  --input-data "../tests/data/automl/*"  \
---email <example>@astrazeneca.com  --user_name <example>  \
---input-smiles-csv-column canonical  --input-activity-csv-column molwt \
---input-task-csv-column one_taskid  \ # one_taskid in the toy data set has only one example task
---input-initial-template ../examples/automl/config.initial.template \
---input-retrain-template ../examples/automl/config.retrain.template \
---input-slurm-template ../examples/slurm-scripts/automl.template \
---n-cores 1 -vvv --slurm-al-pool ../tests/data/DRD2/subset-1000/train.csv \
---slurm-al-smiles-csv-column canonical  --output-path ../example_automl \
---slurm-failure-max-cpu 26 --slurm-failure-max-mem 50 --save-previous-models
-
-
-

and we can perform a test dry-run of this command from within our workbook like so:

-
-
[107]:
-
-
-
aml_args = [
-    "prog",
-    "--output-path",
-    "../target/automl_example",
-    "--email",
-    "test@test.com",
-    "--user_name",
-    "test",
-    "--input-data",
-    "../tests/data/automl/*",
-    "--input-smiles-csv-column",
-    "canonical",
-    "--input-activity-csv-column",
-    "molwt",
-    "--input-task-csv-column",
-    "one_taskid",
-    "--input-initial-template",
-    "../examples/automl/config.initial.template",
-    "--input-retrain-template",
-    "../examples/automl/config.retrain.template",
-    "--input-slurm-template",
-    "../examples/slurm-scripts/automl.template",
-    "--n-cores",
-    "1",
-    "--dry-run", # The dry-run option is enabled, so the AutoML pipeline does not submit to SLURM
-    "-vv", # Use this CLI option to enable detailed debugging logging to observe Qptuna AutoML behaviour
-    "--slurm-al-pool",
-    "../tests/data/DRD2/subset-1000/train.csv",
-    "--slurm-al-smiles-csv-column",
-    "canonical",
-    "--slurm-job-prefix",
-    "testaml"
-]
-with patch.object(sys, "argv", aml_args):
-    automl.main()
-
-
-
-
-
-
-
-
-2024-08-23 12:51:18,031.031 INFO automl - main: Namespace(output_path='../target/automl_example', email='test@test.com', user_name='test', input_data='../tests/data/automl/*', input_smiles_csv_column='canonical', input_activity_csv_column='molwt', input_task_csv_column='one_taskid', input_initial_template='../examples/automl/config.initial.template', input_retrain_template='../examples/automl/config.retrain.template', input_slurm_template='../examples/slurm-scripts/automl.template', quorum=25, n_cores=1, dry_run=True, verbose=2, slurm_req_cores=12, slurm_req_mem=None, slurm_req_partition='dgx', slurm_al_pool='../tests/data/DRD2/subset-1000/train.csv', slurm_al_smiles_csv_column='canonical', slurm_job_prefix='testaml', slurm_failure_cores_increment=4, slurm_failure_mem_increment=20, slurm_failure_mins_increment=720, slurm_failure_max_retries=5, slurm_failure_max_mem=200, slurm_failure_max_cpu=20, save_previous_models=False)
-2024-08-23 12:51:18,041.041 DEBUG automl - main: Processing timepoint 24_01_01
-2024-08-23 12:51:18,042.042 DEBUG automl - first_run: ../target/automl_example/processed_timepoints.json exists
-2024-08-23 12:51:18,044.044 DEBUG automl - checkSkipped: ../target/automl_example/data/TID1/.skip not present
-2024-08-23 12:51:18,044.044 DEBUG automl - checkisLocked: 24_01_01: Lockfile [../target/automl_example/data/TID1/.24_01_01] locks the taskcode [TID1]
-2024-08-23 12:51:18,046.046 DEBUG automl - checkRunningSlurmJobs: Dry run of /usr/bin/squeue
-2024-08-23 12:51:18,050.050 WARNING automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh never ran, so will be resubmit
-
-
-
-
-
-
-
-2024-08-23 12:51:18,050.050 WARNING automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh never ran, so will be resubmit
-
-
-
-
-
-
-
-2024-08-23 12:51:18,053.053 DEBUG automl - submitJob: Dry run of /usr/bin/sbatch ../target/automl_example/data/TID1/TID1.sh
-2024-08-23 12:51:18,053.053 INFO automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh resubmit (2 retrys)
-2024-08-23 12:51:18,054.054 INFO automl - resubmitAnyFailedJobs: Some jobs were resubmitted: ['TID1']
-2024-08-23 12:51:18,054.054 INFO automl - main: Exiting: 24_01_01 lock(s) indicate(s) work ongoing
-2024-08-23 12:51:18,054.054 INFO automl - main: AutoML script took [0.024824142] seconds.
-2024-08-23 12:51:18,055.055 DEBUG base - close: <pid.posix.PidFile object at 0x7ff3f48a5cc0> closing pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-
-
-

Using the verbose -vv enables us to see what the automl script did behind the scenes.

-

Let’s check the testaml directory to see what the automl script has generated in the ../target/automl_example output directory:

-
-
[108]:
-
-
-
!find ../target/automl_example/ | sed -e "s/[^-][^\/]*\// |/g" -e "s/|\([^ ]\)/|-\1/"
-
-
-
-
-
-
-
-
- | | |
- | | |-/processed_timepoints.json
- | | |-/data
- | | | |-TID1
- | | | | |-TID1.csv
- | | | | |-TID1.sh
- | | | | |-.24_01_01
- | | | | |-TID1.json
- | | | | |-.retry
-
-
-

Taken together, the above log and directory structure of automl_example shows that the:

-
    -
  • first temporal point of training data has been correctly ingested (24_01_01)

  • -
  • one available task in this example (for which data is available) at the 24_01_01 timepoint, denoted as TID1 (for molwt prediction) meets quourum and is hence indexed at ../tests/data/automl/data/TID1/

  • -
  • resulting folder data/TID1 comprises the following processed data:

    -
      -
    • TID1.csv : molecular property data set ready for modelling

    • -
    • TID1.json: config for an initial round of model training

    • -
    • TID1.sh: used to run run Qptuna AutoML via an sbatch command, though -- dry-run prevented this happening

    • -
    • .24_01_01 lock file initiated to track the status of the training at this timepoint

    • -
    -
  • -
  • processed_timepoints.json is created to track which timepoints are processed

  • -
-

The script stopped at this point, to allow for HPC resources to submit the initial optimisation job. Subsequent runs of the Qptuna AutoML are required to progress past the initial optimisation run, and so could be scheduled (e.g. using cron or similar).

-

Running the AutoML workflow does a dry-run check of the status of the run:

-
-
[109]:
-
-
-
with patch.object(sys, "argv", aml_args):
-    automl.main()
-
-
-
-
-
-
-
-
-2024-08-23 12:51:18,513.513 DEBUG base - setup: <pid.posix.PidFile object at 0x7ff3f48a5c00> entering setup
-2024-08-23 12:51:18,515.515 DEBUG base - create: <pid.posix.PidFile object at 0x7ff3f48a5c00> create pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-2024-08-23 12:51:18,515.515 DEBUG base - check: <pid.posix.PidFile object at 0x7ff3f48a5c00> check pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-2024-08-23 12:51:18,519.519 INFO automl - main: Namespace(output_path='../target/automl_example', email='test@test.com', user_name='test', input_data='../tests/data/automl/*', input_smiles_csv_column='canonical', input_activity_csv_column='molwt', input_task_csv_column='one_taskid', input_initial_template='../examples/automl/config.initial.template', input_retrain_template='../examples/automl/config.retrain.template', input_slurm_template='../examples/slurm-scripts/automl.template', quorum=25, n_cores=1, dry_run=True, verbose=2, slurm_req_cores=12, slurm_req_mem=None, slurm_req_partition='dgx', slurm_al_pool='../tests/data/DRD2/subset-1000/train.csv', slurm_al_smiles_csv_column='canonical', slurm_job_prefix='testaml', slurm_failure_cores_increment=4, slurm_failure_mem_increment=20, slurm_failure_mins_increment=720, slurm_failure_max_retries=5, slurm_failure_max_mem=200, slurm_failure_max_cpu=20, save_previous_models=False)
-2024-08-23 12:51:18,524.524 DEBUG automl - main: Processing timepoint 24_01_01
-2024-08-23 12:51:18,526.526 DEBUG automl - first_run: ../target/automl_example/processed_timepoints.json exists
-2024-08-23 12:51:18,527.527 DEBUG automl - checkSkipped: ../target/automl_example/data/TID1/.skip not present
-2024-08-23 12:51:18,527.527 DEBUG automl - checkisLocked: 24_01_01: Lockfile [../target/automl_example/data/TID1/.24_01_01] locks the taskcode [TID1]
-2024-08-23 12:51:18,529.529 DEBUG automl - checkRunningSlurmJobs: Dry run of /usr/bin/squeue
-2024-08-23 12:51:18,531.531 WARNING automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh never ran, so will be resubmit
-
-
-
-
-
-
-
-2024-08-23 12:51:18,531.531 WARNING automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh never ran, so will be resubmit
-
-
-
-
-
-
-
-2024-08-23 12:51:18,534.534 DEBUG automl - submitJob: Dry run of /usr/bin/sbatch ../target/automl_example/data/TID1/TID1.sh
-2024-08-23 12:51:18,535.535 INFO automl - resubmitAnyFailedJobs: ../target/automl_example/data/TID1/TID1.sh resubmit (3 retrys)
-2024-08-23 12:51:18,535.535 INFO automl - resubmitAnyFailedJobs: Some jobs were resubmitted: ['TID1']
-2024-08-23 12:51:18,535.535 INFO automl - main: Exiting: 24_01_01 lock(s) indicate(s) work ongoing
-2024-08-23 12:51:18,535.535 INFO automl - main: AutoML script took [0.019070148] seconds.
-2024-08-23 12:51:18,536.536 DEBUG base - close: <pid.posix.PidFile object at 0x7ff3f48a5c00> closing pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-
-
-

The subsequent run of the code above correctly identifies the job was never correctly submitted to the SLURM queue (due to the dry-run), and correctly increased the number of retires: “resubmitAnyFailedJobs: ... resubmit (1 retrys)”.

-

NB: if an acutal job fails with a reported reason, dynamic resource allocation will attempt to increase job time/mem/cpu and resubmit to slurm, in an attempt to facilitate a successful run due to insufficient resources.

-

We can now emulate a sucessful run by copying an example trained model to the directory:

-
-
[110]:
-
-
-
import shutil
-import os
-
-os.remove('../target/automl_example/data/TID1/.24_01_01') # Remove the lock
-shutil.copy('../tests/data/DRD2/drd2_reg.pkl', '../target/automl_example/data/TID1/') # Add the example model
-
-with patch.object(sys, "argv", aml_args):
-    automl.main()
-
-
-
-
-
-
-
-
-2024-08-23 12:51:18,549.549 DEBUG base - setup: <pid.posix.PidFile object at 0x7ff3f48a5e40> entering setup
-2024-08-23 12:51:18,549.549 DEBUG base - create: <pid.posix.PidFile object at 0x7ff3f48a5e40> create pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-2024-08-23 12:51:18,550.550 DEBUG base - check: <pid.posix.PidFile object at 0x7ff3f48a5e40> check pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-2024-08-23 12:51:18,552.552 INFO automl - main: Namespace(output_path='../target/automl_example', email='test@test.com', user_name='test', input_data='../tests/data/automl/*', input_smiles_csv_column='canonical', input_activity_csv_column='molwt', input_task_csv_column='one_taskid', input_initial_template='../examples/automl/config.initial.template', input_retrain_template='../examples/automl/config.retrain.template', input_slurm_template='../examples/slurm-scripts/automl.template', quorum=25, n_cores=1, dry_run=True, verbose=2, slurm_req_cores=12, slurm_req_mem=None, slurm_req_partition='dgx', slurm_al_pool='../tests/data/DRD2/subset-1000/train.csv', slurm_al_smiles_csv_column='canonical', slurm_job_prefix='testaml', slurm_failure_cores_increment=4, slurm_failure_mem_increment=20, slurm_failure_mins_increment=720, slurm_failure_max_retries=5, slurm_failure_max_mem=200, slurm_failure_max_cpu=20, save_previous_models=False)
-2024-08-23 12:51:18,555.555 DEBUG automl - main: Processing timepoint 24_01_01
-2024-08-23 12:51:18,556.556 DEBUG automl - first_run: ../target/automl_example/processed_timepoints.json exists
-2024-08-23 12:51:18,557.557 DEBUG automl - checkSkipped: ../target/automl_example/data/TID1/.skip not present
-2024-08-23 12:51:18,558.558 DEBUG automl - checkisLocked: 24_01_01: Lockfile [../target/automl_example/data/TID1/.24_01_01] not set; no lock for taskcode [TID1]
-2024-08-23 12:51:18,576.576 DEBUG automl - processRetraining: TID1: Fist timepoint
-2024-08-23 12:51:18,577.577 INFO automl - main: Work appears complete for timepoint 24_01_01
-2024-08-23 12:51:18,577.577 DEBUG automl - setProcessedTimepoints: Appended processed timepoint 24_01_01 to ../target/automl_example/processed_timepoints.json
-2024-08-23 12:51:18,581.581 DEBUG automl - getRetrainingData: 24_01_01 is in processed_timepoints.json
-2024-08-23 12:51:18,585.585 DEBUG automl - main: Processing timepoint 24_02_01
-2024-08-23 12:51:18,586.586 DEBUG automl - first_run: ../target/automl_example/processed_timepoints.json exists
-2024-08-23 12:51:18,587.587 DEBUG automl - checkSkipped: ../target/automl_example/data/TID1/.skip not present
-2024-08-23 12:51:18,588.588 DEBUG automl - checkisLocked: 24_02_01: Lockfile [../target/automl_example/data/TID1/.24_02_01] not set; no lock for taskcode [TID1]
-2024-08-23 12:51:18,592.592 DEBUG automl - processTrain: TID1: 152 new data points found
-2024-08-23 12:51:18,598.598 DEBUG automl - processRetraining: TID1: Dynamic resource allocation mem: 60G
-2024-08-23 12:51:18,599.599 DEBUG automl - processRetraining: TID1: 24_02_01: No temporal predictions since [No previous model found for [../target/automl_example/data/TID1/TID1.pkl]]
-2024-08-23 12:51:18,601.601 DEBUG automl - writeDataset: wrote dataset to ../target/automl_example/data/TID1/TID1.csv
-2024-08-23 12:51:18,608.608 DEBUG automl - writeSlurm: wrote slurm to ../target/automl_example/data/TID1/TID1.sh
-2024-08-23 12:51:18,611.611 DEBUG automl - writeJson: wrote json to ../target/automl_example/data/TID1/TID1.json
-2024-08-23 12:51:18,612.612 DEBUG automl - setJobLocked: lock_file for ../target/automl_example/data/TID1/.24_02_01 was set
-2024-08-23 12:51:18,612.612 DEBUG automl - submitJob: Dry run of /usr/bin/sbatch ../target/automl_example/data/TID1/TID1.sh
-2024-08-23 12:51:18,613.613 INFO automl - main: Exiting at this timepoint since there is work to do
-2024-08-23 12:51:18,613.613 DEBUG automl - main: Work: ['TID1']
-2024-08-23 12:51:18,614.614 INFO automl - main: AutoML script took [0.062794924] seconds.
-2024-08-23 12:51:18,614.614 DEBUG base - close: <pid.posix.PidFile object at 0x7ff3f48a5e40> closing pidfile: /Users/kljk345/PycharmProjects/optuna_az/notebooks/prog.pid
-
-
-

We observe that after lock file removal and emulation of a trained model, the AutoML pipeline correctly identifies that ” Work appears complete for timepoint 24_01_01”.

-

The output shows how the pipeline proceeds to the next timepoint, 24_02_01, which has 152 new datapoints and is allocated dynamically allocated 60G of requested SLURM memory. A dry run to generate predictions for the pseodu-prospective performance of a “24_01_01” model vs. new data from “24_02_01” timepoint is now initiated.

-

Upon generation of pseodu-prospective performance, the next next round of optimisation, build and active learning predictions are initiated for the next timepoint, and the process continues like so.

-
-
[ ]:
-
-
-

-
-
-
-
-
- - -
-