diff --git a/docs/source/benchmarks/img_benchmarks/binary_result_barplot.png b/docs/source/benchmarks/img_benchmarks/binary_result_barplot.png new file mode 100644 index 0000000000..0d70e51346 Binary files /dev/null and b/docs/source/benchmarks/img_benchmarks/binary_result_barplot.png differ diff --git a/docs/source/benchmarks/img_benchmarks/cd-all-1h8c-constantpredictor.png b/docs/source/benchmarks/img_benchmarks/cd-all-1h8c-constantpredictor.png new file mode 100644 index 0000000000..321586ee3f Binary files /dev/null and b/docs/source/benchmarks/img_benchmarks/cd-all-1h8c-constantpredictor.png differ diff --git a/docs/source/benchmarks/img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png b/docs/source/benchmarks/img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png new file mode 100644 index 0000000000..94338c633d Binary files /dev/null and b/docs/source/benchmarks/img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png differ diff --git a/docs/source/benchmarks/img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png b/docs/source/benchmarks/img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png new file mode 100644 index 0000000000..e9a6bce020 Binary files /dev/null and b/docs/source/benchmarks/img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png differ diff --git a/docs/source/benchmarks/img_benchmarks/fedot_class_gluon.png b/docs/source/benchmarks/img_benchmarks/fedot_class_gluon.png deleted file mode 100644 index c5e754cb35..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/fedot_class_gluon.png and /dev/null differ diff --git a/docs/source/benchmarks/img_benchmarks/fedot_classregr.png b/docs/source/benchmarks/img_benchmarks/fedot_classregr.png deleted file mode 100644 index 3f1ff8b3b2..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/fedot_classregr.png and /dev/null differ diff --git a/docs/source/benchmarks/img_benchmarks/fedot_meta.png b/docs/source/benchmarks/img_benchmarks/fedot_meta.png deleted file mode 100644 index 24c22403f5..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/fedot_meta.png and /dev/null differ diff --git a/docs/source/benchmarks/img_benchmarks/fedot_time_series.png b/docs/source/benchmarks/img_benchmarks/fedot_time_series.png deleted file mode 100644 index 18ec5b32d4..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/fedot_time_series.png and /dev/null differ diff --git a/docs/source/benchmarks/img_benchmarks/multiclass_result_barplot.png b/docs/source/benchmarks/img_benchmarks/multiclass_result_barplot.png new file mode 100644 index 0000000000..d082f4ec75 Binary files /dev/null and b/docs/source/benchmarks/img_benchmarks/multiclass_result_barplot.png differ diff --git a/docs/source/benchmarks/img_benchmarks/stats.png b/docs/source/benchmarks/img_benchmarks/stats.png deleted file mode 100644 index 63f68efd5e..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/stats.png and /dev/null differ diff --git a/docs/source/benchmarks/img_benchmarks/ts_metrics.png b/docs/source/benchmarks/img_benchmarks/ts_metrics.png deleted file mode 100644 index f080f7f005..0000000000 Binary files a/docs/source/benchmarks/img_benchmarks/ts_metrics.png and /dev/null differ diff --git a/docs/source/benchmarks/tabular.rst b/docs/source/benchmarks/tabular.rst index 0ff6dd00a7..1c23815254 100644 --- a/docs/source/benchmarks/tabular.rst +++ b/docs/source/benchmarks/tabular.rst @@ -2,53 +2,62 @@ Tabular data ------------ Here are overall classification problem results across state-of-the-art AutoML frameworks -using `AMLB `__ test suite: +using self-runned tasks form OpenML test suite (10 folds run): .. csv-table:: - :header: Dataset, Metric, AutoGluon, FEDOT, H2O, TPOT - - adult, auc, 0.91001, 0.91529, **0.93077**, 0.92729 - airlines, auc, 0.72491, 0.65378, **0.73039**, 0.69368 - albert, auc, **0.73903**, 0.72765, nan, nan - amazon_employee_access, auc, 0.85715, 0.85911, **0.87281**, 0.86625 - apsfailure, auc, 0.99062, 0.98999, **0.99252**, 0.99044 - australian, auc, **0.93953**, 0.93785, 0.93857, 0.93604 - bank-marketing, auc, 0.93126, 0.93245, **0.93860**, 0.93461 - blood-transfusion, auc, 0.68959, 0.72444, **0.75949**, 0.74019 - christine, auc, 0.80429, 0.80446, **0.81936**, 0.80669 - credit-g, auc, **0.79529**, 0.78458, 0.79357, 0.79381 - guillermo, auc, **0.89967**, 0.89125, nan, 0.78331 - jasmine, auc, 0.88312, 0.88548, 0.88734, **0.89038** - kc1, auc, 0.82226, 0.83857, nan, **0.84481** - kddcup09_appetency, auc, 0.80447, 0.78778, **0.82912**, 0.82556 - kr-vs-kp, auc, 0.99886, 0.99925, 0.99972, **0.99976** - miniboone, auc, 0.98217, 0.98102, nan, **0.98346** - nomao, auc, 0.99483, 0.99420, **0.99600**, 0.99538 - numerai28_6, auc, 0.51655, 0.52161, **0.53052**, nan - phoneme, auc, 0.96542, 0.96448, 0.96751, **0.97070** - riccardo, auc, **0.99970**, 0.99794, nan, nan - sylvine, auc, 0.98470, 0.98496, 0.98936, **0.99339** - car, neg_logloss, -0.11659, -0.08885, **-0.00347**, -0.64257 - cnae-9, neg_logloss, -0.33208, -0.27010, -0.21849, **-0.15369** - connect-4, neg_logloss, -0.50157, -0.47033, **-0.33770**, -0.37349 - covertype, neg_logloss, **-0.07140**, -0.14096, -0.26422, nan - dilbert, neg_logloss, -0.14967, -0.24455, **-0.07643**, -0.16839 - dionis, neg_logloss, **-2.15760**, nan, nan, nan - fabert, neg_logloss, -0.78781, -0.90152, **-0.77194**, -0.89159 - fashion-mnist, neg_logloss, **-0.33257**, -0.38379, -0.38328, -0.53549 - helena, neg_logloss, **-2.78497**, -6.34863, -2.98020, -2.98157 - jannis, neg_logloss, -0.72838, -0.76192, **-0.69123**, -0.70310 - jungle_chess, neg_logloss, -0.43064, -0.27074, -0.23952, **-0.21872** - mfeat-factors, neg_logloss, -0.16118, -0.17412, **-0.09296**, -0.10726 - robert, neg_logloss, **-1.68431**, -1.74509, nan, nan - segment, neg_logloss, -0.09419, -0.09643, **-0.05962**, -0.07711 - shuttle, neg_logloss, -0.00081, -0.00101, **-0.00036**, nan - vehicle, neg_logloss, -0.51546, -0.42776, **-0.33137**, -0.39150 - volkert, neg_logloss, **-0.92007**, -1.04485, -0.97797, nan - -The statistical analysis was conducted using the Friedman t-test. -The results of experiments and analysis confirm that FEDOT results are statistically indistinguishable -from SOTA competitors H2O, AutoGluon and TPOT (see below). - -.. image:: img_benchmarks/stats.png \ No newline at end of file + :header: Dataset,FEDOT,AutoGluon,H2O + + adult,0.874,0.874,0.875,0.874 + airlines,0.669,0.669,0.675,0.617 + airlinescodrnaadult,0.812,-,0.818,0.809 + albert,0.670,0.669,0.697,0.667 + amazon_employee_access,0.949,0.947,0.951,0.953 + apsfailure,0.994,0.994,0.995,0.995 + australian,0.871,0.870,0.865,0.860 + bank-marketing,0.910,0.910,0.910,0.899 + blood-transfusion,0.747,0.697,0.797,0.746 + car,1.000,1.000,0.998,0.998 + christine,0.746,0.746,0.748,0.737 + click_prediction_small,0.835,0.835,0.777,0.777 + cnae-9,0.957,0.954,0.957,0.954 + connect-4,0.792,0.788,0.865,0.867 + covertype,0.964,0.966,0.976,0.952 + credit-g,0.753,0.759,0.766,0.727 + dilbert,0.985,0.982,0.996,0.984 + fabert,0.688,0.685,0.726,0.534 + fashion-mnist,0.885,-,0.734,0.718 + guillermo,0.821,-,0.915,0.897 + helena,0.332,0.333,-,0.318 + higgs,0.731,0.732,0.369,0.336 + jannis,0.718,0.718,0.743,0.719 + jasmine,0.817,0.821,0.734,0.727 + jungle_chess_2pcs_raw_endgame_complete,0.953,0.939,0.817,0.817 + kc1,0.866,0.867,0.996,0.947 + kddcup09_appetency,0.982,0.982,0.866,0.818 + kr-vs-kp,0.995,0.996,0.982,0.962 + mfeat-factors,0.980,0.979,0.980,0.980 + miniboone,0.948,0.948,0.952,0.949 + nomao,0.969,0.970,0.975,0.974 + numerai28_6,0.523,0.522,0.522,0.505 + phoneme,0.915,0.916,0.916,0.910 + riccardo,0.997,-,0.998,0.997 + robert,0.405,-,0.559,0.487 + segment,0.982,0.982,0.982,0.980 + shuttle,1.000,1.000,1.000,1.000 + sylvine,0.952,0.951,0.952,0.948 + vehicle,0.851,0.849,0.846,0.835 + volkert,0.694,0.694,0.758,0.697 + Mean F1,0.838,0.837,0.833,0.812 + + +Also, we tested FEDOT on the results of `AMLB ` benchmark. +The visualization of FEDOT (v.0.7.3) results against H2O (3.46.0.4), AutoGluon (v.1.1.0), TPOT (v.0.12.1) and LightAutoML (v.0.3.7.3) +obtained using built-in visualizations of critial difference plot from AutoMLBenchmark are provided below: + +.. image:: img_benchmarks/cd-all-1h8c-constantpredictor.png +.. image:: img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png +.. image:: img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png + +We can claim that results are statistically better that TPOT and and indistinguishable from H2O and AutoGluon. + diff --git a/docs/source/faq/abstract.rst b/docs/source/faq/abstract.rst index 6ee8d1da81..7a4a76abfd 100644 --- a/docs/source/faq/abstract.rst +++ b/docs/source/faq/abstract.rst @@ -7,16 +7,16 @@ Abstract data-driven composite models. It can solve classification, regression, clustering, and forecasting problems.* -.. topic:: What FEDOT is framework. +.. topic:: Why FEDOT is framework? *While the exact difference between 'library' and 'framework' is a bit ambiguous and context-dependent in many cases, we still consider FEDOT as a framework.* *The reason is that is can be used not only to solve pre-defined AutoML task, but also can be used to build new derivative solutions. - *As an examples:* `FEDOT.NAS`_, `FEDOT.Industrial`_. + As an examples:* `FEDOT.NAS`_, `FEDOT.Industrial`_. -.. topic:: Why should I use FEDOT instead of existing state-of-the-art solutions (H2O/TPOT/etc)? +.. topic:: Why should I use FEDOT instead of existing state-of-the-art solutions (LightAutoML/AutoGluon/H2O/etc)? *In practice, the existing AutoML solutions are really effective for the limited set of problems only. During the model learning, modern AutoML @@ -25,7 +25,7 @@ Abstract set of models (this approach is also referred to as the Combined Algorithm Selection and Hyperparameters optimization - CASH) since the overall learning and meta-learning process is extremely expensive. In - the Fedot we have used the composite models concept. We claim, + the FEDOT we have used the composite models concept. We claim, that it allows us to solve many actual real-world problems in a more efficient way. Also, we are aimed to outperform the existing solutions even for well-known benchmarks (e.g. PMLB datasets).*