From f2ee03b1b065b684af76d0f9b96a479ab7584208 Mon Sep 17 00:00:00 2001 From: Jake Crawford Date: Mon, 9 Sep 2024 18:56:55 -0400 Subject: [PATCH] Update content/91.supp-info.md Co-authored-by: Casey Greene --- content/91.supp-info.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/91.supp-info.md b/content/91.supp-info.md index 9252c5d..6c0de4e 100644 --- a/content/91.supp-info.md +++ b/content/91.supp-info.md @@ -45,6 +45,7 @@ For both the "best" and "smallest good" model selection approaches, this effect Based on these results, given the observation that the mean difference in model performance is fairly small in both "frequent CNV" and "rare CNV" cases, and for both model selection approaches, we conclude that combining point mutation and CNV data and including the target gene in the feature set are reasonable general rules for our pan-cancer and pan-gene study. In general, our focus is less on individual prediction performance and more on model complexity, which is another degree removed from the individual prediction performance. +In addition, including the target gene would seem most likely to increase the benefit of smaller models, as the single-gene could be considered particularly information rich. However, the exceptions that we pointed out above emphasize the importance of considering the biological context in applications to specific driver genes or prediction problems. ![Bar plot showing difference in performance (AUPR) between models including and excluding the target gene, for genes where CNV changes are (top) and are not (bottom) frequently included in the label set, colored by model selection approach. Positive values represent better performance for the “control” model, and negative values better performance for the “drop target” model.](images/supp_figure_1.png){#fig:supp_note tag="S1" width="100%"}