Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks to address when adding option to use“full” TCGA reference cohort #164

Open
JMarzec opened this issue Aug 30, 2024 · 1 comment

Comments

@JMarzec
Copy link
Member

JMarzec commented Aug 30, 2024

Consider the following matters to address when adding the option to use the “full” TCGA patients reference cohort

  1. Use static plots (instead of interactive ones, in particular those with per-sample data points within the "Input data summary" and "Expression profiles" sections) to reduce the run time as well as the size of the final report
  2. Switch off saving the expression data (expression matrixes) and summary tables since they are computationally intense and produce big files, which are used only for RNA data portal
  3. Look at the Addendum run time to check which time-consuming code chunks can be skipped to reduce the run time
  4. Create separate "RNAsum.data" repo with expression matrix files including the “full” TCGA patients reference cohort
@JMarzec
Copy link
Member Author

JMarzec commented Sep 16, 2024

I run RNAsum using the “full” and "partial" TCGA patients reference cohort options for the following samples:

SBJ04426 BRCA
SBJ04187 BRCA
SBJ04296 BRCA
SBJ01649 PANCAN
SBJ04469 PANCAN
SBJ02061 PANCAN
SBJ02091 PANCAN
SBJ04376 PANCAN
SBJ04408 PANCAN

Attached are summary plots illustrating the following:

  • RNAsum processing time by sample
  • RNAsum processing time by chunk
  • RNAsum report size by sample

Based on the "RNAsum processing time by chunk" chart , the following R code chunks are the most computationally demanding (comments in "()" indicate whether respective chunks can be skipped using the "full" TCGA reference option):

(can be skipped) data_transformation_plot
(keep) glance_expr_plot_immune_genes
(keep) pca
(keep) glance_expr_plot_cancer_genes
(can be skipped) data_transformation_display
(keep) glance_expr_plot_hrd_genes
(keep) top_hits_fusions
(keep) unnamed-chunk-1
(keep) rle

I'd also skip "data_normalisation_plot", "scree_combined_data_display" and "rle_display" chunks since these are not readable given the number of included samples.

RNAsum processing time by sample
RNAsum report size by sample
RNAsum processing time by chunk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant