Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in the 'quant_bins' module #533

Open
quliping opened this issue Jan 7, 2024 · 4 comments
Open

Bug in the 'quant_bins' module #533

quliping opened this issue Jan 7, 2024 · 4 comments

Comments

@quliping
Copy link

quliping commented Jan 7, 2024

          > The abundance value is simmilar to TPM in RNAseq. It represents the average read coverage of a bin per million reads. So its really a standardized read coverage estimation.

Salmon produces the coverage values for each contig at first. Then I calculate the likely coverage of the bin overall by taking the average, but accounting for the lengths of the contigs. For example if a bin had these contigs: 1000bp at cov=10, 5000bp at cov=5, 10000bp at cov=11. Ave_cov = (10*1000 + 5*5000 + 11*10000)/(1000+5000+10000) = 145000/16000 = 9.0625

In the quant_bins module of metawrap, I found some problems... The script 'split_salmon_out_into_bins.py' was used to summary the TPM results of MAGs of metawrap, right? However, I found that the caculation method in your script is totally different from what you said...It seems that you just chose a median value of a list of TPM in the script? I wrote my detail comments behind "##" in file 'compare.txt': compare.txt
00_bug

Besides, I think whatever TPM or Ave_cov in metawrap is just the realtive abundance of an MAG in all MAGs or a contig in all contigs in a assembly, right? If I want to compare the abundance of one or mutiple MAGs in different samples, but these MAGs were only parts of all MAGs retrieved from these samples or even obtained from other unrelated samples, what should I do? For example, I have 12 genomes (12 different species) of a genus, some of them were retrieved from my 80 samples, some were reference genomes. I want to know the abundance of the genus in the 80 samples. TPM seems inappropriate because I will got 80 '10,00,000'... I can only compare the relative abundance difference of the 12 species in 80 samples rather than the abundance difference of the entire genus in the 80 samples. The CPM of metawrap seems also imappropriate becaues it is very similar to TPM.

Originally posted by @quliping in #84 (comment)

@bioinformaticsporter
Copy link

可以跟您请教一个问题吗?在安装依赖项的时候,出现安装不上的问题,这是运行代码,我安装了2.7python环境,然后从github克隆了软件,接下来安装依赖项,卡在这个地方不知道怎末解决?
conda install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome fastqc kraken=1.1 kraken=2.0^Crona=2.7 matplotlib maxbin2 megahit metabat2 pandas prokka quast r-ggplot2 r-recommended salmon samtools=1.9 seaborn spades trim-galore
(metawrap) [svip019 @ cloud 21:17:42 ~]
$ conda install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome fastqc kraken=1.1.1 kraken=2.1.3 krona=2.7 matplotlib maxbin2 megahit metabat2 pandas prokka quast r-ggplot2 r-recommended salmon samtools=1.9 seaborn spades trim-galore
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • kraken[version='1.1.1.,2.1.3.']

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

@tfaitova1
Copy link

tfaitova1 commented Mar 4, 2024

@quliping Did you use other software for calculating an abundance table for your bins? Currently struggling to find a suitable method, metawrap quan_bins does not seem useful due to bugs and the concerns you raised.
I only know about bedtools genomecov, haven't you tried this?

Many thanks,
Tereza

@tfaitova1
Copy link

@ursky Could you please comment on the concerns in the split_salmon_out_into_bins.py script that @quliping raised?

@yqy6611
Copy link

yqy6611 commented Aug 17, 2024

Seems like the first line commented in the script is what is illustrated in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants