Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter combination recommendations #215

Open
kiran-lee opened this issue Jul 3, 2024 · 1 comment
Open

Parameter combination recommendations #215

kiran-lee opened this issue Jul 3, 2024 · 1 comment

Comments

@kiran-lee
Copy link

kiran-lee commented Jul 3, 2024

on which platform/server? (Windows? Windows Sublinux? MacOS? Ubuntu? etc.)

Linux

MitoZ version?

3.6

How did you install MitoZ? (e.g. Docker, Udocker, Singularity, Conda-Pack, Conda, or source code)

Conda

Did you run a test after your installation, and was the test run okay?

Yes. OK.

How much data (roughly) did you use for mitogenome assembly? e.g. 5Gbp?

25 Gbp.

The command you used?

mitoz all 
--outprefix sw
--thread_number 20
--clade Chordata
--requiring_taxa Chordata
--genetic_code 2
--species_name "Seychelles warbler"
--fq1 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R1.fastq.gz
--fq2 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R2.fastq.gz
--fastq_read_length 151
--data_size_for_mt_assembly 25,0
--assembler megahit
--kmers_megahit 39 59 79 99 119 141
--memory 100
--requiring_taxa Chordata
--min_abundance 0

Problem description

From your experience do you have suggestions for combinations of parameters to use on a sample of raw paired-end reads, with mean read depth of 15x?

I have tried 13 combinations that vary in the 1) sample used (either a ~17x coverage or 10x coverage sample), 2) assembler used (megahit or spades), 3) data size used for assembly (5, 25 ,50 and 80), 4) kmers ("Large
39 59 79 99 119 141" or "Small 21 31 41 51 61 71 81 91”) and 5) whether reads were trimmed or not. I attach the below table summarising the combinations I have tried (MitoZ_combinations.xlsx).

The command that works best (attached above) finds all genes but is non-circular and produces two seq_id (combo5summary.txt). The read depth across the genome looks OK apart from the beginning (combo5circos.depth.txt). This is the command :

mitoz all 
--outprefix sw
--thread_number 20
--clade Chordata
--requiring_taxa Chordata
--genetic_code 2
--species_name "Seychelles warbler"
--fq1 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R1.fastq.gz
--fq2 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R2.fastq.gz
--fastq_read_length 151
--data_size_for_mt_assembly 25,0
--assembler megahit
--kmers_megahit 39 59 79 99 119 141
--memory 100
--requiring_taxa Chordata
--min_abundance 0

The raw paired-end reads can be found here:
102: https://cgr.liv.ac.uk/illum/LIMS26629_51a15827930a0b65/Raw/Sample_102/
53: https://cgr.liv.ac.uk/illum/LIMS25133_4f8b5ec41474a239/Raw/Sample_53-11998DH0147L01_4879/

Log messages from MitoZ (stdout and stderr, e.g., both m.log and m.err files)

Attached as combo5.log
combo5.log
and combo5errorsummaryval.txt
combo5errorsummaryval.txt

@linzhi2013
Copy link
Owner

Hi @kiran-lee ,

Thanks for your detailed explaination!

Based on my experience on mammals (your samples are birds), 2-5Gbp or 8Gbp is good enough for assembling circular mitogenome.

I have no better recommendations now. But maybe you can map all the raw data to the mitogenomes of some closely related species? And use a loose cutoff to keep many alignable reads. Then use the mapped reads to assemble the mitogenome with MitoZ?

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants