Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no clip option for fasterq-dump #952

Open
paulzierep opened this issue Jul 23, 2024 · 2 comments
Open

no clip option for fasterq-dump #952

paulzierep opened this issue Jul 23, 2024 · 2 comments

Comments

@paulzierep
Copy link

We discovered, that fastq files downloaded from NCBI SRA via fasterq_dump are different to the ENA stored fastq files. After some digging, this is probably due to the --clip option.

Example downloaded from: https://www.ebi.ac.uk/ena/browser/view/DRR010705

@DRR010705.1 HUMWT9A01AC2YA/4
ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCATCTTGCGCTCCTTGGTATTCCTTGGAGCATGCCTGTTTGAGTATCATGAGCAAATCTCAAAGTCAATTCCTTAATTGGTTTTGCTTTGGACTTGGAGGTCTTGCAGATTTCACAGTCTGCTCCTCTTAAATGCATTAGCTGGATCTCAGTAATTATGCTTGGTTCCACTCGGCGTGATAAGTATCACTCGCTGAGGACACTGTTAAAAAGGTGGCCAGGAAATTACTGATTGAACCGCTTCTAACGGTCTATTAAGTTGGACAATTGACCCCTTAAGTTTGATCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACAGGGATTGCCTTAGTAACGGCGGGTGAAGCGGCAACAGCTCAAATTTGAAATCTGGCTCTTTCAGGGTCCGAGTTGTAATTTGTAGAAGT
+
EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHFIIIIIIIIIIIIIIHBBBHDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCEECCBBIIIIIIIIADDIIICCEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDCHIIIIIIIIIIIIIIIIIIIIIIHDAADDIIAAA;;AAIAADDACIIAAAA@IIICCAECCICAAACAAAAIBBBBA>>>??@????AA899999;@;;????A?87777<A=:666=<<<;444;<AB996=;AA<<99999<?==;;;8331021..,,,..0..,,,//000.,,,,//1////1186/...1353;8<:7733357:8:777555544841111233310011464333331101440,,,,,.-444221

Default download via fasterq_dump

@HUMWT9A01AC2YA/4
ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCATCTTGCGCTCCTTGGTATTCCTTGGAGCATGCCTGTTTGAGTATCATGAGCAAATCTCAAAGTCAATTCCTTAATTGGTTTTGCTTTGGACTTGGAGGTCTTGCAGATTTCACAGTCTGCTCCTCTTAAATGCATTAGCTGGATCTCAGTAATTATGCTTGGTTCCACTCGGCGTGATAAGTATCACTCGCTGAGGACACTGTTAAAAAGGTGGCCAGGAAATTACTGATTGAACCGCTTCTAACGGTCTATTAAGTTGGACAATTGACCCCTTAAGTTTGATCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACAGGGATTGCCTTAGTAACGGCGGGTGAAGCGGCAACAGCTCAAATTTGAAATCTGGCTCTTTCAGGGTCCGAGTTGTAATTTGTAGAAGTAG
+
EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHFIIIIIIIIIIIIIIHBBBHDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCEECCBBIIIIIIIIADDIIICCEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDCHIIIIIIIIIIIIIIIIIIIIIIHDAADDIIAAA;;AAIAADDACIIAAAA@IIICCAECCICAAACAAAAIBBBBA>>>??@????AA899999;@;;????A?87777<A=:666=<<<;444;<AB996=;AA<<99999<?==;;;8331021..,,,..0..,,,//000.,,,,//1////1186/...1353;8<:7733357:8:777555544841111233310011464333331101440,,,,,.-44422100

Unfortunately, there seems to be no clip parameter for fasterq-dump.
Any idea how to generate identical reads as the ones stored in ENA ?

See also: galaxyproject/tools-iuc#6171 as we're trying to use that for Galaxy.

@wraetz
Copy link
Contributor

wraetz commented Jul 23, 2024

Unfortunately fasterq-dump does not support a clip option, but the older fastq-dump does.

@durbrow
Copy link
Collaborator

durbrow commented Jul 23, 2024

In general, without reproducing the options used, you can't compare the result of two runs of fastq-dump or fasterq-dump. Does EBI document what options they used when generating the fastq file you downloaded from them?

Is there a reason you need clipping in fasterq-dump besides reproducing the file from EBI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants