Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_existing failed #10

Open
jjoropezav opened this issue Apr 11, 2024 · 2 comments
Open

add_existing failed #10

jjoropezav opened this issue Apr 11, 2024 · 2 comments

Comments

@jjoropezav
Copy link

jjoropezav commented Apr 11, 2024

Hello again, sorry to bother

Found this error running build_kraken.nf, tried 3 times with the same result


Apr-10 21:34:56.803 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'add_existing (1)'

Caused by:
Process add_existing (1) terminated with an error exit status (25)

Command executed:

kraken2-build --download-library bacteria --db medi_db --threads 4

Command exit status:
25

Command output:
(empty)

Command error:
Step 1/2: Performing rsync file transfer of requested files
rsync: link_stat "/all/GCF/037/832/925/GCF_037832925.1_ASM3783292v1/GCF_037832925.1_ASM3783292v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1684) [generator=3.1.3]
rsync_from_ncbi.pl: rsync error, exiting: 5888

Work dir:
/scratch/home/joropeza/medi/work/b4/7605aa405f3eaed44400ee14867236


seems that the sequence is suppressed in ncbi: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_037832925.1/

attached is the log file from nextflow, any work around it?
.nextflow.log
GCF_037832925.1_ASM3783292v1_genomic.fna.gz

Could we use a premade kraken database to fix this issue? https://benlangmead.github.io/aws-indexes/k2

Thanks again!

@cdiener
Copy link
Collaborator

cdiener commented Apr 12, 2024

Hi, this is an error in Kraken2 itself unfortunately. However it seems like a timing issue because Kraken2 usually does a dry-run first and flags files that can not be downloaded. So I would think that cases like this would usually be caught. However, if the genome got suppressed between the dry run and download (which can happen, especially if the download is slow and takes a while) this can happen. The easiest fix would be to just to rerun the download at another date.

We would love to provide prebuilt hashes, the issue is the size (~600GB) because there is currently no public repository that lets you deposit data of that size for free. I will try to apply for the AWS program Kraken2 uses, but there is no guarantee it will be granted. We are also floating the idea of a subsampled hash (to 128GB) which could be uploaded to existing repos. Since we are rebuilding the database for the revisions it will take a bit though (roughly when the paper is published).

Sorry for the inconvenience!

@jjoropezav
Copy link
Author

Oh, i see the problem now

I have space available on my Google Drive account and can keep that link open without issues for at least a year meanwhile. This could be a viable solution for hosting the prebuilt hashes

i dont know if that would work

Thanks again for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants