add_existing failed #10

jjoropezav · 2024-04-11T14:22:09Z

Hello again, sorry to bother

Found this error running build_kraken.nf, tried 3 times with the same result

Apr-10 21:34:56.803 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'add_existing (1)'

Caused by:
Process add_existing (1) terminated with an error exit status (25)

Command executed:

kraken2-build --download-library bacteria --db medi_db --threads 4

Command exit status:
25

Command output:
(empty)

Command error:
Step 1/2: Performing rsync file transfer of requested files
rsync: link_stat "/all/GCF/037/832/925/GCF_037832925.1_ASM3783292v1/GCF_037832925.1_ASM3783292v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1684) [generator=3.1.3]
rsync_from_ncbi.pl: rsync error, exiting: 5888

Work dir:
/scratch/home/joropeza/medi/work/b4/7605aa405f3eaed44400ee14867236

seems that the sequence is suppressed in ncbi: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_037832925.1/

attached is the log file from nextflow, any work around it?
.nextflow.log
GCF_037832925.1_ASM3783292v1_genomic.fna.gz

Could we use a premade kraken database to fix this issue? https://benlangmead.github.io/aws-indexes/k2

Thanks again!

The text was updated successfully, but these errors were encountered:

cdiener · 2024-04-12T10:33:18Z

Hi, this is an error in Kraken2 itself unfortunately. However it seems like a timing issue because Kraken2 usually does a dry-run first and flags files that can not be downloaded. So I would think that cases like this would usually be caught. However, if the genome got suppressed between the dry run and download (which can happen, especially if the download is slow and takes a while) this can happen. The easiest fix would be to just to rerun the download at another date.

We would love to provide prebuilt hashes, the issue is the size (~600GB) because there is currently no public repository that lets you deposit data of that size for free. I will try to apply for the AWS program Kraken2 uses, but there is no guarantee it will be granted. We are also floating the idea of a subsampled hash (to 128GB) which could be uploaded to existing repos. Since we are rebuilding the database for the revisions it will take a bit though (roughly when the paper is published).

Sorry for the inconvenience!

jjoropezav · 2024-04-12T16:19:39Z

Oh, i see the problem now

I have space available on my Google Drive account and can keep that link open without issues for at least a year meanwhile. This could be a viable solution for hosting the prebuilt hashes

i dont know if that would work

Thanks again for the help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_existing failed #10

add_existing failed #10

jjoropezav commented Apr 11, 2024 •

edited

Loading

cdiener commented Apr 12, 2024

jjoropezav commented Apr 12, 2024

add_existing failed #10

add_existing failed #10

Comments

jjoropezav commented Apr 11, 2024 • edited Loading

cdiener commented Apr 12, 2024

jjoropezav commented Apr 12, 2024

jjoropezav commented Apr 11, 2024 •

edited

Loading