Skip to content

Commit

Permalink
feat(ena-deposition): Make deposition a package instead of multiple s…
Browse files Browse the repository at this point in the history
…nakemake rules (#2976)

* Create a new package called ena_deposition, with only one config: uses defaults from the config/default.yaml and then overwrites additional config arguments from the passed config/config.yaml

* The create_project,create_sample, create_assembly, trigger_submission and upload_external_metadata functions are run in parallel threads and use stop_event to stop all threads if one fails.

* xmltodict update: Create a new class called XmlNone for xml creation functions

* Specify the log level in main 

* Create function: secure_ena_connection to check if connections to ENA are correct instead of doing this at the start of the snakefile.

* Remove snakemake dependency and run cronjob without snakemake

---------

Co-authored-by: Cornelius Roemer <[email protected]>
  • Loading branch information
anna-parker and corneliusroemer authored Oct 18, 2024
1 parent 7a83cfa commit b022f4d
Show file tree
Hide file tree
Showing 24 changed files with 391 additions and 849 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ena-submission-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
post-cleanup: 'all'
- name: Run tests
run: |
micromamba activate loculus-ena-submission
pip install -e .
python3 scripts/test_ena_submission.py
shell: micromamba-shell {0}
working-directory: ena-submission
5 changes: 4 additions & 1 deletion ena-submission/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,14 @@ RUN micromamba config set extract_threads 1 \
# Set the environment variable to activate the conda environment
ARG MAMBA_DOCKERFILE_ACTIVATE=1


ENV WEBIN_CLI_VERSION 7.3.1
USER root
RUN wget -q "https://github.com/enasequence/webin-cli/releases/download/${WEBIN_CLI_VERSION}/webin-cli-${WEBIN_CLI_VERSION}.jar" -O /package/webin-cli.jar
USER $MAMBA_USER

COPY --chown=$MAMBA_USER:$MAMBA_USER . /package

RUN ls -alht /package
RUN pip install /package

WORKDIR /package
40 changes: 19 additions & 21 deletions ena-submission/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# ENA Submission

## Snakemake Rules
## Cronjob

### get_ena_submission_list

This rule runs daily in a cron job, it calls the loculus backend (`get-released-data`), obtains a new list of sequences that are ready for submission to ENA and sends this list as a compressed json file to our slack channel. Sequences are ready for submission IF:
This script runs once daily as a kubernetes cronjob. It calls the Loculus backend (`/get-released-data`), computes a new list of sequences that are ready for submission to ENA and sends this list as a compressed json file to our slack channel. Sequences are ready for submission IFF all of the following are true:

- data in state APPROVED_FOR_RELEASE:
- data must be state "OPEN" for use
Expand All @@ -13,9 +13,9 @@ This rule runs daily in a cron job, it calls the loculus backend (`get-released-
- data is not in the `ena-submission.submission_table`
- as an extra check we discard all sequences with `ena-specific-metadata` fields

### all
## Threads

This rule runs in the ena-submission pod, it runs the following rules in parallel:
The ena_deposition package, runs the following functions in parallel (via threads):

#### trigger_submission_to_ena

Expand Down Expand Up @@ -149,10 +149,6 @@ In order to submit assemblies you will also need to install ENA's `webin-cli.jar
wget -q "https://github.com/enasequence/webin-cli/releases/download/${WEBIN_CLI_VERSION}/webin-cli-${WEBIN_CLI_VERSION}.jar" -O /package/webin-cli.jar
```

### Running snakemake

Then run snakemake using `snakemake` or `snakemake {rule}`.

## Testing

> [!WARNING]
Expand All @@ -169,7 +165,7 @@ python3 scripts/test_ena_submission.py
You can also use the `deposition_dry_run.py` script to produce the same output files/XMLs that the pipeline would produce in order to submit to ENA. This is a good test if you would like to first verify what your submission to ENA will look like. Make sure that you have the same config.yaml that will be used in production (use deploy.py to generate this). Also note that the generator can only produce output for one submission at a time.

```
python scripts/deposition_dry_run.py --log-level=DEBUG --data-to-submit=results/approved_ena_submission_list.json --mode=assembly --center-name="Yale"
python scripts/deposition_dry_run.py --log-level=DEBUG --data-to-submit=results/approved_ena_submission_list.json --mode=assembly --center-name="Yale" --config-file=config/config.yaml
```

### Testing submission locally
Expand All @@ -184,7 +180,8 @@ cd ../backend
./start_dev.sh &
cd ../ena-submission
micromamba activate loculus-ena-submission
flyway -user=postgres -password=unsecure -url=jdbc:postgresql://127.0.0.1:5432/loculus -schemas=ena-submission -locations=filesystem:./flyway/sql migrate
pip install -e .
flyway -user=postgres -password=unsecure -url=jdbc:postgresql://127.0.0.1:5432/loculus -schemas=ena_deposition_schema -locations=filesystem:./flyway/sql migrate
```

2. Submit data to the backend as test user (create group, submit and approve), e.g. using [example data](https://github.com/pathoplexus/example_data). (To test the full submission cycle with insdc accessions submit cchf example data with only 2 segments.)
Expand All @@ -211,37 +208,40 @@ curl -X 'POST' 'http://localhost:8079/groups' \
"country": "Germany"
},
"contactEmail": "[email protected]"}'
LOCULUS_ACCESSION = $(curl -X 'POST' \
LOCULUS_ACCESSION=$(curl -X 'POST' \
'http://localhost:8079/cchf/submit?groupId=1&dataUseTermsType=OPEN' \
-H 'accept: application/json' \
-H "Authorization: Bearer ${JWT}" \
-H 'Content-Type: multipart/form-data' \
-F 'metadataFile=@../../example_data/example_files/cchfv_test_metadata.tsv;type=text/tab-separated-values' \
-F 'sequenceFile=@../../example_data/example_files/cchfv_test_sequences.fasta' | jq -r '.[0].accession')
curl -X 'POST' \
'http://localhost:8079/cchf/approve-processed-data' \
curl -X 'POST' 'http://localhost:8079/cchf/approve-processed-data' \
-H 'accept: application/json' \
-H "Authorization: Bearer ${JWT}"
-H "Authorization: Bearer ${JWT}" \
-H 'Content-Type: application/json' \
-d '{"scope": "ALL"}'
```

3. Get list of sequences ready to submit to ENA, locally this will write `results/ena_submission_list.json`.

```sh
snakemake get_ena_submission_list
python scripts/get_ena_submission_list.py --config-file=config/config.yaml --output-file=results/ena_submission_list.json
```

4. Check contents and then rename to `results/approved_ena_submission_list.json`, trigger ena submission by adding entries to the submission table
4. Check contents and then rename to `results/approved_ena_submission_list.json`, trigger ena submission by adding entries to the submission table and using the `--input-file` flag

```sh
cp results/ena_submission_list.json results/approved_ena_submission_list.json
snakemake trigger_submission_to_ena_from_file
ena_deposition --config-file=config/config.yaml --input-file=results/approved_ena_submission_list.json
```

Alternatively you can upload data to the [test folder](https://github.com/pathoplexus/ena-submission/blob/main/test/approved_ena_submission_list.json) and run `snakemake trigger_submission_to_ena`.
Alternatively you can upload data to the [test folder](https://github.com/pathoplexus/ena-submission/blob/main/test/approved_ena_submission_list.json) and run:

5. Create project, sample and assembly: `snakemake results/project_created results/sample_created results/assembly_created` - you will need the credentials of the ENA test submission account for this. (You can terminate the rules after you see assembly creation has been successful, or earlier if you see errors.)
```sh
ena_deposition --config-file=config/config.yaml
```

Note that if you use data that you have not uploaded to Loculus the final step (uploading the results of ENA submission to Loculus) will fail as the accession will be unknown.

6. Note that ENA's dev server does not always finish processing and you might not receive a `gcaAccession` for your dev submissions. If you would like to test the full submission cycle on the ENA dev instance it makes sense to manually alter the gcaAccession in the database to `ERZ24784470` (a known test submission with 2 chromosomes/segments - sadly ERZ accessions are private so I do not have other test examples). You can do this after connecting via pgAdmin or connecting via the CLI:

Expand All @@ -260,8 +260,6 @@ WHERE accession = '$LOCULUS_ACCESSION';

Exit `psql` using `\q`.

7. Upload to loculus (you can run the webpage locally if you would like to see this visually), `snakemake results/assembly_created results/uploaded_external_metadata`.

If you experience issues you can look at the database locally using pgAdmin. On local instances the password is `unsecure`.

### Testing submission on a preview instance
Expand Down
188 changes: 0 additions & 188 deletions ena-submission/Snakefile

This file was deleted.

3 changes: 3 additions & 0 deletions ena-submission/config/defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,6 @@ metadata_mapping_mandatory_field_defaults:
"host health state": "not provided"
"host subject id": "not provided"
"host common name": "not provided"
db_username: postgres
db_password: unsecure
db_url: "jdbc:postgresql://127.0.0.1:5432/loculus"
2 changes: 1 addition & 1 deletion ena-submission/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ dependencies:
- jsonlines
- PyYAML
- requests
- snakemake
- unzip
- psycopg2
- slack_sdk
- xmltodict
- biopython
- pytz
15 changes: 15 additions & 0 deletions ena-submission/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Basic package config to make it installable
[project]
name = "ena_deposition"
version = "0.1.0"
requires-python = ">=3.12"

[project.scripts]
ena_deposition = "ena_deposition.__main__:run"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/ena_deposition"]
Loading

0 comments on commit b022f4d

Please sign in to comment.