-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: gwas catalog processing with google batch operator #12
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
project-defiant
changed the title
Szsz code cleanup
feat: google batch job for gwas_catalog processing
Jul 18, 2024
project-defiant
changed the title
feat: google batch job for gwas_catalog processing
feat: google batch job for gwas_catalog processing - harmonisation
Jul 18, 2024
project-defiant
changed the title
feat: google batch job for gwas_catalog processing - harmonisation
feat: gwas catalog processing with google batch operator
Jul 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First approach for the genetics pipeline for GWAS Catalog processing x airflow utils development
Things implemented
make test
command.docker/
and added automatic build for artifact registry for genetics_etl image based on gentropy docker image - github actions + OIDC.gwas_catalog_dag
that include:RESUME
- when one want to run the pipeline for manifests that have failed previously,CONTINUE
- when one want to run the pipeline on manifests that were not processed yet,FORCE
when one wants to rerun all manifests from scratchTo set the correct flag, update
config/config.yaml
gs://ot_orchestration
bucketot fetch-raw-sumstat-paths
ot gwas-catalog-pipeline
) - cli command that gets as input the single input manifest file and based on it's content runs the gentropy steps - currently there are two gentropy steps implemented (harmonisation and qc), other steps are in progressIOManager
andProtoPath
( with concrete implementations forGCS
andPosix
) implementations to be able to perform file system agnostic concurrent read and writes.config resolver object
based ondag name
should look for thedag config parser
and use it to read the correct configuration with some level of config validation.To resolve the first issue I want to try to split the
batch_processing_job
and all manifest_processing tasks with dependencies outside to separate package and inject the steps that run this intoPythonVirtualenvOperator
orKubernetesPodOperator
. This will also allow us to move to the cloud composer.To resolve the other issue I need to undrestand the process of curation.