Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plug into VICC pipelines #1

Open
cmungall opened this issue Jun 24, 2019 · 1 comment
Open

Plug into VICC pipelines #1

cmungall opened this issue Jun 24, 2019 · 1 comment

Comments

@cmungall
Copy link
Collaborator

Instructions from @ahwagner below - we want to better automate this to have a workflow or services we can run to run this regularly over VICC unnormalized.

This was done by using the same harvester routines we use in production, at https://github.com/ohsu-comp-bio/g2p-aggregator/tree/v0.12/harvester.

Specifically, the `harvest` and `convert` phases were run using the utility scripts `harvest-file-all.sh` and `convert-file-all.sh` here: https://github.com/ohsu-comp-bio/g2p-aggregator/tree/v0.12/util.

Finally, I extracted the relevant terms from the pre-normalized `.convert.json` files using jq:

`cat *.convert.json | jq '.association.phenotypes | .[]?.description' | sort -u > unnormalized_disease_terms.txt`
cmungall added a commit that referenced this issue Jun 26, 2019
@cmungall
Copy link
Collaborator Author

Initial results here: https://github.com/cmungall/neoplasmer/blob/master/scratch/vicc-results.tsv

I also updated the docker container

To re-run the analysis:

docker run -p 9055:9055 -e PORT=9055 -v $PWD:/work -w /work  --rm -ti cmungall/neoplasmer swipl -G0  -p library=/tools/prolog -l /tools/utf8.pl /tools/bin/neoplasmer -X .cache -i /data/mondo.owl -i /data/doid.owl -i /data/neoplasm-core.owl TERMFILE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant