Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some minimal QC to the NEO pipeline #89

Open
kltm opened this issue Apr 7, 2022 · 14 comments
Open

Add some minimal QC to the NEO pipeline #89

kltm opened this issue Apr 7, 2022 · 14 comments

Comments

@kltm
Copy link
Member

kltm commented Apr 7, 2022

From the software call, we've agreed to add some minimal QC to the NEO pipeline for this project.

@pgaudet @vanaukenk Would you mind providing a handful of example genes to check for when building NEO?

@kltm
Copy link
Member Author

kltm commented Apr 11, 2022

Work at #90

@kltm
Copy link
Member Author

kltm commented Apr 11, 2022

After some testing, we are unable to create a full data environment with the restrictions from GHA:
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
Pivoting, will treat this more like a local pipeline and run through Jenkins where I can.

kltm added a commit to geneontology/pipeline that referenced this issue Apr 12, 2022
@kltm
Copy link
Member Author

kltm commented Apr 12, 2022

Looking at doing tests in the form of
runoak --input neo.owl info GO:0022008
but:

Alternatively, we could write a direct test script with oaklib and a test list. Probably still easier to make sure that lib is in ODK and run in there.

Stub added to pipeline.

@kltm
Copy link
Member Author

kltm commented Apr 21, 2022

It looks like there is action to get oaklib into odk, which would be convenient.
Also, I think I can split the neo build into three steps (build, test, publish), which should help with the software compatibility.

@kltm
Copy link
Member Author

kltm commented Apr 25, 2022

Candidate in testing here: INCATools/ontology-development-kit#586

@pgaudet
Copy link

pgaudet commented May 10, 2022

The goal of the QC will be to test whether a number of test IDs are present at each load.

@kltm
Copy link
Member Author

kltm commented May 11, 2022

Suggestion from @cmungall to use sqlite backend.

@kltm
Copy link
Member Author

kltm commented May 13, 2022

For the tooling we want, we expect it to be added to a versioned public ODK release around June 1st (https://github.com/INCATools/ontology-development-kit/milestone/5).

@kltm
Copy link
Member Author

kltm commented Jun 28, 2022

Noting that runoak has been added to the ODK at v1.3.1, but does not seem to be functional. E.g.:

docker run --network host -it obolibrary/odkfull:v1.3.1 /bin/bash
root@moiraine:/tmp# runoak --input go-base.owl info GO:0022008
OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 8: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
Traceback (most recent call last):
  File "/usr/local/bin/runoak", line 5, in <module>
    from oaklib.cli import main
  File "/usr/local/lib/python3.10/dist-packages/oaklib/__init__.py", line 7, in <module>
    from oaklib.interfaces import BasicOntologyInterface
  File "/usr/local/lib/python3.10/dist-packages/oaklib/interfaces/__init__.py", line 6, in <module>
    from oaklib.interfaces.mapping_provider_interface import MappingProviderInterface
  File "/usr/local/lib/python3.10/dist-packages/oaklib/interfaces/mapping_provider_interface.py", line 6, in <module>
    import sssom
  File "/usr/local/lib/python3.10/dist-packages/sssom/__init__.py", line 5, in <module>
    from .util import (  # noqa:401
  File "/usr/local/lib/python3.10/dist-packages/sssom/util.py", line 28, in <module>
    import numpy as np
  File "/usr/local/lib/python3.10/dist-packages/numpy/__init__.py", line 144, in <module>
    from . import core
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/__init__.py", line 23, in <module>
    from . import multiarray
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/multiarray.py", line 10, in <module>
    from . import overrides
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py", line 6, in <module>
    from numpy.core._multiarray_umath import (
KeyboardInterrupt

@kltm
Copy link
Member Author

kltm commented Jun 28, 2022

Hm, making progress after a little update:

Unpacking containerd.io (1.6.6-1) over (1.4.9-1) ...
Preparing to unpack .../docker-ce-cli_5%3a20.10.17~3-0~ubuntu-bionic_amd64.deb ...
Unpacking docker-ce-cli (5:20.10.17~3-0~ubuntu-bionic) over (5:20.10.8~3-0~ubuntu-bionic) ...
Preparing to unpack .../docker-ce_5%3a20.10.17~3-0~ubuntu-bionic_amd64.deb ...
Unpacking docker-ce (5:20.10.17~3-0~ubuntu-bionic) over (5:20.10.8~3-0~ubuntu-bionic) ...

@kltm
Copy link
Member Author

kltm commented Jun 28, 2022

@pgaudet I think we're unlikely to get to this soon:
https://github.com/berkeleybop/bbops/issues/26
We're going to have to do updates as maintenance anyways; I think there is little point in holding up the closing of the project over this at this point. I'd vote to pull it from this project and closing this project out.

@kltm
Copy link
Member Author

kltm commented Jun 28, 2022

Trying: root@moiraine:/tmp# semsql make /tmp/neo.db
Flame out with

java.lang.OutOfMemoryError: Java heap space
**** WARNING ***
Catastrophic JVM error encountered. Application not safely interrupted. Resources may be leaked. Check the logs for more details and consider overriding `Platform.reportFatal` to capture context.
make: *** [/usr/local/lib/python3.10/dist-packages/semsql/builder/build.Makefile:65: /tmp/neo-relation-graph.tsv] Error 255

Will follow up later on--looking at Makefile, not sure how to pass parameters or what might magnitude be needed/expected.

@kltm
Copy link
Member Author

kltm commented Jun 28, 2022

JAVA_OPTS=-Xmx12G JAVA_ARGS=-Xmx12G semsql make /tmp/neo.db works to get args through (not sure which). Was able to complete after a while, so at least 12G needed for this.

With that though, runoak --input /tmp/neo.db info GO:0022008 runs super zippy with no fuss (cheers @cmungall ).
Still blocked with https://github.com/berkeleybop/bbops/issues/26 for production; will likely try with lib and script rather than trying to fix up CLI.

runoak --input sqlite:/tmp/neo.db info PR:000000001
PR:000000001 ! protein

Not always getting expected results: often not getting info or search results for things I know are in there. For example, runoak --input sqlite:/tmp/neo.db descendants GO:0098015 does not return results.

@kltm
Copy link
Member Author

kltm commented Jul 6, 2022

Noting that I load go-lego.owl and neo.owl into solr, so naturally can't query GO when just NEO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants