-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EHR Compatibility #7
base: master
Are you sure you want to change the base?
Conversation
Addition of EHR loading capabilities
EHR Loading development
@miltondp I now see that the Docker Hub image has failed to build! I don't have experience with Docker Hub -- is there a tutorial I can read somewhere to try to fix this? |
Travis CI build is passing. I'm not sure how to see the build logs for Docker Hub CI (If that is a separate thing from Travis CI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @meliao! This is a partial review on your changes, since I couldn't finish reviewing all the files and test the code. I guess I won't be able to do that until early October, since I have an important deadline. However, I hope my comments here will be helpeful. My main concern is that several unit tests are disabled now.
.gitignore
Outdated
@@ -0,0 +1,3 @@ | |||
*/__pycache__/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should not be necessary, at least according to the standard .gitignore
- eventlet=0.21.0 | ||
- ruamel.yaml=0.15.34 | ||
- sqlalchemy | ||
- flask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to fix most of the versions here, since otherwise the Docker image will be built always with different versions of critical packages like flask
.
My current approach is to have two environment.yml
files:
- One with most of the packages with fixed major versions, like, for instance,
python==3.8
ornumpy=1.13
(note that I'm not fixing the revision part of the version). This one is for production, building the Docker image, etc. - Another one, intended for developers, with the list of needed packages and almost no versions; this one is to easily update the environment when you want to do so. For example, for a major release of ukbREST, you use this file to create a new conda environment, and then export that environment to update the first file (the one for production).
@@ -0,0 +1,159 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this file here? Looks similar than docker/start.py
, but without new methods you introduced like _setup_ehr_paths
.
@@ -0,0 +1,7 @@ | |||
eid data_provider event_dt read_2 read_3 value1 value2 value3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this is not real data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would add a proper extension to the file, probably .tsv
here?
self.basic_2sql_config = {'db_uri': POSTGRESQL_ENGINE, | ||
'sql_chunksize': None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest a minor stylish improvement in the dict creation.
self.basic_2sql_config = {'db_uri': POSTGRESQL_ENGINE, | |
'sql_chunksize': None} | |
self.basic_2sql_config = { | |
'db_uri': POSTGRESQL_ENGINE, | |
'sql_chunksize': None | |
} |
def test_load_ehr_single_path_gp(self): | ||
ehr2sql = self._get_ehr2sql(self.ehr_path, None, | ||
**self.basic_2sql_config) | ||
ehr2sql.load_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you have some assert
statements here to make sure the data was correctly loaded?
@@ -718,6 +718,7 @@ def test_sqlite_query_single_table(self): | |||
assert query_result.loc[3, 'c48_0_0'] == '2010-01-01' | |||
assert query_result.loc[4, 'c48_0_0'] == '2011-02-15' | |||
|
|||
@unittest.skip('Pheno2sql was refactored') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, ok, but the test should be telling you if the refactoring is fine :-) Here I see that basically most of the probably failing tests are disabled. I'd like to see if these tests are passing with your refactored code.
from ukbrest.common.utils.auth import PasswordHasher | ||
|
||
class AppTests(DBTest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some documentation to the class. What does this do? I guess I'm the culprit and I should have done that at the beginning :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miltondp could you help with this? Even imperfect, I would like to have a working version of the tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hakyim, I fixed the Dockerfile to make it work with the new bgen repo. So now it's possible to get a Docker image for this developmental branch from here: https://hub.docker.com/repository/docker/hakyimlab/ukbrest/builds
You should be able to get a Docker image of this branch with: docker pull hakyimlab/ukbrest:ehr-dev
And then use hakyimlab/ukbrest:ehr-dev
everywhere instead of just hakyimlab/ukbrest
However, I do not recommend merging this branch into master since several unit tests were disabled, and I can't test the changes in real data. If someone else can put the unit tests back and test it, also with real UK Biobank data, then it would be fine to merge it, but otherwise, it might break the latest
Docker image tag and prevent others from using it.
As soon as I'm done with a grant deadline (mid-July), I will look into this.
The problem seems to be related to the installation of "bgen", the indexer
to get genotype data from bgen files, since that repo was moved from
Bitbucket to another place
<https://enkre.net/cgi-bin/code/bgen/dir?ci=trunk>. A quick fix could be to
comment those lines out (as long as you are not using this feature). The
part in the Dockerfile that installs this package and should be commented
is from line "&& hg clone ***@***.***/gavinband/bgen"
to "&& ./waf install".
…On Tue, Jun 22, 2021 at 8:18 PM Hae Kyung Im ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/test_rest_api_phenotype.py
<#7 (comment)>:
> from ukbrest.common.utils.auth import PasswordHasher
+class AppTests(DBTest):
@miltondp <https://github.com/miltondp> could you help with this? Even
imperfect, I would like to have a working version of the tool.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABKFD5VPHATEHPMNWQF6H3TUER7FANCNFSM4PXWHN4A>
.
--
Milton Pividori
|
# Conflicts: # .gitignore # .travis.yml
This update adds support for the following EHR tables available from the UK Biobank:
gp_clinical
the main outpatient care datasetgp_scripts
outpatient prescription datasetgp_registrations
outpatient health insurance registrationshesin
the main hospital inpatient datasethesin_diag
hospital inpatient diagnosis datasetMajor changes to the code:
Pheno2SQL
class into aLoadSQL
base class and addingEHR2SQL
classYamlQuery
,PhenoQuery
, andEHRQuery
.load_data.py
to include a--load_ehr
command line option and function.Other notes: