Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on Project #1

Open
upadhyd opened this issue Feb 22, 2019 · 2 comments
Open

Questions on Project #1

upadhyd opened this issue Feb 22, 2019 · 2 comments

Comments

@upadhyd
Copy link

upadhyd commented Feb 22, 2019

@invitae-viv , Had a few questions regarding the project. They are as follows:

  1. Would the dataset contain duplicates? If not, which fields uniquely identify each record?
  2. Length(Gene) - Are there any constraints on the length of the Gene?
  3. NUCLEOTIDE_CHANGE - Can this field contain array of nucliotides. If so, should they displayed on the new line in the table?
  4. In the tsv file, I see that for some of the records, gene is null (eg - line 9) of the file?
    • Assumption - In this case, the gene would be the same as the one in previous record. In this case RTP5. Is the assumption correct?

RTP5 CM000664.1:g.242812080_243048760del,NC_000002.11:g.242812080_243048760del236681 CM000664.1,NC_000002.11 not provided Not Provided ClinVar 2017-09-14 https://www.ncbi.nlm.nih.gov/clinvar/RCV000161254 GRCh37 2 242812080 243048760 - - NC_000002.11 NULL NULL
CM000665.1:g.65191847_65215804del,NC_000003.12:g.65206172_65230129del23958,NC_000003.11:g.65191847_65215804del23958 CM000665.1,NC_000003.12,NC_000003.11 not provided Not Provided ClinVar 2017-09-14 https://www.ncbi.nlm.nih.gov/clinvar/RCV000161287 GRCh37 3 65191847 65215804 - - NC_000003.11 NULL NULL

  1. Language for backend. --> I am much familiar with Java/Spring framework and plan to design and develop the api in the same. If time permits, I will also replicate the same apis with python+flask or python+django through there is a bit of a learning curve for these on my end. Would that be acceptable?
@tobyberster
Copy link

@upadhyd Here are some answers to your questions:

  1. There shouldn't be any duplicates in the dataset.
  2. No constraints on the max length of the gene except that it needs to exist (can not be null) -> 4.
  3. The screenshot provides some guidance on what we are looking for in terms of NUCLEOTIDE_CHANGE.
  4. If a gene is null, you may skip the row entirely
  5. Feel free to chose any language you desire and feel most comfortable with.

@upadhyd
Copy link
Author

upadhyd commented Feb 23, 2019

@tobyberster, thanks for the responses. Will keep the issue open for further clarification if needed as I implement the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants