Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data issue for study: coadread_cass_2020 #2054

Open
saxenanurag opened this issue Aug 26, 2024 · 3 comments
Open

Data issue for study: coadread_cass_2020 #2054

saxenanurag opened this issue Aug 26, 2024 · 3 comments

Comments

@saxenanurag
Copy link

I am trying to import coadread_cass_2020 into a private installation of cbioportal and getting this error:

ERROR: data_clinical_patient.txt: lines [74, 108]: columns [18, 19]: Value of numeric attribute is not a real number; values encountered: ['<0.5', '<2.0']

I downloaded the files directly from cbioportal.org as well and got the same error.

@alexsigaras
Copy link
Member

Thanks @saxenanurag. I can confirm we are having the same issue on our end.
This refers to values of the columns CEA Biomarker and CA19-9 Antigen.

The issue is that <0.5 is not a NUMBER but a STRING instead and could be changed at line 3 at the respective columns.

Looking at https://www.cbioportal.org/study/clinicalData?id=coadread_cass_2020 it seems that the data are imported with the < and > symbols so perhaps a fix would be to change the data_clinical_patient.txt problematic definitions from NUMBER to STRING.

Kindly let us know if you would like us to open a PR instead.

@rmadupuri
Copy link
Collaborator

rmadupuri commented Sep 6, 2024

Hi @saxenanurag @alexsigaras, thank you for bringing this issue to our attention. I have updated the validator to accept >, < and float values as numbers (see PR #58). However, this update will be available in the next release. In the meantime, please feel free to update the column on your side to string to bypass the validator check.

@alexsigaras
Copy link
Member

Thank you for your response @rmadupuri . Indeed as suggested changing NUMBER to STRING solves the issue but your solution above is a much better approach. I suggest keeping this open until the PR is part of a release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants