Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clinical Data Counts Issue (Patients are incorrectly counted because case insensitivity) #11086

Open
haynescd opened this issue Oct 16, 2024 · 1 comment

Comments

@haynescd
Copy link
Collaborator

Study where issue was found
https://www.cbioportal.org/study/summary?id=prad_organoids_msk_2022

Specifically the issue was found while looking at the Pie Chart Ethnicity

Patients Related to issue

  • VCAP
  • VCaP
  • LNCAP
  • LNCaP
  • 22RV1
  • 22Rv1
  • PC3
  • PC-3

The Patients above seem to be merged together when counting clinical-data-counts even though they have unique patient internal ids.

The Ethnicity chart shows 5 Caucasians, but there are a total of 6

Found in the RFC80 Effort

SQL used

SELECT ''                                                   AS sample_unique_id,
       concat(cs.cancer_study_identifier, '_', p.stable_id) AS patient_unique_id,
       p.internal_id,
       cam.attr_id                                          AS attribute_name,
       ifNull(clinpat.attr_value, '')                                   AS attribute_value,
       cs.cancer_study_identifier                           AS cancer_study_identifier,
       'patient'                                            AS type
FROM patient AS p
    INNER JOIN cancer_study AS cs ON p.cancer_study_id = cs.cancer_study_id
    FULL OUTER JOIN clinical_attribute_meta AS cam ON cs.cancer_study_id = cam.cancer_study_id
    FULL OUTER JOIN clinical_patient AS clinpat ON (p.internal_id = clinpat.internal_id) AND (clinpat.attr_id = cam.attr_id)
WHERE cam.patient_attribute = 1 and cs.cancer_study_identifier = 'prad_organoids_msk_2022' and attribute_name = 'ETHNICITY';
@onursumer
Copy link
Member

The unique internal ids do not matter to legacy implementation because we are using stable patient ids in the SQL.

return clinicalDataMapper.fetchPatientClinicalDataCounts(patientStudyIds,
patients.stream().map(Patient::getStableId).collect(Collectors.toList()), attributeIds, projection);

prad_organoids_msk_2022_patient_ids

The below SQL works (except for PC-3) because mysql is case insensitive

patient.STABLE_ID IN
<foreach item="item" collection="patientIds" open="(" separator="," close=")">
#{item}
</foreach>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants