Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample profile count disparity #11104

Open
alisman opened this issue Oct 22, 2024 · 0 comments
Open

Sample profile count disparity #11104

alisman opened this issue Oct 22, 2024 · 0 comments
Labels

Comments

@alisman
Copy link
Contributor

alisman commented Oct 22, 2024

Legacy uses gene panel data. When there is NO gene panel (WES?), we get a row per sample because of the join even though both sampleid and panelid will be null! Perhaps we always get a row per sample? And so it doesn't limit the returned set. This sometimes differs from the query of the sample_profile table, which is a subset.

    SELECT sample_id, sample_profile.panel_id
    FROM sample
        INNER JOIN patient ON sample.patient_id = patient.internal_id
    INNER JOIN cancer_study ON patient.cancer_study_id = cancer_study.cancer_study_id
    LEFT JOIN genetic_profile ON cancer_study.cancer_study_id = genetic_profile.cancer_study_id
    LEFT JOIN sample_profile ON sample_profile.genetic_profile_id = genetic_profile.genetic_profile_id
                                    AND sample.internal_id = sample_profile.sample_id
    LEFT JOIN gene_panel ON sample_profile.panel_id = gene_panel.internal_id
    WHERE genetic_profile.stable_id='brain_cptac_2020_mutations'

For example:

SELECT * from sample_profile
    JOIN genetic_profile gp on sample_profile.genetic_profile_id = gp.genetic_profile_id
    WHERE gp.stable_id='brain_cptac_2020_mutations'

The question is, which is correct as a measure of whether a given sample is profiled? The legacy discards any information in sample_profile. What i don't understand is how there could EVER by a subset according to the query above? Since it's a left join it would seem there will always be a row per sample whether or not there is a matching gene panel. And yet some profiles can return subset.

@alisman alisman added the RFC80 label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant