Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing one sample in filtered samples (CH is 1 less) #11079

Open
alisman opened this issue Oct 15, 2024 · 5 comments
Open

Missing one sample in filtered samples (CH is 1 less) #11079

alisman opened this issue Oct 15, 2024 · 5 comments
Assignees
Labels

Comments

@alisman
Copy link
Contributor

alisman commented Oct 15, 2024

curl 'http://localhost:8082/api/column-store/filtered-samples/fetch'           -H 'accept: application/json, text/plain, */*'           -H 'accept-language: en-US,en;q=0.9'           -H 'cache-control: no-cache'           -H 'content-type: application/json'           -H 'cookie: _ga_ET18FDC3P1=GS1.1.1727902893.87.0.1727902893.0.0.0; _gid=GA1.2.1570078648.1728481898; _ga_CKJ2CEEFD8=GS1.1.1728589307.172.1.1728589613.0.0.0; _ga_5260NDGD6Z=GS1.1.1728612388.318.1.1728612389.0.0.0; _gat_gtag_UA_17134933_2=1; _ga=GA1.1.1260093286.1710808634; _ga_334HHWHCPJ=GS1.1.1728647421.32.1.1728647514.0.0.0'           -H 'pragma: no-cache'           -H 'priority: u=1, i'            -H 'sec-ch-ua: "Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"'           -H 'sec-ch-ua-mobile: ?0'           -H 'sec-ch-ua-platform: "macOS"'           -H 'sec-fetch-dest: empty'           -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'           --data-raw '{"studyIds":["brca_tcga_gdc"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"includeDriver":true,"includeSomatic":true,"includeUnknownTier":true,"includeGermline":true,"includeUnknownStatus":true,"includeUnknownOncogenicity":true,"includeVUS":true},"genomicProfiles":[["rna_seq_mrna"],["mutations"]]}';
@onursumer
Copy link
Member

Missing sample is TCGA-AO-A1KO-01

{
    "uniqueSampleKey": "VENHQS1BTy1BMUtPLTAxOmJyY2FfdGNnYV9nZGM",
    "uniquePatientKey": "VENHQS1BTy1BMUtPOmJyY2FfdGNnYV9nZGM",
    "sampleId": "TCGA-AO-A1KO-01",
    "patientId": "TCGA-AO-A1KO",
    "studyId": "brca_tcga_gdc"
}

@onursumer
Copy link
Member

The sample TCGA-AO-A1KO-01 doesn't have a mutation profile for this study, so it is excluded by the study view filter.

To verify run the SQL query below.

SELECT
    sp.sample_id as sampleInternalId,
    sd.sample_stable_id as sampleStableId,
    sd.sample_unique_id as sampleUniqueId,
    gp.stable_id as geneticProfile
FROM cgds_public_v5.sample_profile sp
    JOIN cgds_public_v5.sample_derived sd on sp.sample_id=sd.internal_id
    JOIN cgds_public_v5.genetic_profile gp on sp.genetic_profile_id=gp.genetic_profile_id
WHERE sd.sample_stable_id='TCGA-AO-A1KO-01' AND sd.cancer_study_identifier='brca_tcga_gdc'

image

Clickhouse SQL implementation is applying AND logic for the given genomic profiles.

<if test="studyViewFilterHelper.studyViewFilter.genomicProfiles != null and !studyViewFilterHelper.studyViewFilter.genomicProfiles.isEmpty()">
INTERSECT
SELECT * FROM (
<foreach item="ANDGroup" collection="studyViewFilterHelper.studyViewFilter.genomicProfiles" separator="INTERSECT">
SELECT sample_derived.sample_unique_id
FROM sample_profile
JOIN genetic_profile gp ON sample_profile.genetic_profile_id = gp.genetic_profile_id
JOIN cancer_study cs ON gp.cancer_study_id = cs.cancer_study_id
JOIN sample_derived on sample_profile.sample_id = sample_derived.internal_id
<where>
sample_derived.cancer_study_identifier IN
<foreach item="studyId" collection="studyViewFilterHelper.studyViewFilter.studyIds" open="(" separator="," close=")">
#{studyId}
</foreach>
AND
<foreach item="genomicProfileId" collection="ANDGroup" open="(" separator="OR" close=")">
gp.stable_id LIKE '%_${genomicProfileId}'
</foreach>
</where>
</foreach>
)
</if>

Legacy SQL might be applying OR logic. Need to investigate further to confirm.

@onursumer
Copy link
Member

Actually, legacy implementation is also applying AND logic, but it's getting the profile information from the gene panel.

genePanelData.forEach(datum -> {
if (datum.getProfiled() && profileMap.containsKey(datum.getMolecularProfileId())) {
SampleIdentifier sampleIdentifier =
studyViewFilterUtil.buildSampleIdentifier(datum.getStudyId(), datum.getSampleId());
filteredSampleIdentifiers.add(sampleIdentifier);
}
});

And according to the gene panel the sample TCGA-AO-A1KO-01 has the mutations genomic profile.

image

@alisman
Copy link
Contributor Author

alisman commented Oct 21, 2024

@onursumer i don't really understand how gene panel can be used because, unless i'm totally mistaken, there is no relation from gene panel to sample. gene panel only says what genes are profiled by a given genetic_profile? so i guess the question is, how is the above genePanelData derived?

@onursumer
Copy link
Member

genePanelData.forEach(datum -> {
if (datum.getProfiled() && profileMap.containsKey(datum.getMolecularProfileId())) {
SampleIdentifier sampleIdentifier =
studyViewFilterUtil.buildSampleIdentifier(datum.getStudyId(), datum.getSampleId());

Here we get the sample id from gene panel datum by datum.getSampleId(). Not sure how we integrate sample id to gene panel data but here is the related class member.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants