Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty yaml file types (and incorrect formatting) #1424

Open
earlEBI opened this issue Sep 11, 2024 · 4 comments
Open

Empty yaml file types (and incorrect formatting) #1424

earlEBI opened this issue Sep 11, 2024 · 4 comments
Assignees

Comments

@earlEBI
Copy link

earlEBI commented Sep 11, 2024

Empty yaml file types
Found 785 yamls with empty file types - GCSTs listed in attached .txt file:
yamls-with-empty-filetypes.txt
(Unfortunately these are not all GWAS-SSF so file headers would need to be checked to determine correct file type.)

Incorrect file_type formatting
Also found several yamls with file_type ' GWAS-SSFv1.0' or ' GWAS-SSF v1.0' (with single quotation marks and beginning whitespace (eg. GCST90319314). These should be removed so it reads eg. file_type: GWAS-SSFv1.0.
(There is some variability about usage of 'GWAS-SSFv1.0' and 'GWAS-SSF v1.0' with added space. Could this also be cleaned up?)

@karatugo karatugo self-assigned this Sep 17, 2024
@ljwh2
Copy link
Contributor

ljwh2 commented Sep 19, 2024

@karatugo could this please be worked on along with the other yaml issues, thanks

@jiyue1214 jiyue1214 self-assigned this Sep 25, 2024
@ljwh2
Copy link
Contributor

ljwh2 commented Sep 25, 2024

@jiyue1214 will check how many of these have been resolved already and also deal with the quotation marks.

@jiyue1214
Copy link

Based on the studies status on 27th September 2024:

Number of Studies Study Type in YAML File
803 ''
912 'GWAS-SSFv1.0'
35,823 GWAS-SSFv1.0
16,042 non-GWAS-SSF
1 'Non-GWAS-SSF'
890 Non-GWAS-SSF
56,677 pre-GWAS-SSF

Noticeably, we have not set any limitation on the value of the file_type field.
Harmonisation queue script detects the field type by if the value of the file_type starts with GWAS-SSF or pre-GWAS-SSF. if none of them, the file_type will be set as "not_harm" automatically.

@jiyue1214
Copy link

Hi @earlEBI, I extracted the first two rows from each sumstat reformat them into "header: value" and identified their file type based on it. Here are the results. Could you please help me to check if they are correct?

Hi, @karatugo. Since there are more than 800 studies, could you please suggest any best practices for updating both the DB and meta-yaml files?

Thank you for your support and help,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants