Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some studies without harmonised files in 36635386 #1420

Open
2 tasks
eks-ebi opened this issue Sep 2, 2024 · 3 comments
Open
2 tasks

Some studies without harmonised files in 36635386 #1420

eks-ebi opened this issue Sep 2, 2024 · 3 comments
Assignees

Comments

@eks-ebi
Copy link

eks-ebi commented Sep 2, 2024

This issue was raised by a user query.

Some studies in this publication have harmonised files, e.g.
http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90200001-GCST90201000/GCST90200266/

But others do not, e.g.
http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90200001-GCST90201000/GCST90200267/

I can't see any obvious reason why one would be harmonised and the other not, so maybe there was an error in the harmonisation process?

Action items:

  • investigate why some studies in this publication do not have harmonised files
  • re-generate harmonised files for any that are missing
@jiyue1214
Copy link

Hello, it is true that some studies failed in the harmonization pipeline and the results did not show up on the FTP. However, this file is not such a case.

In another scenario, our pipeline runs 1600 studies per day using 4 jobs (400 studies per job). This particular publication is a large paper containing over 4000 studies, which were divided into at least 10 jobs to run. The reason you see some studies harmonized while others are not is because one of them may have been interrupted and did not succeed.

I have a list of studies with files that need to be fixed and cannot finish the pipeline. The study you listed is not on my list.

What i going to do: requeue them to be harmonisation pipeline.

@jiyue1214
Copy link

For those studies that failed the harmonisation without any reason, I re-queue them to be re-harmonised.

@jiyue1214
Copy link

For the PMID_36635386 there are a total of 4443 studies and 4140 already harmonised and 303 have not been harmonised until 24th Sep.

One study GCST90200809 cannot be harmonised because Error: column length (11) does not match header length (10). GCST90202482 need some invalidate cells, need to look in detail

Others 301 failed because of the time limit issue in the qualitycontrolqc. I will change the wall time for this step and rerun them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants