Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting CPU utilization? #9

Open
tfenne opened this issue Dec 21, 2018 · 0 comments
Open

Predicting CPU utilization? #9

tfenne opened this issue Dec 21, 2018 · 0 comments

Comments

@tfenne
Copy link

tfenne commented Dec 21, 2018

I'm trying to run aFC on a very large dataset - literally millions of QTLs. I've broken the work up into small chunk and am running 100s of parallel jobs on a compute cluster. The problem I'm running into is that I'm having a very hard time predicting the CPU usage of the jobs.

What I'm seeing is that many of the jobs run at ~100% of one CPU for most of their runtime. Then once in a while a bunch of jobs will spike up significantly, consuming anywhere from 300%-1200% CPU (i.e. 3-12 cores). This is causing quite a problem for me because I'm left with the choice fo either scheduling the jobs with 1 cpu each and dealing with the mayhem that ensues when a non-trivial number of jobs spike, or scheduling multiple cpus per job and watching my compute farm sit half or more idle most of the time.

I've taken a brief read through the source code and can't see any references to multi-processing, threads or parallelism, but I'm also not experienced with numpy/pandas, so it's very possible I'm missing something.

Any pointer or insight into what might be causing the CPU spikes and how to deal with them would be great appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant