Predicting CPU utilization? #9

tfenne · 2018-12-21T14:22:57Z

I'm trying to run aFC on a very large dataset - literally millions of QTLs. I've broken the work up into small chunk and am running 100s of parallel jobs on a compute cluster. The problem I'm running into is that I'm having a very hard time predicting the CPU usage of the jobs.

What I'm seeing is that many of the jobs run at ~100% of one CPU for most of their runtime. Then once in a while a bunch of jobs will spike up significantly, consuming anywhere from 300%-1200% CPU (i.e. 3-12 cores). This is causing quite a problem for me because I'm left with the choice fo either scheduling the jobs with 1 cpu each and dealing with the mayhem that ensues when a non-trivial number of jobs spike, or scheduling multiple cpus per job and watching my compute farm sit half or more idle most of the time.

I've taken a brief read through the source code and can't see any references to multi-processing, threads or parallelism, but I'm also not experienced with numpy/pandas, so it's very possible I'm missing something.

Any pointer or insight into what might be causing the CPU spikes and how to deal with them would be great appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting CPU utilization? #9

Predicting CPU utilization? #9

tfenne commented Dec 21, 2018

Predicting CPU utilization? #9

Predicting CPU utilization? #9

Comments

tfenne commented Dec 21, 2018