Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing ashlar #191

Open
ahnsws opened this issue Apr 27, 2023 · 6 comments
Open

Parallelizing ashlar #191

ahnsws opened this issue Apr 27, 2023 · 6 comments

Comments

@ahnsws
Copy link

ahnsws commented Apr 27, 2023

Hello, I wanted to bring up the performance of ashlar. We use ashlar very heavily in our image pre-processing pipeline, but it's been pretty slow because of its single-threadedness, understandably because of memory concerns. I took a stab at parallelizing ashlar in this gist, without the pyramid step:

https://gist.github.com/ahnsws/b82ed163c773c5d841585e182825f472

I used viztracer to identify slow loops and threw a ThreadPoolExecutor context over them. I did have to place a lock over the reader, as that seemed to give bad results, as expected with multithreading. With the lock, we have exact concordance between images stitched with single-threaded vs parallelized ashlar.

For three rounds of 287 tiles, each with six channels of cycif data, single-threaded ashlar takes around 36 minutes, but with the parallelized version in the gist, it runs in three minutes (using 20 cores on our workstation). I think based on the merging step, the time to run will probably depend on ceil(n_channels / n_cores). I've also used tqdm to show progress bars in verbose mode:

Screenshot from 2023-04-27 15-30-01

There is a tradeoff here between compute time and memory usage, but because RAM is more plentiful nowadays we can speed things up by quite a bit, at least for our use case.

Thanks,
Seb

@ahnsws
Copy link
Author

ahnsws commented Apr 28, 2023

I used dask distributed to profile the program, and based on my machine (20 cores, 128 GB), the amount of memory consumed reaches around 14 GiB, which makes sense since each channel ends up being about 900 MB. I can also try including more channels than the number of available cores.

Screenshot from 2023-04-28 11-29-24

@ahnsws
Copy link
Author

ahnsws commented Apr 28, 2023

With six rounds and 35 total channels, with 287 tiles each, we see the following:

Screenshot from 2023-04-28 12-20-08

Not sure why the memory usage is logistic-like compared to before, but previously stitched channels seem to be properly gc'ed, and the total memory usage seems to be reasonable for modern computing environments.

@josenimo
Copy link

Hey @ahnsws,

Would you think that this parallelization would work for HPC runs?
I am a newbie, but trying to improve run times for large WSI images.
Thanks in advance

@ahnsws
Copy link
Author

ahnsws commented Jun 20, 2023

Hi @josenimo, it's hard to say without knowing more about your pipeline. Have you tried profiling the steps in your pipeline and identifying bottlenecks? If it is possible to trade off computation time with memory in the context of your clusters, the parallelization should work.

@josenimo
Copy link

@ahnsws, It is just mcmicro with some little changes, see josenimo/mcmicro .
ASHLAR is a bottleneck, time-wise, I know this from running various datasets.
(1) Illumination takes about 2 hours per cycle, but executor = "sge" in nextflow.config parallelizes this into different nodes wonderfully
(2) ASHLAR takes from 7 to 12 hours, with very little RAM and just one core
(3) Coreograph takes ~2 hours
(4) Segmentation and Quantification takes in total about 25 minutes per core, and because 'executor = "sge"' takes care of parallelizing each core into a different segmentation job.

Theoretically, for this parallelization I would just ask for more cores specifically for the ASHLAR step. I think asking for less time and more cores is a great tradeoff with our current HPC solution.

I will perform some tests later :)

@mdposkus
Copy link

mdposkus commented Aug 8, 2024

Hi @ahnsws,

Thank you for contributing this file! ASHLAR is also the longest step of MCMICRO for me, so this implementation would be incredibly helpful. I'm wondering what the workflow is for using the gist file to run ASHLAR. Should it be downloaded separately and called by command prompt? Can it be implemented in MCMICRO?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants