Cache: reduce sleep time #504

matthiasdiener · 2021-07-15T15:43:49Z

pyopencl/cache.py

inducer · 2021-07-22T11:27:10Z

pyopencl/cache.py


                attempts += 1

-                if attempts > 10:
+                if attempts % (10/wait_time_seconds) == 0:


That won't work reliably. / unconditionally has a floating point result, and comparing floating point numbers to zero is unreliable.

That won't work reliably. / unconditionally has a floating point result, and comparing floating point numbers to zero is unreliable.

4d13b82 should fix this

yxliang01 · 2021-07-23T10:37:33Z

Actually, what about don't do polling at all, i.e. no sleep. There's a lot more efficient system calls available like inotify and seemingly good quality python package watchdog (never use myself though) to do this cross-platform.

inducer · 2021-07-23T12:49:52Z

Thanks for the suggestion. IMO, anytime there is contention for this lock, you're running your program poorly. (I.e. this should always be avoidable, e.g. by setting a private cache folder.) As such, I'm not very inclined to optimize that case.

matthiasdiener · 2021-12-10T19:50:40Z

Thanks for the suggestion. IMO, anytime there is contention for this lock, you're running your program poorly. (I.e. this should always be avoidable, e.g. by setting a private cache folder.) As such, I'm not very inclined to optimize that case.

A bit of background on this PR: We keep seeing this timeout a lot when running multiple MPI ranks on a single node, which is a difficult situation to set private cache folders (especially when running interactively).

inducer · 2021-12-12T19:02:16Z

pyopencl/cache.py

@@ -88,18 +88,26 @@ def __init__(self, cleanup_m, cache_dir):
                except OSError:
                    pass

+                wait_time_seconds = 0.05


What's the motivation for this change? It seems really short. If you were running this (by mistake, say) on 1000 nodes sharing a cache directory, I think the resulting thrashing would not be a good thing.

The specific value was taken from https://github.com/tox-dev/py-filelock/blob/a6c8fabc4192fa7a4ae19b1875ee842ec5eb4f61/src/filelock/_api.py#L113. I can't really predict what the result of this with thousands of nodes would be, but we already keep on hitting the 1 second timeout when running with 2 ranks on a single node, which I guess indicates that the existing polling time would cause minutes-long waiting times when running with thousands of ranks.

I can see that argument. Could you add a comment justifying this choice of time-out value? (making reference to the single-node interactive "test" use case)

Added in 387011e

matthiasdiener added 2 commits July 15, 2021 10:43

Cache: reduce sleep time

48d41a3

Update cache.py

aa708a2

inducer reviewed Jul 15, 2021

View reviewed changes

pyopencl/cache.py Show resolved Hide resolved

print warnings fix

8ab383a

matthiasdiener requested a review from inducer July 21, 2021 16:29

inducer reviewed Jul 22, 2021

View reviewed changes

matthiasdiener added 2 commits December 10, 2021 12:03

Merge branch 'main' into patch-1

e08031d

restructure

4d13b82

matthiasdiener requested a review from inducer December 10, 2021 18:18

inducer reviewed Dec 12, 2021

View reviewed changes

matthiasdiener added 2 commits December 13, 2021 05:53

Merge branch 'main' into patch-1

2a9212a

add comment about our choice for the timeout

387011e

matthiasdiener requested a review from inducer December 13, 2021 12:18

inducer merged commit 7204c0a into inducer:main Dec 13, 2021

matthiasdiener deleted the patch-1 branch December 13, 2021 20:35

matthiasdiener mentioned this pull request Feb 25, 2022

restructure lock file timeout inducer/pytools#121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache: reduce sleep time #504

Cache: reduce sleep time #504

matthiasdiener commented Jul 15, 2021 •

edited

Loading

inducer Jul 22, 2021

matthiasdiener Dec 10, 2021

yxliang01 commented Jul 23, 2021

inducer commented Jul 23, 2021 via email

matthiasdiener commented Dec 10, 2021

inducer Dec 12, 2021

matthiasdiener Dec 12, 2021

inducer Dec 12, 2021

matthiasdiener Dec 13, 2021

Cache: reduce sleep time #504

Cache: reduce sleep time #504

Conversation

matthiasdiener commented Jul 15, 2021 • edited Loading

inducer Jul 22, 2021

Choose a reason for hiding this comment

matthiasdiener Dec 10, 2021

Choose a reason for hiding this comment

yxliang01 commented Jul 23, 2021

inducer commented Jul 23, 2021 via email

matthiasdiener commented Dec 10, 2021

inducer Dec 12, 2021

Choose a reason for hiding this comment

matthiasdiener Dec 12, 2021

Choose a reason for hiding this comment

inducer Dec 12, 2021

Choose a reason for hiding this comment

matthiasdiener Dec 13, 2021

Choose a reason for hiding this comment

matthiasdiener commented Jul 15, 2021 •

edited

Loading