-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to drop disk caches before running a benchmark #134
Conversation
R/run.R
Outdated
@@ -441,10 +451,12 @@ global_setup <- function(lib_path = NULL, cpu_count = NULL, mem_alloc = NULL, | |||
#' @param name Benchmark name, i.e. `bm$name` | |||
#' @param params Named list of parameters for the individual run, i.e. the case | |||
#' @param cpu_count Number of CPUs allocated | |||
#' @param drop_caches Attempt to drop the disk cache before each iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the [very weak] opinion that we provide the multiple-choice of not dropping, dropping between cases, or dropping between iterations. I really don't know enough about most benchmarks to know which one would produce the most insightful results, and at least then we'll be able to try something for a while and react later.
Unless this would take a lot of time to implement (then we could add support when it's necessary)
@@ -222,7 +223,7 @@ run_one <- function(bm, | |||
all_params <- list(...) | |||
|
|||
# separate the global parameters, and make sure only those that are specified remain | |||
global_param_names <- c("lib_path", "cpu_count", "mem_alloc") | |||
global_param_names <- c("lib_path", "cpu_count", "mem_alloc", "drop_caches") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem like it's worth breaking [histories] now
Yeah, once we do conbench/conbench#565 then we can be a little more liberal with breaking histories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of use case is one of the biggest for that ticket. It sounds ridiculously liberal, but in cases like this it will help us get out of the trap of "Well we can't update this code cause it'll break history and we need a transition there and and and"
I think we should implement this at both the case and the iteration level (though this can be exclusive to one layer or the other or off totally — I can't foresee a circumstance we'll need to clear the cache both at the case level and at the iteration level!). For today we want to mimic what we are already running (not just for history, but also there is benefit in matching what Python is doing; and it will defer having to figure out the answer to warmup run differences). In the (hopefully nearterm!) future, I would love to toggle this (in both R and Python) to be at the case level experiment with a few runs + investigate what would be needed to support running these like this generally. I suspect this will include some expansion of what we're measuring (mean, min, max, median, cf #640) — that's great, we should do that at some point, but having the option to run a real experiment with the cache clearing only at the case level, analyze the data, make test cases for changes or expansion to our metrics would be really fantastic (as opposed to needing to implement all of them right now as quickly as possible to unblock this).
I was under the impression that we did drop caches for both Python and R. Looking at labs/benchmarks and the json there: https://github.com/voltrondata-labs/benchmarks/blob/5ea34d76951be9a323683344c5233310eb867908/benchmarks.json#L9 Which should trigger for Python:https://github.com/conbench/conbench/blob/175bc404b2f39f1518efef8e33a20848b4c4bac5/conbench/runner.py#L350 and for R: https://github.com/voltrondata-labs/benchmarks/blob/5ea34d76951be9a323683344c5233310eb867908/benchmarks/_benchmark.py#L253 That this is confusing (and spans as much of our stack as it does!) is exactly the kind of cleanup that I'm looking forward to with our use-the-arrowbench-runner-to-run-arrowbench project that this is part of 😄! |
@@ -17,7 +17,7 @@ | |||
#' a range of parameter combinations is handled by the runner, not the functions | |||
#' in the benchmark object. | |||
#' | |||
#' @section Parametrizing benchmarks: | |||
#' @section Parameterizing benchmarks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently "parametrizing" is allowable but more British? And I'd already written "parameterizing" elsewhere, so I changed this for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL I just assumed I mis-spelled it 🙈
@@ -222,7 +223,7 @@ run_one <- function(bm, | |||
all_params <- list(...) | |||
|
|||
# separate the global parameters, and make sure only those that are specified remain | |||
global_param_names <- c("lib_path", "cpu_count", "mem_alloc") | |||
global_param_names <- c("lib_path", "cpu_count", "mem_alloc", "drop_caches") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of use case is one of the biggest for that ticket. It sounds ridiculously liberal, but in cases like this it will help us get out of the trap of "Well we can't update this code cause it'll break history and we need a transition there and and and"
Oh interesting. And that makes me realize that even though we're using
I think for now I can switch it from logical to categorical to cover all three cases (no clearing, at the case level, at the iteration level) reasonably easily. Longer-term, I'm increasingly convinced cache clearing should be part of the benchmark, i.e. if you want it, call (parameterized, if you like) |
Yeah, that sounds right. In an ideal world we would supply a micro-binary that just does this cleaning that has no install drama and just works (so I think it simply cannot be Python, unfortunately). But for now copy | pasting these shell commands isn't so bad. Parameterizing this will allow us to do the testing we need to gain confidence in what the right setting is (for our benchmarks). My heart + gut says it should be dropped only per case, and that we should do good + smart things about inter-iteration differences that might cause. |
To be clear, I'm advocating still keeping a |
nods yeah if that's the case let's make something that doesn't have the drama needed to get that working (anything we've tried so far with Python ends up having installation scripts | functions that are longer than the body of the copy-pastaed version!). I'm absolutely no fan of copy-pasta; but we can't keep bashing our heads on install drama that blocks us from getting done what we need to. But also, once we have this here we'll be fine for a little while with python and R sharing the same thing for a bit. We can defer building the cache dropper in go or rust or something that gives us better isolation + easy distributability for a bit (and possibly forever, depending on what we're benchmarking if it needs that or if that would even be helpful at all) |
Ok, refactored so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -441,10 +460,13 @@ global_setup <- function(lib_path = NULL, cpu_count = NULL, mem_alloc = NULL, | |||
#' @param name Benchmark name, i.e. `bm$name` | |||
#' @param params Named list of parameters for the individual run, i.e. the case | |||
#' @param cpu_count Number of CPUs allocated | |||
#' @param drop_caches Attempt to drop the disk cache before each case or iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth inheriting this from measure
? Or the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up making the measure
one a bit more precise, since the values have different implications there (i.e. "iteration"
drops once, and both "case"
and NULL
result in no dropping)
Alright, ensured global options like Also discovered and patched a related bug breaking |
Closes #126. Implementation is based on that from the conbench runner, meaning it will attempt to drop disk caches, but if it fails it will a. not error and b. not try again (in Python, by setting an attribute, which I translated to R as an
option()
).Resetting is in
run_one
, so ifdrop_caches
is set toTRUE
, they will be cleared once per case (not per benchmark-code or iteration). This is different from the conbench runner, which drops caches in each iteration. I can change this here if we like (though it will look more like how we handle profiling and less like other global options), but from previous discussion this seems like a thing we haven't quite figured out the best way to handle yet, and per-case is very much on the table so we could ignore warmup iterations. If we want to enable both and move beyond boolean, we could do that too. Opinions welcome!The
drop_caches
global option is inserted inresult.optional_benchmark_info.options
, but not inresult.tags
, so this will not break any histories. Arguably it should, but presentlycpu_count
is the only global option that is (although the way we're running right now it's always null anyway);lib_path
andmem_alloc
are not. Because we're not actually dropping caches anywhere (Python, R), it doesn't seem like it's worth breaking this now. If we want to set any of these global options differently (or run more than one value), we should make a separate story to refactor. Or maybe just make them not-global parameters of the particular benchmarks where they matter;options
andglobal_options
are only runner things, not Conbench things.