-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow benchmarks with runners beyond conbench CLI #99
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool!
buildkite/benchmark/run.py
Outdated
r_command = f""" | ||
bm_df <- arrowbench::get_package_benchmarks(); | ||
arrowbench::run( | ||
bm_df[bm_df$name %in% c({str(bm_names)[1:-1]}), ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, wow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a janky way to turn a list of strings into a comma-space separated string with each initial string quoted (e.g. ['a', 'b']
-> "'a', 'b'"
. It could be ", ".join([f"'{x}'" for x in ['a', 'b']])
, but I'm not sure that's more readable. I miss R's toString().
exit_on_failure=False, | ||
) | ||
|
||
# leave tempfile for validation if mock run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, doesn't this delete the tempfile if it's a mock run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, yeah, I'm missing a not
...how did my tests pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this went a bit weird and I can't fully explain why, but I'm switching to not all(bg.mock run ...)
because it works.
Do you mean "will require a lot of alterations to run here" for the current set of python macrobenchmarks? Or that we'll still have a large chunk of work ahead of us to be able to run our arrowbench-based benchmarks this way too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good — a few minor comments
buildkite/benchmark/run.py
Outdated
], | ||
}, | ||
"env_vars": { | ||
"PYTHONFAULTHANDLER": "1", # makes it easy to debug segmentation faults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here? Or does it get passed all the way down to the adapter and we want those seg faults boiled up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is left over from when I copied it over from the labs/benchmarks one. I think it might get passed through all the way to adapters, but if it matters at that level I'd rather set it at that level (as doing so is easy). I'll remove this.
buildkite/benchmark/run.py
Outdated
"BENCHMARKS_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading Python and R benchmarks input data from s3 for every build | ||
"ARROWBENCH_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading R benchmarks input data from s3 for every build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for arrowbench we only need one of these (ARROWBENCH_DATA_DIR
), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removing
buildkite/benchmark/run.py
Outdated
"BENCHMARKS_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading Python and R benchmarks input data from s3 for every build | ||
"ARROWBENCH_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading R benchmarks input data from s3 for every build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chunk runs all of the benchmarks together, so we need them together, yeah? Does this run the arrowbench ones a second time?
Or I guess taking a step back: what's this code chunk helpful for | doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env vars here are vestigial and unneeded (as of now).
This dict is here because adapters will live in this repo (see root
on line 80 is arrow-benchmarks-ci/adapters
. We could stick them in a different repo if we want; it doesn't really matter. As of now the only one there is a mock for testing, but eventually at least the C++ and Go ones should live there.
# NOTE: `bm.command` is the raw arrowbench name; `bm.name` is f"arrowbench/{bm.command}" | ||
# to disambiguate from labs/benchmarks versions when filtering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this distinction something we plan on coming back and getting rid of once we've transitioned this fully? If so a TODO might be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought so, but after implementation, I'd rather keep it namespaced. Even if we want to refactor the filters to include repo, in some way we really want benchmarks to be namespaced by repo, as a lot of languages are going to have a "read file" benchmark (etc.), and relying on all the repos in question to avoid name clashes with each other is not a good way to manage.
It was written intended to imply to adapter benchmarks like C++ (mostly; it's still getting metadata through the legacy runner) and Go (a bit), but not arrowbench or Python benchmarks. arrowbench should be (🤞) good to go shortly (though we will have to clean up |
feaf9cc
to
681a8d5
Compare
086025f
to
4fcd199
Compare
Ok! I got a working test build running: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2224#01867bb0-78ba-4337-bbd3-b4815be4a7c1 that runs One note on that build is that running through arrowbench directly does log a lot less; in particular it doesn't log the results. If we want these there, I think the right way to do so is to change the log level for benchclients in benchconnect to DEBUG, which will show all the JSON sent in requests. TODO: Revert |
buildkite/benchmark/run.py
Outdated
arrowbench::install_pipx(); | ||
arrowbench::install_benchconnect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. I'm not sure if this is the best way to install benchconnect in this context—there are already a ton of environments in this repo, and we could just add it to the relevant one, if we could figure out which that is—but this works.
If we care about speed, it's probably worth switching from conda to mamba (or at least the mamba solver); we're currently repeatedly solving environments, and it's not fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we care about speed, it's probably worth switching from conda to mamba (or at least the mamba solver); we're currently repeatedly solving environments, and it's not fast.
Would you mind making an issue in this repo for that? Agreed that's probably the right way (but we also probably shouldn't do this just yet unless it's super quick + easy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not Path(self.root).exists(): | ||
self.executor.execute_command(f"git clone {self.repo}") | ||
|
||
self.executor.execute_command( | ||
f"git fetch && git checkout {self.branch}", self.root | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made cloning conditional, because it was failing when labs/benchmarks and arrowbench both cloned arrowbench, specifically by complaining the directory already exists. We could rm -rf
it and re-clone, but given it then fetches anyway, I'm pretty sure it'd be duplicative.
IMO we should log those requests — I'm fine bumping that logging level like you suggest. Normally I would say something like that is overkill, but for a variety of other systems reasons I actually end up going to those logged output values pretty frequently (e.g. when we were hitting a bunch of 50xs recently, I went to the buildkite logs to get some ids from there instead of needing to get them through the list api that was 50xing). At some point we will probably get to a point where it's not needed anymore, but for now as we're iterating on this system (and in other places) it's quite nice to have them |
This looks good — I'm fine to merge it. @ElenaHenderson if you had a sec to look through I would be super grateful as always (or if you're busy or would rather: we can merge and then post-hoc review). |
conbench/conbench#775 (will be quick) |
@@ -429,7 +429,6 @@ def run_benchmark_groups(self, benchmark_groups: List[BenchmarkGroup]) -> None: | |||
n_iter = 3L, | |||
drop_caches = TRUE, | |||
publish = TRUE, | |||
run_id = '{os.getenv('RUN_ID')}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So unfortunately this isn't going to work. The passing is fine, but the run ID it generates isn't, because the way abci generates run IDs is with the buildkite build, not the Repo
the benchmarks are in, so arrowbench fails because benchconnect tries to start and finish a run that other Repo
s are using. We could generate an ID here and pass it through, though that's no different than letting arrowbench (really benchconnect) handle it. I've made #104 to come back and change this; this is going to keep biting us if we want to, say, close a run for an adapter with why it failed.
Ok! So! Things are working for real now; before it looked like there was a logging problem, but it was really installation and run ID uniqueness problems (which are now fixed). Build with full logging running labs/benchmarks, arrowbench, and adapters benchmarks: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2265#01869a5a-db70-4c2e-94a2-f6048538d346 |
Closes #97.
Relaxes the assumption that all benchmarks will be run with the
conbench
CLI. Also relaxes the assumption that a benchmarkable repo will always have only one benchmarks repo. Adds changes to allow running via arrowbench or adapters.Notably does not yet turn on arrowbench (or adapter) benchmarks by changing
config.py
, but that change is all that should be required to do so. There are some differences between running via arrowbench instead of labs/benchmarks, though, specifically cache clearing (which voltrondata-labs/arrowbench#126 will fix) andtags["cpu_count"]
, which labs/benchmarks does not populate, but arrowbench does, so we'll need to clean up the db to unbreak histories. Full details: voltrondata-labs/benchmarks#130Generally tries to change as little as possible so existing things keep working.
Specific changes:
run.py
:BenchmarkGroupsRunner
class out of theRun
class. Moves the existing functionality to aConbenchBenchmarkGroupsRunner
, but also implementsArrowbenchBenchmarkGroupsRunner
andAdapterBenchmarkGroupsRunner
.BenchmarkGroup
will not represent each neatly in the db anymore.run_id
, because they will open and close the run, and other results attempting to submit to the same run will fail to post.command
and adds optionalname
so it does not need to be parsed out ofcommand
. Documents schema. All new fields are optional and everything defaults to the conbench runner, so existing metadata will continue to function without changes.CommandExecutor
class out ofRun
so it can be injected into the groups runner class.repos_with_benchmark_groups
adapters
directory withrequirements.txt
,benchmarks.json
metadata, andmock-adapter.py
. For now, this is all just the stuff we'll need to run real adapter here, but no real benchmarks, as all of those will require a lot of alterations to run here. The new stuff here will likely need some light tweaks (e.g. ensure dependencies get installed correctly), but this structure shows we can run adapters from here.