Allow benchmarks with runners beyond conbench CLI #99

alistaire47 · 2023-02-14T19:45:37Z

Closes #97.

Relaxes the assumption that all benchmarks will be run with the conbench CLI. Also relaxes the assumption that a benchmarkable repo will always have only one benchmarks repo. Adds changes to allow running via arrowbench or adapters.

Notably does not yet turn on arrowbench (or adapter) benchmarks by changing config.py, but that change is all that should be required to do so. There are some differences between running via arrowbench instead of labs/benchmarks, though, specifically cache clearing (which voltrondata-labs/arrowbench#126 will fix) and tags["cpu_count"], which labs/benchmarks does not populate, but arrowbench does, so we'll need to clean up the db to unbreak histories. Full details: voltrondata-labs/benchmarks#130

Generally tries to change as little as possible so existing things keep working.

Specific changes:

Changes run.py:
1. Separates abstract BenchmarkGroupsRunner class out of the Run class. Moves the existing functionality to a ConbenchBenchmarkGroupsRunner, but also implements ArrowbenchBenchmarkGroupsRunner and AdapterBenchmarkGroupsRunner.
  1. arrowbench running notably calls arrowbench once for all benchmarks instead of once each, which allows arrowbench to manage the run. This has two implications:
    1. The memory monitoring in BenchmarkGroup will not represent each neatly in the db anymore.
    2. arrowbench benchmarks cannot be run in the same run as other benchmarks, because they need their own new run_id, because they will open and close the run, and other results attempting to submit to the same run will fail to post.
2. Allows runner to be switched by benchmark JSON (the one that lives in the benchmark repo). Also loosens assumptions about command and adds optional name so it does not need to be parsed out of command. Documents schema. All new fields are optional and everything defaults to the conbench runner, so existing metadata will continue to function without changes.
3. Separates CommandExecutor class out of Run so it can be injected into the groups runner class.
4. Adds arrowbench and new adapters dir (see 2) to repos_with_benchmark_groups
Adds top-level adapters directory with requirements.txt, benchmarks.json metadata, and mock-adapter.py. For now, this is all just the stuff we'll need to run real adapter here, but no real benchmarks, as all of those will require a lot of alterations to run here. The new stuff here will likely need some light tweaks (e.g. ensure dependencies get installed correctly), but this structure shows we can run adapters from here.
Adds tests for arrowbench and adapter running. The adapter one will need small changes to run from main instead of this branch, but is currently set up so CI here should work.

austin3dickey

very cool!

buildkite/benchmark/run.py

austin3dickey · 2023-02-15T16:54:23Z

buildkite/benchmark/run.py

+        r_command = f"""
+        bm_df <- arrowbench::get_package_benchmarks();
+        arrowbench::run(
+            bm_df[bm_df$name %in% c({str(bm_names)[1:-1]}), ],


Yeah, this is a janky way to turn a list of strings into a comma-space separated string with each initial string quoted (e.g. ['a', 'b'] -> "'a', 'b'". It could be ", ".join([f"'{x}'" for x in ['a', 'b']]), but I'm not sure that's more readable. I miss R's toString().

austin3dickey · 2023-02-15T16:55:54Z

buildkite/benchmark/run.py

+            exit_on_failure=False,
+        )
+
+        # leave tempfile for validation if mock run


wait, doesn't this delete the tempfile if it's a mock run?

Wait, yeah, I'm missing a not...how did my tests pass?

Ok this went a bit weird and I can't fully explain why, but I'm switching to not all(bg.mock run ...) because it works.

tests/buildkite/benchmark/test_run_benchmarks.py

jonkeane · 2023-02-17T21:43:44Z

For now, this is all just the stuff we'll need to run real adapter here, but no real benchmarks, as all of those will require a lot of alterations to run here

Do you mean "will require a lot of alterations to run here" for the current set of python macrobenchmarks? Or that we'll still have a large chunk of work ahead of us to be able to run our arrowbench-based benchmarks this way too?

jonkeane

Looking good — a few minor comments

jonkeane · 2023-02-17T21:46:37Z

buildkite/benchmark/run.py

+            ],
+        },
+        "env_vars": {
+            "PYTHONFAULTHANDLER": "1",  # makes it easy to debug segmentation faults


Do we need this here? Or does it get passed all the way down to the adapter and we want those seg faults boiled up?

This is left over from when I copied it over from the labs/benchmarks one. I think it might get passed through all the way to adapters, but if it matters at that level I'd rather set it at that level (as doing so is easy). I'll remove this.

jonkeane · 2023-02-17T21:47:23Z

buildkite/benchmark/run.py

+            "BENCHMARKS_DATA_DIR": f"{os.getenv('HOME')}/data",  # allows to avoid loading Python and R benchmarks input data from s3 for every build
+            "ARROWBENCH_DATA_DIR": f"{os.getenv('HOME')}/data",  # allows to avoid loading R benchmarks input data from s3 for every build


If this is for arrowbench we only need one of these (ARROWBENCH_DATA_DIR), right?

Yep, removing

jonkeane · 2023-02-17T21:48:45Z

buildkite/benchmark/run.py

+            "BENCHMARKS_DATA_DIR": f"{os.getenv('HOME')}/data",  # allows to avoid loading Python and R benchmarks input data from s3 for every build
+            "ARROWBENCH_DATA_DIR": f"{os.getenv('HOME')}/data",  # allows to avoid loading R benchmarks input data from s3 for every build


This chunk runs all of the benchmarks together, so we need them together, yeah? Does this run the arrowbench ones a second time?

Or I guess taking a step back: what's this code chunk helpful for | doing here?

The env vars here are vestigial and unneeded (as of now).

This dict is here because adapters will live in this repo (see root on line 80 is arrow-benchmarks-ci/adapters. We could stick them in a different repo if we want; it doesn't really matter. As of now the only one there is a mock for testing, but eventually at least the C++ and Go ones should live there.

jonkeane · 2023-02-17T21:51:58Z

buildkite/benchmark/run.py

+        # NOTE: `bm.command` is the raw arrowbench name; `bm.name` is f"arrowbench/{bm.command}"
+        # to disambiguate from labs/benchmarks versions when filtering.


Is this distinction something we plan on coming back and getting rid of once we've transitioned this fully? If so a TODO might be nice.

At first I thought so, but after implementation, I'd rather keep it namespaced. Even if we want to refactor the filters to include repo, in some way we really want benchmarks to be namespaced by repo, as a lot of languages are going to have a "read file" benchmark (etc.), and relying on all the repos in question to avoid name clashes with each other is not a good way to manage.

alistaire47 · 2023-02-21T20:43:17Z

For now, this is all just the stuff we'll need to run real adapter here, but no real benchmarks, as all of those will require a lot of alterations to run here

Do you mean "will require a lot of alterations to run here" for the current set of python macrobenchmarks? Or that we'll still have a large chunk of work ahead of us to be able to run our arrowbench-based benchmarks this way too?

It was written intended to imply to adapter benchmarks like C++ (mostly; it's still getting metadata through the legacy runner) and Go (a bit), but not arrowbench or Python benchmarks. arrowbench should be (🤞) good to go shortly (though we will have to clean up cpu_count). The Python ones will be a good bit more work, yes; we'll have to either extract the runner from Conbench or port everything to benchrun (and finish the datalogistik integration in Python).

alistaire47 · 2023-02-23T18:21:47Z

Ok! I got a working test build running: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2224#01867bb0-78ba-4337-bbd3-b4815be4a7c1 that runs file-read through labs/benchmarks, arrowbench/file-read, and adapters/mock-adapter, all successfully. It took some tweaks to get everything running. Most were minor, but I'll comment on places that make judgement calls that weren't already reviewed.

One note on that build is that running through arrowbench directly does log a lot less; in particular it doesn't log the results. If we want these there, I think the right way to do so is to change the log level for benchclients in benchconnect to DEBUG, which will show all the JSON sent in requests.

TODO: Revert config.py to the original set of benchmarks instead of my testing set. If we want to add more of the arrowbench ones to any particular machine, it's a good time to do that too, though I want to be cautious and not add lots of running time to busy machines. Opinions welcome.

alistaire47 · 2023-02-23T18:28:28Z

buildkite/benchmark/run.py

+        arrowbench::install_pipx();
+        arrowbench::install_benchconnect();


Added. I'm not sure if this is the best way to install benchconnect in this context—there are already a ton of environments in this repo, and we could just add it to the relevant one, if we could figure out which that is—but this works.

If we care about speed, it's probably worth switching from conda to mamba (or at least the mamba solver); we're currently repeatedly solving environments, and it's not fast.

If we care about speed, it's probably worth switching from conda to mamba (or at least the mamba solver); we're currently repeatedly solving environments, and it's not fast.

Would you mind making an issue in this repo for that? Agreed that's probably the right way (but we also probably shouldn't do this just yet unless it's super quick + easy)

alistaire47 · 2023-02-23T18:35:26Z

buildkite/benchmark/run.py

+        if not Path(self.root).exists():
+            self.executor.execute_command(f"git clone {self.repo}")
+
+        self.executor.execute_command(
+            f"git fetch && git checkout {self.branch}", self.root
+        )


Made cloning conditional, because it was failing when labs/benchmarks and arrowbench both cloned arrowbench, specifically by complaining the directory already exists. We could rm -rf it and re-clone, but given it then fetches anyway, I'm pretty sure it'd be duplicative.

jonkeane · 2023-02-24T10:10:05Z

One note on that build is that running through arrowbench directly does log a lot less; in particular it doesn't log the results. If we want these there, I think the right way to do so is to change the log level for benchclients in benchconnect to DEBUG, which will show all the JSON sent in requests.

IMO we should log those requests — I'm fine bumping that logging level like you suggest. Normally I would say something like that is overkill, but for a variety of other systems reasons I actually end up going to those logged output values pretty frequently (e.g. when we were hitting a bunch of 50xs recently, I went to the buildkite logs to get some ids from there instead of needing to get them through the list api that was 50xing). At some point we will probably get to a point where it's not needed anymore, but for now as we're iterating on this system (and in other places) it's quite nice to have them

jonkeane · 2023-02-24T10:13:31Z

This looks good — I'm fine to merge it.

@ElenaHenderson if you had a sec to look through I would be super grateful as always (or if you're busy or would rather: we can merge and then post-hoc review).

alistaire47 · 2023-02-24T16:32:20Z

One note on that build is that running through arrowbench directly does log a lot less; in particular it doesn't log the results. If we want these there, I think the right way to do so is to change the log level for benchclients in benchconnect to DEBUG, which will show all the JSON sent in requests.

IMO we should log those requests — I'm fine bumping that logging level like you suggest. Normally I would say something like that is overkill, but for a variety of other systems reasons I actually end up going to those logged output values pretty frequently (e.g. when we were hitting a bunch of 50xs recently, I went to the buildkite logs to get some ids from there instead of needing to get them through the list api that was 50xing). At some point we will probably get to a point where it's not needed anymore, but for now as we're iterating on this system (and in other places) it's quite nice to have them

conbench/conbench#775 (will be quick)

alistaire47 · 2023-03-01T17:34:20Z

buildkite/benchmark/run.py

@@ -429,7 +429,6 @@ def run_benchmark_groups(self, benchmark_groups: List[BenchmarkGroup]) -> None:
            n_iter = 3L,
            drop_caches = TRUE,
            publish = TRUE,
-            run_id = '{os.getenv('RUN_ID')}',


So unfortunately this isn't going to work. The passing is fine, but the run ID it generates isn't, because the way abci generates run IDs is with the buildkite build, not the Repo the benchmarks are in, so arrowbench fails because benchconnect tries to start and finish a run that other Repos are using. We could generate an ID here and pass it through, though that's no different than letting arrowbench (really benchconnect) handle it. I've made #104 to come back and change this; this is going to keep biting us if we want to, say, close a run for an adapter with why it failed.

alistaire47 · 2023-03-01T17:37:56Z

Ok! So! Things are working for real now; before it looked like there was a logging problem, but it was really installation and run ID uniqueness problems (which are now fixed). Build with full logging running labs/benchmarks, arrowbench, and adapters benchmarks: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2265#01869a5a-db70-4c2e-94a2-f6048538d346

alistaire47 self-assigned this Feb 14, 2023

alistaire47 requested review from ElenaHenderson, austin3dickey and jonkeane February 14, 2023 19:45

austin3dickey approved these changes Feb 15, 2023

View reviewed changes

alistaire47 mentioned this pull request Feb 17, 2023

Add option to drop disk caches before running a benchmark voltrondata-labs/arrowbench#134

Merged

jonkeane approved these changes Feb 17, 2023

View reviewed changes

alistaire47 mentioned this pull request Feb 22, 2023

Add adapters directory #101

Merged

alistaire47 force-pushed the edward/direct-running branch from feaf9cc to 681a8d5 Compare February 22, 2023 21:39

alistaire47 mentioned this pull request Feb 23, 2023

Restructure MockAdapter init #102

Merged

alistaire47 added 17 commits February 22, 2023 18:28

Allow benchmarks with runners beyond conbench CLI

1c1662c

fix adapters tests

f8f98c1

tweaks

eefb312

tweaks and cleanup

a3be726

nevermind, run params might as well be here

e69180a

turn on "arrowbench/file-write" for test build

20f681b

cpp microbenchmarks are slow and i don't care

119f660

fix quoting and test better

ed8ae1b

rm before clone

9bd06d8

conditional clone

f2157f3

so tired of waiting

f067862

install benchconnect

c6939b9

fix adapters metametadata

9f11822

point adapters at main

48a7881

test adapter

f9d7702

missed one

37d493a

typo

4fcd199

alistaire47 force-pushed the edward/direct-running branch from 086025f to 4fcd199 Compare February 23, 2023 00:28

alistaire47 commented Feb 23, 2023

View reviewed changes

revert what to run

fa46264

alistaire47 commented Feb 23, 2023

View reviewed changes

alistaire47 mentioned this pull request Feb 24, 2023

benchconnect: Set client logging to DEBUG conbench/conbench#775

Closed

alistaire47 added 9 commits February 27, 2023 13:59

file logger

18dfc4e

Fail early if benchconnect install fails

9be5fa0

pipx -> pip

67dc194

ugh typo

57c97ee

pipx ensurepath

5d53ba9

fail if benchconnect unavailable

c10056e

back to pip

bf47fcb

let arrowbench generate run ID

8cb505d

oops

4f61f2d

alistaire47 commented Mar 1, 2023

View reviewed changes

alistaire47 merged commit 1cf2922 into main Mar 1, 2023

alistaire47 deleted the edward/direct-running branch March 1, 2023 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow benchmarks with runners beyond conbench CLI #99

Allow benchmarks with runners beyond conbench CLI #99

alistaire47 commented Feb 14, 2023 •

edited

Loading

austin3dickey left a comment

austin3dickey Feb 15, 2023

alistaire47 Feb 15, 2023

austin3dickey Feb 15, 2023

alistaire47 Feb 15, 2023

alistaire47 Feb 15, 2023

jonkeane commented Feb 17, 2023

jonkeane left a comment

jonkeane Feb 17, 2023

alistaire47 Feb 21, 2023

jonkeane Feb 17, 2023

alistaire47 Feb 21, 2023

jonkeane Feb 17, 2023

alistaire47 Feb 21, 2023

jonkeane Feb 17, 2023

alistaire47 Feb 21, 2023

alistaire47 commented Feb 21, 2023

alistaire47 commented Feb 23, 2023 •

edited

Loading

alistaire47 Feb 23, 2023

jonkeane Feb 24, 2023

alistaire47 Feb 24, 2023

alistaire47 Feb 23, 2023

jonkeane commented Feb 24, 2023

jonkeane commented Feb 24, 2023

alistaire47 commented Feb 24, 2023

alistaire47 Mar 1, 2023

alistaire47 commented Mar 1, 2023

		"BENCHMARKS_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading Python and R benchmarks input data from s3 for every build
		"ARROWBENCH_DATA_DIR": f"{os.getenv('HOME')}/data", # allows to avoid loading R benchmarks input data from s3 for every build

		# NOTE: `bm.command` is the raw arrowbench name; `bm.name` is f"arrowbench/{bm.command}"
		# to disambiguate from labs/benchmarks versions when filtering.

		arrowbench::install_pipx();
		arrowbench::install_benchconnect();

Allow benchmarks with runners beyond conbench CLI #99

Allow benchmarks with runners beyond conbench CLI #99

Conversation

alistaire47 commented Feb 14, 2023 • edited Loading

austin3dickey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonkeane commented Feb 17, 2023

jonkeane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alistaire47 commented Feb 21, 2023

alistaire47 commented Feb 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonkeane commented Feb 24, 2023

jonkeane commented Feb 24, 2023

alistaire47 commented Feb 24, 2023

Choose a reason for hiding this comment

alistaire47 commented Mar 1, 2023

alistaire47 commented Feb 14, 2023 •

edited

Loading

alistaire47 commented Feb 23, 2023 •

edited

Loading