Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla-node: prepare for the post-jmx world #565

Merged
merged 3 commits into from
Apr 2, 2024

Conversation

denesb
Copy link
Contributor

@denesb denesb commented Mar 22, 2024

Soon the jmx package will be dropped from the unified package, and the scylla.git repo. Prepare for this:

  • Add use_jmx property, based on probing the reloc package for the presence of the jmx directory, or the tools/jmx directory in the scylla repo.
  • Use this new property to fence off jmx-related code. nodetool() now routes to the native nodetool directly, when use_jmx == False.

Fixes: https://github.com/scylladb/scylla-dtest/issues/4093

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

We need a full dtest run with this PR, I don't know how to do that.

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

This PR is a 6.0 release blocker, so please give it attention.

@tchaikov
Copy link
Contributor

We need a full dtest run with this PR, I don't know how to do that.

running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

Looks like I have to fix ccm's own tests too.

@tchaikov
Copy link
Contributor

Looks like I have to fix ccm's own tests too.

@denesb see #566

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denesb is the next step to update the building system of scylla and then drop tools/jmx submodule?

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

@denesb is the next step to update the building system of scylla and then drop tools/jmx submodule?

Yes, see scylladb/scylladb#17969.

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

New in v2:

  • Update package version in tests/test_config.py, to a version, which has the fully complete native nodetool.

@denesb denesb self-assigned this Mar 22, 2024
@fruch
Copy link
Contributor

fruch commented Mar 22, 2024

We need a full dtest run with this PR, I don't know how to do that.

running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/

This is just the gating (and seems like there is some work still needed, not sure if in this PR)

We should also do a run with enterprise

And a full run, to flush issues with non gating tests.

@tchaikov
Copy link
Contributor

New in v2:

* Update package version in `tests/test_config.py`, to a version, which has the fully complete native nodetool.

but i think it would be great to continue using the cached binary as long as cache works.

We need a full dtest run with this PR, I don't know how to do that.

running at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/

This is just the gating (and seems like there is some work still needed, not sure if in this PR)

We should also do a run with enterprise

And a full run, to flush issues with non gating tests.

ah, right. rescheduling at https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/212/

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

but i think it would be great to continue using the cached binary as long as cache works.

Not sure what do you mean, what cache?

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

I just discovered two new commands I didn't know before:

  • getsstables
  • sstableinfo

@tchaikov
Copy link
Contributor

tchaikov commented Mar 22, 2024

but i think it would be great to continue using the cached binary as long as cache works.

Not sure what do you mean, what cache?

IIUC, the github workflows try to reuse the cached binary. see

uses: actions/cache@v2
with:
path: |
~/.ccm/repository
~/.ccm/scylla-repository
key: ${{ runner.os }}-binaries
and
id: cache-versions
uses: actions/cache@v2
with:
path: |
~/.ccm/repository
~/.ccm/scylla-repository
key: ${{ runner.os }}-binaries
- name: Download versions
if: steps.cache-versions.outputs.cache-hit != 'true'
run: |
if [ ! -f ~/.ccm/scylla-repository/unstable/master/2023-04-13T13_31_00Z ]; then
RELOC_VERSION="2023-04-13T13:31:00Z"
./ccm create temp -n 1 --scylla --version unstable/master:${RELOC_VERSION}
./ccm remove
fi
./ccm create temp-cas -n 1 --version 3.11.4 > /dev/null
./ccm remove
. in other words, i am trying to raise more attentions to #566 . i will apply the same change to integration-tests.yml once i get positive feedbacks.

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

Analysis of failed gating dtests. I will soon prepare scylla-dtest.git and scylla.git branches to rerun this with.

FullDtest / full-split003 / reversed_queries_test.TestReversedQueriesSelectorsDuringUpgrade.test_queries_during_upgrade

Version we upgrade to doesn't have feature complete nodetool. Need to change the test to use a more recent version.

[FullDtest / full-split001 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-SizeTieredCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/FullDtest___full_split001___test_compaction_delete_tombstone_gc_immediate_SizeTieredCompactionStrategy_/)
[FullDtest / full-split001 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/

Flaky test (very possibly related to native nodetool somehow): scylladb/scylladb#17947

FullDtest___full_split001___test_compaction_delete_tombstone_gc_immediate_TimeWindowCompactionStrategy_/)

[FullDtest / full-split001 / nodetool_additional_test.TestNodetool.test_scrub_with_multi_nodes_expect_data_rebuild](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split001___test_scrub_with_multi_nodes_expect_data_rebuild/)

Missing getsstables command.

[FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/FullDtest___full_split001___test_change_node_ip/)

Version we upgrade to doesn't have feature complete nodetool. Need to change the test to use a more recent version.

[FullDtest / full-split002 / compaction_test.TestCompaction.test_compaction_delete_tombstone_gc[immediate-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/compaction_test/TestCompaction/

Flaky test (very possibly related to native nodetool somehow): scylladb/scylladb#17947

FullDtest___full_split002___test_compaction_delete_tombstone_gc_immediate_TimeWindowCompactionStrategy_/)

[FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_get_sstable](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split000___test_get_sstable/)

Need to implement getsstable command.

[FullDtest / full-split000 / nodetool_additional_test.TestNodetool.test_sstable_info](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/nodetool_additional_test/TestNodetool/FullDtest___full_split000___test_sstable_info/)

Need to implement sstableinfo command.

[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_one_keyspace_multiple_column_families_writes_sample_and_empty_reads_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_one_keyspace_multiple_column_families_writes_sample_and_empty_reads_sample/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_all_column_families_writes_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_all_column_families_writes_only/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_writes_sample_for_10_partitions_with_100_op_and_empty_reads_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_writes_sample_for_10_partitions_with_100_op_and_empty_reads_sample/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_all_column_families_reads_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_all_column_families_reads_only/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_reads_sample_for_10_partitions_with_100_op_and_empty_writes_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_reads_sample_for_10_partitions_with_100_op_and_empty_writes_sample/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_5_paritions_write_samplers_and_empty_read_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_5_paritions_write_samplers_and_empty_read_sample/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_5_paritions_for_read_samplers_and_empty_write_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_5_paritions_for_read_samplers_and_empty_write_sample/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_3_paritions_for_write_samplers_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_3_paritions_for_write_samplers_only/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_top_3_paritions_for_read_samplers_only](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_top_3_paritions_for_read_samplers_only/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_param_sampler_writes_and_capacity_size](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_param_sampler_writes_and_capacity_size/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_param_sampler_read_and_capacity_size](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_param_sampler_read_and_capacity_size/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_write_into_one_paritions_to_different_rows](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_write_into_one_paritions_to_different_rows/)
[FullDtest / full-split000 / toppartitions_test.TestTopPartitions.test_one_keyspace_multiple_column_families_reads_sample_and_empty_writes_sample](https://jenkins.scylladb.com/job/scylla-master/job/byo/job/dtest-byo/211/testReport/junit/toppartitions_test/TestTopPartitions/FullDtest___full_split000___test_one_keyspace_multiple_column_families_reads_sample_and_empty_writes_sample/)

tools/toppartitions.py uses a regexp to search for running nodetool executable which doesn't match the native nodetool, regex needs minor adjustment

@denesb denesb force-pushed the scylla-native-nodetool branch 3 times, most recently from 23f9f0c to 2bb097c Compare March 22, 2024 12:58
@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

New in v3 - fix the tests:

  • Update matched version in get_get_scylla*version.
  • Remove jmx-related code from test_nodetool_timeout

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

I will have to roll back the removal of JMX, because it breaks upgrade tests. :(

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

New in v4:

  • Restore JMX, add a branch based on whether the node has native nodetool or not. JMX is not started if node has native nodetool (determined based on the version).

@denesb
Copy link
Contributor Author

denesb commented Mar 22, 2024

@denesb
Copy link
Contributor Author

denesb commented Mar 27, 2024

There is one more nodetool command missing: checkAndRepairCdcStreams.

PR: scylladb/scylladb#18076

@denesb
Copy link
Contributor Author

denesb commented Mar 27, 2024

.github/workflows/nix.yml Outdated Show resolved Hide resolved
@avikivity
Copy link
Member

FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip

This is another test which fails because the native nodetool is too fast.

Where's the compatibility?!

@fruch
Copy link
Contributor

fruch commented Mar 27, 2024

FullDtest / full-split001 / update_cluster_layout_tests.TestUpdateClusterLayout.test_change_node_ip

This is another test which fails because the native nodetool is too fast.

Where's the compatibility?!

yeah, they should have shelp a few sleeps in the new code 🤣

@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

It would be best to have ccm always use the relocatable package and install it, and just call nodetool without worrying if it's native or not.

I may have to go back to this, I'm hitting issues in upgrade tests and I ended up restoring the pre-PR code completely, just invoked in an else: branch.
I will prepare a Scylla branch which installs the native nodetool as nodetool and lets see if that works.

I may have to discard this PR completely. At least it served as a good integration test for finding the last wrinkles to iron out in the native nodetool.

I will report back when I have results.

Separate parameterizing nodetool and actually invoking it. The
parameterizing is about to diverege for node.py and scylla_node.py, so
this separation allows us to reuse just the invoke logic, while letting
the two go on its separate ways when it comes to parameterizing.
…om stderr

These messages are printed by seastar and ASAN respectively and they
trigger tests which look for error messages in stderr. Filter them out
to address this problem at the root, instead of dealing with it in each
individual test.
@denesb denesb changed the title scylla-node: switch to using the native nodetool directly scylla-node: prepare for the post-java world Mar 28, 2024
@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

New in v5:

  • Complete revamp of how we decide when to use native nodetool -- instead of an unreliable version check, check whether the package/repo has java tools dirs in them.
  • Take one step further and instead of just fencing off JMX, fence off all java related code behind a flag (see above).
  • Handle upgrade correctly -- redo the check after the install dir changes.

With this, CCM should be prepared for the disappearance of the java tools. Lets see how dtests will take this (I tested with a small sample, including an upgrade test, which is tricky).

@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

@fruch
Copy link
Contributor

fruch commented Mar 28, 2024

BYO (with java packages removed): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2168/

That's quite ambitious

There quite a lot of tests using Cassandra-stress, you'll need to supply an alternative or in CCM or in dtest

@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

BYO (with java packages removed): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2168/

That's quite ambitious

There quite a lot of tests using Cassandra-stress, you'll need to supply an alternative or in CCM or in dtest

Yes, it was more for curiosity and also because dropping the java packages is one of our goals for 6.0 (but we of course want to keep c-s).
There are 280 failed tests already.

I already kicked off another BYO run, which runs with the java tools still in (it is in the queue).

@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

I will hack scylladb/scylladb#17969 to install only cassandra-stress, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: only cassandra-stress is part of the unified package, jmx and the rest of the java-tools content is dropped.

@fruch
Copy link
Contributor

fruch commented Mar 28, 2024

I will hack scylladb/scylladb#17969 to install only cassandra-stress, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: only cassandra-stress is part of the unified package, jmx and the rest of the java-tools content is dropped.

I thought @syuu1228 was doing some work on repacking c-s on itself own, maybe you want to sync with him

@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

I will hack scylladb/scylladb#17969 to install only cassandra-stress, but nothing else, then adjust this PR to work with that. This is the agree-upon end-state: only cassandra-stress is part of the unified package, jmx and the rest of the java-tools content is dropped.

I thought @syuu1228 was doing some work on repacking c-s on itself own, maybe you want to sync with him

Yes, I will. For now I just want to see what changes I need here in ccm, to keep cassandra-stress working, but not care about the other java tools.

Soon the jmx package will be dropped from the unified package, and the
scylla.git repo. Prepare for this:
* Add `use_jmx` property, based on probing the reloc package for the
  presence of the `jmx` directory, or the `tools/jmx` directory in the
  scylla repo.
* Use this new property to fence off jmx-related code. nodetool() now
  routes to the native nodetool directly, when use_jmx == False.
@denesb denesb changed the title scylla-node: prepare for the post-java world scylla-node: prepare for the post-jmx world Mar 28, 2024
@denesb
Copy link
Contributor Author

denesb commented Mar 28, 2024

v6: reduce the scope of the PR from preparing for the post-java world, to preparing to the post-jmx world.

@denesb
Copy link
Contributor Author

denesb commented Mar 29, 2024

BYO with scylla changes to drop JMX: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2171
Two tests failed, they look unrelated.

@denesb
Copy link
Contributor Author

denesb commented Mar 29, 2024

Now a BYO with default branches and this branch from ccm: https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/2172/

If this passes, I think we can merge this PR.

@denesb
Copy link
Contributor Author

denesb commented Mar 29, 2024

limits_test.TestLimits.test_max_columns_and_query_parameters
rolling_upgrade_test.TestRollingUpgrade.test_rolling_upgrade

These two failed in the master run. Both have a history of failing in master dtest runs. So I think this PR is good to go in.

@yaronkaikov
Copy link
Contributor

@fruch Can we merge this?

@fruch
Copy link
Contributor

fruch commented Apr 2, 2024

@fruch Can we merge this?

I'll give it one final run, on both OSS and Enterprise

if it would pass, I'll merge it

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'll merge, once gating is working

@fruch fruch merged commit b18f85d into scylladb:next Apr 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants