Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange issue, node joins the cluster and then Abort #1837

Open
IRCGeek opened this issue Jan 22, 2023 · 1 comment
Open

Strange issue, node joins the cluster and then Abort #1837

IRCGeek opened this issue Jan 22, 2023 · 1 comment

Comments

@IRCGeek
Copy link

IRCGeek commented Jan 22, 2023

Please advice me as per below logs
I am unable to find any reason

Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal

Linux instance-20220523-1749 5.15.0-1016-oracle #20~20.04.1-Ubuntu SMP Mon Aug 8 07:30:37 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

mysql Ver 15.1 Distrib 10.3.37-MariaDB, for debian-linux-gnu (aarch64) using readline 5.2

2023-01-22 6:26:52 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2023-01-22 6:26:52 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2023-01-22 6:26:52 0 [Note] WSREP: wsrep_load(): Galera 3.29(ra60e019) by Codership Oy [email protected] loaded successfully.
2023-01-22 6:26:52 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2023-01-22 6:26:52 0 [Note] WSREP: Found saved state: f1e1e4a1-9a0f-11ed-9c0f-1be97a0e7b5b:-1, safe_to_bootstrap: 1
2023-01-22 6:26:52 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.0.1.169; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; p
2023-01-22 6:26:52 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2023-01-22 6:26:52 0 [Note] WSREP: wsrep_sst_grab()
2023-01-22 6:26:52 0 [Note] WSREP: Start replication
2023-01-22 6:26:52 0 [Note] WSREP: Setting initial position to f1e1e4a1-9a0f-11ed-9c0f-1be97a0e7b5b:0
2023-01-22 6:26:52 0 [Note] WSREP: protonet asio version 0
2023-01-22 6:26:52 0 [Note] WSREP: Using CRC-32C for message checksums.
2023-01-22 6:26:52 0 [Note] WSREP: backend: asio
2023-01-22 6:26:52 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2023-01-22 6:26:52 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2023-01-22 6:26:52 0 [Note] WSREP: restore pc from disk failed
2023-01-22 6:26:52 0 [Note] WSREP: GMCast version 0
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2023-01-22 6:26:52 0 [Note] WSREP: EVS version 0
2023-01-22 6:26:52 0 [Note] WSREP: gcomm: connecting to group 'MariaDB Galera Cluster', peer '35.212.132.67:,85.122.127.235:,192.3.91.168:,10.0.1.169:'
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.0.1.169:4567
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') connection established to 6ab548fb tcp://85.122.127.235:4567
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2023-01-22 6:26:52 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') connection established to 7191d038 tcp://192.3.91.168:4567
2023-01-22 6:26:53 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') connection established to 5149ecf7 tcp://35.212.132.67:4567
2023-01-22 6:26:53 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') connection established to 7191d038 tcp://192.3.91.168:4567
2023-01-22 6:26:53 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') connection established to 5149ecf7 tcp://35.212.132.67:4567
2023-01-22 6:26:54 0 [Note] WSREP: declaring 5149ecf7 at tcp://35.212.132.67:4567 stable
2023-01-22 6:26:54 0 [Note] WSREP: declaring 6ab548fb at tcp://85.122.127.235:4567 stable
2023-01-22 6:26:54 0 [Note] WSREP: declaring 7191d038 at tcp://192.3.91.168:4567 stable
2023-01-22 6:26:55 0 [Note] WSREP: view(view_id(NON_PRIM,5149ecf7,2792) memb {
5149ecf7,0
6ab548fb,0
7191d038,0
c273183b,0
} joined {
} left {
} partitioned {
07d1e878,0
77a052d6,0
9e708b0d,0
a1d8493f,0
a79d17a2,0
b2984c45,0
b60181c9,0
bad3a406,0
c401108d,0
d392217d,0
dc8cc78b,0
})
2023-01-22 6:26:56 0 [Note] WSREP: (c273183b, 'tcp://0.0.0.0:4567') turning message relay requesting off
2023-01-22 6:27:23 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():160
2023-01-22 6:27:23 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2023-01-22 6:27:23 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1457: Failed to open channel 'MariaDB Galera Cluster' at 'gcomm://35.212.132.67,85.122.127.235,192.3.91.168,10.0.1.169': -110 (Connection timed out)
2023-01-22 6:27:23 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2023-01-22 6:27:23 0 [ERROR] WSREP: wsrep::connect(gcomm://35.x.x.x,85.122.x.x,192.3.x.x,10.0.x.x) failed: 7
2023-01-22 6:27:23 0 [ERROR] Aborting

@dciabrin
Copy link
Contributor

From the few logs here, it looks like the joiner could reach one of the four nodes in the gcomm from which it could join the running cluster. But onced it initiated the connection, it determined that it joined a partition of the cluster which lost quorum (only 4 nodes out of 15?). It probably refused to carry on from that point onward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants