Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Creating node from scratch does not copy tables for the Postgres and Kafka engines #1455

Open
Hubbitus opened this issue Jul 12, 2024 · 12 comments

Comments

@Hubbitus
Copy link

Hubbitus commented Jul 12, 2024

We use your Operator to manage Clickhouse cluster. Thank you.

After some hardware failure we reset PVC (and zookeeper namespace) to re-create one clickhouse node.

Most of metadata like views, materialized views and tables with most engines (MergeTree, ReplicatedMergeTree etc.) was successfully re-created on the node and replication was started.

Meantime none of Postgres and Kafka based engines tables was recreated.
Is it a bug, or we need to use some commands or hacks to sync all metadata across the cluster?

@alex-zaitsev
Copy link
Member

@Hubbitus , have you used latest 0.23.6 or earlier release?

@Hubbitus
Copy link
Author

Hubbitus commented Jul 24, 2024

@alex-zaitsev, thank you for the response.

That was in older version. Now we have updated operator. What is a correct way to re-init node? Is it enough to just delete PVC of failed node and delete POD?

@alex-zaitsev
Copy link
Member

@Hubbitus , if you want to re-init the existing node, delete STS, PVC, PV and start re-concile. Do you have multiple replicas?

@Hubbitus
Copy link
Author

Hubbitus commented Jul 31, 2024

@alex-zaitsev, thank you for the reply.

I understand how to delete objects. But what you are meant under "start re-concile"?

I have two replicas chi-gid-gid-0-0-0 and chi-gid-gid-0-1-0. And now chi-gid-gid-0-0-0 is misfunction. I want to re-init it from the data in chi-gid-gid-0-1-0. And that should include sync all:

  • metadata (all type of objects like MergeTree tables, Postgres, kafka engines, materialized views, etc)
  • populate it with data from replica 1
  • Users and all permissions to the objects

@alex-zaitsev
Copy link
Member

@Hubbitus , we have released 0.23.7 that is more aggressive re-creating the schema. So you may try to delete PVC/PV completely, and let it to re-create the objects.

@Hubbitus
Copy link
Author

Hubbitus commented Sep 4, 2024

@alex-zaitsev, thank you very much!
Eventually I get it updated for our cluster:

kub_dev get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" -l app=clickhouse-operator                                                                                                     
altinity/clickhouse-operator:0.23.7 altinity/metrics-exporter:0.23.7

And doing in ArgoCD:

  • Deleted PVC default-volume-claim-chi-gid-gid-0-0-0
  • Deleted pod chi-gid-gid-0-0-0

Then PVC had been re-created.

I see pod is up and running.

  1. But there are a lot of errors like 2024.09.04 23:50:34.382651 [ 712 ] {} <Error> Access(user directories): from: 10.42.9.104, user: data_quality: Authentication failed: Code: 192. DB::Exception: There is no user data_quality in local_directory. (UNKNOWN_USER).... So, users are not copied
  2. Tables looks like also not synced:
SELECT hostname() as node, COUNT(*)
FROM clusterAllReplicas('{cluster}', system.tables)
WHERE database NOT IN ('INFORMATION_SCHEMA', 'information_schema', 'system')
GROUP BY node
node count()
chi-gid-gid-0-1-0 620

And also error in log like: 2024.09.04 23:52:49.039132 [ 714 ] {bb628508-db8e-4cf9-8307-a13133a185c9} <Error> PredefinedQueryHandler: Code: 60. DB::Exception: Table system.operator_compatible_metrics does not exist. (UNKNOWN_TABLE) - so even in system database some tables missing...

So, I see only tables in information_schema for the 1-st node.

@alex-zaitsev
Copy link
Member

alex-zaitsev commented Sep 20, 2024

Notes:

  1. Users are not replicated by operator since it can not access sensitive data (like passwords). Use CHI/XML user management or replicated user directory.
<clickhouse>
  <user_directories replace="replace">
    <users_xml>
      <path>/etc/clickhouse-server/users.xml</path>
    </users_xml>
    <replicated>
      <zookeeper_path>/clickhouse/access/</zookeeper_path>
    </replicated>
    <local_directory>
       <path>/var/lib/clickhouse/access/</path>
    </local_directory>
  </user_directories>
</clickhouse>

Note, the order is important, but local_directory may be skipped if you are not using it. Keep it, if there are users defined with CREATE USER already, otherwise they disappear at all.

  1. Tables in system database are not replicated as well, since it is supposed there are no user tables in there.

Others should work, so operator log is needed to check what went wrong.

The correct PVC recovery sequence is:

  1. Delete PVC (or PVC and STS)
  2. Run reconcile adding taskID to CHI, for instance

Looks like since you have deleted PVC and Pod, the recovery has been handled by Kubernetes (STS), and Operator even did not know that PVC has been recreated. So make sure you delete STS as well. Also consider using operator managed persistance:

spec:
  defaults:
    storageManagement:
      provisioner: Operator

@Hubbitus
Copy link
Author

Hubbitus commented Sep 21, 2024

@alex-zaitsev, very thank you for the answer. First I would like to recover my tables, then I will try to deal with users.

Today, I eventfully receive rights to see operator pod in kube-system namespace.
And just after deletion of PVC and pod I see errors in clickhouse-operator pod:

I0921 22:13:23.555553       1 worker.go:275] processReconcilePod():gidplatform-dev/chi-gid-gid-0-0-0:Delete Pod. gidplatform-dev/chi-gid-gid-0-0-0
I0921 22:13:23.686901       1 worker.go:266] processReconcilePod():gidplatform-dev/chi-gid-gid-0-0-0:Add Pod. gidplatform-dev/chi-gid-gid-0-0-0
I0921 22:13:32.391425       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.391446       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
E0921 22:13:32.394908       1 connection.go:194] Exec():FAILED Exec(http://test_operator:***@chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:8123/) doRequest: transport failed to send a request to ClickHouse: dial tcp 10.42.9.84:8123: connect: connection refused for
SQL: SYSTEM DROP DNS CACHE
W0921 22:13:32.394938       1 retry.go:52] exec():chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:FAILED single try. No retries will be made for Applying sqls
I0921 22:13:32.414341       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.414363       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:32.415447       1 worker.go:387] gidplatform-dev/gid/b22b39fe-b7d8-40e3-a510-e169d1ffab18:updating endpoints for CHI-1 gid
I0921 22:13:32.450485       1 worker.go:389] gidplatform-dev/gid/b22b39fe-b7d8-40e3-a510-e169d1ffab18:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.84 10.42.5.92]
I0921 22:13:32.464127       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.464172       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:32.466517       1 worker.go:393] gidplatform-dev/gid/f2584b3a-a25a-4f22-8dfd-72f2a5166984:Update users IPS-1
I0921 22:13:32.481724       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/f2584b3a-a25a-4f22-8dfd-72f2a5166984:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0921 22:13:42.168333       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.168355       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.190633       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.190651       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.191751       1 worker.go:387] gidplatform-dev/gid/ef8a0da7-09d3-4890-9a59-c760233aedb5:updating endpoints for CHI-1 gid
I0921 22:13:42.215106       1 worker.go:389] gidplatform-dev/gid/ef8a0da7-09d3-4890-9a59-c760233aedb5:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.84 10.42.5.92]
I0921 22:13:42.224452       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.224470       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.225507       1 worker.go:393] gidplatform-dev/gid/d9105257-3cfe-4596-b3bf-0f6cd6935843:Update users IPS-1
I0921 22:13:42.235027       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/d9105257-3cfe-4596-b3bf-0f6cd6935843:Update ConfigMap gidplatform-dev/chi-gid-common-usersd

@Hubbitus
Copy link
Author

Hubbitus commented Sep 29, 2024

As we are speaking, I have tried to reconcile cluster by providing:

spec:
  taskID: "click-reconcile-1"

Indeed, that looks like triggering reconcile. Logs of operator pod:

kubectl -n kube-system logs --selector=app=clickhouse-operator --container=clickhouse-operator --tail=1000
I0929 11:54:59.076600       1 worker.go:574] ActionPlan start---------------------------------------------:
Diff start -------------------------
modified spec items num: 1
diff item [0]:'.TaskID' = '"click-reconcile-1"'
Diff end -------------------------

ActionPlan end---------------------------------------------
I0929 11:54:59.076655       1 worker-chi-reconciler.go:89] reconcileCHI():gidplatform-dev/gid/click-reconcile-1:ActionPlan has actions - continue reconcile
I0929 11:54:59.125555       1 worker.go:663] markReconcileStart():gidplatform-dev/gid/click-reconcile-1:reconcile started, task id: click-reconcile-1
I0929 11:54:59.681288       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:0|host:0-0
I0929 11:54:59.681436       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:1|host:0-1
I0929 11:54:59.681607       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:54:59.859367       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:00.648852       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:55:01.284151       1 service.go:86] CreateServiceCluster():gidplatform-dev/gid/click-reconcile-1:gidplatform-dev/cluster-gid-gid
I0929 11:55:01.294688       1 worker-chi-reconciler.go:819] PDB updated: gidplatform-dev/gid-gid
I0929 11:55:01.294746       1 worker-chi-reconciler.go:554] not found ReconcileShardsAndHostsOptionsCtxKey, use empty opts
I0929 11:55:01.294769       1 worker-chi-reconciler.go:568] starting first shard separately
I0929 11:55:01.294967       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:01.305993       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:01.306072       1 worker-chi-reconciler.go:684] reconcileHost():Reconcile Host start. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:01.897135       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-0:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:01.897345       1 worker.go:1001] worker.go:1001:excludeHost():start:exclude host start
I0929 11:55:02.047624       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-0
I0929 11:55:02.047656       1 worker.go:1170] shouldExcludeHost():Host is the same, would not be updated, no need to exclude. Host/shard/cluster: 0/0/gid
I0929 11:55:02.047669       1 worker.go:1005] worker.go:1002:excludeHost():end:exclude host end
I0929 11:55:02.047693       1 worker.go:1020] worker.go:1020:completeQueries():start:complete queries start
I0929 11:55:02.047730       1 worker.go:1220] shouldWaitQueries():Will wait for queries to complete according to CHOp config 'reconcile.host.wait.queries' setting. Host is not yet in the cluster. Host/shard/cluster: 0/0/gid
I0929 11:55:02.047779       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:02.087023       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:02.087048       1 worker.go:1024] worker.go:1021:completeQueries():end:complete queries end
I0929 11:55:02.248789       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-deploy-confd-gid-0-0
I0929 11:55:02.884163       1 worker-chi-reconciler.go:716] reconcileHost():Reconcile PVCs and check possible data loss for host: 0-0
I0929 11:55:03.458635       1 worker-chi-reconciler.go:406] worker-chi-reconciler.go:406:reconcileHostStatefulSet():start:reconcile StatefulSet start
I0929 11:55:03.458764       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:03.465752       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:03.472628       1 worker-chi-reconciler.go:412] reconcileHostStatefulSet():Reconcile host: 0-0. ClickHouse version: 24.2.1.2248
I0929 11:55:03.651853       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-0
I0929 11:55:03.651943       1 worker-chi-reconciler.go:425] reconcileHostStatefulSet():Reconcile host: 0-0. Reconcile StatefulSet
I0929 11:55:03.655273       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-0:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:04.097497       1 worker-chi-reconciler.go:445] worker-chi-reconciler.go:407:reconcileHostStatefulSet():end:reconcile StatefulSet end
I0929 11:55:04.654273       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/chi-gid-gid-0-0. Will try to update
I0929 11:55:04.853666       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:05.487521       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:05.487592       1 worker-chi-reconciler.go:461] reconcileHostService():DONE Reconcile service of the host: 0-0
I0929 11:55:05.487682       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:05.495665       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:05.495739       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:05.495824       1 worker-chi-reconciler.go:753] reconcileHost():Check host for ClickHouse availability before migrating tables. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:05.495957       1 worker.go:908] migrateTables():No need to add tables on host 0 to shard 0 in cluster gid
I0929 11:55:05.496005       1 worker.go:1057] includeHost():Include into cluster host 0 shard 0 cluster gid
I0929 11:55:05.496048       1 worker.go:1124] includeHostIntoClickHouseCluster():going to include host 0 shard 0 cluster gid
I0929 11:55:05.496070       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:55:05.648655       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:06.449496       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:06.463606       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:06.463648       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:06.463703       1 worker-chi-reconciler.go:776] reconcileHost():Reconcile Host completed. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:07.086061       1 worker-chi-reconciler.go:797] reconcileHost():[now: 2024-09-29 11:55:07.085979541 +0000 UTC m=+530555.182385088] ProgressHostsCompleted: 1 of 2
I0929 11:55:08.084486       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/clickhouse-gid. Will try to update
I0929 11:55:08.253098       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/clickhouse-gid
I0929 11:55:08.883102       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/clickhouse-gid
I0929 11:55:08.883295       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:55:08.889935       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:55:08.890015       1 worker-chi-reconciler.go:684] reconcileHost():Reconcile Host start. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:55:09.524136       1 worker.go:1572] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-1:cur and new objects ARE DIFFERENT based on object version label: Update of the object is required. Object: gidplatform-dev/chi-gid-gid-0-1
I0929 11:55:09.524219       1 worker.go:1001] worker.go:1001:excludeHost():start:exclude host start
I0929 11:55:09.647870       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-1
I0929 11:55:09.647935       1 worker.go:1177] shouldExcludeHost():Host should be excluded. Host/shard/cluster: 1/0/gid
I0929 11:55:09.647982       1 worker.go:1010] excludeHost():Exclude from cluster host 1 shard 0 cluster gid
I0929 11:55:10.090456       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.090524       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.132801       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.132824       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.134283       1 worker.go:387] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-1 gid
I0929 11:55:10.256392       1 worker.go:1099] excludeHostFromClickHouseCluster():going to exclude host 1 shard 0 cluster gid
I0929 11:55:10.256420       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:55:10.651725       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:10.847886       1 worker.go:389] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:55:10.859857       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.859903       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.862438       1 worker.go:393] gidplatform-dev/gid/click-reconcile-1:Update users IPS-1
I0929 11:55:11.249384       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:55:11.887237       1 worker.go:1203] shouldWaitExcludeHost():wait to exclude host fallback to operator's settings. host 1 shard 0 cluster gid
I0929 11:55:11.896425       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:16.902829       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:21.913913       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:26.921150       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:31.928701       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:36.936718       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:41.945459       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:46.954333       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:51.962841       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:56.971440       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:01.978083       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:06.984911       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:11.996098       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:11.996147       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:17.002241       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:17.002279       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:22.008717       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:22.008762       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:27.015747       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:27.015810       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:32.024632       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:32.024713       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:37.037036       1 schemer.go:137] IsHostInCluster():The host 0-1 is outside of the cluster
I0929 11:56:37.037107       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:37.037132       1 worker.go:1015] worker.go:1002:excludeHost():end:exclude host end
I0929 11:56:37.037189       1 worker.go:1020] worker.go:1020:completeQueries():start:complete queries start
I0929 11:56:37.037281       1 worker.go:1220] shouldWaitQueries():Will wait for queries to complete according to CHOp config 'reconcile.host.wait.queries' setting. Host is not yet in the cluster. Host/shard/cluster: 1/0/gid
I0929 11:56:37.037353       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:37.041809       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:37.041827       1 worker.go:1024] worker.go:1021:completeQueries():end:complete queries end
I0929 11:56:37.048773       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-deploy-confd-gid-0-1
I0929 11:56:37.098510       1 worker-chi-reconciler.go:716] reconcileHost():Reconcile PVCs and check possible data loss for host: 0-1
I0929 11:56:37.119348       1 worker-chi-reconciler.go:406] worker-chi-reconciler.go:406:reconcileHostStatefulSet():start:reconcile StatefulSet start
I0929 11:56:37.119427       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:37.123489       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:37.127378       1 worker-chi-reconciler.go:412] reconcileHostStatefulSet():Reconcile host: 0-1. ClickHouse version: 24.2.1.2248
I0929 11:56:37.131620       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-1
I0929 11:56:37.131650       1 worker-chi-reconciler.go:425] reconcileHostStatefulSet():Reconcile host: 0-1. Reconcile StatefulSet
I0929 11:56:37.133351       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-1:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:37.168247       1 worker-chi-reconciler.go:445] worker-chi-reconciler.go:407:reconcileHostStatefulSet():end:reconcile StatefulSet end
I0929 11:56:37.653395       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/chi-gid-gid-0-1. Will try to update
I0929 11:56:37.849923       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:38.491295       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:38.491349       1 worker-chi-reconciler.go:461] reconcileHostService():DONE Reconcile service of the host: 0-1
I0929 11:56:38.491418       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:38.495556       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:38.495593       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:38.495629       1 worker-chi-reconciler.go:753] reconcileHost():Check host for ClickHouse availability before migrating tables. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:56:38.495686       1 worker.go:908] migrateTables():No need to add tables on host 1 to shard 0 in cluster gid
I0929 11:56:38.495706       1 worker.go:1057] includeHost():Include into cluster host 1 shard 0 cluster gid
I0929 11:56:38.495726       1 worker.go:1124] includeHostIntoClickHouseCluster():going to include host 1 shard 0 cluster gid
I0929 11:56:38.495737       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:56:38.654056       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:56:39.689499       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:39.689543       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:39.711932       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:39.711952       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:39.713061       1 worker.go:387] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-1 gid
I0929 11:56:39.851639       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:39.853763       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:39.853841       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:39.853942       1 worker-chi-reconciler.go:776] reconcileHost():Reconcile Host completed. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:56:40.449305       1 worker.go:389] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:56:40.460088       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:40.460129       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:40.462470       1 worker.go:393] gidplatform-dev/gid/click-reconcile-1:Update users IPS-1
I0929 11:56:40.849312       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:56:41.078096       1 worker-chi-reconciler.go:797] reconcileHost():[now: 2024-09-29 11:56:41.078003076 +0000 UTC m=+530649.174408624] ProgressHostsCompleted: 2 of 2
I0929 11:56:43.083018       1 worker-chi-reconciler.go:581] Starting rest of shards on workers: 1
I0929 11:56:43.249032       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:56:43.885956       1 worker-deleter.go:43] clean():gidplatform-dev/gid/click-reconcile-1:remove items scheduled for deletion
I0929 11:56:44.481307       1 worker-deleter.go:46] clean():gidplatform-dev/gid/click-reconcile-1:List of objects which have failed to reconcile:
I0929 11:56:44.481378       1 worker-deleter.go:47] clean():gidplatform-dev/gid/click-reconcile-1:List of successfully reconciled objects:
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-0-0
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-1-0
StatefulSet: gidplatform-dev/chi-gid-gid-0-1
StatefulSet: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/clickhouse-gid
Service: gidplatform-dev/chi-gid-gid-0-1
ConfigMap: gidplatform-dev/chi-gid-common-configd
ConfigMap: gidplatform-dev/chi-gid-common-usersd
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-0
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-1
PDB: gidplatform-dev/gid-gid
I0929 11:56:45.252969       1 worker-deleter.go:50] clean():gidplatform-dev/gid/click-reconcile-1:Existing objects:
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-0-0
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-1-0
PDB: gidplatform-dev/gid-gid
StatefulSet: gidplatform-dev/chi-gid-gid-0-0
StatefulSet: gidplatform-dev/chi-gid-gid-0-1
ConfigMap: gidplatform-dev/chi-gid-common-configd
ConfigMap: gidplatform-dev/chi-gid-common-usersd
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-0
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-1
Service: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/chi-gid-gid-0-1
Service: gidplatform-dev/clickhouse-gid
I0929 11:56:45.253123       1 worker-deleter.go:52] clean():gidplatform-dev/gid/click-reconcile-1:Non-reconciled objects:
I0929 11:56:45.253195       1 worker-deleter.go:68] worker-deleter.go:68:dropReplicas():start:gidplatform-dev/gid/click-reconcile-1:drop replicas based on AP
I0929 11:56:45.253260       1 worker-deleter.go:80] worker-deleter.go:80:dropReplicas():end:gidplatform-dev/gid/click-reconcile-1:processed replicas: 0
I0929 11:56:45.253308       1 worker.go:640] addCHIToMonitoring():gidplatform-dev/gid/click-reconcile-1:add CHI to monitoring
I0929 11:56:45.885652       1 worker.go:595] worker.go:595:waitForIPAddresses():start:gidplatform-dev/gid/click-reconcile-1:wait for IP addresses to be assigned to all pods
I0929 11:56:45.893820       1 worker.go:600] gidplatform-dev/gid/click-reconcile-1:all IP addresses are in place
I0929 11:56:45.893858       1 worker.go:673] worker.go:673:finalizeReconcileAndMarkCompleted():start:gidplatform-dev/gid/click-reconcile-1:finalize reconcile
I0929 11:56:45.904253       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:45.904335       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:45.904391       1 controller.go:617] OK update watch (gidplatform-dev/gid): {"namespace":"gidplatform-dev","name":"gid","labels":{"argocd.argoproj.io/instance":"bi-clickhouse-dev","k8slens-edit-resource-version":"v1"},"annotations":{},"clusters":[{"name":"gid","hosts":[{"name":"0-0","hostname":"chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local","tcpPort":9000,"httpPort":8123},{"name":"0-1","hostname":"chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local","tcpPort":9000,"httpPort":8123}]}]}
I0929 11:56:45.906676       1 worker.go:677] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-2 gid
I0929 11:56:46.249853       1 worker.go:679] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-2 finalize reconcile gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:56:46.261380       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:46.261442       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:46.263792       1 worker.go:683] gidplatform-dev/gid/click-reconcile-1:Update users IPS-2
I0929 11:56:46.449574       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:56:47.495545       1 worker.go:707] finalizeReconcileAndMarkCompleted():gidplatform-dev/gid/click-reconcile-1:reconcile completed successfully, task id: click-reconcile-1
I0929 11:56:48.077981       1 worker-chi-reconciler.go:134] worker-chi-reconciler.go:60:reconcileCHI():end:gidplatform-dev/gid/click-reconcile-1
I0929 11:56:48.078036       1 worker.go:469] worker.go:432:updateCHI():end:gidplatform-dev/gid/click-reconcile-1

Not sure what going wrong, but on host chi-gid-gid-0-0-0 even no databases copied. And still present only single default.

@Hubbitus
Copy link
Author

@alex-zaitsev, could you please look on it?

@Slach
Copy link
Collaborator

Slach commented Oct 14, 2024

I0929 11:55:05.495957 [worker.go:908] migrateTables():No need to add tables on host 0 to shard 0 in cluster gid

I0929 11:56:38.495686 [worker.go:908] migrateTables():No need to add tables on host 1 to shard 0 in cluster gid

@Hubbitus is your cluster have 2 shards with only 1 replica inside shard?

Could you share:
kubectl get chi -n gidplatform-de gid -o yaml
without sensitive credentials?

@Hubbitus
Copy link
Author

@Slach, thanks to response.
We do not use sharding yet.

Output of kubectl get chi -n gidplatform-dev gid -o yaml:
chi.yaml.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants