-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Log Index Error in Meta Pod on Restart #16417
Comments
Hello @inviscid! I'm here to help you with any bugs, questions, or contributions you have for the project. Let's tackle this issue together! To resolve the "Log Index Error in Meta Pod on Restart" issue in a 3-pod Kubernetes cluster running Databend version nightly-630, follow these steps:
Here is an example of how the log entries should look: ["header",{"DataHeader":{"key":"header","value":{"version":"V003","upgrading":null}}}]
["raft_state",{"RaftStateKV":{"key":"Id","value":{"NodeId":1}}}]
["raft_state",{"RaftStateKV":{"key":"HardState","value":{"HardState":{"leader_id":{"term":1,"node_id":1},"committed":false}}}}]
["raft_log",{"Logs":{"key":0,"value":{"log_id":{"leader_id":{"term":0,"node_id":0},"index":0},"payload":{"Membership":{"configs":[[1]],"nodes":{"1":{}}}}}}}]
["raft_log",{"Logs":{"key":1,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":1},"payload":"Blank"}}}]
["raft_log",{"Logs":{"key":2,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":2},"payload":{"Normal":{"txid":null,"time_ms":1667290820099,"cmd":{"AddNode":{"node_id":1,"node":{"name":"1","endpoint":{"addr":"localhost","port":28103},"grpc_api_advertise_address":"0.0.0.0:9191"},"overriding":false}}}}}}}]
["raft_log",{"Logs":{"key":3,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":3},"payload":{"Normal":{"txid":null,"time_ms":1667290820429,"cmd":{"AddNode":{"node_id":2,"node":{"name":"2","endpoint":{"addr":"localhost","port":28203},"grpc_api_advertise_address":"0.0.0.0:28202"},"overriding":false}}}}}}}]
["raft_log",{"Logs":{"key":4,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":4},"payload":{"Membership":{"configs":[[1],[1,2]],"nodes":{"1":{},"2":{}}}}}}}]
["raft_log",{"Logs":{"key":5,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":5},"payload":{"Membership":{"configs":[[1,2]],"nodes":{"1":{},"2":{}}}}}}}]
["raft_log",{"Logs":{"key":6,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":6},"payload":{"Normal":{"txid":null,"time_ms":1667290821018,"cmd":{"AddNode":{"node_id":3,"node":{"name":"3","endpoint":{"addr":"localhost","port":28303},"grpc_api_advertise_address":"0.0.0.0:28302"},"overriding":false}}}}}}}]
["raft_log",{"Logs":{"key":7,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":7},"payload":{"Membership":{"configs":[[1,2],[1,2,3]],"nodes":{"1":{},"2":{},"3":{}}}}}}}]
["raft_log",{"Logs":{"key":8,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":8},"payload":{"Membership":{"configs":[[1,2,3]],"nodes":{"1":{},"2":{},"3":{}}}}}}}]
["raft_log",{"Logs":{"key":9,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":9},"payload":{"Normal":{"txid":null,"time_ms":1667290824580,"cmd":{"UpsertKV":{"key":"__fd_id_gen/database_id","seq":"Any","value":{"Update":[]},"value_meta":null}}}}}}}]
["raft_log",{"Logs":{"key":10,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":10},"payload":{"Normal":{"txid":null,"time_ms":1667290824586,"cmd":{"Transaction":{"condition":[{"key":"__fd_database/test_tenant/default","expected":0,"target":{"Seq":0}},{"key":"__fd_database_id_to_name/1","expected":0,"target":{"Seq":0}},{"key":"__fd_db_id_list/test_tenant/default","expected":0,"target":{"Seq":0}}],"if_then":[{"request":{"Put":{"key":"__fd_database/test_tenant/default","value":[49],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_database_by_id/1","value":[162,1,30,50,48,50,50,45,49,49,45,48,49,32,48,56,58,50,48,58,50,52,46,53,55,48,53,56,54,32,85,84,67,170,1,30,50,48,50,50,45,49,49,45,48,49,32,48,56,58,50,48,58,50,52,46,53,55,48,53,56,54,32,85,84,67,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_db_id_list/test_tenant/default","value":[10,1,1,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_database_id_to_name/1","value":[10,11,116,101,115,116,95,116,101,110,97,110,116,18,7,100,101,102,97,117,108,116,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}}],"else_then":[]}}}}}}}]
["raft_log",{"Logs":{"key":11,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":11},"payload":{"Normal":{"txid":null,"time_ms":1667290824603,"cmd":{"UpsertKV":{"key":"__fd_clusters/test_tenant/test_cluster/databend_query/RpuWndTf5JlgyJCpAAtQX6","seq":{"Exact":0},"value":{"Update":[123,34,105,100,34,58,34,82,112,117,87,110,100,84,102,53,74,108,103,121,74,67,112,65,65,116,81,88,54,34,44,34,99,112,117,95,110,117,109,115,34,58,48,44,34,118,101,114,115,105,111,110,34,58,48,44,34,102,108,105,103,104,116,95,97,100,100,114,101,115,115,34,58,34,49,50,55,46,48,46,48,46,49,58,57,48,57,48,34,44,34,98,105,110,97,114,121,95,118,101,114,115,105,111,110,34,58,34,118,48,46,56,46,57,53,45,110,105,103,104,116,108,121,45,97,52,49,56,55,100,51,40,114,117,115,116,45,49,46,54,54,46,48,45,110,105,103,104,116,108,121,45,50,48,50,50,45,49,49,45,48,49,84,48,56,58,49,55,58,51,54,46,55,48,49,55,54,53,90,41,34,125]},"value_meta":{"expire_at":1667290884}}}}}}}}]
["raft_log",{"Logs":{"key":12,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":12},"payload":{"Normal":{"txid":null,"time_ms":1667290826333,"cmd":{"UpsertKV":{"key":"__fd_clusters/test_tenant/test_cluster/databend_query/KMZ4VvqDFVExlZFThKDzZ1","seq":{"Exact":0},"value":{"Update":[123,34,105,100,34,58,34,75,77,90,52,86,118,113,68,70,86,69,120,108,90,70,84,104,75,68,122,90,49,34,44,34,99,112,117,95,110,117,109,115,34,58,48,44,34,118,101,114,115,105,111,110,34,58,48,44,34,102,108,105,103,104,116,95,97,100,100,114,101,115,115,34,58,34,49,50,55,46,48,46,48,46,49,58,57,48,57,50,34,44,34,98,105,110,97,114,121,95,118,101,114,115,105,111,110,34,58,34,118,48,46,56,46,57,53,45,110,105,103,104,116,108,121,45,97,52,49,56,55,100,51,40,114,117,115,116,45,49,46,54,54,46,48,45,110,105,103,104,116,108,121,45,50,48,50,50,45,49,49,45,48,49,84,48,56,58,49,55,58,51,54,46,55,48,49,55,54,53,90,41,34,125]},"value_meta":{"expire_at":1667290886}}}}}}}}]
["raft_log",{"Logs":{"key":13,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":13},"payload":{"Normal":{"txid":null,"time_ms":1667290826867,"cmd":{"UpsertKV":{"key":"__fd_clusters/test_tenant/test_cluster/databend_query/7GVP1GsQ2jpDMu1ti8UnF1","seq":{"Exact":0},"value":{"Update":[123,34,105,100,34,58,34,55,71,86,80,49,71,115,81,50,106,112,68,77,117,49,116,105,56,85,110,70,49,34,44,34,99,112,117,95,110,117,109,115,34,58,48,44,34,118,101,114,115,105,111,110,34,58,48,44,34,102,108,105,103,104,116,95,97,100,100,114,101,115,115,34,58,34,49,50,55,46,48,46,48,46,49,58,57,48,57,51,34,44,34,98,105,110,97,114,121,95,118,101,114,115,105,111,110,34,58,34,118,48,46,56,46,57,53,45,110,105,103,104,116,108,121,45,97,52,49,56,55,100,51,40,114,117,115,116,45,49,46,54,54,46,48,45,110,105,103,104,116,108,121,45,50,48,50,50,45,49,49,45,48,49,84,48,56,58,49,55,58,51,54,46,55,48,49,55,54,53,90,41,34,125]},"value_meta":{"expire_at":1667290886}}}}}}}}]
["raft_log",{"Logs":{"key":14,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":14},"payload":{"Normal":{"txid":null,"time_ms":1667290836132,"cmd":{"UpsertKV":{"key":"__fd_id_gen/database_id","seq":"Any","value":{"Update":[]},"value_meta":null}}}}}}}]
["raft_log",{"Logs":{"key":15,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":15},"payload":{"Normal":{"txid":null,"time_ms":1667290836137,"cmd":{"Transaction":{"condition":[{"key":"__fd_database/test_tenant/db1","expected":0,"target":{"Seq":0}},{"key":"__fd_database_id_to_name/9","expected":0,"target":{"Seq":0}},{"key":"__fd_db_id_list/test_tenant/db1","expected":0,"target":{"Seq":0}}],"if_then":[{"request":{"Put":{"key":"__fd_database/test_tenant/db1","value":[57],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_database_by_id/9","value":[42,7,100,101,102,97,117,108,116,162,1,30,50,48,50,50,45,49,49,45,48,49,32,48,56,58,50,48,58,51,53,46,54,51,52,49,50,51,32,85,84,67,170,1,30,50,48,50,50,45,49,49,45,48,49,32,48,56,58,50,48,58,51,53,46,54,51,52,49,50,54,32,85,84,67,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_db_id_list/test_tenant/db1","value":[10,1,9,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}},{"request":{"Put":{"key":"__fd_database_id_to_name/9","value":[10,11,116,101,115,116,95,116,101,110,97,110,116,18,3,100,98,49,160,6,19,168,6,1],"prev_value":true,"expire_at":null}}}],"else_then":[]}}}}}}}]
["raft_log",{"Logs":{"key":16,"value":{"log_id":{"leader_id":{"term":1,"node_id":0},"index":16},"payload":{"Normal":{"txid":null,"time_ms":1667290838209,"cmd":{"UpsertKV":{"key":"__fd_id_gen/table_id","seq":"Any","value":{"Update":[]},"value_meta":null}}}}}}}]
bootstrap: true
replicaCount: 3
persistence:
size: 8Gi
config:
logDir: /.databend/logs
logPrefixFilter: ""
logLevel: "DEBUG"
If the issue persists, you may need to investigate further into the specific log entries causing the error and ensure that the Raft state is correctly maintained across all nodes in the cluster [1][2][3]. |
Do you have a DEBUG level log so that I can see what's going on. And if possible, please provide the backup file you were using to restore and the command to restore and the config file that is used when bringing up the meta node. |
I'm assembling an active case of this with debug logs. The startup looks like this:
|
This is not currently taking place after a restore. A restore was performed some time in the past. The cluster runs for a few days, then eventually, we will end up in this state. |
Here is a small chunk of log data that may be relevant:
|
|
The state machine seems incomplete from the log. There are at least one membership log in it. But it is empty from these two logs: Can you provide the full debug log since the server starts? There should be some log about the initial state machine state just when meta-service started. |
|
This is complete startup logs. Until we have an error. |
I have some suspicion that the issue here might be related to the way the bootstrap process works. This is the initialization script that is included with all databend meta pods from the helm chart: echo "check if my service is running and run commands ";
if [[ $POD_NAME =~ (.*)-([0-9]+)$ ]]
then
ID=${BASH_REMATCH[2]}
NAME=${BASH_REMATCH[1]}
echo "ID: $ID" "NAME : $NAME"
{{- if $bootstrap }}
case ${BASH_REMATCH[2]} in
0)
echo "initialize leader node";
exec /databend-meta --id ${ID} --config-id ${ID} \
--single \
--raft-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--grpc-api-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--config-file /etc/databend/meta.yaml
;;
*)
echo "initialize follower node";
exec /databend-meta --id ${ID} --config-id ${ID} \
--join "${NAME}-0.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local:28004" \
--raft-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--grpc-api-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--config-file /etc/databend/meta.yaml
;;
esac
{{- else }}
exec /databend-meta --id ${ID} --config-id ${ID} \
{{- range $i := until $replicaCount }}
--join "${NAME}-{{ $i }}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local:28004" \
{{- end }}
--raft-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--grpc-api-advertise-host "${POD_NAME}.{{ $fullName }}.${POD_NAMESPACE}.svc.cluster.local" \
--config-file /etc/databend/meta.yaml
{{- end }}
else
echo pod name $POD_NAME is not valid && exit 1
fi |
If the nodes have done a failover, node 0 would not be the master node anymore. But we are bootstrapping it as if it was. |
This script might need changed to solve which node is the master and initialize it with variables instead of using the node indices. |
I'm testing if this is true at the moment, will report findings. |
For some context here, hopefully to explain why it might run fine for a few days and then enter into an unhealthy state, we are currently testing using SPOT instances to simulate pod failures over time. It may be that the pod is being restarted after a delay and is no longer in-sync with the rest of the meta pods in the cluster. I don't know if this is helpful information or irrelevant but just thought I'd let you know. Our expectation was that meta pod loss and recovery should be transparent to the overall Databend process and that a meta pod rejoining the meta cluster would rejoin and sync up as quickly as it could and then remain a follower until elected leader in the future. We may be misinterpreting the logic but it seems like node 0 will try to become the leader even if another leader is present in the meta cluster already. |
It looks like the snapshot once was removed from this node, which leads to the following error:
If there is a Leader in the cluster that contains complete data. This node can be rebuilt with the following steps:
|
@drmingdrmer this fix does work to fix the error. There is some more we should discuss though:
|
I'm not sure about this issue. It hasn't occurred in any of our other deployments.
What do you mean about
Backing up the dir does work. You are very likely backing up a inconsistent database, since a running databend-meta may be updating the dir parallelly. |
Search before asking
Version
nightly-630
What's Wrong?
Upon restart of a meta pod, we receive the following error in the meta logs:
The meta cluster is a 3 pod cluster running in kubernetes. We haven't observed this issue until recently.
How to Reproduce?
Restart meta pod after recovering from backup.
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: