Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add detached script to join cluster #363

Merged
merged 5 commits into from
Sep 27, 2023

Conversation

mgagliardo91
Copy link
Contributor

Resolves an issue I was running into related to #304 in which the joining of clusters does not resolve in Kubernetes. After some discovery, it appears that misfire of joining the cluster is likely related to a race condition of the other replicas not being ready to accept the join request. Given that the join request only occurs once, if it misses, it ends up creating a standalone cluster. With a replica set of 3, I was seeing all three sit in their own cluster.

To resolve this, I threw together a script that attempts to join the cluster on an interval after the pod starts up, using the Kube API to recognize that their is other pods to cluster with. The script will attempt to join and check to see if it has successfully joined via the vmq-admin CLI. Once properly joined, it will exit properly. The log of attempts is logged to the /var/log/vernemq/log directory.

@ioolkos
Copy link
Contributor

ioolkos commented Sep 12, 2023

@mgagliardo91 looks great, thanks!
one thing we need to ensure is that this does not affect non-Kubernetes deployments (Swarm, for instance). will review.

bin/vernemq.sh Outdated
@@ -339,4 +339,5 @@ trap 'sigterm_handler' SIGTERM
# Start VerneMQ
/vernemq/bin/vernemq console -noshell -noinput $@ &
pid=$!
/vernemq/bin/join_cluster > /var/log/vernemq/log/join_cluster.log &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking to be sure. Let's say we only want to start a single node in Docker (no Kubernetes). We'll run the join_cluster script so we have to do the right thing for any possible situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, currently the join_cluster script will start, see that they are not in an environment that it would require joining for, and shutdown. But I can also make the join_cluster script execute conditionally? Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ioolkos I went ahead and updated it to conditionally start based on it needing to join a cluster.

Dockerfile Outdated
@@ -14,6 +14,7 @@ ENV DOCKER_VERNEMQ_KUBERNETES_LABEL_SELECTOR="app=vernemq" \
PATH="/vernemq/bin:$PATH" \
VERNEMQ_VERSION="1.13.0"
COPY --chown=10000:10000 bin/vernemq.sh /usr/sbin/start_vernemq
COPY --chown=10000:10000 bin/join_cluster.sh bin/join_cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's copy the script into /usr/sbin alongside the starte_vernemq script

# Let's set our nodename correctly
# https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#list-pod-v1-core
podList=$(k8sCurlGet "api/v1/namespaces/${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}/pods?labelSelector=${DOCKER_VERNEMQ_KUBERNETES_LABEL_SELECTOR}")
kube_pod_names=$(echo ${podList} | jq '.items[].spec.hostname' | sed 's/"//g' | tr '\n' ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgagliardo91 I think we need to add a similar fix to the kube_pod_names= lines as added here: https://github.com/vernemq/docker-vernemq/pull/361/files

Copy link
Contributor

@ioolkos ioolkos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Since this script is super important for all things Docker/K8s, we'll need to give this a bit of post-surgery attention after the merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants