-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new chart for ScalarDB Analytics with PostgreSQL #242
Conversation
version: 1.0.0-SNAPSHOT | ||
appVersion: 3.10.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this time, ScalarDB Analytics with PostgreSQL does not release the SNAPSHOT
version. So, I set the latest stable version.
However, we are working on releasing the SNAPSHOT
version on the ScalarDB Analytics with PostgreSQL side. In the future, we will set the SNAPSHOT
version here.
@@ -0,0 +1,54 @@ | |||
# scalardb-analytics-postgresql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is automatically generated based on the values.yaml
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. How is this file generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can generate this file by using helm-docs
.
https://github.com/norwoodj/helm-docs
Also, in our repository, you can run the helm-docs
by using the following script.
https://github.com/scalar-labs/helm-charts/blob/main/scripts/update-chart-docs.sh
scalardbAnalyticsPostgreSQL: | ||
databaseProperties: | | ||
scalar.db.storage=jdbc | ||
scalar.db.contact_points=jdbc:postgresql://postgresql.default.svc.cluster.local:5432/postgres | ||
scalar.db.username=postgres | ||
scalar.db.password=postgres | ||
|
||
schemaImporter: | ||
namespaces: | ||
- ct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a custom values file for the testing in the CI.
entrypoint.sh: | | ||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this time, I create the entorypoint.sh
on the helm chart side. Schema Importer container mounts this file and runs it as an entrypoint.
The main purpose of this shell is to implement the retry
process for Schema Importer.
We need this file on the helm chart side to run the existing stable versions (v3.10) images.
However, I will create this entrypoint.sh
on the Schema Imorter container image side in the future. After that (maybe after v3.11), we can use the etntrypoint.sh
that is included in the container image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this runs Schema Importer every time the pod starts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's right. For example, we run Schema Importer in the following cases:
- Deploy pods.
- Pods restart for some reason.
- Scale out the pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the response. I have one more question: why is it necessary to include the Schema Importer in the pod? I'm considering the possibility of running the Schema Importer separately, outside of the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brfrn169
Thank you for your question!
In conclusion, I want to deploy ScalarDB Analytics with PostgreSQL as a Stateless
workload to reduce maintenance costs and complex configurations, at the moment. This is why I run the Schema Importer as a sidecar in the pod.
For example, I want to avoid manually running the Schema Importer in the following cases.
- Existing pods crash or restart
- Scale pods (new pods start)
Challenges
ScalarDB Schema Loader and ScalarDL Schema Loader create database schemas (i.e., create some tables or objects) on the backend database side. In other words, those objects are persisted by the backend database. So, basically, ScalarDB/ScalarDL has no state in themselves.
However, Schema Importer creates some objects (e.g., foreign servers, extensions, and views) on the ScalarDB Analytics with PostgreSQL side. In other words, strictly, ScalarDB Analytics with PostgreSQL has states. It's a Stateful
workload.
So, if the pod is restarted for some reason, the loaded objects are lost. In this case, we have to re-run Schema Importer to re-load all objects on the PostgreSQL.
As well as the pods restart, we have to run the Schema Importer if we scale out (add a new pod) to create some objects in the new pod (in the new PostgreSQL). This is because each pod (each PostgreSQL) has objects (e.g., views) respectively.
Solution 1 (make it Stateful
workload)
To address the above challenges, we can deploy ScalarDB Analytics with PostgreSQL as a Stateful
workload by using StatefulSet
which is one of the Kubernetes resources.
In this case, the objects (foreign server, extension, and views) are stored and persisted in the PV (persistent volume) which is attached to the pod.
So, we don't need to re-run Schema Importer if the pods crash/restart. However, we still have to run Schema Importer manually when we scale out pods.
Also, in this solution, we have to use a bit more complex configurations for instance StatefulSet
and PersistentVolume
, rather than we deploy it as a Stateless
workload by using Deployment
. It takes maintenance costs.
In addition, in this case, we have to consider the backup/restore of the PersistentVolume
. It increases operation costs.
So, I want to avoid this solution if I can.
Solution 2 (run Schema Importer manually every time)
This is a simple (but not easy) solution. We can deploy ScalarDB Analytics with PostgreSQL as a Stateless
workload with Deployment
, and run Schema Importer manually every time we need.
However, this solution increases the operation costs on the user side. And, this solution cannot take advantage of the self-healing (automatically pod restart when some failure occurs) feature of Kubernetes well.
Solution 3 (run Schema Importer as a sidecar / our choice)
To resolve the challenges with the smallest additional costs, we decided to run Schema Importer as a sidecar in the pod startup step.
In this case, the pod runs Schema Importer automatically when pods crash/restart or a new pod is added. So, we can avoid manually running Schema Importer.
Also, in this case, we don't need to store/persist some objects in the PersistentVolume
because the pod runs Schema Importer every time on its startup. We can avoid additional costs to maintain the Stateful
workload.
This is why I decided to run Schema Importer as a sidecar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Thank you for the explanation!
- secretRef: | ||
name: "{{ .Values.scalardbAnalyticsPostgreSQL.postgresql.secretName }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the superuser's password of PostgreSQL via a Secret resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use env
because I don't know the key name that should be included in the secret from envFrom (although it is case by case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. In this case, the environment variable name is fixed POSTGRES_PASSWORD
. It depends on the PostgreSQL official container image.
In other words, users cannot use arbitrary environment variable names, and there is no special reason that I use envFrom
here.
So, I will update to use env
. Thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 8af1a27.
periodSeconds: 10 | ||
successThreshold: 1 | ||
timeoutSeconds: 1 | ||
- name: schema-importer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the PR description, this chart run Schema Importer
as a sidecar.
- configMap: | ||
defaultMode: 0440 | ||
name: {{ include "scalardb-analytics-postgresql.fullname" . }}-database-properties | ||
name: database-properties-volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScalarDB Analytics with PostgreSQL
container and Schema Importer
container share the same database.properties
file.
@@ -0,0 +1,239 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is automatically generated based on the values.yaml
file.
# -- To work ScalarDB Analytics with PostgreSQL properly, you must set "201" to "podSecurityContext.fsGroup". | ||
fsGroup: 201 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run the entorypoint.sh
by scalar (UID=201)
user in the Schema Importer container, we have to mount the entrypoint.sh
file with 201:201
configuration as a file owner configuration. So, we have to set fsGroup=201
here.
# -- Containers should be run as a non-root user with the minimum required permissions (principle of least privilege). | ||
runAsNonRoot: true | ||
# -- The PostgreSQL official image use the "postgres (UID=999)" user by default. | ||
runAsUser: 999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PostgreSQL container image sets non-root user postgres
with UID=999
. To run the container with the non-root user properly, we have to set UID=999
here.
https://github.com/scalar-labs/docker/blob/main/jdk-postgres/8-15/Dockerfile#L14
version: 1.0.0-SNAPSHOT | ||
appVersion: 3.10.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this time, ScalarDB Analytics with PostgreSQL does not release the SNAPSHOT
version. So, I set the latest stable version.
However, we are working on releasing the SNAPSHOT
version on the ScalarDB Analytics with PostgreSQL side. In the future, we will set the SNAPSHOT
version here.
Co-authored-by: Mitsunori Komatsu <[email protected]>
Co-authored-by: Mitsunori Komatsu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Left several minor comments. Please take a look when you have time!
entrypoint.sh: | | ||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this runs Schema Importer every time the pod starts?
scalar.db.storage=cassandra | ||
``` | ||
|
||
### Namespaces configurations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause confusion between ScalarDB's namespace and Kubernetes's namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I agree with that concern.
I will update the documents.
Thank you for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the document to describe the namespace
is the database namespace
in this context explicitly in 831fa61.
@brfrn169 |
Co-authored-by: Josh Wong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!👍 I've added some comments and suggestions, so PTAL!
Co-authored-by: Josh Wong <[email protected]>
Co-authored-by: Josh Wong <[email protected]>
Co-authored-by: Josh Wong <[email protected]>
Co-authored-by: Josh Wong <[email protected]>
Co-authored-by: Josh Wong <[email protected]>
@komamitsu @josh-wong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one minor suggestion for something that I didn't catch before. Other than that, LGTM! Thank you🙇♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Co-authored-by: Josh Wong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good to me! Thanks! I left several questions. I would appreciate it if you could take a look.
@@ -0,0 +1,54 @@ | |||
# scalardb-analytics-postgresql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. How is this file generated?
|
||
```yaml | ||
scalardbAnalyticsPostgreSQL: | ||
replicaCount: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this setting create the Postgres instances with replication? Or does it just make multiple instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting just makes multiple instances without streaming replication or logical replication. At the moment, we cannot control the replication or HA feature of PostgreSQL by using this chart. We just deploy it as a single instance or multiple instances.
As I mentioned, we cannot control HA features, however, I think ScalarDB Analytics with PostgreSQL is basically a read-only
product for the analytical workload. So, I don't think we need to use the replication feature at this time.
But, from the perspective of availability, I think we can make it more available by deploying 3 pods (ideally across on 3 zones in a cloud environment).
This is why I set 3
by default in this chart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@choplin
Thank you for your review and questions!
I left answers. Please take a look when you have time!
@@ -0,0 +1,54 @@ | |||
# scalardb-analytics-postgresql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can generate this file by using helm-docs
.
https://github.com/norwoodj/helm-docs
Also, in our repository, you can run the helm-docs
by using the following script.
https://github.com/scalar-labs/helm-charts/blob/main/scripts/update-chart-docs.sh
|
||
```yaml | ||
scalardbAnalyticsPostgreSQL: | ||
replicaCount: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting just makes multiple instances without streaming replication or logical replication. At the moment, we cannot control the replication or HA feature of PostgreSQL by using this chart. We just deploy it as a single instance or multiple instances.
As I mentioned, we cannot control HA features, however, I think ScalarDB Analytics with PostgreSQL is basically a read-only
product for the analytical workload. So, I don't think we need to use the replication feature at this time.
But, from the perspective of availability, I think we can make it more available by deploying 3 pods (ideally across on 3 zones in a cloud environment).
This is why I set 3
by default in this chart.
Description
This PR adds a new helm chart for
ScalarDB Analytics with PostgreSQL
!Deploying
ScalarDB Analytics with PostgreSQL
on the Kubernetes environment manually takes a bit of time and effort. Instead of that, this chart can help users to deploy ScalarDB Analytics with PostgreSQL on the Kubernetes environment.Related issues and/or PRs
N/A
Changes made
Checklist
Additional notes (optional)
Regarding getting started guide
I will create
Getting started guide
in another PR after related issue scalar-labs/scalardb-analytics-postgresql#44 fixed onScalarDB Analytics with PostgreSQL
side.Overview of what you need to run
ScalarDB Analytics with PostgreSQL
In arbitrary platforms including other environments than Kubernetes, when you run
ScalarDB Analytics with PostgreSQL
, you need to run the following two steps.Before running
ScalarDB Analytics with PostgreSQL
(there are several backend databases).Run
ScalarDB Analytics with PostgreSQL
container as a first step.Run Schema Importer against the ScalarDB Analytics with PostgreSQL container to load some objects into PostgreSQL as a second step.
These are the general way to run
ScalarDB Analytics with PostgreSQL
.How this chart deploy
ScalarDB Analytics with PostgreSQL
In this chart, it combines
ScalarDB Analytics with PostgreSQL
andSchema Importer
into one Pod. And, runSchema Importer
automatically in the pod.Before running
ScalarDB Analytics with PostgreSQL
pod (there are several backend databases).Run
ScalarDB Analytics with PostgreSQL
pod.Automatically, run Schema Importer against the
ScalarDB Analytics with PostgreSQL
container in the pod.If Schema Importer fails because PostgreSQL is not started yet,
entrypoint.sh
retries to run theSchema Importer
several times (10 times by default).After Schema Importer succeeds, the Schema Importer container will sleep endlessly (run the
sleep inf
command).Release notes
Add a new chart for ScalarDB Analytics with PostgreSQL. By using this new chart, you can deploy ScalarDB Analytics with PostgreSQL on the Kubernetes environment.