Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new chart for ScalarDB Analytics with PostgreSQL #242

Merged
merged 32 commits into from
Dec 18, 2023

Conversation

kota2and3kan
Copy link
Collaborator

@kota2and3kan kota2and3kan commented Nov 7, 2023

Description

This PR adds a new helm chart for ScalarDB Analytics with PostgreSQL!

Deploying ScalarDB Analytics with PostgreSQL on the Kubernetes environment manually takes a bit of time and effort. Instead of that, this chart can help users to deploy ScalarDB Analytics with PostgreSQL on the Kubernetes environment.

Related issues and/or PRs

N/A

Changes made

  • Add a new chart (new resource manifests).
  • Add new test for this new chart in the CI.
  • Add new documents that describes how to use this new chart.
  • Update existing documents to add the descriptions of this new chart.

Checklist

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

Regarding getting started guide

I will create Getting started guide in another PR after related issue scalar-labs/scalardb-analytics-postgresql#44 fixed on ScalarDB Analytics with PostgreSQL side.

Overview of what you need to run ScalarDB Analytics with PostgreSQL

In arbitrary platforms including other environments than Kubernetes, when you run ScalarDB Analytics with PostgreSQL, you need to run the following two steps.

  1. Before running ScalarDB Analytics with PostgreSQL (there are several backend databases).

    +-------------------+
    | Backend databases |
    +-------------------+
  2. Run ScalarDB Analytics with PostgreSQL container as a first step.

    +-------------------+      +------------------------------------+
    | Backend databases | <--- | ScalarDB Analytics with PostgreSQL |
    +-------------------+      +------------------------------------+
  3. Run Schema Importer against the ScalarDB Analytics with PostgreSQL container to load some objects into PostgreSQL as a second step.

    +-------------------+      +------------------------------------+                       +-----------------+
    | Backend databases | <--- | ScalarDB Analytics with PostgreSQL | <---(load objects)--- | Schema Importer |
    +-------------------+      +------------------------------------+                       +-----------------+

These are the general way to run ScalarDB Analytics with PostgreSQL.

How this chart deploy ScalarDB Analytics with PostgreSQL

In this chart, it combines ScalarDB Analytics with PostgreSQL and Schema Importer into one Pod. And, run Schema Importer automatically in the pod.

  1. Before running ScalarDB Analytics with PostgreSQL pod (there are several backend databases).

    +-------------------+
    | Backend databases |
    +-------------------+
  2. Run ScalarDB Analytics with PostgreSQL pod.

                               +--[Pod]-----------------------------------------+
                               |                                                |
                               |  +------------------------------------+        |
                               |  | ScalarDB Analytics with PostgreSQL |        |
    +-------------------+      |  +------------------------------------+        |
    | Backend databases | <--- |                                                |
    +-------------------+      |  +-----------------+                           |
                               |  | Schema Importer |                           |
                               |  +-----------------+                           |
                               |                                                |
                               +------------------------------------------------+
  3. Automatically, run Schema Importer against the ScalarDB Analytics with PostgreSQL container in the pod.

                               +--[Pod]-----------------------------------------+
                               |                                                |
                               |  +------------------------------------+        |
                               |  | ScalarDB Analytics with PostgreSQL | <---+  |
    +-------------------+      |  +------------------------------------+     |  |
    | Backend databases | <--- |                                             |  |
    +-------------------+      |  +-----------------+                        |  |
                               |  | Schema Importer | ---(load objects)------+  |
                               |  +-----------------+                           |
                               |                                                |
                               +------------------------------------------------+
  4. If Schema Importer fails because PostgreSQL is not started yet, entrypoint.sh retries to run the Schema Importer several times (10 times by default).

                               +--[Pod]-----------------------------------------+
                               |                                                |
                               |  +------------------------------------+        |
                               |  | ScalarDB Analytics with PostgreSQL | <---+  |
    +-------------------+      |  +------------------------------------+     |  |
    | Backend databases | <--- |                                             |  |
    +-------------------+      |  +-----------------+                        |  |
                               |  | Schema Importer | ---(retry if failed)---+  |
                               |  +-----------------+                           |
                               |                                                |
                               +------------------------------------------------+
  5. After Schema Importer succeeds, the Schema Importer container will sleep endlessly (run the sleep inf command).

                               +--[Pod]-----------------------------------------+
                               |                                                |
                               |  +------------------------------------+        |
                               |  | ScalarDB Analytics with PostgreSQL |        |
    +-------------------+      |  +------------------------------------+        |
    | Backend databases | <--- |                                                |
    +-------------------+      |  +-----------------------------+               |
                               |  | Schema Importer (sleep inf) |               |
                               |  +-----------------------------+               |
                               |                                                |
                               +------------------------------------------------+

Release notes

Add a new chart for ScalarDB Analytics with PostgreSQL. By using this new chart, you can deploy ScalarDB Analytics with PostgreSQL on the Kubernetes environment.

@kota2and3kan kota2and3kan added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 7, 2023
@kota2and3kan kota2and3kan self-assigned this Nov 7, 2023
@kota2and3kan kota2and3kan added the scalardb analytics postgresql PR for ScalarDB Analytics with PostgreSQL label Nov 15, 2023
Comment on lines 5 to 6
version: 1.0.0-SNAPSHOT
appVersion: 3.10.2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this time, ScalarDB Analytics with PostgreSQL does not release the SNAPSHOT version. So, I set the latest stable version.

However, we are working on releasing the SNAPSHOT version on the ScalarDB Analytics with PostgreSQL side. In the future, we will set the SNAPSHOT version here.

@@ -0,0 +1,54 @@
# scalardb-analytics-postgresql
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is automatically generated based on the values.yaml file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. How is this file generated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can generate this file by using helm-docs.
https://github.com/norwoodj/helm-docs

Also, in our repository, you can run the helm-docs by using the following script.
https://github.com/scalar-labs/helm-charts/blob/main/scripts/update-chart-docs.sh

Comment on lines +1 to +10
scalardbAnalyticsPostgreSQL:
databaseProperties: |
scalar.db.storage=jdbc
scalar.db.contact_points=jdbc:postgresql://postgresql.default.svc.cluster.local:5432/postgres
scalar.db.username=postgres
scalar.db.password=postgres

schemaImporter:
namespaces:
- ct
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a custom values file for the testing in the CI.

Comment on lines +17 to +18
entrypoint.sh: |
#!/bin/bash
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this time, I create the entorypoint.sh on the helm chart side. Schema Importer container mounts this file and runs it as an entrypoint.

The main purpose of this shell is to implement the retry process for Schema Importer.

We need this file on the helm chart side to run the existing stable versions (v3.10) images.

However, I will create this entrypoint.sh on the Schema Imorter container image side in the future. After that (maybe after v3.11), we can use the etntrypoint.sh that is included in the container image.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this runs Schema Importer every time the pod starts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's right. For example, we run Schema Importer in the following cases:

  • Deploy pods.
  • Pods restart for some reason.
  • Scale out the pods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the response. I have one more question: why is it necessary to include the Schema Importer in the pod? I'm considering the possibility of running the Schema Importer separately, outside of the pod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169
Thank you for your question!

In conclusion, I want to deploy ScalarDB Analytics with PostgreSQL as a Stateless workload to reduce maintenance costs and complex configurations, at the moment. This is why I run the Schema Importer as a sidecar in the pod.

For example, I want to avoid manually running the Schema Importer in the following cases.

  • Existing pods crash or restart
  • Scale pods (new pods start)

Challenges

ScalarDB Schema Loader and ScalarDL Schema Loader create database schemas (i.e., create some tables or objects) on the backend database side. In other words, those objects are persisted by the backend database. So, basically, ScalarDB/ScalarDL has no state in themselves.

However, Schema Importer creates some objects (e.g., foreign servers, extensions, and views) on the ScalarDB Analytics with PostgreSQL side. In other words, strictly, ScalarDB Analytics with PostgreSQL has states. It's a Stateful workload.

So, if the pod is restarted for some reason, the loaded objects are lost. In this case, we have to re-run Schema Importer to re-load all objects on the PostgreSQL.

As well as the pods restart, we have to run the Schema Importer if we scale out (add a new pod) to create some objects in the new pod (in the new PostgreSQL). This is because each pod (each PostgreSQL) has objects (e.g., views) respectively.

Solution 1 (make it Stateful workload)

To address the above challenges, we can deploy ScalarDB Analytics with PostgreSQL as a Stateful workload by using StatefulSet which is one of the Kubernetes resources.

In this case, the objects (foreign server, extension, and views) are stored and persisted in the PV (persistent volume) which is attached to the pod.

So, we don't need to re-run Schema Importer if the pods crash/restart. However, we still have to run Schema Importer manually when we scale out pods.

Also, in this solution, we have to use a bit more complex configurations for instance StatefulSet and PersistentVolume, rather than we deploy it as a Stateless workload by using Deployment. It takes maintenance costs.

In addition, in this case, we have to consider the backup/restore of the PersistentVolume. It increases operation costs.

So, I want to avoid this solution if I can.

Solution 2 (run Schema Importer manually every time)

This is a simple (but not easy) solution. We can deploy ScalarDB Analytics with PostgreSQL as a Stateless workload with Deployment, and run Schema Importer manually every time we need.

However, this solution increases the operation costs on the user side. And, this solution cannot take advantage of the self-healing (automatically pod restart when some failure occurs) feature of Kubernetes well.

Solution 3 (run Schema Importer as a sidecar / our choice)

To resolve the challenges with the smallest additional costs, we decided to run Schema Importer as a sidecar in the pod startup step.

In this case, the pod runs Schema Importer automatically when pods crash/restart or a new pod is added. So, we can avoid manually running Schema Importer.

Also, in this case, we don't need to store/persist some objects in the PersistentVolume because the pod runs Schema Importer every time on its startup. We can avoid additional costs to maintain the Stateful workload.

This is why I decided to run Schema Importer as a sidecar.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thank you for the explanation!

Comment on lines 52 to 53
- secretRef:
name: "{{ .Values.scalardbAnalyticsPostgreSQL.postgresql.secretName }}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the superuser's password of PostgreSQL via a Secret resource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use env because I don't know the key name that should be included in the secret from envFrom (although it is case by case).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. In this case, the environment variable name is fixed POSTGRES_PASSWORD. It depends on the PostgreSQL official container image.

In other words, users cannot use arbitrary environment variable names, and there is no special reason that I use envFrom here.

So, I will update to use env. Thank you for your suggestion!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8af1a27.

periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
- name: schema-importer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the PR description, this chart run Schema Importer as a sidecar.

Comment on lines +132 to +135
- configMap:
defaultMode: 0440
name: {{ include "scalardb-analytics-postgresql.fullname" . }}-database-properties
name: database-properties-volume
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ScalarDB Analytics with PostgreSQL container and Schema Importer container share the same database.properties file.

@@ -0,0 +1,239 @@
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is automatically generated based on the values.yaml file.

Comment on lines 75 to 76
# -- To work ScalarDB Analytics with PostgreSQL properly, you must set "201" to "podSecurityContext.fsGroup".
fsGroup: 201
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run the entorypoint.sh by scalar (UID=201) user in the Schema Importer container, we have to mount the entrypoint.sh file with 201:201 configuration as a file owner configuration. So, we have to set fsGroup=201 here.

# -- Containers should be run as a non-root user with the minimum required permissions (principle of least privilege).
runAsNonRoot: true
# -- The PostgreSQL official image use the "postgres (UID=999)" user by default.
runAsUser: 999
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostgreSQL container image sets non-root user postgres with UID=999. To run the container with the non-root user properly, we have to set UID=999 here.
https://github.com/scalar-labs/docker/blob/main/jdk-postgres/8-15/Dockerfile#L14

Comment on lines +5 to +6
version: 1.0.0-SNAPSHOT
appVersion: 3.10.3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this time, ScalarDB Analytics with PostgreSQL does not release the SNAPSHOT version. So, I set the latest stable version.

However, we are working on releasing the SNAPSHOT version on the ScalarDB Analytics with PostgreSQL side. In the future, we will set the SNAPSHOT version here.

Copy link

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Left several minor comments. Please take a look when you have time!

Comment on lines +17 to +18
entrypoint.sh: |
#!/bin/bash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this runs Schema Importer every time the pod starts?

scalar.db.storage=cassandra
```

### Namespaces configurations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause confusion between ScalarDB's namespace and Kubernetes's namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I agree with that concern.
I will update the documents.
Thank you for pointing it out!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the document to describe the namespace is the database namespace in this context explicitly in 831fa61.

@kota2and3kan
Copy link
Collaborator Author

@brfrn169
Thank you for your review!
I updated the documents and left an answer to your question.
Please take a look again when you have time!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!👍 I've added some comments and suggestions, so PTAL!

docs/mount-files-or-volumes-on-scalar-pods.md Outdated Show resolved Hide resolved
charts/scalardb-analytics-postgresql/values.yaml Outdated Show resolved Hide resolved
charts/scalardb-analytics-postgresql/values.yaml Outdated Show resolved Hide resolved
charts/scalardb-analytics-postgresql/values.yaml Outdated Show resolved Hide resolved
charts/scalardb-analytics-postgresql/values.yaml Outdated Show resolved Hide resolved
docs/how-to-deploy-scalardb-analytics-postgresql.md Outdated Show resolved Hide resolved
@kota2and3kan
Copy link
Collaborator Author

@komamitsu @josh-wong
Thank you for your review!
I applied your suggestions.
Please take a look when you have time!

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one minor suggestion for something that I didn't catch before. Other than that, LGTM! Thank you🙇‍♂️

charts/scalardb-analytics-postgresql/values.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link

@choplin choplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me! Thanks! I left several questions. I would appreciate it if you could take a look.

@@ -0,0 +1,54 @@
# scalardb-analytics-postgresql
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. How is this file generated?


```yaml
scalardbAnalyticsPostgreSQL:
replicaCount: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this setting create the Postgres instances with replication? Or does it just make multiple instances?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting just makes multiple instances without streaming replication or logical replication. At the moment, we cannot control the replication or HA feature of PostgreSQL by using this chart. We just deploy it as a single instance or multiple instances.

As I mentioned, we cannot control HA features, however, I think ScalarDB Analytics with PostgreSQL is basically a read-only product for the analytical workload. So, I don't think we need to use the replication feature at this time.

But, from the perspective of availability, I think we can make it more available by deploying 3 pods (ideally across on 3 zones in a cloud environment).

This is why I set 3 by default in this chart.

Copy link
Collaborator Author

@kota2and3kan kota2and3kan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@choplin
Thank you for your review and questions!
I left answers. Please take a look when you have time!

@@ -0,0 +1,54 @@
# scalardb-analytics-postgresql
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can generate this file by using helm-docs.
https://github.com/norwoodj/helm-docs

Also, in our repository, you can run the helm-docs by using the following script.
https://github.com/scalar-labs/helm-charts/blob/main/scripts/update-chart-docs.sh


```yaml
scalardbAnalyticsPostgreSQL:
replicaCount: 3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting just makes multiple instances without streaming replication or logical replication. At the moment, we cannot control the replication or HA feature of PostgreSQL by using this chart. We just deploy it as a single instance or multiple instances.

As I mentioned, we cannot control HA features, however, I think ScalarDB Analytics with PostgreSQL is basically a read-only product for the analytical workload. So, I don't think we need to use the replication feature at this time.

But, from the perspective of availability, I think we can make it more available by deploying 3 pods (ideally across on 3 zones in a cloud environment).

This is why I set 3 by default in this chart.

@kota2and3kan kota2and3kan merged commit b1e2190 into main Dec 18, 2023
14 checks passed
@kota2and3kan kota2and3kan deleted the add-scalardb-analytics-postgresql-chart branch December 18, 2023 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request scalardb analytics postgresql PR for ScalarDB Analytics with PostgreSQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants