Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate of S3 transactions for pgBackRest backups increase over time #3960

Open
3 of 5 tasks
JJGadgets opened this issue Jul 23, 2024 · 2 comments
Open
3 of 5 tasks

Rate of S3 transactions for pgBackRest backups increase over time #3960

JJGadgets opened this issue Jul 23, 2024 · 2 comments

Comments

@JJGadgets
Copy link

Please ensure you do the following when reporting a bug:

  • Provide a concise description of what the bug is.
  • Provide information about your environment.
  • Provide clear steps to reproduce the bug.
  • Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
  • Ensure any code / output examples are properly formatted for legibility.

Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTER-NAME>/pg_log directory on your Postgres instance.

An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.

If you are looking for general support, please view the support page for where you can ask questions.

Thanks for reporting the issue, we're looking forward to helping you!

Overview

I have observed that after a while of a PostgresCluster being applied and running on my Kubernetes homelab cluster, my pgBackRest bucket on Cloudflare R2 will consume more and more transactions each month. More context in the Additional Information section below.

Environment

Please provide the following details:

  • Platform: (Kubernetes, OpenShift,Rancher, GKE, EKS, AKS etc.): Kubernetes on Talos Linux
  • Platform Version: (e.g. 1.20.3, 4.7.0): 1.29.2
  • PGO Image Tag: (e.g. ubi8-5.x.y-0): ubi8-16.2-0
  • Postgres Version (e.g. 15): 16
  • Storage: (e.g. hostpath, nfs, or the name of your storage class): local-path for Postgres PVC, R2 for WAL destination.

Steps to Reproduce

REPRO

Provide steps to get to the error condition:

  1. Apply PostgresCluster with pgBackRest repos pointed to an R2 bucket.
  2. Set up R2 transaction count alerts.
  3. Leave the PostgresCluster running for a few months at almost similar load across the whole duration.
  4. Check R2 transaction count alerts and check the date within the month of each alert.

EXPECTED

  1. Transaction count and rate remains constant across months for roughly the same amount of database operations each month.
  2. Stay within R2 free tier for transaction count.

ACTUAL

  1. R2 transaction count goes past the free tier.
  2. Rate of transactions increases the longer the PostgresCluster age.

Logs

Unsure of what logs would be relevant to this issue. Advice on what logs to drill down on would be helpful.

R2 dashboard only shows transaction count up to a week without upgrading account plans, and this issue's timeframe is mainly in the unit of months, not days or weeks.

Additional Information

R2:
Class A is mainly for uploads (CRUD), Class B is mainly for downloads. Class A has a free tier of 1 million transactions, Class B's free tier is 10 million.

My R2 transaction count alerts are triggering at earlier days within each month as the PostgresCluster ages, suggesting that the rate of transactions increases as the months go by. I have the alerts in a Discord channel and can screenshot them if they would be helpful at all, but I doubt it.

I have wiped and restored the PostgresCluster using the pgBackRest dataSource multiple times just because of this issue which then brings the rate of transaction count back down, then after 2 months or so I start going past the R2 free tier again and have to wipe and restore again. This cycle repeats itself and has repeated at least 3 times.

PostgresCluster resource manifest (managed by FluxCD GitOps):
https://github.com/JJGadgets/Biohazard/blob/4035c729132335ed4bab1ca4010c029a6db1c338/kube/deploy/core/db/pg/clusters/template/crunchy.yaml#L48-L143

@JJGadgets
Copy link
Author

@joryirving and @drewburr are also experiencing similar issues, we've discussed about this and couldn't come up with a reason or solution.

@tjmoore4
Copy link
Contributor

tjmoore4 commented Aug 7, 2024

@JJGadgets Thanks for the detailed explanation. A couple of suggestions.

The first would be to look closely at the pgBackRest logs. Based on your linked cluster manifest, you don't have a repo host Pod enabled, so the relevant logs should be located in /pgdata/pgbackrest/log on your primary Pod. Those logs may give you a clue as to what might be happening. You could also increase your logging detail to give more information.

I also noticed your R2 repo (repo2) configuration is set to take full and incremental backups on a schedule, but it seems your retention settings are only for full and differential. Perhaps adding an incr retention policy might help in this case. Hope these suggestions help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants