Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new service pgss_dealloc #331

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

anayrat
Copy link
Collaborator

@anayrat anayrat commented Sep 14, 2022

Hello!
This conversation on hackers remind me that pg_stat_statements deallocs should me monitored.

I suggest adding such service.

Cheer

@anayrat
Copy link
Collaborator Author

anayrat commented Sep 14, 2022

If the number of dealloc per second is too low, we can change it to number of dealloc per millisecond.

Copy link
Member

@rjuju rjuju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit dubious about this service. I agree that this is something you should keep you eyes on, but I don't think it would play very well as a check here.

The main problem is that if you schedule the check too frequently it wil be a bit useless. For instance, if you schedule it every 5 minutes, how do you differentiate from "there was once 1 deallocate and then none" from "there is 1 deallocate every 5 minutes" from this service point of view? The only way to know if there's really a problem is either:

  1. the service is constantly raising a problem
  2. the frequency of the service moving from ok to problem is high

But if you're in the first case it's likely that the global performance will immediately drop down by a huge factor, so it's unlikely that you won't notice there's a problem. And the second isn't a good way to spot a problem.

The fact that you only return (and handle thresholds as) a rate and not also the raw number probably exacerbates this problem.

-exitval => 127
) if @hosts != 1;

is_compat $hosts[0], 'check_pg_stat_statements_dealloc', $PG_VERSION_140 or exit 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that having postgres 14 doesn't mean that you updated the pg_stat_statements extension to get the needed field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I will add a test to check pgss' version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added several tests:

  • pg_stat_statements version must be above or equal 1.9
  • pg_stat_statements has been created on target database
  • pg_stat_statements has been loaded in shared_preload_libraries

@anayrat
Copy link
Collaborator Author

anayrat commented Sep 16, 2022

Yeah, I shouldn't report rate as perfdata. I will change it to a counter. But for the threshold, I don't see other way.
We can't use a threshold as a raw value. For example, if you have 1 dealloc every minute, once you reach 100 (if it is your threshold). The check be critical even if there is no more deallocation.

My idea is to, first graph the dealloc rate. For example, it will give you a mean rate of 100 dealloc between 5 minutes. Then, you add a threshold at 500. That means if you reach this threshold, your workload has changed. And, you should understand why you have an increase in dealloc rate before hitting production issue.

@ioguix
Copy link
Member

ioguix commented Nov 29, 2023

Hi guys,

Instead of working on rates, the threshold can apply on delta since last call. I think this is what @anayrat explains in his last message, is it?

So you plan to keep working on it @anayrat?

@anayrat
Copy link
Collaborator Author

anayrat commented Jan 11, 2024

Hello,
Yes, I can use the delta since last call as a threshold.
I will work on this.

@anayrat
Copy link
Collaborator Author

anayrat commented Jan 11, 2024

Hello,
I added more checks, replaced rate by dealloc delta and rephrased service description.
If you want to test :
Run a pgbench with

\set id random(1,1000000)
set client.id = :id

And enable pg_stat_statements.track_utility with a low pg_stat_statements.max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants