The ability to define `GUNICORN_WORKERS` and `CONN_MAX_AGE` env vars via helm chart's values #74

Bregor · 2018-08-20T15:49:59Z

Following this code without ability to set GUNICORN_WORKERS we have number of workers equal to number_of_cpus*4+1.
Django's default behaviour is to open connection per worker for GUNICORN_WORKERS times and keep them opened for CONN_MAX_AGE seconds.
CONN_MAX_AGE is strictly set right in code for 600 seconds and max_connections parameter of PostgreSQL in deis-database strictly set to 100 too.
So, lets imagine, we have a big metal chassis with 40 CPUs.
Our GUNICORN_WORKERS variable will be equal to 40*4+1=161 (which is already greater than 100), also, we have CONN_MAX_AGE parameter equal to 600 seconds.
This way after 500 seconds (health check is once per 5 seconds, 100 times) queries like SELECT 0 which are results of health check will own all 100 connections to PostgreSQL.

The text was updated successfully, but these errors were encountered:

kingdonb · 2018-08-20T16:53:34Z

So do you need to change CONN_MAX_AGE? I'm not sure why

It's clear at least that you probably don't really want 161 workers on a 24-CPU controller node, unless it's intended to serve a lot of devs concurrently... but I'm not sure how changing CONN_MAX_AGE will help anything.

You need to reduce the number of workers, or increase the number of allowed connections.

It makes sense to me that, any workers that can't get a database handle cannot help at all. Timing them out won't make things better (well, maybe that's wrong... even with 161 healthy workers, you probably wouldn't want 161 idle database connections to remain persistent at all times unless you actually had 161 currently active workers serving client requests.)

kingdonb · 2018-08-20T16:57:57Z

I don't see any reason we can't add both of these to controller params, I'm just not sure if we should also add a configuration option for the number of connections allowed to the database while we're in there. (You might actually want to serve thousands of concurrent app developers, and maybe the default value of GUNICORN_WORKERS is actually what you wanted.)

Probably that is a separate issue, though.

Cryptophobia · 2018-08-21T10:05:46Z

@Bregor So you are currently setting using ENV vars on the deis-controller deployment object to fix this issue? We should alternately allow users to be able to define GUNICORN_WORKERS and CONN_MAX_AGE on the helm values passed to the workflow install.

However, are the default values set correct in most cases when there are not that many users?

Bregor · 2018-08-21T10:33:36Z

you are currently setting using ENV vars on the deis-controller deployment object to fix this issue

Yes. Deis database is a server with only one client (deis-controller), so this client should respect its settings. There should not be possible to run more gunicorn workers than max_connections in deis-database.

However, are the default values set correct in most cases when there are not that many users?

TL;DR: for now nodes can not contains more than 24 CPU

It's pretty easy to count:

healthcheck period: 5 seconds
TTL of pooled connection: 600 seconds
actual limit of connections: 100
current number of workers: cpu_count * 4 + 1

This way every 5 seconds new worker became alive and is trying to run SELECT 0 amongst deis-database.
While there is no pooled connections aged more than 600 seconds AND pool size (which is equal to GUNICORN_WORKERS) still has open slots, it will open new connection for 600 seconds.
While we run one worker per 5 seconds, there will be 120 workers (600/5) before first connection in pool will be freed.
Considering formula for calculation of current number of workers and fact we can not change connection TTL, there is strict limit: 24 CPUs per node for deis-controller to work.

kingdonb · 2018-08-24T21:56:59Z

Do you think this is an important knob? I kind of think we should just set a cap for the maximum allowed value of GUNICORN_WORKERS, unless we're also going to allow users to tune max_connections in deis-database.

Been reading this: https://github.com/concourse/concourse/wiki/Anti-Patterns#knobs

kingdonb · 2018-08-24T21:58:55Z

I'd guess 64 as a reasonable high-end limit. If we're making it tuneable, it should be tuneable both ways.

kingdonb · 2018-08-25T00:08:50Z

(Personally I use deis-database for something other than deis-controller so I don't want it to consume all 100 connections, if the limit is 100. I know that's against the advice, but I'm not sure it is documented anywhere as such...)

Cryptophobia · 2018-08-25T03:04:38Z

unless we're also going to allow users to tune max_connections in deis-database.

Doesn't max_connections in deis-database also depend on where you are running the postgresql database? For example, you can tune max_connection setting in AWS RDS and restart the db. This seems like a very custom request to support larger servers. I support making it tunable but having sane defaults for users who are not going to go that deep at first.

Bregor · 2018-08-25T10:07:30Z

Tuneable max_connections is a cherry on a cake. Most of the time it's absolutely useless. Rephrasing B. Gates: "100 connections are enough for everyone" ;)
With short-living pooled connections, 100 actual connection is pretty enough to serve hundreds of developers on site.

kingdonb · 2018-08-25T17:26:06Z

I'm just trying to avoid adding an unnecessary knob, because it's possible to tune a knob wrong.

I'm not actually in favor of adding either knob, if we can simply write a rule that will guarantee it never fails on machines with >24 CPUs. That's a good point though, we can't always guarantee max_connections will be defaulted to 100, from a quick Google it actually looks like the default setting of max_connections on RDS postgres is...

uhh, parameterized[1] based on the amount of memory in your database? Well that's super... my idea is therefore out the window. There is no rule we could write that is guaranteed to be correct, not at least without additional knowledge about your off-cluster databases, if that's the case.

I'll consider that as due-diligence done, I'm satisfied that we should add a way to tune GUNICORN_WORKERS and CONN_MAX_AGE, and skip any further consideration about max_connections for now, since nobody is really here asking for that.

kingdonb · 2018-08-25T17:39:59Z

started in #75

kingdonb · 2018-08-25T18:21:20Z

I haven't tested teamhephy/workflow#66 yet, but it should work

kingdonb linked a pull request Aug 25, 2018 that will close this issue

Override GUNICORN_WORKERS and CONN_MAX_AGE teamhephy/workflow#66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The ability to define `GUNICORN_WORKERS` and `CONN_MAX_AGE` env vars via helm chart's values #74

The ability to define `GUNICORN_WORKERS` and `CONN_MAX_AGE` env vars via helm chart's values #74

Bregor commented Aug 20, 2018

kingdonb commented Aug 20, 2018 •

edited

Loading

kingdonb commented Aug 20, 2018

Cryptophobia commented Aug 21, 2018

Bregor commented Aug 21, 2018 •

edited

Loading

kingdonb commented Aug 24, 2018

kingdonb commented Aug 24, 2018

kingdonb commented Aug 25, 2018

Cryptophobia commented Aug 25, 2018

Bregor commented Aug 25, 2018

kingdonb commented Aug 25, 2018 •

edited

Loading

kingdonb commented Aug 25, 2018

kingdonb commented Aug 25, 2018

The ability to define GUNICORN_WORKERS and CONN_MAX_AGE env vars via helm chart's values #74

The ability to define GUNICORN_WORKERS and CONN_MAX_AGE env vars via helm chart's values #74

Comments

Bregor commented Aug 20, 2018

kingdonb commented Aug 20, 2018 • edited Loading

kingdonb commented Aug 20, 2018

Cryptophobia commented Aug 21, 2018

Bregor commented Aug 21, 2018 • edited Loading

kingdonb commented Aug 24, 2018

kingdonb commented Aug 24, 2018

kingdonb commented Aug 25, 2018

Cryptophobia commented Aug 25, 2018

Bregor commented Aug 25, 2018

kingdonb commented Aug 25, 2018 • edited Loading

kingdonb commented Aug 25, 2018

kingdonb commented Aug 25, 2018

The ability to define `GUNICORN_WORKERS` and `CONN_MAX_AGE` env vars via helm chart's values #74

The ability to define `GUNICORN_WORKERS` and `CONN_MAX_AGE` env vars via helm chart's values #74

kingdonb commented Aug 20, 2018 •

edited

Loading

Bregor commented Aug 21, 2018 •

edited

Loading

kingdonb commented Aug 25, 2018 •

edited

Loading