Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timers "leaking" ? #69

Open
wbednarczyk opened this issue Apr 30, 2020 · 3 comments
Open

Timers "leaking" ? #69

wbednarczyk opened this issue Apr 30, 2020 · 3 comments

Comments

@wbednarczyk
Copy link

wbednarczyk commented Apr 30, 2020

Hi,
On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear:
"failed to create timer: too many pending timers".
I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like.
It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time.
After ca. hour in logs got values like this:

Current runing timers -2129 (??!!!)
Current penging timers 4096
failed to spawn health checker: failed to create timer: too many pending timers,

I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly.
Is this a known error? Can it be avoided somehow?
Any help / insights would be very appreciated.

Our config looks like this:

       local hc = require "resty.upstream.healthcheck"
 
        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "application_karaf",
            type = "http",
            http_req = "GET /tenant/health HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 3000, timeout = 1500, fall = 3, rise = 2,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            ngx.log(ngx.ERR, "Current penging timers ", ngx.timer.pending_count())
            ngx.log(ngx.ERR, "Current runing timers ", ngx.timer.running_count())
            return
        end
@wbednarczyk wbednarczyk changed the title Timers Timers "leaking" ? Apr 30, 2020
@spacewander
Copy link
Member

A weird problem.
The spawn_checker is expected to be called once, which will tick a new timer at constant interval. Do you call this method multiple times?

@rainingmaster
Copy link
Member

@wbednarczyk
Could you share your Nginx config more? It seem you run spawn_checkerwhich will create a forever timer in per request(content_by_lua*, rewrite_by_lua*) instead of init_worker_by_lua_block ?

Beside, I think you can share your idea or ask question in the official form: https://forum.openresty.us/t/en-discussion

@wbednarczyk
Copy link
Author

wbednarczyk commented May 13, 2020

Hi,
sorry for late response.

@spacewander It seems yes. What would be proper way to use spawn_checker with multiple healthchecks?
@rainingmaster Below more of our configs.

I would really appreciate any help on that topic. Thanks in advance!

EDIT:
I re-read your comments, and I think you suggest that running multiple spawn_checker have to be inside init_worker_by_lua_block like said in https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/README.markdown#multiple-upstreams . Am I right here?

  • our healthcheck.lua actually contains 5 defined healthchecks in a way shown below (I'm not copying all file, because further down it contains some chef lines for templating)
        local hc = require "resty.upstream.healthcheck"

        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "karaf",
            type = "http",
            http_req = "GET /tenant/health?hch1 HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 4000, timeout = 1500, fall = 3, rise = 5,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
                ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
                return
        end

        local ok, err = hc.spawn_checker{
            shm = "healthcheck2",
            upstream = "karaf_with_ip_hash",
            type = "http",
            http_req = "GET /tenant/health?hch2 HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 4000, timeout = 1500, fall = 3, rise = 5,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
                ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
                return
        end
...

and fragment of our nginx.conf which loads lua script:

user nginx;
worker_processes 2;

...

http {
                lua_max_pending_timers 8192;
                lua_shared_dict healthcheck1 1m;
                lua_shared_dict healthcheck2 1m;
                lua_shared_dict healthcheck3 1m;
                lua_shared_dict healthcheck4 1m;
                lua_shared_dict healthcheck5 1m;

                lua_socket_log_errors off;
                access_by_lua_file /etc/nginx/healthcheck.lua;
}
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants