Not a valid WebSocket upgrade request crashes #149

pvthuyen · 2023-05-19T09:50:20Z

Apologize if this is probably not the best place to raise my question.

We’ve recently experimented on using Bandit instead of Cowboy for our Websocket service built on Elixir. With just a small percentage of our traffic, we saw a lot of crashes saying “Not a valid WebSocket upgrade request”. We followed the instructions for migration in https://hexdocs.pm/bandit/1.0.0-pre.1/Bandit.html.

Are there any other configuration which we should do? Is this possible that this issue has always been happening and Cowboy does not raise it but Bandit does?

mtrudel · 2023-05-19T13:58:48Z

Interesting! If you're seeing that message, then you're ending up in a situation where your Plug has indicated an upgrade should be performed, but the underlying HTTP request isn't a valid WebSocket upgrade (due to how all the pieces of Plug fit together we can't identify this situation until it's too late for your Plug call to do anything about it, hence the crash).

Offhand, I don't know if Cowboy may have always silently been failing in a similar manner. As a general observation, Cowboy tends to err in the direction of being silent, where Bandit has an explicit policy of being noisy in the face of unexpected situations. If this is the case, you certainly wouldn't be the first person to discover that you'd always had a broken setup and Cowboy was just being quiet about it.

Some questions to try and diagnose this:

What is your Plug? If this is Phoenix, do you have any non-standard socket configurations in your endpoint?
Is there anything in the logs adjacent to these messages that may provide some clues?
What are your clients? Are they connecting to Bandit directly, or through a reverse proxy or similar?
If it's possible to share logs (or even better, if this is an open source code base), any other clues you have would possibly help immensely.

My guesses here are (in no particular order):

You have a busted client that's trying to upgrade but is somehow failing (again, this may have been happening forever under Cowboy)
You have a client that's trying to hit a websocket upgrade endpoint as if it were a regular HTTP endpoint
There's some oddball combination of protocols here (connecting HTTP to an HTTPS endpoint, etc)
There's a bug in Bandit's upgrade handling (frankly I would be surprised by this; Autobahn is a very comprehensive test suite and we run it on every CI run, but it's certainly possible)

pvthuyen · 2023-05-22T07:54:55Z

Hey @mtrudel, thanks a lot for your prompt response:

To answer your questions:

We're using Phoenix and we do not have any non-standard socket configuration
This is the log I found which I believe they're related to the crahses:

07:44:32.443 [error] GenServer #PID<0.3352.0> terminating
** (stop) "Not a valid WebSocket upgrade request"
Last message: {:continue, :handle_connection}

We're using the phoenix-js npm package for our frontend apps. All connections are from users' browsers, the connection goes through a Cloudflare Load Balancer, then to our Google Cloud Load Balancer, and finally to our VMs. Only the traffic between our GC Load Balancer and our VMs is served on HTTP
We also saw these errors on Javascript frontend around the same time if it helps:

SecurityError
Failed to construct 'WebSocket': Access to address of 'websocket.example.com' is not allowed from 'public' address space.

SecurityError
Failed to construct 'WebSocket': Access to this address is not allowed.

It seems like we only saw this happening on a few users as I cannot reproduce the error on my side.

Thanks a lot for helping us out and please let me know if you need any other information.

mtrudel · 2023-05-23T21:29:23Z

I've added improved logging for the upgrade failure reason in 1.0.0-pre.4 (just released). Give it a try @pvthuyen and see what comes out in the logs (the improved logging will appear in place where ** (stop) "Not a valid WebSocket upgrade request" appeared previously).

pvthuyen · 2023-05-24T06:42:07Z

Thanks @mtrudel. All of the crashes now gives ** (exit) "WebSocket upgrade failed: error in upgrade_header check".

pvthuyen · 2023-05-24T07:10:53Z

It looks like Cowboy returns 426 responses for requests with wrong Upgrade header and does not crash.
https://github.com/ninenines/cowboy/blob/22adc4de517bf54661795f6e18b6b2e62aec4140/src/cowboy_websocket.erl#L147

mtrudel · 2023-05-24T13:18:51Z

Interesting! 426 is certainly the wrong choice there (the semantics of 426 refer to the current protocol, not the desired protocol outlined in an Upgrade header); 400 continues to be the most reasonable choice, so in terms of response codes we're doing it right (at least IMHO).

In terms of crashing, there's a tricky balance here. If the client is indeed attempting to upgrade to a WebSocket connection then it's not a great idea to attempt to reuse the underlying TCP connection via keepalives, so closing the underlying connection is likely a good idea. We also want to notify via logging that an error occurred so we can debug issues (like this one!). That being said we can do a better job of doing this without crashing; I'll work up a better approach for the next release.

Regardless, it would seem that your upstream client is doing something wrong here (that, or we have a bug!). To diagnose this, you'll want to find out the header values that the broken client(s) are sending when attempting to upgrade, which is probably going to be difficult for you given that you've mentioned it's not reproducible. I'll make sure that the improvements I land in the next version capture this logging as well as possible. Stay tuned.

pvthuyen · 2023-05-25T06:56:18Z

Thanks a lot. I think this issues have always been happening for us since I recalled seeing a lot of 426 responses in our services. We're using the phoenix-js npm package on our frontends with no other special configuration so I'm gonna guess it could be because of something on the user's end, e.g. extensions, configuration.

I'm going to ignore this crashes for now to reduce noise on our side first.

woylie · 2023-06-13T07:10:16Z

We're seeing a whole bunch of ** (exit) "WebSocket upgrade failed: error in upgrade_header check" errors as well (Bandit 1.0.0-pre.6). I don't know what to do about them, but they do clog up our error logs. If anybody knows how to deal with this, please let me know.

mtrudel · 2023-06-14T18:29:46Z

I've pushed improved logging for this to main and have released it as part of 1.0.0-pre.7. Give it a shot and let me know what the improved error messages are; this should help narrow down the culprit!

woylie · 2023-06-15T06:03:39Z

Now we're getting this:

** (exit) "WebSocket upgrade failed: error in upgrade_header check: \"Did not find upgrade in websocket\""

mtrudel · 2023-06-15T12:44:08Z

Ugh. I switched the needle and the haystack values in the logging output; those logs aren't helpful. Let me cut a new version. Sorry for the trouble.

mtrudel · 2023-06-15T12:55:29Z

1.0.0-pre.8 just released with updated logging (again!). Give that a shot.

woylie · 2023-06-16T00:22:59Z

Now we get this:

** (exit) "WebSocket upgrade failed: error in upgrade_header check: \"Did not find 'websocket' in ''\"")

🤔

mtrudel · 2023-06-16T01:24:58Z

There you go. Your clients aren't sending a proper websocket upgrade (per RFC6455§4.2.1 clause 3, the Upgrade header MUST contain websocket, and your clients are not doing so).

My hunch here is that you've either got you last hop load balancers misconfigured, or (more likely), your clients are hitting a websocket upgrade path as a regular HTTP request. Anything in your logs to suggest either?

woylie · 2023-06-16T02:05:39Z

This is a Phoenix/LiveView application. Websockets work fine, so I don't think it's a misconfiguration.

Some clients seem to do GET requests on /live/websocket indeed. Among the user agents are some bots like bingbot or semrush bot, but also mobile safari or LINE. I don't know why the latter would make those requests. The app works fine in mobile safari as well, we only see these errors occasionally.

It seems like this is something outside of our control, so I don't think we need to handle this with exceptions that end up in Sentry. Do you think it would be better if bandit handles this silently?

mtrudel · 2023-06-16T15:48:41Z

Some clients seem to do GET requests on /live/websocket indeed. Among the user agents are some bots like bingbot or semrush bot, but also mobile safari or LINE. I don't know why the latter would make those requests. The app works fine in mobile safari as well, we only see these errors occasionally.

This is very much expected for normal WebSocket upgrade calls (after all, all WebSockets start as GETs). The problem is what happens when a client connects expecting a regular HTTP response, but the server mistakenly tries to upgrade the connection to a WebSocket. This is what you're seeing here.

This is happening because Phoenix/WebsockAdapter is pretty loosey-goosey with what it tries to upgrade (basically, it tries to upgrade any GET request to a designated socket endpoint, as codified here). What it should be doing is actually validating the clauses in the RFC6455§4.2.1 handshake before signalling an upgrade to the underlying server, because once such an upgrade is signalled there's no way for the user to express any control over what happens in the case of a failed upgrade.

In light of this Bandit chooses to fail upgrade requests loudly (as you're seeing), in keeping with our stated goal of not codifying policy. Beyond this particular issue, this goal is proving to be a bit of a pain as it's exposing places where the Phoenix / Plug / Cowboy interface was leaking and nobody noticed since Cowboy's usual behaviour is to silently fail (examples #144, #106, #101). Thanks for bearing with the pain of helping to plug (ha!) all these leaky abstractions.

I'm going to close this issue and start on a PR to add the relevant improvements described above to WebSockAdapter. I'll keep this issue linked to those changes.

Thanks again for your patience and diagnostic work on this!

mtrudel mentioned this issue May 23, 2023

Improve WebSocket upgrade error checking #152

Merged

mtrudel mentioned this issue May 24, 2023

content-encoding negotiation should be disabled when the response headers already contain content-encoding #151

Closed

mtrudel closed this as completed Jun 16, 2023

mtrudel mentioned this issue Jun 16, 2023

Provide a mechanism to validate upgrade requests before issuing upgrades phoenixframework/websock_adapter#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not a valid WebSocket upgrade request crashes #149

Not a valid WebSocket upgrade request crashes #149

pvthuyen commented May 19, 2023

mtrudel commented May 19, 2023 •

edited

Loading

pvthuyen commented May 22, 2023 •

edited

Loading

mtrudel commented May 23, 2023

pvthuyen commented May 24, 2023

pvthuyen commented May 24, 2023

mtrudel commented May 24, 2023

pvthuyen commented May 25, 2023

woylie commented Jun 13, 2023

mtrudel commented Jun 14, 2023

woylie commented Jun 15, 2023

mtrudel commented Jun 15, 2023

mtrudel commented Jun 15, 2023

woylie commented Jun 16, 2023

mtrudel commented Jun 16, 2023

woylie commented Jun 16, 2023 •

edited

Loading

mtrudel commented Jun 16, 2023 •

edited

Loading

Not a valid WebSocket upgrade request crashes #149

Not a valid WebSocket upgrade request crashes #149

Comments

pvthuyen commented May 19, 2023

mtrudel commented May 19, 2023 • edited Loading

pvthuyen commented May 22, 2023 • edited Loading

mtrudel commented May 23, 2023

pvthuyen commented May 24, 2023

pvthuyen commented May 24, 2023

mtrudel commented May 24, 2023

pvthuyen commented May 25, 2023

woylie commented Jun 13, 2023

mtrudel commented Jun 14, 2023

woylie commented Jun 15, 2023

mtrudel commented Jun 15, 2023

mtrudel commented Jun 15, 2023

woylie commented Jun 16, 2023

mtrudel commented Jun 16, 2023

woylie commented Jun 16, 2023 • edited Loading

mtrudel commented Jun 16, 2023 • edited Loading

mtrudel commented May 19, 2023 •

edited

Loading

pvthuyen commented May 22, 2023 •

edited

Loading

woylie commented Jun 16, 2023 •

edited

Loading

mtrudel commented Jun 16, 2023 •

edited

Loading