-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not a valid WebSocket upgrade request crashes #149
Comments
Interesting! If you're seeing that message, then you're ending up in a situation where your Plug has indicated an upgrade should be performed, but the underlying HTTP request isn't a valid WebSocket upgrade (due to how all the pieces of Plug fit together we can't identify this situation until it's too late for your Plug call to do anything about it, hence the crash). Offhand, I don't know if Cowboy may have always silently been failing in a similar manner. As a general observation, Cowboy tends to err in the direction of being silent, where Bandit has an explicit policy of being noisy in the face of unexpected situations. If this is the case, you certainly wouldn't be the first person to discover that you'd always had a broken setup and Cowboy was just being quiet about it. Some questions to try and diagnose this:
My guesses here are (in no particular order):
|
Hey @mtrudel, thanks a lot for your prompt response: To answer your questions:
It seems like we only saw this happening on a few users as I cannot reproduce the error on my side. Thanks a lot for helping us out and please let me know if you need any other information. |
I've added improved logging for the upgrade failure reason in 1.0.0-pre.4 (just released). Give it a try @pvthuyen and see what comes out in the logs (the improved logging will appear in place where |
Thanks @mtrudel. All of the crashes now gives |
It looks like Cowboy returns 426 responses for requests with wrong |
Interesting! 426 is certainly the wrong choice there (the semantics of 426 refer to the current protocol, not the desired protocol outlined in an Upgrade header); 400 continues to be the most reasonable choice, so in terms of response codes we're doing it right (at least IMHO). In terms of crashing, there's a tricky balance here. If the client is indeed attempting to upgrade to a WebSocket connection then it's not a great idea to attempt to reuse the underlying TCP connection via keepalives, so closing the underlying connection is likely a good idea. We also want to notify via logging that an error occurred so we can debug issues (like this one!). That being said we can do a better job of doing this without crashing; I'll work up a better approach for the next release. Regardless, it would seem that your upstream client is doing something wrong here (that, or we have a bug!). To diagnose this, you'll want to find out the header values that the broken client(s) are sending when attempting to upgrade, which is probably going to be difficult for you given that you've mentioned it's not reproducible. I'll make sure that the improvements I land in the next version capture this logging as well as possible. Stay tuned. |
Thanks a lot. I think this issues have always been happening for us since I recalled seeing a lot of 426 responses in our services. We're using the I'm going to ignore this crashes for now to reduce noise on our side first. |
We're seeing a whole bunch of |
I've pushed improved logging for this to |
Now we're getting this:
|
Ugh. I switched the needle and the haystack values in the logging output; those logs aren't helpful. Let me cut a new version. Sorry for the trouble. |
|
Now we get this:
🤔 |
There you go. Your clients aren't sending a proper websocket upgrade (per RFC6455§4.2.1 clause 3, the My hunch here is that you've either got you last hop load balancers misconfigured, or (more likely), your clients are hitting a websocket upgrade path as a regular HTTP request. Anything in your logs to suggest either? |
This is a Phoenix/LiveView application. Websockets work fine, so I don't think it's a misconfiguration. Some clients seem to do GET requests on It seems like this is something outside of our control, so I don't think we need to handle this with exceptions that end up in Sentry. Do you think it would be better if bandit handles this silently? |
This is very much expected for normal WebSocket upgrade calls (after all, all WebSockets start as GETs). The problem is what happens when a client connects expecting a regular HTTP response, but the server mistakenly tries to upgrade the connection to a WebSocket. This is what you're seeing here. This is happening because Phoenix/WebsockAdapter is pretty loosey-goosey with what it tries to upgrade (basically, it tries to upgrade any GET request to a designated socket endpoint, as codified here). What it should be doing is actually validating the clauses in the RFC6455§4.2.1 handshake before signalling an upgrade to the underlying server, because once such an upgrade is signalled there's no way for the user to express any control over what happens in the case of a failed upgrade. In light of this Bandit chooses to fail upgrade requests loudly (as you're seeing), in keeping with our stated goal of not codifying policy. Beyond this particular issue, this goal is proving to be a bit of a pain as it's exposing places where the Phoenix / Plug / Cowboy interface was leaking and nobody noticed since Cowboy's usual behaviour is to silently fail (examples #144, #106, #101). Thanks for bearing with the pain of helping to plug (ha!) all these leaky abstractions. I'm going to close this issue and start on a PR to add the relevant improvements described above to WebSockAdapter. I'll keep this issue linked to those changes. Thanks again for your patience and diagnostic work on this! |
Apologize if this is probably not the best place to raise my question.
We’ve recently experimented on using Bandit instead of Cowboy for our Websocket service built on Elixir. With just a small percentage of our traffic, we saw a lot of crashes saying “Not a valid WebSocket upgrade request”. We followed the instructions for migration in https://hexdocs.pm/bandit/1.0.0-pre.1/Bandit.html.
Are there any other configuration which we should do? Is this possible that this issue has always been happening and Cowboy does not raise it but Bandit does?
The text was updated successfully, but these errors were encountered: