Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plug.Conn.send_resp in separate process fails for HTTP/2 streams #101

Closed
danschultzer opened this issue Feb 23, 2023 · 10 comments
Closed

Plug.Conn.send_resp in separate process fails for HTTP/2 streams #101

danschultzer opened this issue Feb 23, 2023 · 10 comments

Comments

@danschultzer
Copy link
Contributor

Took me a while to track this one down. When testing with HTTP/2 I was seeing failure with Stream 3 completed in unexpected state remote_closed and empty responses even though I definitely did send the response!

The culprit is this line (the same for send_data):

:ok <- Stream.owner?(stream, pid),

In TestServer the route matching lives in a GenServer. When a requests hits the generic plug on the HTTP server, the plug will pass off the conn to the genserver to be processed in a handle_call callback. This means that caller pid will be different when calling send_resp.

Resolving this in TestServer may be awkward since the routing state lives in that GenServer process. Maybe it does make sense that the send_resp call should only happen in the generic plug process. But what's the reasoning behind the owner?/2 check? Why is it necessary to lock the conn processing to the initial caller?

FWIW cowboy works with the current logic in TestServer. It works as expected with Bandit when I remove this check, but not sure if there are any sideeffects to doing that. If it's necessary to have this check then it would be good to have a clearer warning.

@danschultzer danschultzer changed the title Plug conn in separate process fails for HTTP/2 streams Plug.Conn.send_resp in separate process fails for HTTP/2 streams Feb 23, 2023
@mtrudel
Copy link
Owner

mtrudel commented Feb 23, 2023

That's an oddball one! I suppose the plug API doesn't specify anything explicitly about cross-process plug usage, but it conceptually seems like doing so could introduce a ton of gnarly concurrency issues.

@mtrudel
Copy link
Owner

mtrudel commented Feb 23, 2023

Brought it up on the forum: https://elixirforum.com/t/are-there-restrictions-on-making-plug-conn-calls-across-multiple-processes/54088/1. Let's see what people say.

@danschultzer
Copy link
Contributor Author

Great! Yeah, maybe this is just an extreme edge case, and it was never really meant to handle the conn outside of the owner process. I can't think of any other library that would do this other than bypass.

I looked at the plug sourcecode. The :owner attribute seems to only be used to pass messages of what has happened with the conn:

https://github.com/elixir-plug/plug/blob/d7940552e365e0d3245bb78ca99429ea5c324f7d/lib/plug/conn.ex#L447-L455
https://github.com/elixir-plug/plug/blob/d7940552e365e0d3245bb78ca99429ea5c324f7d/lib/plug/conn.ex#L486-L506
https://github.com/elixir-plug/plug/blob/d7940552e365e0d3245bb78ca99429ea5c324f7d/lib/plug/conn.ex#L528-L534

The test conn is interesting, sending messages to the owner pid (I want to dig into where the test conn is used): https://github.com/elixir-plug/plug/blob/12dbfb817d1951c955afe90d0f4f4bf4bfd4e8b9/lib/plug/adapters/test/conn.ex

AFAIK it's used for Phoenix ConnTest, but I'm guessing everything happens in the same test process there.

@danschultzer
Copy link
Contributor Author

FWIW I pushed this PR danschultzer/test_server#9 where I experience the issue (I realized I wasn't testing HTTP/2 streams). When I got more time I'll dig deeper into how the owner pid attribute is used in plug.

@mtrudel
Copy link
Owner

mtrudel commented Mar 10, 2023

Forum post was inconclusive; not really much to do here.

@mtrudel mtrudel closed this as completed Mar 10, 2023
@ryanwinchester
Copy link
Contributor

I've had issues with gRPC/Gun across processes, too. Instead of trying to understand it, I just rewrote that part. Smol-brained solutions.

@danschultzer
Copy link
Contributor Author

I've thought a lot more about this trying to figure out how this should be solved in TestServer.

Since the conn.owner attribute is not enforced to be the only caller for any functions affecting the conn in the Plug.Conn module, I assume that those functions (including send_resp) can be called from any process. WebSocket gets around any issues by just exposing a behaviour and it'll call the functions from within the controlling pid.

Another thing is that Bandit.HTTP1.Adapter doesn't care what process you call send_resp from, so there's an inconsistency here in how Bandit deals with the process owner (in both cases ThousandIsland.Socket.send is called though HTTP1 it only happens once).

Plug.Cowboy doesn't use the owner attribute for anything, because it's not necessary. It calls :cowboy.req module that just sends it back to the controlling process:

https://github.com/elixir-plug/plug_cowboy/blob/0227c58613d02d4d6885dd4df01d297ccdb95c68/lib/plug/cowboy/conn.ex#L32-L37

https://github.com/ninenines/cowboy/blob/30ee75cea14c1fd63a46093cfb2bb9ad3f844f75/src/cowboy_req.erl#L829-L834

https://github.com/ninenines/cowboy/blob/30ee75cea14c1fd63a46093cfb2bb9ad3f844f75/src/cowboy_req.erl#L920-L923

And as I said earlier, the conn test adapter also sends the message back to the controlling process:

https://github.com/elixir-plug/plug/blob/2ef07cdd2732cde5cac73fc39b49fe83d5fcc369/lib/plug/adapters/test/conn.ex#L98-L101

I think how Bandit should work is to send the message back to the controlling process rather than just ignoring the message. A question I got; isn't that already kinda what happens with the ThousandIsland.Socket.send call? 'm unsure what concurrency issue can occur here as the :send_headers and :send_data messages always goes to the Bandit.HTTP2.Handler GenServer for the connection pid, and those messages are processed sequentially?

@josevalim I would appreciate any input here if you got some insights on how to treat the conn.owner attribute when dealing with the conn callbacks 😄

@danschultzer
Copy link
Contributor Author

I solved this in TestServer by just using Plug.Conn.resp/3 instead of Plug.Conn.send_resp. It'll make it so the controlling process sends the response on plug termination. Bandit, Plug.Cowboy, and my custom :httpd adapter all call send_resp/3 in the plug handler so this solves the issue!

I also opened #115 to surface the controlling process error with a warning. It'll still be good to have a more definite answer on the correct approach with controlling process for the conn, but my issue is solved (as long as nobody calls Plug.Conn.send_resp/3 in their tests using TestServer with Bandit).

@mtrudel
Copy link
Owner

mtrudel commented Mar 26, 2023

I added the sending PID constraint code mostly as a footgun suppressant if people were to try calling from multiple processes. Rather than leave open the possibility of oddball concurrency issues, I figured I would enforce a strict owner process constraint, which is what you see here.

That said, I'm 90% sure that it isn't strictly required; I could drop the owner constraint & things would function just fine; we'd basically have the same level of concurrency issue resilience that Cowboy / Plug has in this regard. I'm quite open to that.

I've pushed up an implementation of this as #117. Have a look; I think it would simplify a bunch of the issues folks have been having with this, and concedes only on points I don't feel too strongly about.

@mtrudel
Copy link
Owner

mtrudel commented Jan 12, 2024

I added a note on #215 about upcoming changes to how we do PID checking that will apply here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants