Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side FFI bindings #3084

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open

Server-side FFI bindings #3084

wants to merge 65 commits into from

Conversation

bossmc
Copy link
Contributor

@bossmc bossmc commented Dec 14, 2022

Continues the work started in #2278 by adding bindings to allow using hyper's server stack from a C runtime. The patterns are based on the model used on the client side, and the two can be used simultaneously (e.g. using the client bindings to make down-stream HTTP calls as part of serving an upstream request).

I understand that the project is currently focussed on the 1.0 stabilization (woo!) and I realise that this MR isn't part of that process, but there's various teams across my area of Microsoft who would really like to start using hyper in our C/C++ projects to take advantage of the benefits that hyper is already providing to our Rust codebases elsewhere. Thus it would be really great to get a first-pass review of the approach soon to try and head off any churn down the line, if possible?

I've added an example C integration (capi/examples/server.c) that simply logs information about the request then responds with a 404 to everything.

Implementation details of note:

  • The service callback (where the C code implements business logic for handling requests) is given the request and a channel handle and completes the request by passing a response back on the channel, this allows the service callback to be asynchronous/multi-threaded/deferred etc.
  • I added a timer heap implementation so the server configuration can set timeouts (e.g. to protect against slow loris attacks) this currently exposes millisecond accuracy to the C code since that's the granularity of epoll_wait's timeout argument though there is also epoll_pwait2 which goes down to nanoseconds so maybe the FFI should expose them?
  • I made the ffi feature depend on all the other related features for now, mostly as a QoL for people using the FFI bindings (since the FFI code isn't feature aware yet)
  • I added an AutoConnection layer that restores the behaviour of "trying HTTP/1.x and, if it sees the HTTP/2 prelude rewinds and switches to HTTP/2 mode". I've tried to make this generic so it is possibly in a fit state to be lifted to hyper-utils rather than being in the FFI code?

Other smaller notes:

  • Simplified the header-generation code a lot by using cargo expand's ability to show a single module from a crate (so we no longer need to build a fake crate to be able to run bindgen)
  • Formatted the C example code with clang-format (with some rust-familar settings)
  • Fixed some compiler warnings in the example C code
  • Made Sleep auto-impl for things that look like Sleep again as a QoL thing

I've tested the server example code under valgrind and -fsanitize=memory to ensure there's no use-after-free/double-free/etc.

bossmc and others added 30 commits October 31, 2022 18:34
* Fix some compiler issues (signedness and pointer sizing)
* Include What You Use
* Sort the includes
@bossmc
Copy link
Contributor Author

bossmc commented Dec 19, 2022

Exposing each timer separately? I'm not sure I follow. Maybe you were referring to what I'm about to describe, but: I meant exposing hyper_timer * similar to hyper_io *. The user in C could call hyper_timer_set_sleep(timer, some_c_func), just like they do setting up the IO for hyper.

To be clear, the timer heap I've added is for Hyper's internal use (backing things like the "header timeout" function on HTTP/1 streams) it's not exposed to the C code to put general purpose timers on (I don't know if that matches your mental picture, but I thought I'd write it down explicitly just in case).

When I said "exposing each timer separately" I was picturing something like the following - hyper might be processing multiple requests on multiple sockets each with multiple timeouts configured on them so the model where there's a hyper_timer (like a hyper_io) provided by the C code (that presumably implements the Timer trait on the Rust side) it inevitably gets notified about each set/cancelled timer and has to report back to hyper when any non-cancelled timer pops which requires a reasonable amount of complexity on the C code to handle correctly and efficiently.

In contrast, with the heap inside the rust code, the C code is simply told "hyper would like to be polled in n microseconds or when an IO object becomes readable" which is hopefully very easy to integrate into whatever C event loop is driving the IO (or even to be added to some C-based timer heap along with other C-side timers if applicable).

But it sounds like your users want to use a Rust timer anyways. I certainly get why (more in Rust is safer!). I'm currently undecided on the pros and cons of having it directly in hyper, versus having the heap timer be outside. It doesn't have to be decided immediately, we can choose to add it later, or choose to document that it might get moved out.

Fair enough, we can discuss this further down the line if needs be.

since people are already integrating with this branch internally

🙃 🥵

They're only doing de-risking prototyping work, but still, I'd prefer not to break their codebases if I can help it.

features don't really integrate very well with cdylib targets

Why not? Isn't it typical to ./configure --blah --stuff C libraries, to include the features you want?

True, that is a common pattern, the difference as I see it here is that the ./configure && make model builds the library and its header file (or at least a configure.h file that's part of the header-file set) so the downstream project is guaranteed to see a consistent pair of library + headers. We could go a similar direction with the hyper-ffi library and have build tooling wrapping cargo build and gen_header that works a bit like ./configure and produces consistent output? Compare this to libraries shipped in OS packages which are built with one set of features (normally "most of them") enabled and the appropriate header files to go with and the consumer gets what they get (or builds the library from source if they need to be more fine-tuned).

What I'd want to avoid is there being a hyper/client.h and hyper/server.h file checked into the codebase and for the C-users to have to know which one(s) they're able to use based on how their libhyper.so was built. The simple option (maybe for now?) is to enable everything and expose one header file - then the C code sees a consistent view, but gets slightly bigger libraries than maybe they need?

@seanmonstar
Copy link
Member

The simple option (maybe for now?) is to enable everything and expose one header file - then the C code sees a consistent view, but gets slightly bigger libraries than maybe they need?

Ah yes, this makes sense. Let the exported symbols be consistent. We already have the start of this in the client code, there's a function enable HTTP/2, but if the feature was not compiled, it just returns an error, something like HYPERE_FEATURE_NOT_ENABLED or something. We could expand that concept more, such that the client code returns error-ish values (I guess NULL if returning a pointer) when not enabled, and same thing with the server.

Fair enough, we can discuss this further down the line if needs be.

Yea, I think I'm generally fine with all you've proposed. I'd just like to track its stabilization differently from the client FFI. Basically, I don't want us to have to decide the server stuff "is all good, freeze it forever" when stabilizing the client side for curl.

@bossmc
Copy link
Contributor Author

bossmc commented Dec 19, 2022

Yea, I think I'm generally fine with all you've proposed. I'd just like to track its stabilization differently from the client FFI. Basically, I don't want us to have to decide the server stuff "is all good, freeze it forever" when stabilizing the client side for curl.

Absolutely, I'm happy that all the changes I'm proposing here will fall into the "this API is unstable and is immune from Semver constraints" - the main goal of proposing this upstream is to get it out there so it hopefully can in time stabilize and, in the mean time won't be broken by non-FFI changes.

Martin Howarth and others added 2 commits December 21, 2022 01:23
Fix header handling for requests

See merge request amc/hyper-ffi!2
@bossmc
Copy link
Contributor Author

bossmc commented Feb 28, 2023

Hi @seanmonstar! Hope all's going well. Investigations and implementation is continuing and they threw up an interesting problem that I wanted to flag with you.

The basic problem is that when setting userdata on hyper_* objects, it can be hard to track the correct lifetime of that userdata object from the C code (e.g. a userdata for an IO object that backs a server connection needs to live for as long as the server connection task and probably wants to be cleaned up promptly when the server task completes in order to close sockets etc.).

The original implementation of the C server code just magically "knew" that when a server connection task completed the IO userdata and service userdata that backed it were no longer needed and could be freed, and to achieve this those userdata pointers were duplicated in the task userdata. This design means that there are multiple references to the IO/service userdata alive at the same time and the code frees one set of them when it's "pretty sure" the other ones have been forgotten and won't be read again. This made our C developers rightly nervous 😨 so I've implemented a different strategy that's more similar to how Rust would manage the memory.

hyper_io_set_userdata and hyper_service_set_userdata have both gained a drop_callback argument that (if provided) will be called on the userdata pointer when the owning object is dropped. This means that the owning object really owns the userdata in a rust ownership sense and the lifetimes are correctly aligned. I've allowed the C code to pass NULL as the callback to mean "don't drop this userdata, I'll do it myself" which I can see being useful if the userdata is stack-allocated (as it is in the client example code) or is shared between tasks.

I'm raising this change specifically as it's the first breaking change to the existing client-facing bindings that this MR has introduced 😄 and hence I wanted to get your opinion on the approach. I will say that the related changes to the example server integration (server.c) made the code much nicer/clearer IMO.

Additional Thoughts

  1. I've not done the same thing for the hyper_task userdata, since the expectation is that that will be read on task completion by the C code in order to work out what (if anything) needs to happen next for this piece of work and thus the C code would have it in hand to free it if appropriate. The only benefit of adding the a drop callback would be if someone called hyper_task_free without first reading the userdata from the task (e.g. in an error condition?).

  2. If someone calls hyper_X_set_userdata twice, then the first userdata is lost, hopefully the calling code still has a reference to it to free it/clean it up. I considered making hyper_X_set_userdata return any existing userdata (or call the clean-up callback?) to try and help but I don't have a hard requirement to solve this so I've not made any (more breaking) changes here before seeing what you think.

@bossmc
Copy link
Contributor Author

bossmc commented May 17, 2023

Coming back to those additional thoughts, I've come to the conclusion that it's better to be consistent everywhere, so hyper_task now takes an optional drop_callback and setting the userdata twice will call any drop_callback passed with the first userdata blob (i.e. providing a drop callback is equivalent to "pass by move" in a rust sense and the C code should not reference the userdata pointer after setting it on the task/io/service/etc.).

I've re-read your last comment (re: stability guarantees) and I may have misunderstood your question, are you proposing a #[cfg(hyper_unstable_client_ffi)] and #[cfg(hyper_unstable_server_ffi)] so we can stabilize each half separately? I was assuming we could just remove some of the #[cfg] blocks if we wanted to stabilize some but not all of the FFI API?

I think the changes are now ready for review (the team that has been integrating with them has got their pre-existing test suite passing with these changes 😄 ). It would be great to try and get this in if possible (otherwise the branch will keep getting stale and needing catchup merges).

@seanmonstar
Copy link
Member

I was assuming we could just remove some of the #[cfg] blocks if we wanted to stabilize some but not all of the FFI API?

Yes, that's probably the best way to do it. Whenever something does go stable, we just explicitly say which types/functions are stable in the changelog.

@bossmc
Copy link
Contributor Author

bossmc commented Oct 12, 2023

Coming back to this, I've merged the latest master in to this branch and fixed up the discrepancies. Do you think you'll be able to review this soon to prevent it bit-rotting again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants