Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finalizer support in async API #1173

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Mar 12, 2023

This is another take on adding finalizer to the async API. Some things are taken from #926 but the implementation is completely rewritten and includes a rewrite of pubsub handling and adds reference counting for callbacks in the async API.

API changes

  • The redisAsyncCommand() family of functions have got new variants with "WithFinalizer" suffixes. The new functions are redisvAsyncCommandWithFinalizer(), redisAsyncCommandWithFinalizer(), redisAsyncCommandArgvWithFinalizer() and redisAsyncFormattedCommandWithFinalizer().
  • For each call to redisAsyncCommandWithFinalizer or one of its variants, the finalizer will be called exactly once.
  • The finalizer is called when a callback (even if the callback function is NULL) is no longer associated with any command or pubsub channel or pattern.
  • The finalizer is called after the last call to the callback. The callback is never called after the finalizer has been called.
  • Support the RESET command: It cancels subscriptions and MONITOR mode.
  • Accept commands while in MONITOR mode.
  • Add support for sharded pubsub (SSUBSCRIBE, SUNSUBSCRIBE).

Internal changes

The callback queue is changed so that each callback is allocated once and freed once. The operations "push" and "shift" don't copy the callback struct between heap and stack anymore, but instead they and just set the 'next' pointer to link callbacks into a list, without doing any allocations. For pubsub channels, one callback can be mapped to multiple channels, and they can be unsubscribed independently, so we use a reference counter to keep track of references to a callback. This makes adding finalizer support, and making sure it is called only once when the callback is freed, strait-forward.

The way hiredis kept tack of 'pubsub mode' using pending_subs and unsubscribe_sent was flawed. It couldn't handle multiple pending subscribe commands with an overlapping but different set of channels and it couldn't handle an error reply to a SUBSCRIBE or UNSUBSCRIBE command. Now we change this so that all commands, even SUBSCRIBE and UNSUBSCRIBE, are added to the reply queue and the 'subscribe' and 'unsubscribe' replies are matched against them in the queue. The channel-to-callback dicts are updated when a 'subscribe' or 'unsubscribe' reply is received, confirming that the client has subscribed or unsubscribed to a channel, rather than when the command is sent. This change makes it possible to handle an error reply for a pubsub command.

With this change, there is no need to keep a different replies queue for subscribe commands and regular commands. All are added to the same replies queue.

The callback queue in the changed so that each callback is allocated once and
freed once. The operations "push" and "shift" don't copy the callback struct
between heap and stack, but instead they and just set the 'next' pointer to link
callbacks into a list, without doing any allocations. For pubsub channels, one
callback can be mapped to multiple channels, and they can be unsubscribed
independently, so we use a reference counter to keep track of references to a
callback.

This prepares for adding finalizer support and to be sure it is called only
once, when the callback is freed.

The way hiredis kept tack of 'pubsub mode' using `pending_subs` and
`unsubscribe_sent` was flawed. It couldn't handle multiple pending subscribe
commands with an overlapping but different set of channels and it couldn't
handle an error reply to a SUBSCRIBE or UNSUBSCRIBE command. Now we change this
so that all commands, even SUBSCRIBE and UNSUBSCRIBE, are added to the reply
queue and the 'subscribe' and 'unsubscribe' replies are matched against them in
the queue. The channel-to-callback dicts are updated when a 'subscribe' or
'unsubscribe' reply is received, confirming that the client has subscribed or
unsubscribed to a channel, rather than when the command is sent. This change
makes it possible to handle an error reply for a pubsub command.

With this change, there is no need to keep a different replies queue for
subscribe commands and regular commands. All are added to the same replies
queue.
@zuiderkwast zuiderkwast force-pushed the finalizer branch 2 times, most recently from 1b50cd8 to 342cb92 Compare July 19, 2023 09:39
zuiderkwast and others added 2 commits July 19, 2023 12:25
* For each time redisAsyncCommandWithFinalizer or one of its variants has been
  called with a finalizer, the finalizer will be called exactly once.
* The finalizer is called when a callback (even if the callback function is
  NULL) is no longer associated with any command or pubsub channel or pattern.
* The finalizer is called after the last call to the callback. The callback is
  never called after the finalizer has been called.

New async API functions added:

* redisvAsyncCommandWithFinalizer
* redisAsyncCommandWithFinalizer
* redisAsyncCommandArgvWithFinalizer
* redisAsyncFormattedCommandWithFinalizer

Co-Authored-By: Tudor Bosman <[email protected]>
Co-Authored-By: Tudor Bosman <[email protected]>
@zuiderkwast zuiderkwast changed the title Async finalizer rebased (WIP) Add finalizer support in async API Jul 19, 2023
@zuiderkwast zuiderkwast marked this pull request as ready for review July 19, 2023 23:17
Additionally, accept commands in monitor mode. (For example the RESET
command, but also other commands.)

Apart from being useful but itself, this change makes the async API's
reply queue stay in sync (mapping each reply to the callback given
when the command was sent) when bombed with random commands (fuzzing).
@michael-grunder
Copy link
Collaborator

This is really good work. If it's ready I'll go through it and get it merged.

@zuiderkwast
Copy link
Contributor Author

This is really good work. If it's ready I'll go through it and get it merged.

Thanks. I thought it was ready, but there is some minor issue (timeout in a test) after I added the last commit (RESET command). I hope to get it fixed today or in the next days.

@zuiderkwast
Copy link
Contributor Author

@michael-grunder Did you restart the mac build? :-)

I believe this is ready now.

@zuiderkwast
Copy link
Contributor Author

If you want to squash-merge, we can use the top comment as the commit message.

If not, I can squash the fixup commits into the main commits later.

@kristjanvalur
Copy link
Contributor

Its been a long time since I looked at this code, but are the callbacks hashed (and refcounted) according to both the callback function pointer and the privdata?

@zuiderkwast
Copy link
Contributor Author

zuiderkwast commented Jul 29, 2023

are the callbacks hashed (and refcounted) according to both the callback function pointer and the privdata?

No, they are refcounted only per call to redisAsyncCommandWithFinalizer or its variants. The code doesn't check if the same privdata or function is used for other async commands.

[Edit] As mentioned in the PR description: "For each call to redisAsyncCommandWithFinalizer or one of its variants, the finalizer will be called exactly once".

This means hiredis doesn't care at all what privdata is. If the user wants to use the same privdata for multiple calls, they can put a refcounter on the privdata and decrement it in the finalizer. @kristjanvalur Do you think it is the right design or do you have another idea?

@zuiderkwast
Copy link
Contributor Author

@michael-grunder I just came back from vacation. Did you have any chance to look at this yet?

The use case we have is to implement pubsub in hiredis-cluster, which is a cluster wrapper around hiredis. For this, we would use the same finalizer for all commands, so a simpler API would suffice, e.g. redisAsyncSetFinalizer(ac, finalizer) rather than adding the four variants of redisAsyncCommandWithFinalizer. The finalizer can then be stored in the async context rather than in the callback struct. WDYT?

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Aug 21, 2023

This means hiredis doesn't care at all what privdata is. If the user wants to use the same privdata for multiple calls, they can put a refcounter on the privdata and decrement it in the finalizer. @kristjanvalur Do you think it is the right design or do you have another idea?

I haven't thought too deeply about it but I suspect the design is fine, if every call gets one call back. I remember having concerns about the previous attempt where callbacks were registered sepearately, if I remember correctly. #926

Edit: Thinking more about it, my concerns were with an API where one explicitly registers a callback, and then, whether the same CB pointer, with a different "privdata" would be considered a separate listener or not. I mean, you could use the same actual callback with different privdata to listen to different things.

This is never ambiguous with the proposed WithFinalizer API, but if you go with redisAsyncSetFinalizer then you need to specify probably that only a single finalizer can be active, or something similar.

@zuiderkwast
Copy link
Contributor Author

if you go with redisAsyncSetFinalizer then you need to specify probably that only a single finalizer can be active, or something similar.

Yes, it's like a config, so only a single finalizer can be set. Setting a new one replaces the previous one. Is that what you mean?

The finalizer would be called once for each command, whenever the command's callback struct is freed internally. (For UNSUBSCRIBE, this means the finalizer is called immediately (because the replies to UNSUBSCRIBE are passed to SUBSCRIBE's callback) and even if privdata and callback are NULL, but this is fine to me too.)

I don't have a strong preference regarding *WithFinalizer vs redisAsyncSetFinalizer.

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Aug 22, 2023

So, here is the thing:
The finalizer is intended to finalize a callback, which has been registered. A callback is a unique tuple (redisCallbackFn* cb, void *privdata). The finalizer is supposed to be called when the (*cb)(privdata) call will never be made again.

So, if you go with the withFinalizer thing, you will have to change it so that it receives both the function pointer and the privdata, i.e. te redisFinalizerCallback must be somethign like typedef void (redisfinalizerCallback*)(redisCallbackFn *cb, void *privdata)

@zuiderkwast
Copy link
Contributor Author

OK, I see what you mean. But in the implementation, there is nothing that requires the tuple (callback_fn, privdata) to be unique. Even the exact same tuple can be used multiple times. I don't think the implementation should check that. It would be unnecessary complex and maybe even undesirable. Probably we should change the description to something like

"The finalizer is called when a callback (even if the callback function is NULL) is no longer associated with the command where it was provided"

instead of "... with any command or pubsub channel or pattern".

Btw, I realized now I haven't documented this feature in the README. I'll do that when there is a decision about the API.

@kristjanvalur
Copy link
Contributor

"The finalizer is called when a callback (even if the callback function is NULL) is no longer associated with the command where it was provided"

Yes, that is clearer. I just want to make sure that the finalizer then, needs to get both the callback pointer and the privdata provided to the API method which is being finalized.

You may even want to provide the finalizer with its own privdata. That way, both the callback and finalizer can effectively be "bound methods."

typedef void (redisFinalizerCallback)(struct redisAsyncContext *ac, void *finalizer_privdata, redisCallbackFn * cb, void *cb_privdata)

You could add some blurb in the line of "for most commands, that will be right after the callback is called for the only time, but for subscriptions that happens only when the subscription completes."

@zuiderkwast
Copy link
Contributor Author

I just want to make sure that the finalizer then, needs to get both the callback pointer and the privdata provided to the API method which is being finalized.

Why would you need that? Do you have a use case?

The use case so far is to free the privdata when it is no longer used. This assumes you do not reuse privdata for multiple commands (or if you do, you can use a reference counter) but I can't see why you would need the callback function pointer in the finalizer.

You may even want to provide the finalizer with its own privdata. That way, both the callback and finalizer can effectively be "bound methods."

I'd rather avoid that. I can't see why this would be useful. If you want special data for the finalizer, it can be embedded in the same privdata argument passed to the command.

You could add some blurb in the line of "for most commands, that will be right after the callback is called for the only time, but for subscriptions that happens only when the subscription completes."

Good idea. I'll add something like that.

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Aug 23, 2023

I just want to make sure that the finalizer then, needs to get both the callback pointer and the privdata provided to the API method which is being finalized.

Why would you need that? Do you have a use case?

Sure, you call two API functions. Both take different callback pointers and different privdata. Different callbacs can have different privdata formats.

Once you get a "finalizer" call, you have no idea which one of those calls are being finalized, unless both the callback and privdata are returned. The privdata is meaningless without a corresponding function pointer to associate it with.

I think that the point I'm trying to get across here, is that we should be able to consider a (func_ptr, data_ptr) as a bound method, a pointer to a class method, and they should always be kept around and used as a pair. The data_ptr is meaningless without the corresponding func_ptr and vice versa.

Whenever, in C, a callback is provided, the callback should always be a function pointer and some anynomus arg. That way, you can easily take an C++ class instance method and wrap it into a static method + this pointer.
Each callback function registered in C should (in general for C api, not just here) always have its own private data like that. That way, the callbacks can belong to different objects and you can register object methods as callbacks everywhere. Otherwise you are forced to view all your callbacks as different static methods bound to a single "object" which puts limits on how an api is used.

That's how I would design an API to be the most general. However, you could argue that people should just declare a single "privdata" data structure and use that everywhere where they use the api. Make it a limitation of the api. But then that should probably be explicitly mentioned. I.e. if you see the finalizer solely as a "free" function for privdata, and the finalizer is not per-call, then all the "privdata" need to be of the same type. Which was previously not a requirement in the API.

For this reason, I think I prefer the "with finalizer" approach, because in the current form, it allows you to register two bound methods to the same_object for each api, the callback of the object and the finalizer of the object, without restricting you to use the same "object type" everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants