-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slot supplier interface & fixed-size implementation #2014
Conversation
temporal-sdk/src/main/java/io/temporal/internal/worker/EagerActivitySlotsReservation.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/PollTaskExecutor.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/slotsupplier/SlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/slotsupplier/SlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/slotsupplier/TrackingSlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/slotsupplier/TrackingSlotSupplier.java
Outdated
Show resolved
Hide resolved
|
||
void markSlotUsed(SlotInfo info, SlotPermit permit); | ||
|
||
void releaseSlot(SlotReleaseReason reason, SlotPermit permit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can releaseSlots
be blocking? If we had some sort of remote slot provider it seems like it could be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can allow it to be blocking. The code has no expectation now that it might, and allowing it to could I think gum up stuff in unexpected ways. I mentioned in the proposal that users can delegate any blocking work to a background thread or to calls to reserve slot, and I think that's what they ought to do.
d814a14
to
76a5921
Compare
767552a
to
7221d33
Compare
|
||
releaseSlots(this.outstandingReservationSlotsCount - activityTasksCount); | ||
// Release any slots that we won't be using | ||
releaseSlots(this.reservedSlots.size() - activityTasksCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the slots are reserved based on the task queue don't we also need to keep track of that when using the slot and releasing the slot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point. I wouldn't say they're based on so much as that information is provided when making the reservation (implementations may or may not care). It does make sense, of course, that we might also want to provide that information when releasing.
Arguably, users could stuff that in the permit data, but maybe we want to make it first-class. In that case, I'd probably just stick it as a field in the permit. However, you could also argue that implementations that want to balance across multiple queues maybe don't want to associate a permit with a specific queue, in which case it makes more sense on some ReleaseContext
object to provide as another arg when releasing.
The latter maybe makes a bit more sense to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to release context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to clarify what I meant is your slot reservations are not fungible since they are tied to the task queue they are requested for. So you can't just blindly release first x slots you need to release the slots based on task queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah that makes sense. I don't know that it necessarily must work that way, we could only make the guarantee that we'll return permits since they all came from the same supplier regardless of queue, but we can do it fairly easily and it's probably helpful so I'll make that change.
temporal-sdk/src/main/java/io/temporal/internal/worker/EagerActivitySlotsReservation.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/WorkerOptions.java
Outdated
Show resolved
Hide resolved
* | ||
* @return the maximum number of slots that can ever be in use at one type for this slot type. | ||
*/ | ||
int maximumSlots(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I wonder now if we really need this? It is totally valid to have an unbounded threadpool in Java by setting maximumPoolSize
to Integer.MAX_VALUE
.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm just a little worried that could be a big footgun? They can always return a huge number here and accomplish the same thing if they want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm just a little worried that could be a big footgun?
What is the footgun?
They can always return a huge number here and accomplish the same thing if they want to?
True, but why isn't that just the default then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The footgun is possibly way over-subscribing the CPU in the case of activity executions, for example. You can easily end up running way too many concurrently. Buuuuut, I suppose that's definitely not happening if they use our fixed size supplier, since it'll do the right thing, and if they implement their own, well, then it should be fairly clear to them not to do that, or at least is their responsibility.
So, yeah, I can see allowing it to default to something big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The footgun is possibly way over-subscribing the CPU in the case of activity executions, for example.
That is a problem of handing out to many slots. Basically I don't think there is any technical reason we need this in Java so if we add it here we should add it to all SDKs.
this.maxConcurrentActivityExecutionSize = o.maxConcurrentActivityExecutionSize; | ||
this.maxConcurrentWorkflowTaskExecutionSize = o.maxConcurrentWorkflowTaskExecutionSize; | ||
this.maxConcurrentLocalActivityExecutionSize = o.maxConcurrentLocalActivityExecutionSize; | ||
this.workflowSlotSupplier = o.workflowSlotSupplier; | ||
this.activitySlotSupplier = o.activitySlotSupplier; | ||
this.localActivitySlotSupplier = o.localActivitySlotSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two recommendations here:
- Leave the existing fields and make new fields mutually exclusive with them, don't need to try to have multiple getters/setters reference the same fields, the builders as simple getters/setters is clearer IMO.
- Add a new interface for
WorkerTuner
that wraps the three suppliers (e.g. can just have three getters that are given basic worker info as params and default return null and a default implementation that accepts these three nullable values as constructor or something). We've had good success with the singular option field for interceptors because from a user perspective, they like to just set one thing and forget it (e.g. otel impl). In most cases these three will be intertwined I think, e.g.builder().setTuner(new ResourceLimitedTuner(someOptions))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Quinn-With-Two-Ns Do you have any thoughts here? I feel like there are a lot of options here and it's mostly preferential. I think Chad's suggestions make sense, but there are lots of viable approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with 1. , I don't have a strong opinion on 2 , but I slightly prefer 3 separate options for each SlotSupplier
since it fits with a builder pattern better and handle a mix of SlotSupplier
and legacy config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about it from a user POV. What is the simplest configuration for a user to choose "resource limited tuning"? I am happy we didn't make a user set 3 different interceptor fields for use of OTel interceptors. It's not that the 3 things can't still be provided, it's just from a user and default impl, people shouldn't have to set three things if they don't need to. Even for fixed, they should be able to construct one thing with options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's dramatically easier for the user to provide separate suppliers for separate kinds, which is I think more likely the more common case than a shared implementation across different kinds (at least for custom implementations) - and it makes that case no harder. Setting two more options fields is effectively zero effort.
If it was one option, we either have to have something like WorkerTuner.activity(a).workflow(b).localActivity(c)
which is now the same thing, but with one extra step of putting that whole thing in the worker config. Or we have to change the whole interface to pass the type on every reservation call which I think is much messier and conceptually more difficult for most implementations.
I'll change the options to keep the old values, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, since we now already have mutually exclusive options, there's zero reason we can't do builder().setResourceLimitedTuner(options)
and under the hood it sets all three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was one option, we either have to have something like
WorkerTuner.activity(a).workflow(b).localActivity(c)
It'd be more like builder().setWorkerTuner(new TheTuner(a, b, c))
which is nice because people can extend that one or get/share one from a third party lib or whatever
Plus, since we now already have mutually exclusive options, there's zero reason we can't do
builder().setResourceLimitedTuner(options)
and under the hood it sets all three.
That means only we get the single-option approach because we are Temporal. Everyone should be allowed to have users use their tuner via a single option too. In the meantime, they may have to provide a static applyToWorkerOptionsBuilder
to make sure all three suppliers are set right.
Same thing for interceptors. Datadog was able to provide https://pkg.go.dev/go.temporal.io/sdk/contrib/datadog/tracing#NewTracingInterceptor that is set as a single place but it intercepts three things: client, activity, and workflow. We definitely could have required that users give us all three interceptors separately but I'm glad we didn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's an apples-to-apples comparison at all though. Your tracing interceptor isn't making decisions about what happens where the first and most important criteria is the type of slot - rather it is dealing with application-wide traces, and may or may not care about the specifics of the thing being traced. Here, you always care about the type of slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation always cares about the type of thing it's intercepting or the slot, the one setting the worker option does not. Well, they may in some cases (e.g. fixed size things they may provide three options to this tuner), but they may not in other cases (e.g. they only want to provide resource limits to tuner or just leave defaults). But it should be up to the tuner implementation to decide whether the configurer cares about separate things or not (the implementation itself of course will always care about separate things).
Even in cases where you may want to mix (e.g. fixed and resource-limited), I think still requiring an impl that returns different slot suppliers for those specific cases is better UX than making everyone care about those separate slot suppliers in every case.
I think it may be worth seeing how a user would provide their own implementation of a resource tuner that needs to work across the three (we shouldn't be special in that we're the only ones that get single-option config).
* the reservation, or during shutdown. You may perform cleanup, and then should rethrow the | ||
* exception. | ||
*/ | ||
SlotPermit reserveSlot(SlotReservationContext<SI> ctx) throws InterruptedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SlotPermit reserveSlot(SlotReservationContext<SI> ctx) throws InterruptedException; | |
Object reserveSlot(SlotReservationContext<SI> ctx) throws InterruptedException |
IMO we should return an Object
(can make this generic, your choice). There's no benefit to the SlotPermit class IMO, anyone can make their own or if they don't care, they can return new Object(). They don't need IDs (objects already have IDs) and therefore the whole object is "user data".
Note, I originally was going to suggest returning a future since they have better cancellation mechanism than forced thread interruption (so we can have better performing things like virtual threads in the future), but I figured it'd be too much work for users currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having the internal ID because it ensures equality & hashing always work properly, regardless of what the user ends up doing with their data (yeah, objects have an ID, but they could end up implementing eq/hash at the top level in a way that potentially breaks things - but to your point we could get rid of the internal ID and still have that work seemingly), and it also makes for a nice, clear type at the top-level. The type name alone feels worth it to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, objects have an ID, but they could end up implementing eq/hash at the top level in a way that potentially breaks things - but to your point we could get rid of the internal ID and still have that work seemingly
I'd have to understand this a bit better. Equals/hashCode will work for new Object()
directly and if someone overrides those for a specific object, I would expect them to work too. I don't think we should ever invoke equals/hashCode on this object for anything though.
Using user data inside the permit is a bit rough, but if we are expecting that to be the user method for extensibility, can you make the class final so it can't be extended? I would have originally thought we would encourage people to return their own instance to keep state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash/eq is used because we use it in the hash map in the tracking supplier. So, sure, we expect it to work just fine - but our implementation will always be correct (and likely cheaper) than a user-supplied one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should then make SlotPermit
final
(so nobody extends it and messes up this expectation) and you should rely on the default equals/hashCode which is identity based (i.e. don't override it). I think it's clearer that way. We should also document to users that the exact same instance of slot permit is the one given back later so they can have similar expectations.
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotReservationContext.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotReservationContext.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/ActivitySlotInfo.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotReleaseReason.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotReleaseReason.java
Show resolved
Hide resolved
private final String workerBuildId; | ||
private final boolean fromStickyQueue; | ||
|
||
public WorkflowSlotInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would suggest hiding this object-building detail (can consider making slot infos abstract classes or interfaces or something and move construction to internal if wanted, or just change the constructor for every field in here and potentially add javadoc on this one and activity one saying not to rely on stability of constructor signature).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the lack of granular enough visibility specifiers is annoying. I'll use a doc comment for now and we can potentially change it later if/when we change more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, but we should consider a POJO constructor for these fields too. People will be instantiating this to test their suppliers I think. (I wonder if we need to account for that with the context classes too and make those easy to make, but meh as interfaces they should be easy to mock)
temporal-sdk/src/main/java/io/temporal/internal/worker/TrackingSlotSupplier.java
Outdated
Show resolved
Hide resolved
this.userData = null; | ||
} | ||
|
||
public SlotPermit(Object userData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect slot providers will often be implemented by chaining or wrapping other slot providers, where each slot provider may want to attach data to the permit. Maybe an API like a Go context
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking something like permit.withData(obj)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where allowing the Java author to return the type of their choice (even if they were forced to extend SlotPermit
though not as much of a fan there) gives more freedom than attaching an object to an existing object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not want users to be able to return an object of their choice. I would like to maintain control over the object and provide an interface to attach and get metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Ok, no prob. Still not a big fan of the ID in here, but no prob there too. Mentioned in other comment, but we should probably make this class final, consider removing the equals/hashCode override (i.e. default to identity-based), and document on the context interfaces that the permits are the exact same instances as the ones returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify I was taking about having a map of userdata like a Go context
does so different providers wrapping each other can add their own metadata without impacting each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like a typemap of <T -> Data> would be useful. But, perhaps we add that later if requested
Fix todos Give eq/hash to SlotPermit Rename package Add more to slot infos Move metric tracking inside of tracking supplier Publish junit test report / set some task queue kinds Don't bother with futures since the value they're providing is unclear Make use of in-use slot tracking map Internalize tracking slot supplier [Trial] Seeing if a dedicated thread fixes fixed-size w/ futures Use completable future, fix sched-to-start timeout test Fix accidentally removed call to newAttempt for LAs Mark slots as used & Apparently I can't update year in license header Stupid licenses Tests are mostly working? Local timeouts that I think are unrelated. Plumbed through & compiling, but a number of TODOS Add interface, supporting classes, and fixed size impl
628ac7c
to
f27cfb6
Compare
* | ||
* @return the maximum number of slots that can ever be in use at one type for this slot type. | ||
*/ | ||
default int maximumSlots() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default int maximumSlots() { | |
default int getMaximumSlots() { |
Pedantic and don't have strong opinion here, just saw #2014 (comment) marked resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, must have marked it by accident.
this.userData = null; | ||
} | ||
|
||
public SlotPermit(Object userData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Ok, no prob. Still not a big fan of the ID in here, but no prob there too. Mentioned in other comment, but we should probably make this class final, consider removing the equals/hashCode override (i.e. default to identity-based), and document on the context interfaces that the permits are the exact same instances as the ones returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly there, things left IMO:
- At least make
SlotPermit
final and document we use same instances (should also consider removing unnecessarily ID and removing unnecessary equals/hashCode) - Figure out the worker options (would like to leave existing ones alone and just make mutually exclusive at build time, and would like a single option to set a tuner instead of 3)
- Add
@Experimental
on all types in tuning package and on the new worker option(s)
private final String workerBuildId; | ||
private final boolean fromStickyQueue; | ||
|
||
public WorkflowSlotInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, but we should consider a POJO constructor for these fields too. People will be instantiating this to test their suppliers I think. (I wonder if we need to account for that with the context classes too and make those easy to make, but meh as interfaces they should be easy to mock)
import java.util.Objects; | ||
|
||
/** Contains information about a slot that is being used to execute an activity task. */ | ||
public class ActivitySlotInfo extends SlotInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is gonna be annoying I know, but I think you should mark every class in this package and the worker options as @Experimental
(this is an annotation in the io.temporal.common
package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worker options are marked experimental as is the interface class, which seems sufficient to me, but sure I'll mark them all that way.
*/ | ||
@Experimental | ||
public final class SlotPermit { | ||
public final Object userData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it's final, it's a bit of a faux pas to have public fields. Would prefer getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
253dace
to
a298f26
Compare
@@ -71,12 +71,12 @@ public void markSlotUsed(SI slotInfo, SlotPermit permit) { | |||
publishSlotsMetric(); | |||
} | |||
|
|||
public void releaseSlot(SlotReleaseReason reason, SlotPermit permit) { | |||
public void releaseSlot(SlotReleaseReason reason, SlotPermit permit, String taskQueue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taskQueue should not be needed I think, it should always be the same as what was provided in SlotReservationData
Can we make issue (if we don't already) for:
|
} | ||
|
||
public int getMaxConcurrentLocalActivityExecutionSize() { | ||
return maxConcurrentLocalActivityExecutionSize; | ||
return localActivitySlotSupplier.getMaximumSlots(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 getMaxConcurrent*ExecutionSize()
should be zero if the corresponding maxConcurrent*ExecutionSize
is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a buidlers setter should only effect one getter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah forgot to change those ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I was hoping not to retain the property in the built workflow options since having mutually exclusive stuff represented in the built type is less than good from a type design perspective. It also means the logic for the threadpool size has to get pushed out of the options, but, that's not so bad.
Look good! my only issue is the handling of eager slots looks incorrect now since we made slots non fungible, my other points are more nits or things we can tackle later. |
a298f26
to
786ad6b
Compare
What was changed
Implements the slot management proposal
Why?
Users have expressed the desire for more control over when workers choose to poll for new tasks
Checklist
Closes
How was this tested:
Existing tests (adding some additional UTs in progress)
Any docs updates needed?
There will be, after this has had some time in preview