Users shall be able to update existing jobs #3871

wdbaruni · 2024-01-22T15:44:51Z

The Problem

With the introduction of long running jobs, users shall be able to update the specs of active jobs where the orchestrator shall deploy updated jobs in place if possible, or select new compute nodes. Today users must cancel previous jobs and submit new ones whenever they want to update their jobs, which is not efficient, introduce gaps in execution, and discard versioning and history of previous job instances.

Updates include and not limited to:

Update execution count where the orchestrator must cancel previous executions or deploy new ones depending on the new count
Update resources, where the orchestrator should attempt to deploy the update in place and ask the compute node to reflect the new resource requirements if the node has capacity, or find a new node and cancel the previous execution
Update metadata and labels, where the orchestrator and compute node shall update the execution’s metadata without stopping anything
Update engine config or engine type, where again the orchestrator can decide to update in place if the compute node supports the new requirements or find a new node

Requirements

More info can be found here

As a user, I want the capability to update the specifications of service and daemon jobs, with Bacalhau rolling out these changes across the network.
As a user, I want the ability to define job update strategies, including rolling updates and blue/green deployments.
As a user, I expect Bacalhau to automatically rollback to a previous job version if a new version is deemed unhealthy within a configurable timeframe.
As a user, I need Bacalhau to conduct health checks to determine if a job is unhealthy, even if the execution environment (e.g., docker container) appears operational.
As a user, I want access to previous versions of my job specifications, with the option to rollback to earlier versions.
As a user, I prefer to submit and query jobs using a unique job name I provide, rather than relying solely on a Bacalhau-generated job ID.
As a user, I want to stop a job using the unique name I have provided, in addition to the Bacalhau-generated ID.
As a user, I expect re-submitting a job with the same name to be treated as an update by Bacalhau, triggering appropriate actions.
As a user, I want re-submitting a job with the same specifications and name to result in no action from Bacalhau and no increment to the job version
As a user, I desire an option to force Bacalhau to update and re-deploy a job even if the specifications have not changed.

Open Questions:

What is the desired behaviour when re-submitting a batch or an ops job with same name?
1. Option 1: If there is no change in the spec, or the user did not force an update, then take no action. Otherwise, treat it as an update by stopping any existing executions and deploy new ones.
2. Option 2: Always deploy new executions without stopping existing ones. bacalhau get and bacalhau describe will always return results and status of the latest job version. User will have to pass a new --version <int> flag to describe or download results of previous versions. bacalhau stop on the other hand will stop all active executions, including from previous versions. Users will have to pass --version <int> to stop a specific version

The text was updated successfully, but these errors were encountered:

wdbaruni · 2024-10-13T07:09:57Z

replaced with linear project

wdbaruni added type/epic Type: A higher level set of issues th/production-readiness Get ready for production workloads labels Jan 22, 2024

wdbaruni mentioned this issue Apr 21, 2024

User defined job identifier #3864

Closed

wdbaruni added this to the v1.4.0 milestone Jan 25, 2024

wdbaruni removed this from the v1.4.0 milestone Apr 16, 2024

wdbaruni transferred this issue from another repository Apr 21, 2024

wdbaruni mentioned this issue May 9, 2024

Separate Job spec from system state #3983

Closed

wdbaruni closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users shall be able to update existing jobs #3871

Users shall be able to update existing jobs #3871

wdbaruni commented Jan 22, 2024

wdbaruni commented Oct 13, 2024

Users shall be able to update existing jobs #3871

Users shall be able to update existing jobs #3871

Comments

wdbaruni commented Jan 22, 2024

The Problem

Requirements

Open Questions:

wdbaruni commented Oct 13, 2024