Execution Runtime #3870

wdbaruni · 2024-01-24T06:09:25Z

The Problem

Today we are running jobs directly under docker or wasm without abstracting the execution layer or wrapping inside a common runtime. This means for each execution engine, we need to solve the problem of accessing logs and resource allocation. We are also missing other features that we will need to implement for each, such as filesystem access, network access, monitoring for leaked executions, secrets, security, sandboxing, health checks, and more.

In addition to that, fetching inputs from storage sources (e.g. IPFS and S3) and publishing results are taking place in the bacalhau process and not within the execution containers. This opens the door for malicious neighbours and sensitive data leaking across executions on the same node.

We are also planning to introduce additional execution engines that can run python, golang and additional applications without having to package them as a docker image. We are still running those inside docker behind the scene, which is great and better than running them directly on the host, but maybe docker is not the best execution engine to adopt.

The Proposal

The proposal is to implement an execution runtime based on Firecracker or something similar, that takes care of the following:

Abstracts the underlying engines and expose a unified interface between the compute node and the execution engines
Handles the pre and post execution logic, such as fetching inputs, secrets and publishing results. All within the isolated box or microVM of the runtime
Allocates and restricts resource access, including memory, filesystem and network access.
Monitoring of leaked executions (executions that are running but they shouldn’t be)
Improved tracing, visibility and monitoring of active executions
Improved error reporting of the different phases of job execution
And much more.

wdbaruni · 2024-10-13T07:49:11Z

replaced by linear project

wdbaruni added th/production-readiness Get ready for production workloads type/epic Type: A higher level set of issues labels Jan 24, 2024

wdbaruni mentioned this issue Jan 24, 2024

Execution Engine Monitoring #3869

Closed

wdbaruni added this to the v1.4.0 milestone Jan 25, 2024

wdbaruni removed this from the v1.4.0 milestone Apr 16, 2024

wdbaruni transferred this issue from another repository Apr 21, 2024

wdbaruni self-assigned this Jun 18, 2024

wdbaruni removed their assignment Jun 26, 2024

wdbaruni added this to the v1.7.0 milestone Aug 12, 2024

wdbaruni assigned udsamani Sep 16, 2024

wdbaruni mentioned this issue Sep 17, 2024

Stopping a job loses all its logs #4420

Open

wdbaruni closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2024

wdbaruni unassigned udsamani Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution Runtime #3870

Execution Runtime #3870

wdbaruni commented Jan 24, 2024 •

edited

Loading

wdbaruni commented Oct 13, 2024

Execution Runtime #3870

Execution Runtime #3870

Comments

wdbaruni commented Jan 24, 2024 • edited Loading

The Problem

The Proposal

wdbaruni commented Oct 13, 2024

wdbaruni commented Jan 24, 2024 •

edited

Loading