Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Runtime #3870

Closed
wdbaruni opened this issue Jan 24, 2024 · 1 comment
Closed

Execution Runtime #3870

wdbaruni opened this issue Jan 24, 2024 · 1 comment
Labels
th/production-readiness Get ready for production workloads type/epic Type: A higher level set of issues
Milestone

Comments

@wdbaruni
Copy link
Member

wdbaruni commented Jan 24, 2024

The Problem

Today we are running jobs directly under docker or wasm without abstracting the execution layer or wrapping inside a common runtime. This means for each execution engine, we need to solve the problem of accessing logs and resource allocation. We are also missing other features that we will need to implement for each, such as filesystem access, network access, monitoring for leaked executions, secrets, security, sandboxing, health checks, and more.

In addition to that, fetching inputs from storage sources (e.g. IPFS and S3) and publishing results are taking place in the bacalhau process and not within the execution containers. This opens the door for malicious neighbours and sensitive data leaking across executions on the same node.

We are also planning to introduce additional execution engines that can run python, golang and additional applications without having to package them as a docker image. We are still running those inside docker behind the scene, which is great and better than running them directly on the host, but maybe docker is not the best execution engine to adopt.

The Proposal

The proposal is to implement an execution runtime based on Firecracker or something similar, that takes care of the following:

  1. Abstracts the underlying engines and expose a unified interface between the compute node and the execution engines
  2. Handles the pre and post execution logic, such as fetching inputs, secrets and publishing results. All within the isolated box or microVM of the runtime
  3. Allocates and restricts resource access, including memory, filesystem and network access.
  4. Monitoring of leaked executions (executions that are running but they shouldn’t be)
  5. Improved tracing, visibility and monitoring of active executions
  6. Improved error reporting of the different phases of job execution
  7. And much more.
@wdbaruni wdbaruni added th/production-readiness Get ready for production workloads type/epic Type: A higher level set of issues labels Jan 24, 2024
@wdbaruni wdbaruni added this to the v1.4.0 milestone Jan 25, 2024
@wdbaruni wdbaruni removed this from the v1.4.0 milestone Apr 16, 2024
@wdbaruni wdbaruni transferred this issue from another repository Apr 21, 2024
@wdbaruni wdbaruni transferred this issue from another repository Apr 21, 2024
@wdbaruni wdbaruni self-assigned this Jun 18, 2024
@wdbaruni wdbaruni removed their assignment Jun 26, 2024
@wdbaruni wdbaruni added this to the v1.7.0 milestone Aug 12, 2024
@wdbaruni
Copy link
Member Author

replaced by linear project

@wdbaruni wdbaruni closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
th/production-readiness Get ready for production workloads type/epic Type: A higher level set of issues
Projects
Status: Done
Status: Backlog
Development

No branches or pull requests

2 participants