Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 2.1 KB

ROADMAP.md

File metadata and controls

24 lines (17 loc) · 2.1 KB

Roadmap of MPI Operator

This document provides a high-level overview of where MPI Operator will grow in future releases. See discussions in the original RFC here.

New Features / Enhancements

  • Decouple the tight dependency on Open MPI and support other collective communication frameworks. Related issue: #12.
  • Support new versions of MPI Operator in kubeflow/manifests.
  • Redesign different components of MPI Operator to support fault tolerant collective communication frameworks such as caicloud/ftlib.
  • Allow more flexible RBAC when MPIJobs so existing RBAC resources can be reused. Related issue: #20.
  • Support installation of MPI Operator via Helm. Related issue: #11.
  • Support Go modules.
  • Consider support launching framework-specific services such as TensorBoard and Horovod Timeline. Since tf-operator already supports TensorBoard, we may want to consider moving this to kubeflow/common so it can be reused. Related issue: #138.

CI/CD

  • Automate the process to publish images to Docker Hub whenever there's new release/commit. Related issue: #93.
  • Ensure new versions of deploy/mpi-operator.yaml are always compatible with kubeflow/manifests.
  • Add end-to-end tests via Kubeflow's testing infrastructure. Related issue: #9.

Bug Fixes

  • Better statuses of launcher and worker pods. Related issues: #90