[proposal] Provide an evolvable End to End Solution for Koordinator Device Management #2181
Labels
area/api-machinery
area/koord-manager
area/koord-scheduler
area/koordlet
kind/proposal
Create a report to help us improve
What is your proposal:
Provide an evolvable End to End Solution for Koordinator Device Management
Why is this needed:
Koordinator already supports two functions in the scheduler: GPU shared scheduling and GPU & RDMA joint allocation. It supports users to apply for GPU or RDMA resources using kubrenetes extended resources and Hints defined on Pod Annotation. The extended resource method was originally introduced into Kubernetes mainly to describe discrete and countable node resources. The Kubelet Device Plugin interface is the main way for the Kubernetes community to support such resource reporting and allocation.
However, the allocation logic of Kubelet Device Manager does not support the refined joint allocation of multiple resources according to the device topology, such as the scenario where GPU and RDMA need to be allocated under a PCIESwitch. The only topology allocation supported by Kubelet is allocation according to NUMA. However, even in the scenario where only NUMA allocation is required, Kubelet intervenes in NUMA access a little late. Users will have to face the failure of a successfully scheduled Pod to start due to topology mismatch.
To solve this problem, Koordinator moved the device allocation logic from Kubelet to the scheduler, and used cri-runtime-proxy on the stand-alone side to set up device isolation and visibility. However, the cri-runtime-proxy approach is indeed heavy and inconvenient to install. In addition, although the Koordinator scheduler provides the GPU and RDMA joint allocation function, there is no end-to-end solution available overall, especially on the stand-alone side, it has not yet been connected to the community standard RDMA logic. This proposal attempts to solve the above problems for Koordinator and provide an end-to-end feasible solution.
Finally, in the field of device management, the community proposed Dynamic Resource Allocation after the Device Plugin interface to overcome the various limitations of the current Device Plugin solution. This proposal will also show how Koordintor's GPU sharing and GPU & RDMA joint allocation are implemented under the DRA mode, and how the current solution evolves to DRA.
Key Results:
The text was updated successfully, but these errors were encountered: