[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

littlewu2508 · 2024-02-01T15:17:44Z

Problem Description

I'm currently interested in p2p data transfer from FPGA (Xilinx Alveo U50) to an AMDGPU. There are already implementation for FPGA-Nvidia GPU at https://github.com/RC4ML/FpgaNIC, using https://github.com/NVIDIA/gdrcopy, and in the past there are researches [1,2] achieving that with DirectGMA. But DirectGMA is now deprecated along with the proprietary fglrx driver. I wonder with the open source amdgpu driver, is there any similar methods?

[1] http://dx.doi.org/10.1088/1748-0221/11/02/P02007
[2] http://dx.doi.org/10.1088/1748-0221/12/03/C03015

I read some source code about dma p2p copy in https://github.com/ROCm/ROCR-Runtime/blob/master/src/core/runtime/ and https://github.com/ROCm/ROCT-Thunk-Interface/tree/master/tests/kfdtest/, it seems that all the userspace dma copy are utilizing the hsa driver. But as I know currently there's no hsa support in Xilinx Alveo cards (maybe there's on-going work), so I wonder if it's possible for dma p2p between FPGA and AMDGPU

I also raised this question in openucx/ucx#9598 and found Xilinx Alveo cards support PCIe dma p2p, via opencl on XRT. Does that mean I can use opencl to achieve p2p between FPGA and AMDGPU? However as I understand rocm-opencl-runtime is also based on hsa.

Operating System

Debian 12

CPU

AMD EPYC 7702 64-Core Processor

GPU

AMD Instinct MI100

ROCm Version

ROCm 5.7.1

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

ppanchad-amd · 2024-08-20T15:12:13Z

@littlewu2508 Internal ticket has been created to assist with your question. Thanks!

tcgu-amd · 2024-09-23T20:28:54Z

Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.

Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.

Thanks!

littlewu2508 · 2024-09-24T09:22:14Z

Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.

Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.

Thanks!

Thank you very much for the idea! I will try it out when I have time.

Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.

tcgu-amd · 2024-09-24T13:55:42Z

Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.

@littlewu2508. If everything works out, the GPU should be directly accessing the FPGA memory without using the host buffer. The idea is that hipHostRegister() will coordinate with the OS to translate and pin the target host memory, which in our case would be the the virtually mapped FPGA memory. This involves a virtual-to-physical memory translation in the OS, after which the GPU is given a device pointer that should correspond to the physical memory address (which is on the FPGA). It can then directly access the FPGA memory with the help of the PCIe controller. The main troublesome part is the address translation in the OS, because that involves the GPU and the FPGA drivers, as well as the PCIe.

ppanchad-amd added the Under Investigation label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

littlewu2508 commented Feb 1, 2024

ppanchad-amd commented Aug 20, 2024

tcgu-amd commented Sep 23, 2024 •

edited

Loading

littlewu2508 commented Sep 24, 2024

tcgu-amd commented Sep 24, 2024 •

edited

Loading

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

Comments

littlewu2508 commented Feb 1, 2024

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

ppanchad-amd commented Aug 20, 2024

tcgu-amd commented Sep 23, 2024 • edited Loading

littlewu2508 commented Sep 24, 2024

tcgu-amd commented Sep 24, 2024 • edited Loading

tcgu-amd commented Sep 23, 2024 •

edited

Loading

tcgu-amd commented Sep 24, 2024 •

edited

Loading