Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

Open
littlewu2508 opened this issue Feb 1, 2024 · 4 comments
Open

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? #159

littlewu2508 opened this issue Feb 1, 2024 · 4 comments

Comments

@littlewu2508
Copy link

Problem Description

I'm currently interested in p2p data transfer from FPGA (Xilinx Alveo U50) to an AMDGPU. There are already implementation for FPGA-Nvidia GPU at https://github.com/RC4ML/FpgaNIC, using https://github.com/NVIDIA/gdrcopy, and in the past there are researches [1,2] achieving that with DirectGMA. But DirectGMA is now deprecated along with the proprietary fglrx driver. I wonder with the open source amdgpu driver, is there any similar methods?

[1] http://dx.doi.org/10.1088/1748-0221/11/02/P02007
[2] http://dx.doi.org/10.1088/1748-0221/12/03/C03015

I read some source code about dma p2p copy in https://github.com/ROCm/ROCR-Runtime/blob/master/src/core/runtime/ and https://github.com/ROCm/ROCT-Thunk-Interface/tree/master/tests/kfdtest/, it seems that all the userspace dma copy are utilizing the hsa driver. But as I know currently there's no hsa support in Xilinx Alveo cards (maybe there's on-going work), so I wonder if it's possible for dma p2p between FPGA and AMDGPU

I also raised this question in openucx/ucx#9598 and found Xilinx Alveo cards support PCIe dma p2p, via opencl on XRT. Does that mean I can use opencl to achieve p2p between FPGA and AMDGPU? However as I understand rocm-opencl-runtime is also based on hsa.

Operating System

Debian 12

CPU

AMD EPYC 7702 64-Core Processor

GPU

AMD Instinct MI100

ROCm Version

ROCm 5.7.1

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@ppanchad-amd
Copy link

@littlewu2508 Internal ticket has been created to assist with your question. Thanks!

@tcgu-amd
Copy link

tcgu-amd commented Sep 23, 2024

Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.

Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.

Thanks!

@littlewu2508
Copy link
Author

Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.

Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.

Thanks!

Thank you very much for the idea! I will try it out when I have time.

Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.

@tcgu-amd
Copy link

tcgu-amd commented Sep 24, 2024

Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.

@littlewu2508. If everything works out, the GPU should be directly accessing the FPGA memory without using the host buffer. The idea is that hipHostRegister() will coordinate with the OS to translate and pin the target host memory, which in our case would be the the virtually mapped FPGA memory. This involves a virtual-to-physical memory translation in the OS, after which the GPU is given a device pointer that should correspond to the physical memory address (which is on the FPGA). It can then directly access the FPGA memory with the help of the PCIe controller. The main troublesome part is the address translation in the OS, because that involves the GPU and the FPGA drivers, as well as the PCIe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants