You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: I've committed a few sins here to get this working. Specifically I have edited the following file
/usr/src/amdgpu-6.8.5-2009582.24.04/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
line 563 to be ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
so that it works with the back ported changes from torvalds/linux@5a507b7
I am running the tests for flash-attention after modifying the setup.py to allow for targeting gfx908 as an expriment.
Install the dkms driver and rocm packages on ubuntu 24.04. DKMS will fail, edit the source as described in the problem description and rerun apt install amdgpu-dkms it will now install.
Modify setup.py line 124 allowed_archs to include "gfx908" such that it is now
allowed_archs = ["native", "gfx908", "gfx90a", "gfx940", "gfx941", "gfx942"]
pull the rocm/pytorch:latest
from the flash-attention folder
podman run --rm -it --device /dev/kfd --device /dev/dri --group-add keep-groups --security-opt seccomp=unconfined -v ./:/source rocm/pytorch:latest bash
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module version 6.8.5 is loaded
HSA System Attributes
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD EPYC 7402P 24-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7402P 24-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2800
BDFID: 0
Internal Node ID: 0
Compute Unit: 24
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx908
Uuid: GPU-fdc9fe5eaa557cf3
Marketing Name: AMD Instinct MI100
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29580(0x738c)
ASIC Revision: 2(0x2)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1502
BDFID: 50688
Internal Node ID: 1
Compute Unit: 120
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 67
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
The text was updated successfully, but these errors were encountered:
So the official stance is that only MI200 and MI300 are supported for flash-attention:
_AMD ROCm Support
ROCm version uses composable_kernel as the backend. It provides the implementation of FlashAttention-2.
Requirements:
ROCm 6.0 and above.
We recommend the Pytorch container from ROCm, which has all the required tools to install FlashAttention.
FlashAttention-2 with ROCm currently supports:
MI200 or MI300 GPUs.
Datatype fp16 and bf16
Forward's head dimensions up to 256. Backward head dimensions up to 128._
-https://github.com/ROCm/flash-attention
I'll close this one off, but you should open the same issue over on the flash-attention repo. The VM faults you're seeing seem like application-level faults. The addresses are valid, but they're unmapped (hence the error), so the fix should be coming from the flash-attention crew. I wish github had an easier way to move issues from one repo to another, but closing this and opening a new one at https://github.com/ROCm/flash-attention/issues/new/choose is the best way to get attention on the issue. Good luck!
Problem Description
OS:
NAME="Ubuntu"
VERSION="24.04.1 LTS (Noble Numbat)"
CPU:
model name : AMD EPYC 7402P 24-Core Processor
GPU:
Name: AMD EPYC 7402P 24-Core Processor
Marketing Name: AMD EPYC 7402P 24-Core Processor
Name: gfx908
Marketing Name: AMD Instinct MI100
Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
I'm getting a similar error to this #147
NOTE: I've committed a few sins here to get this working. Specifically I have edited the following file
/usr/src/amdgpu-6.8.5-2009582.24.04/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
line 563 to be ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
so that it works with the back ported changes from
torvalds/linux@5a507b7
I am running the tests for flash-attention after modifying the setup.py to allow for targeting gfx908 as an expriment.
[ 4778.915423] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.916081] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.916571] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae8570611000 from IH client 0x1b (UTCL2)
[ 4778.917054] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[ 4778.917532] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 4778.918012] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x1
[ 4778.918491] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.918961] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 4778.919424] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.919879] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.920325] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.920781] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.921242] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae8570612000 from IH client 0x1b (UTCL2)
[ 4778.921713] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.922182] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.922656] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.923125] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.923594] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.924057] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.924513] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.924959] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.925416] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.925877] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae8570613000 from IH client 0x1b (UTCL2)
[ 4778.926426] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.926895] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.927366] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.927836] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.928305] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.928768] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.929224] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.929670] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.930126] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.930588] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705a8000 from IH client 0x1b (UTCL2)
[ 4778.931135] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.931604] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.932079] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.932549] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.933020] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.933484] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.933940] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.934389] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.934845] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.935306] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705a9000 from IH client 0x1b (UTCL2)
[ 4778.935855] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.936325] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.936797] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.937266] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.937737] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.938199] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.938653] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.939098] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.939552] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.940011] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705a8000 from IH client 0x1b (UTCL2)
[ 4778.940555] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.941023] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.941492] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.941961] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.942430] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.942891] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.943346] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.943795] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.944254] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.944717] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705aa000 from IH client 0x1b (UTCL2)
[ 4778.945265] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.945738] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.946211] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.946683] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.947156] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.947617] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.948071] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.948515] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.948969] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.949429] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705a9000 from IH client 0x1b (UTCL2)
[ 4778.949976] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.950443] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.950913] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.951382] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.951850] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.952311] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.952764] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.953209] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.953663] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.954122] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae85705aa000 from IH client 0x1b (UTCL2)
[ 4778.954665] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.955132] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.955601] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.956069] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.956537] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.956998] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.957452] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
[ 4778.957896] amdgpu 0000:c6:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769)
[ 4778.958350] amdgpu 0000:c6:00.0: amdgpu: for process pt_main_thread pid 12924 thread pt_main_thread pid 12924)
[ 4778.958810] amdgpu 0000:c6:00.0: amdgpu: in page starting at address 0x0000ae8570678000 from IH client 0x1b (UTCL2)
[ 4778.959362] amdgpu 0000:c6:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 4778.959829] amdgpu 0000:c6:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0)
[ 4778.960298] amdgpu 0000:c6:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4778.960767] amdgpu 0000:c6:00.0: amdgpu: WALKER_ERROR: 0x0
[ 4778.961236] amdgpu 0000:c6:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 4778.961697] amdgpu 0000:c6:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 4778.962150] amdgpu 0000:c6:00.0: amdgpu: RW: 0x0
Operating System
Ubuntu 24.04.1
CPU
AMD EPYC 7402P
GPU
AMD Instinct MI100
ROCm Version
ROCm 6.2.0
ROCm Component
ROCK-Kernel-Driver
Steps to Reproduce
Install the dkms driver and rocm packages on ubuntu 24.04. DKMS will fail, edit the source as described in the problem description and rerun apt install amdgpu-dkms it will now install.
Acquire latest flash-attention
https://github.com/iratebadger/flash-attention.git
Modify setup.py line 124 allowed_archs to include "gfx908" such that it is now
allowed_archs = ["native", "gfx908", "gfx90a", "gfx940", "gfx941", "gfx942"]
pull the rocm/pytorch:latest
from the flash-attention folder
podman run --rm -it --device /dev/kfd --device /dev/dri --group-add keep-groups --security-opt seccomp=unconfined -v ./:/source rocm/pytorch:latest bash
GPU_ARCHS=gfx908 python setup.py bdist_wheel
pip install dist/flash_attn-2.6.3-cp39-cp39-linux_x86_64.whl
pytest tests/test_flash_attn_ck.py
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module version 6.8.5 is loaded
HSA System Attributes
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD EPYC 7402P 24-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7402P 24-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2800
BDFID: 0
Internal Node ID: 0
Compute Unit: 24
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131707928(0x7d9b418) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx908
Uuid: GPU-fdc9fe5eaa557cf3
Marketing Name: AMD Instinct MI100
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29580(0x738c)
ASIC Revision: 2(0x2)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1502
BDFID: 50688
Internal Node ID: 1
Compute Unit: 120
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 67
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
The text was updated successfully, but these errors were encountered: