Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda plot Error STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered. #441

Open
Perk-Mew opened this issue Nov 19, 2023 · 14 comments

Comments

@Perk-Mew
Copy link

need some help I try to plot with chia Gui 2.1.1
My system is AMD Threadripper 3970x 256 gb of ram and rog rtx 3070 8gb

Bladebit Chia Plotter
Version : 3.1.0
Git Commit : e9836f8
Compiled With: msvc 19.29.30152

[Global Plotting Config]
Will create 1 plots.
Thread count : 32
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : b5a8672980142bb8f3b51293b5252f739de0c124db9c3d4b93384d14775de4f0c80b568105f72d022ee3170ec8a5b41e
Pool contract address : xch1nkntyptsljk8t7n3j2j5fa6hw5p28ht85ckspdf59qhvxqr26mfsndj6yv
Compression Level : 7
Benchmark mode : disabled

[Bladebit CUDA Plotter]
Host RAM : 255 GiB
Plot checks : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 3070
CUDA Compute Capability : 8.6
SM count : 46
Max blocks per SM : 16
Max threads per SM : 1536
Async Engine Count : 1
L2 cache size : 4.00 MB
L2 persist cache max size : 3.00 MB
Stack Size : 1.00 KB
Memory:
Total : 8.00 GB
Free : 6.95 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required : 91955994624 bytes ( 87696.07 MiB or 85.64 GiB )
Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB )
Host RAM required : 142270791680 bytes ( 135680.00 MiB or 132.50 GiB )
Total Host RAM required : 234226786304 bytes ( 223376.07 MiB or 218.14 GiB )
GPU RAM required : 6163857408 bytes ( 5878.31 MiB or 5.74 GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: 5f48a25977249f0319bf1ef50234b4178a6530b80deb8df735a842ccbc31bc6b
Plot temporary file: A:\plot-k32-c07-2023-11-19-22-57-5f48a25977249f0319bf1ef50234b4178a6530b80deb8df735a842ccbc31bc6b.plot.tmp

Generating F1
Progress update: 0.01
Finished F1 in 4.37 seconds.
Progress update: 0.1
Table 2 completed in 14.44 seconds with 4294920960 entries.
Progress update: 0.2
Table 3 completed in 25.13 seconds with 4294908477 entries.
Progress update: 0.3
Table 4 completed in 29.95 seconds with 4294902283 entries.
Progress update: 0.4
Table 5 completed in 28.91 seconds with 4294888542 entries.
Progress update: 0.5
Table 6 completed in 24.22 seconds with 4294849124 entries.
Progress update: 0.6
Table 7 completed in 21.40 seconds with 4294750028 entries.
Progress update: 0.7
Finalizing Table 7
STDERR: CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

STDERR:

STDERR: *** Panic!!! *** Fatal Error:

STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.

0x00007FF6E3B793E2 @ ::
0x00007FF6E3C7ED79 @ ::
0x00007FF6E3C9CE0D @ ::
0x00007FF6E3CC23FA @ ::
0x00007FF6E3CC1FD4 @ ::
0x00007FF6E3C9DCE8 @ ::
0x00007FF6E3C9EE51 @ ::
0x00007FF6E3C9FFDF @ ::
0x00007FF6E3CA06BF @ ::
0x00007FF6E3B5F0A8 @ ::
0x00007FF6E3D0AFEC @ ::
0x00007FF9DA98257D @ ::BaseThreadInitThunk()
0x00007FF9DCA8AA58 @ ::RtlUserThreadStart()

@sobertram
Copy link

Curious what is your nvidia-smi.exe output?

@Perk-Mew
Copy link
Author

what is nvidia-smi.exe output? Driver?

@sobertram
Copy link

sobertram commented Nov 20, 2023

what is nvidia-smi.exe output? Driver?

It is a program that comes with your driver installation.

e.g.

Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

Try the new cross-platform PowerShell https://aka.ms/pscore6

PS C:\Users\sober> nvidia-smi.exe
Mon Nov 20 07:42:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.13                 Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P4                     TCC   | 00000000:02:00.0 Off |                    0 |
| N/A   63C    P0              25W /  75W |    546MiB /  7680MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     16724      C   ...unpacked\daemon\start_harvester.exe      538MiB |
+---------------------------------------------------------------------------------------+

Want to see how much memory is being used and by what programs.

I am assuming you are on windows if on linux then its just nvidia-smi.

@Perk-Mew
Copy link
Author

yes windows 11 . how could i fix this problem.

@sobertram
Copy link

yes windows 11 . how could i fix this problem.

Can you share the output, like i did, of the nvidia-smi.exe on your system? It will show what programs are using memory. The error you are getting usually means the GPU has maxed out it's memory.

Also are you getting this error on every plot?

@Perk-Mew
Copy link
Author

yes even i try to plot by using hybrid mode it's still error
how to see the output like you did?

@Perk-Mew
Copy link
Author

Uploading Screenshot 2023-11-20 23085411.png…

@Perk-Mew
Copy link
Author

Screenshot 2023-11-20 231129

@Perk-Mew
Copy link
Author

it's show n/a

@sobertram
Copy link

it's show n/a

Right but we can also see 1096MiB / 8192MiB so looks like it is accurately, in bladebit, reflecting your free mem.
I had some issues with cuda 12.3 on linux you may want to try 12.2 and see if that is more compatible.

So re-install the nvidia driver but install cuda 12.2.

I found the stable version for your card. But unlike unix, can't select the cuda version at download so not sure what version willbe installed. Hope this works for you.
https://us.download.nvidia.com/Windows/546.01/546.01-desktop-win10-win11-64bit-international-nsd-dch-whql.exe

@Perk-Mew
Copy link
Author

Bladebit Chia Plotter
Version : 3.1.0
Git Commit : e9836f8
Compiled With: msvc 19.29.30152

[Global Plotting Config]
Will create 1 plots.
Thread count : 32
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : b5a8672980142bb8f3b51293b5252f739de0c124db9c3d4b93384d14775de4f0c80b568105f72d022ee3170ec8a5b41e
Pool contract address : xch1nkntyptsljk8t7n3j2j5fa6hw5p28ht85ckspdf59qhvxqr26mfsndj6yv
Compression Level : 7
Benchmark mode : disabled

[Bladebit CUDA Plotter]
Host RAM : 255 GiB
Plot checks : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 3070
CUDA Compute Capability : 8.6
SM count : 46
Max blocks per SM : 16
Max threads per SM : 1536
Async Engine Count : 1
L2 cache size : 4.00 MB
L2 persist cache max size : 3.00 MB
Stack Size : 1.00 KB
Memory:
Total : 8.00 GB
Free : 6.95 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required : 91955994624 bytes ( 87696.07 MiB or 85.64 GiB )
Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB )
Host RAM required : 142270791680 bytes ( 135680.00 MiB or 132.50 GiB )
Total Host RAM required : 234226786304 bytes ( 223376.07 MiB or 218.14 GiB )
GPU RAM required : 6163857408 bytes ( 5878.31 MiB or 5.74 GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: a2fd8774ceb12525f7abcf4b701c0857162f2db2d89a554cf87e5e223ab4a014
Plot temporary file: A:\plot-k32-c07-2023-11-21-00-19-a2fd8774ceb12525f7abcf4b701c0857162f2db2d89a554cf87e5e223ab4a014.plot.tmp

Generating F1
Progress update: 0.01
Finished F1 in 4.79 seconds.
Progress update: 0.1
Table 2 completed in 14.98 seconds with 4294960998 entries.
Progress update: 0.2
Table 3 completed in 27.71 seconds with 4294887329 entries.
Progress update: 0.3
Table 4 completed in 33.68 seconds with 4294790221 entries.
Progress update: 0.4
Table 5 completed in 29.64 seconds with 4294597227 entries.
Progress update: 0.5
Table 6 completed in 25.63 seconds with 4294168017 entries.
Progress update: 0.6
Table 7 completed in 19.79 seconds with 4293380871 entries.
Progress update: 0.7
Finalizing Table 7
Finalized Table 7 in 9.58 seconds.
Completed Phase 1 in 166.15 seconds
Progress update: 0.8
Marked Table 6 in 3.56 seconds.
Marked Table 5 in 3.25 seconds.
Marked Table 4 in 3.58 seconds.
Marked Table 3 in 3.48 seconds.
Completed Phase 2 in 13.88 seconds
Progress update: 0.9
Compressing Table 2 and 3...
STDERR: CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

STDERR:

STDERR: *** Panic!!! *** Fatal Error:

STDERR: CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.

0x00007FF7273A93E2 @ ::
0x00007FF7274AED79 @ ::
0x00007FF7274CCE0D @ ::
0x00007FF7274F22CA @ ::
0x00007FF7274F1F38 @ ::
0x00007FF7274D892B @ ::
0x00007FF7274D9CEB @ ::
0x00007FF7274D0152 @ ::
0x00007FF7274D06BF @ ::
0x00007FF72738F0A8 @ ::
0x00007FF72753AFEC @ ::
0x00007FFF562D257D @ ::BaseThreadInitThunk()
0x00007FFF585CAA58 @ ::RtlUserThreadStart()

@Perk-Mew
Copy link
Author

i have install cuda 12.2 and reinstall the driver but now it's stop at 90 percent

@Perk-Mew
Copy link
Author

@harold-b i got the same problem did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants