Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bladebit-cuda-v3.1.0-windows-x86-64 very slowly #448

Open
valentosnik opened this issue Dec 28, 2023 · 12 comments
Open

bladebit-cuda-v3.1.0-windows-x86-64 very slowly #448

valentosnik opened this issue Dec 28, 2023 · 12 comments

Comments

@valentosnik
Copy link

valentosnik commented Dec 28, 2023

With this code line:
.\bladebit_cuda -f xch -c xch -z 7 -n 90 -w cudaplot --disk-128 -t1 Z:\Tmp\ -t2 Z:\Tmp\ Q:\NFT\
it takes up to 50 min for one plot.
System: Win 10 pro, 128 GB Ram, Ryzen 7 5800x, RTX3070, Z:\ - Gen 4 NVMe

What do I do wrong?

Generating plot 14 / 90: a87ce0887756f8dcda26bb50dd57fc1928ce6dba1e6d6c7522b873a3ffe5912a
Plot temporary file: Q:\NFT\plot-k32-c07-2023-12-28-09-40-a87ce0887756f8dcda26bb50dd57fc1928ce6dba1e6d6c7522b873a3ffe5912a.plot.tmp

Generating F1
Finished F1 in 31.75 seconds.
Table 2 completed in 74.02 seconds with 4294890872 entries.
Table 3 completed in 426.05 seconds with 4294837918 entries.
Table 4 completed in 470.47 seconds with 4294781997 entries.
Table 5 completed in 353.66 seconds with 4294636582 entries.
Table 6 completed in 554.12 seconds with 4294187494 entries.
Table 7 completed in 205.78 seconds with 4293486129 entries.
Finalizing Table 7
Finalized Table 7 in 10.29 seconds.
Completed Phase 1 in 2132.26 seconds
Marked Table 6 in 11.01 seconds.
Marked Table 5 in 25.55 seconds.
Marked Table 4 in 18.59 seconds.
Marked Table 3 in 22.13 seconds.
Completed Phase 2 in 77.28 seconds
Compressing Table 2 and 3...
Step 1 completed step in 144.55 seconds.
Step 2 completed step in 14.79 seconds.
Completed table 2 in 159.34 seconds with 3439742752 / 4294837918 entries ( 80.09% ).
Compressing tables 3 and 4...
Step 1 completed step in 233.54 seconds.
Step 2 completed step in 30.60 seconds.
Step 3 completed step in 67.32 seconds.
Completed table 3 in 331.46 seconds with 3465825632 / 4294781997 entries ( 80.70% ).
Compressing tables 4 and 5...
Step 1 completed step in 37.38 seconds.
Step 2 completed step in 16.94 seconds.
Step 3 completed step in 101.94 seconds.
Completed table 4 in 156.28 seconds with 3532540674 / 4294636582 entries ( 82.25% ).
Compressing tables 5 and 6...
Step 1 completed step in 20.95 seconds.
Step 2 completed step in 16.97 seconds.
Step 3 completed step in 116.36 seconds.
Completed table 5 in 154.29 seconds with 3712840674 / 4294187494 entries ( 86.46% ).
Compressing tables 6 and 7...
Step 1 completed step in 61.23 seconds.
Step 2 completed step in 47.44 seconds.
Step 3 completed step in 203.17 seconds.
Completed table 6 in 311.86 seconds with 4293486129 / 4293486129 entries ( 100.00% ).
Serializing P7 entries
Completed serializing P7 entries in 42.87 seconds.
Completed Phase 3 in 1156.13 seconds
Completed Plot 1 in 3365.67 seconds ( 56.09 minutes )

@rhcompany1337
Copy link

Assuming your drive Q: is a slow HDD, you have the issue right there.
Bladebit will write parts directly to the final drive during the generation of the plot. Often the job will have to wait for your Q drive to finish writing.

To prevent this, use another fast SSD, or the same as your temp SSD as final directory.
Than create another job/script to move the finished plots to the final drive.
You can use plow (not working with windows atm) or robocoby etc. to move the plots. You will then face the problem that you generate plots faster than one HDD can write.
Kind regards

@valentosnik
Copy link
Author

I have already tried it and do not see a big difference. Maybe 5 minutes less. I can also observe, that fast SSD will be used very slow. Can it be that windows or the code reduces the speed? With Gigahorce Plotter it takes only 7 min with the same hardware...

@rhcompany1337
Copy link

I remember i had some issues with the short version of the parameters. I think some didnt work. So i mostly used the long version of parameters.
For Example for Windows (powershell):
./bladebit_cuda.exe -f xch -c xch --threads 14 -n 2 --compress 3 cudaplot --disk-128 -t1 G:\temp D:\plots

-t1 Frist drive is the temp second drive is the final drive (use same or different fast SSD here)
But its all in one parameter.

Give it a try.

@valentosnik
Copy link
Author

Still the same. What I can observe: Gigahorse uses common RAM (up to 60 GB) and bladebit do not. Only RAM use goes high...

@rhcompany1337
Copy link

rhcompany1337 commented Jan 3, 2024

Can you post the terminal output thats is in front of you posted output. this might give some extra information.
The output at the start before the first plot gets created

@jeffmiao2016
Copy link

Me too. I have tried 256g ram and 128g+nvme, the speed is the same.
System: win11

Bladebit Chia Plotter
Version : 3.1.0
Git Commit : e9836f8
Compiled With: msvc 19.29.30152

[Global Plotting Config]
Will create 1 plots.
Thread count : 80
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key :
Pool contract address :
Compression Level : 7
Benchmark mode : disabled

[Bladebit CUDA Plotter]
Host RAM : 382 GiB
Plot checks : disabled

Selected cuda device 0 : Tesla P4
CUDA Compute Capability : 6.1
SM count : 20
Max blocks per SM : 32
Max threads per SM : 2048
Async Engine Count : 1
L2 cache size : 2.00 MB
L2 persist cache max size : 0.00 MB
Stack Size : 1.00 KB
Memory:
Total : 8.00 GB
Free : 7.01 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required : 91955994624 bytes ( 87696.07 MiB or 85.64 GiB )
Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB )
Host RAM required : 142270791680 bytes ( 135680.00 MiB or 132.50 GiB )
Total Host RAM required : 234226786304 bytes ( 223376.07 MiB or 218.14 GiB )
GPU RAM required : 6163050496 bytes ( 5877.54 MiB or 5.74 GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: 841186af4a31f234ea83d3546801c33ff0ed28ef62262754cb6ddb1acdce7d39
Plot temporary file: H:\plot-k32-c07-2024-01-04-22-59-841186af4a31f234ea83d3546801c33ff0ed28ef62262754cb6ddb1acdce7d39.plot.tmp

Generating F1
Progress update: 0.01
Finished F1 in 14.12 seconds.
Progress update: 0.1
Table 2 completed in 53.67 seconds with 4294944749 entries.
Progress update: 0.2
Table 3 completed in 88.48 seconds with 4294923861 entries.
Progress update: 0.3
Table 4 completed in 107.92 seconds with 4294720598 entries.
Progress update: 0.4
Table 5 completed in 107.76 seconds with 4294299996 entries.
Progress update: 0.5
Table 6 completed in 93.75 seconds with 4293420175 entries.
Progress update: 0.6
Table 7 completed in 70.86 seconds with 4291904047 entries.
Progress update: 0.7
Finalizing Table 7
Finalized Table 7 in 30.77 seconds.
Completed Phase 1 in 568.55 seconds
Progress update: 0.8
Marked Table 6 in 20.98 seconds.
Marked Table 5 in 18.33 seconds.
Marked Table 4 in 17.62 seconds.
Marked Table 3 in 18.09 seconds.
Completed Phase 2 in 75.02 seconds
Progress update: 0.9
Compressing Table 2 and 3...
Step 1 completed step in 21.54 seconds.
Step 2 completed step in 30.81 seconds.
Completed table 2 in 52.35 seconds with 3439716041 / 4294923861 entries ( 80.09% ).
Compressing tables 3 and 4...
Step 1 completed step in 19.71 seconds.
Step 2 completed step in 39.19 seconds.
Step 3 completed step in 38.05 seconds.
Completed table 3 in 96.95 seconds with 3465652916 / 4294720598 entries ( 80.70% ).
Compressing tables 4 and 5...
Step 1 completed step in 20.05 seconds.
Step 2 completed step in 39.52 seconds.
Step 3 completed step in 38.39 seconds.
Completed table 4 in 97.95 seconds with 3532022496 / 4294299996 entries ( 82.25% ).
Compressing tables 5 and 6...
Step 1 completed step in 20.42 seconds.
Step 2 completed step in 40.77 seconds.
Step 3 completed step in 40.19 seconds.
Completed table 5 in 101.38 seconds with 3711947644 / 4293420175 entries ( 86.46% ).
Compressing tables 6 and 7...
Step 1 completed step in 20.46 seconds.
Step 2 completed step in 44.67 seconds.
Step 3 completed step in 48.47 seconds.
Completed table 6 in 113.61 seconds with 4291904047 / 4291904047 entries ( 100.00% ).
Serializing P7 entries
Completed serializing P7 entries in 27.39 seconds.
Completed Phase 3 in 489.64 seconds
Progress update: 0.95
Completed Plot 1 in 1133.21 seconds ( 18.89 minutes )

H:\plot-k32-c07-2024-01-04-22-59-841186af4a31f234ea83d3546801c33ff0ed28ef62262754cb6ddb1acdce7d39.plot.tmp -> H:\plot-k32-c07-2024-01-04-22-59-841186af4a31f234ea83d3546801c33ff0ed28ef62262754cb6ddb1acdce7d39.plot
Completed writing plot in 0.07 seconds
Final plot table pointers:
Table 1: 0 ( 0x0000000000000000 )
Table 2: 1289294040 ( 0x000000004cd910d8 )
Table 3: 5068279290 ( 0x000000012e17cdfa )
Table 4: 19155960840 ( 0x0000000475c8c408 )
Table 5: 33513430665 ( 0x00000007cd8e5e89 )
Table 6: 48602285040 ( 0x0000000b50ec03f0 )
Table 7: 66048629565 ( 0x0000000f60ce1b3d )
C 1 : 4096 ( 0x0000000000001000 )
C 2 : 1720864 ( 0x00000000001a4220 )
C 3 : 1721040 ( 0x00000000001a42d0 )

Final plot table sizes:
Table 1: 0.00 MiB
Table 2: 3603.92 MiB
Table 3: 13435.06 MiB
Table 4: 13692.35 MiB
Table 5: 14389.85 MiB
Table 6: 16638.13 MiB
Table 7: 16883.96 MiB
C 1 : 1.64 MiB
C 2 : 0.00 MiB
C 3 : 1227.93 MiB

@valentosnik
Copy link
Author

with this code: .\bladebit_cuda.exe -f xch -c xch --threads 14 -n 1 --compress 3 cudaplot --disk-128 -t1 Z:\TMP\ Z:\NFT\

Bladebit Chia Plotter
Version : 3.1.0
Git Commit : e9836f8
Compiled With: msvc 19.29.30152

[Global Plotting Config]
Will create 1 plots.
Thread count : 14
Warm start enabled : false
NUMA disabled : false
CPU affinity disabled : false
Farmer public key : xch
Pool contract address : xch
Compression Level : 3
Benchmark mode : disabled

[Bladebit CUDA Plotter]
Host RAM : 127 GiB
Plot checks : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 3070
CUDA Compute Capability : 8.6
SM count : 46
Max blocks per SM : 16
Max threads per SM : 1536
Async Engine Count : 5
L2 cache size : 4.00 MB
L2 persist cache max size : 3.00 MB
Stack Size : 1.00 KB
Memory:
Total : 8.00 GB
Free : 6.93 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required : 92405843664 bytes ( 88125.08 MiB or 86.06 GiB )
Intermediate RAM required : 4378927104 bytes ( 4176.07 MiB or 4.08 GiB )
Host RAM required : 28420603904 bytes ( 27104.00 MiB or 26.47 GiB )
Total Host RAM required : 120826447568 bytes ( 115229.08 MiB or 112.53 GiB )
GPU RAM required : 6163857408 bytes ( 5878.31 MiB or 5.74 GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: 86f5af3f8c8fd54db8626565b11fb072f47f9d5ec412b37208094a4612d7528e
Plot temporary file: Z:\NFT\plot-k32-c03-2024-01-07-17-52-86f5af3f8c8fd54db8626565b11fb072f47f9d5ec412b37208094a4612d7528e.plot.tmp

Generating F1
Finished F1 in 5.99 seconds.
Table 2 completed in 119.34 seconds with 4294959390 entries.
Table 3 completed in 338.96 seconds with 4294941788 entries.
Table 4 completed in 358.90 seconds with 4294967296 entries.
Table 5 completed in 334.62 seconds with 4294912943 entries.
Table 6 completed in 373.58 seconds with 4294905795 entries.
Table 7 completed in 235.77 seconds with 4294730296 entries.
Finalizing Table 7
Finalized Table 7 in 93.18 seconds.
Completed Phase 1 in 1862.40 seconds
Marked Table 6 in 26.66 seconds.
Marked Table 5 in 20.98 seconds.
Marked Table 4 in 10.01 seconds.
Marked Table 3 in 10.71 seconds.
Completed Phase 2 in 68.36 seconds
Compressing Table 2 and 3...
Step 1 completed step in 85.92 seconds.
Step 2 completed step in 38.15 seconds.
Completed table 2 in 124.06 seconds with 3439892460 / 4294941788 entries ( 80.09% ).
Compressing tables 3 and 4...
Step 1 completed step in 209.68 seconds.
Step 2 completed step in 72.98 seconds.
Step 3 completed step in 48.50 seconds.
Completed table 3 in 331.17 seconds with 3466118706 / 4294967296 entries ( 80.70% ).
Compressing tables 4 and 5...
Step 1 completed step in 32.51 seconds.
Step 2 completed step in 24.81 seconds.
Step 3 completed step in 25.57 seconds.
Completed table 4 in 82.89 seconds with 3532951205 / 4294912943 entries ( 82.26% ).
Compressing tables 5 and 6...
Step 1 completed step in 29.30 seconds.
Step 2 completed step in 24.53 seconds.
Step 3 completed step in 41.85 seconds.
Completed table 5 in 95.68 seconds with 3713619268 / 4294905795 entries ( 86.47% ).
Compressing tables 6 and 7...
Step 1 completed step in 35.02 seconds.
Step 2 completed step in 27.61 seconds.
Step 3 completed step in 63.85 seconds.
Completed table 6 in 126.48 seconds with 4294730296 / 4294730296 entries ( 100.00% ).
Serializing P7 entries
Completed serializing P7 entries in 9.17 seconds.
Completed Phase 3 in 769.49 seconds
Completed Plot 1 in 2700.26 seconds ( 45.00 minutes )

Z:\NFT\plot-k32-c03-2024-01-07-17-52-86f5af3f8c8fd54db8626565b11fb072f47f9d5ec412b37208094a4612d7528e.plot.tmp -> Z:\NFT\plot-k32-c03-2024-01-07-17-52-86f5af3f8c8fd54db8626565b11fb072f47f9d5ec412b37208094a4612d7528e.plot
Completed writing plot in 39.16 seconds
Final plot table pointers:
Table 1: 0 ( 0x0000000000000000 )
Table 2: 1290144172 ( 0x000000004ce609ac )
Table 3: 11959185692 ( 0x00000002c8d2b11c )
Table 4: 26048757017 ( 0x0000000610a07d19 )
Table 5: 40409998067 ( 0x00000009689fa2f3 )
Table 6: 55505645642 ( 0x0000000cec64f04a )
Table 7: 72963478667 ( 0x00000010fcf6548b )
C 1 : 4096 ( 0x0000000000001000 )
C 2 : 1721996 ( 0x00000000001a468c )
C 3 : 1722172 ( 0x00000000001a473c )

Final plot table sizes:
Table 1: 0.00 MiB
Table 2: 10174.79 MiB
Table 3: 13436.86 MiB
Table 4: 13695.95 MiB
Table 5: 14396.33 MiB
Table 6: 16649.09 MiB
Table 7: 16895.07 MiB
C 1 : 1.64 MiB
C 2 : 0.00 MiB
C 3 : 1228.73 MiB

@rhcompany1337
Copy link

I don't see anything obvious other than your times being much to high.
So here are some obvious thinks to check:

Nvidia driver up do date? I know they had issues with older drivers and bladebit. Worth a check!

Check you NVME SSD speed. e.g. Crystal Disk Mark. Its very odd to me that the step " Completed writing plot" took 39.16 seconds for you.
That writing/copy is on the same ssd. It only is the change of a pointer in your file system. It should take like a second.

If you run bladebit in powershell (my suggestion) you should run powershell with admin rights.

Also odd, but probably nothing: I don't use the last backslash in the command path
Yours:
Z:\NFT
Mine:
Z:\NFT

It might also help to open windows recource manager while plotting to locate where the bottleneck is.
Have a look at cpu, gpu and ssd usage.

The second entry wit a time "Table 2 completed in" should be at something like 20 seconds or less.

@rhcompany1337
Copy link

oh and a warning. check your plots when done. like deep check them. don't settle for the default 30 checks. go like 100 or 200 checks. I had so many bad plots in an earlier version. they all would pass the 30 checks. but going to 200 they showed to be faulty.

@valentosnik
Copy link
Author

Thank you for ideas.
Hmmm...
I use power shell with admin rights. tried without backslash --> same. Driver for Grafik is one of the latest. NVMe writings are very slowly. around 300 MB/s. The same hardware with gigahorse plotter up to 4 GB/s. No idea. Bladebit is on my system very slowly. I will try it on another PC with Linux...

@LeroyINC
Copy link

LeroyINC commented Jan 8, 2024

i think there is some issue sometimes with slow writes to NVMe disks. have you tired do turn off direct-io?
that is a command line switch option for blade bit itself to add to your blade-bit command.

@valentosnik
Copy link
Author

.\bladebit_cuda.exe -f xch -c xch --no-direct-io --threads 14 -n 1000 --compress 3 cudaplot --disk-128 -t1 Z:\TMP Z:\NFT
same shit with no direct io...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants