Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

128GB mode Failed to Write Slice Error win11 only occurs on SSDs #443

Open
BrandtH22 opened this issue Nov 29, 2023 · 21 comments
Open

128GB mode Failed to Write Slice Error win11 only occurs on SSDs #443

BrandtH22 opened this issue Nov 29, 2023 · 21 comments
Assignees
Labels
bug Something isn't working compression Plot

Comments

@BrandtH22
Copy link

This issue was reported by Delerium in discord: https://discord.com/channels/1034523881404370984/1102690350218354920/1179030557271793715

Hi guys - im using Win 11 on Bladebit 3.1 with 128GB ram and a 8GB Nvidia RTX 2080 and regardless of setting I keep getting "Failed to write slice on F://p1unsortedx-p11pairs-3lp-p3-lmap.tmp errror 0" - The temp drive is a local 1TB SSD.... what am I doing wrong?

Command used:
bladebit_cuda -f xxxx -c xxxx -n 1 --compress 2 cudaplot --disk-128 -t1 F:/ F:/

Notes:

  • Issue occurs only when setting an SSD as a temp drive (HDDs work without issue)
  • Tried reformatting the SSD (even tried multiple format types)
  • Tried using the standalone bladebit and also the integrated bladebit (integrated has error of STDERR: Failed to write slice)
  • Tried multiple different SSDs (3 samsung pro evo 1 TB)

Full CLI of failed run:

Bladebit Chia Plotter
Version      : 3.1.0
Git Commit   : e9836f8bd963321457bc86eb5d61344bfb76dcf0
Compiled With: msvc 19.29.30152

[Global Plotting Config]
 Will create 1 plots.
 Thread count          : 16
 Warm start enabled    : false
 NUMA disabled         : false
 CPU affinity disabled : false
 Farmer public key     : f
 Pool contract address : f
 Compression Level     : 2
 Benchmark mode        : disabled

[Bladebit CUDA Plotter]
 Host RAM            : 127 GiB
 Plot checks         : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 2080
 CUDA Compute Capability   : 7.5
 SM count                  : 46
 Max blocks per SM         : 16
 Max threads per SM        : 1024
 Async Engine Count        : 2
 L2 cache size             : 4.00 MB
 L2 persist cache max size : 0.00 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 8.00 GB
  Free                     : 6.96 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 92412135120  bytes ( 88131.08  MiB or 86.07  GiB )
Intermediate RAM required : 4385218560   bytes ( 4182.07   MiB or 4.08   GiB )
Host RAM required         : 28420603904  bytes ( 27104.00  MiB or 26.47  GiB )
Total Host RAM required   : 120832739024 bytes ( 115235.08 MiB or 112.53 GiB )
GPU RAM required          : 6167756800   bytes ( 5882.03   MiB or 5.74   GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: fbbb9cf468011ec5123479b0742f2dea31874c57a4f72d074a17b6b4ddc1be5d
Plot temporary file: F:/plotdone/plot-k32-c02-2023-11-28-20-36-fbbb9cf468011ec5123479b0742f2dea31874c57a4f72d074a17b6b4ddc1be5d.plot.tmp

Generating F1
Finished F1 in 12.17 seconds.
Table 2 completed in 37.99 seconds with 4294967296 entries.

Fatal Error:
Failed to write slice on 'F://p1unsortedx-p1lpairs-p3lp-p3-lmap.tmp' with error 0.

Full CLI of completed run using an HDD (attached):
chialog.txt

@BrandtH22 BrandtH22 added bug Something isn't working Plot compression labels Nov 29, 2023
@harold-b harold-b self-assigned this Nov 30, 2023
@GetStreamlined
Copy link

GetStreamlined commented Nov 30, 2023

I'm nice and active on this Harold-b so if you need me to test alternative settings to root cause this (or experimental releases) please just reach out. (This is the original raiser of the issue - Delerium on Discord).

@harold-b
Copy link
Contributor

harold-b commented Nov 30, 2023

Thank you, @GetStreamlined
Do you get the same issue w/ the SSD if you use --no-direct-io?

It's a global option that should come somewhere before cudaplot

@GetStreamlined
Copy link

GetStreamlined commented Nov 30, 2023

@harold-b Sadly same issue. Command used:

bladebit_cuda -f redacted-c redacted -n 1 --compress 5 --no-direct-io cudaplot --disk-128 -t1 F:/ F:/

Output:

Bladebit Chia Plotter
Version      : 3.1.0
Git Commit   : e9836f8bd963321457bc86eb5d61344bfb76dcf0
Compiled With: msvc 19.29.30152

[Global Plotting Config]
 Will create 1 plots.
 Thread count          : 16
 Warm start enabled    : false
 NUMA disabled         : false
 CPU affinity disabled : false
 Farmer public key     : redacted
 Pool contract address : redacted
 Compression Level     : 5
 Benchmark mode        : disabled

[Bladebit CUDA Plotter]
 Host RAM            : 127 GiB
 Plot checks         : disabled

Selected cuda device 0 : NVIDIA GeForce RTX 2080
 CUDA Compute Capability   : 7.5
 SM count                  : 46
 Max blocks per SM         : 16
 Max threads per SM        : 1024
 Async Engine Count        : 2
 L2 cache size             : 4.00 MB
 L2 persist cache max size : 0.00 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 8.00 GB
  Free                     : 6.96 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 92412135120  bytes ( 88131.08  MiB or 86.07  GiB )
Intermediate RAM required : 4385218560   bytes ( 4182.07   MiB or 4.08   GiB )
Host RAM required         : 28420603904  bytes ( 27104.00  MiB or 26.47  GiB )
Total Host RAM required   : 120832739024 bytes ( 115235.08 MiB or 112.53 GiB )
GPU RAM required          : 6167756800   bytes ( 5882.03   MiB or 5.74   GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: bf98e067348b10a1c3e431deea13573e25606eb1a3a5404ac45cfcf004c1b101
Plot temporary file: F:/plot-k32-c05-2023-11-30-23-10-bf98e067348b10a1c3e431deea13573e25606eb1a3a5404ac45cfcf004c1b101.plot.tmp

Generating F1
Finished F1 in 12.39 seconds.
Table 2 completed in 36.91 seconds with 4294967296 entries.

Fatal Error:
Failed to write slice on 'F://p1unsortedx-p1lpairs-p3lp-p3-lmap.tmp' with error 0.

@GetStreamlined
Copy link

Also conducted an iotest:

C:\Chia\Chia_Plotting\Plotting>bladebit_cuda iotest F:/
Size   : 4096.00 MiB
Cache  : 0.00 MiB
Threads: 1
Passes : 1
Performing test with file F:/
Allocating buffer...

Writing...
Wrote 4096.00 MiB in 2.03 seconds @ 2016.74 MiB/s (1.97 GiB/s) or 2115 MB/s (2.11 GB/s).

Reading...
Read 4096.00 MiB in 1.49 seconds @ 2758.25 MiB/s (2.69 GiB/s) or 2892 MB/s (2.89 GB/s)

@GetStreamlined
Copy link

I also switched the video card to a NVIDIA GeForce GTX 1660 Ti to do more trouble shooting. Sadly same output.

Bladebit Chia Plotter
Version      : 3.1.0
Git Commit   : e9836f8bd963321457bc86eb5d61344bfb76dcf0
Compiled With: msvc 19.29.30152

[Global Plotting Config]
 Will create 1 plots.
 Thread count          : 16
 Warm start enabled    : false
 NUMA disabled         : false
 CPU affinity disabled : false
 Farmer public key     : xxxx
 Pool contract address : xxxx
 Compression Level     : 5
 Benchmark mode        : disabled

[Bladebit CUDA Plotter]
 Host RAM            : 127 GiB
 Plot checks         : disabled

Selected cuda device 0 : NVIDIA GeForce GTX 1660 Ti
 CUDA Compute Capability   : 7.5
 SM count                  : 24
 Max blocks per SM         : 16
 Max threads per SM        : 1024
 Async Engine Count        : 2
 L2 cache size             : 1.50 MB
 L2 persist cache max size : 0.00 MB
 Stack Size                : 1.00 KB
 Memory:
  Total                    : 6.00 GB
  Free                     : 5.02 GB

Allocating buffers (this may take a few seconds)...
Kernel RAM required       : 92412135120  bytes ( 88131.08  MiB or 86.07  GiB )
Intermediate RAM required : 4385218560   bytes ( 4182.07   MiB or 4.08   GiB )
Host RAM required         : 28420603904  bytes ( 27104.00  MiB or 26.47  GiB )
Total Host RAM required   : 120832739024 bytes ( 115235.08 MiB or 112.53 GiB )
GPU RAM required          : 6167756800   bytes ( 5882.03   MiB or 5.74   GiB )
Allocating buffers...
Done.

Generating plot 1 / 1: e49016c42914b4a4f527bdd2abaf6817e7f344acce768e4cd0a09e257c4c3ae0
Plot temporary file: F:/plot-k32-c05-2023-12-01-16-59-e49016c42914b4a4f527bdd2abaf6817e7f344acce768e4cd0a09e257c4c3ae0.plot.tmp

Generating F1
Finished F1 in 13.62 seconds.
Table 2 completed in 79.00 seconds with 4294938662 entries.

Fatal Error:
Failed to write slice on 'F://p1unsortedx-p1lpairs-p3lp-p3-lmap.tmp' with error 0.

@teamwest93
Copy link

You run Terminal as Admin?

@GetStreamlined
Copy link

GetStreamlined commented Dec 1, 2023

You run Terminal as Admin?

I did yes and also tried without.

@GetStreamlined
Copy link

Additionally tried in powershell (with and without administrator). Same issue.

@teamwest93
Copy link

What abot beta1 or rc1 versions?

@GetStreamlined
Copy link

What abot beta1 or rc1 versions?

sadly they give a slightly different error (Failed to open plot file with error: 3)

@GetStreamlined
Copy link

@harold-b is there any update on this issue - im keen to get plotting as I dont want to go to Gigahorse.

@harold-b
Copy link
Contributor

I wonder if this is related to block size. Would you mind running diskplot on those target SSDs to see what block size bladebit reports (you don't have to make a plot, it should just report the block size for the temp directories).

@GetStreamlined
Copy link

GetStreamlined commented Dec 17, 2023

@harold-b

Here is the result of the diskplot using the SSD:

[Bladebit Disk Plotter]
 Heap size      : 3.37 GiB ( 3452.88 MiB )
 Cache size     : 0.00 GiB ( 0.00 MiB )
 Bucket count   : 256
 Alternating I/O: false
 F1  threads    : 16
 FP  threads    : 16
 C   threads    : 16
 P2  threads    : 16
 P3  threads    : 16
 I/O threads    : 1
 Temp1 block sz : 16384
 Temp2 block sz : 16384
 Temp1 path     : F:/
 Temp2 path     : F:/
 I/O metrices enabled.
 Allocating memory

If I used the HDD instead its different:

Temp1 block sz : 4096

@harold-b
Copy link
Contributor

Thanks for the info!
So it does look like it is block-size related. As a workaround for the time being you can try resetting the SSDs w/ 4k block size while this is resolved

@GetStreamlined
Copy link

Thanks for the info! So it does look like it is block-size related. As a workaround for the time being you can try resetting the SSDs w/ 4k block size while this is resolved

From research on Samsung Pro EVO SSD's you cannot change the block size so looks like I'm stuck waiting for a resolution :(

@GetStreamlined
Copy link

Hi @harold-b - hope you had a lovely Xmas and New Year. Do you have a rough timescale of when this will be resolved please?

James

@haorldbchi
Copy link
Contributor

I've started up work on bladebit stuff this week. I don't have a timeframe but hopefully this one won't take much since we know exactly where the issue lies. I certainly haven't forgotten about you

@GetStreamlined
Copy link

@harold-b fantastic! If you need me to test a beta release let me know :)

@sonosergio
Copy link

I have the same problem ...

@GetStreamlined
Copy link

I have the same problem ...

I gave up waiting so I tried Gigahorse. No problem there.

@piotr-nowicki
Copy link

Same here. Probably it's better to switch to something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compression Plot
Projects
None yet
Development

No branches or pull requests

7 participants