Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SupraSeal #69

Merged
merged 97 commits into from
Aug 9, 2024
Merged

SupraSeal #69

merged 97 commits into from
Aug 9, 2024

Conversation

magik6k
Copy link
Collaborator

@magik6k magik6k commented Jun 25, 2024

  • FFI interface
  • Reasonable build process
  • Rebase on main
  • Implement batch seal task
  • C1
  • Slot release in finalize
  • Test run
    • Dual-pipeline test run
  • curio seal batch-checklist command checking readiness for batch sealing
  • Add disk requirements to batch-cpu calc cmd
  • Make sure the batch sealer doesn't pick up non-cc sectors
  • Auto-detect space for 1/2 pipelines
  • Storage capacity calc

Build:

// Note: requires that the system-wide GCC is version 11.x, or that `gcc-11`/`g++-11` is present locally

make batch

// or calibnet
make batch-calibnet

Usage

  1. Update all nodes in the cluster to use this branch (use make curio for binaries without batch sealer support - it should be fine to use batch sealer binaries everywhere tho.). Use make batch to build batch-capable curio on batch-sealer nodes
  2. On batch sealer machine run curio calc batch-cpu. This will tell you what batch sizes the installed CPU can support. Note that in case of two-CPU systems performance might suffer significantly due to the CPU interconnect bottleneck. This wasn't tested yet so performance is unknown.
  3. Setup a new layer for the batch sealer, call it e.g. batch-machine1
    3.1. Set Subsystems.EnableBatchSeal to true
    3.2. Set Seal.LayerNVMEDevices to a list of nvme devices for layer storage
    3.3. IF your CPU is older, set Seal.SingleHasherPerThread to true - this is needed for Zen2 or OLDER (Epyc 7xx2)
  4. (optional) Either stop your existing SDR sealers, or set Subsystems.SealSDRMinTasks on one of the layers used by single-sector SDR sealers to a value larger than batch size - this is so that the non-batch sealers don't steal work before enough accumulates to start a batch
  5. (needed temporarily) Setup NVMe disks dedicated to supraseal layer storage (needed on every boot, future versions of curio will automate this)
# make sure you have 36 1G huge pages
sudo sysctl -w vm.nr_hugepages=36

# (in curio repo)
cd extern/supra_seal/deps/spdk-v22.09/   
env NRHUGE=36 ./scripts/setup.sh
  1. Start the batch-sealer curio node!
  2. Attach scratch space for P2 output (sealed sectors + TreeC/TreeR; 32G+36G per sector) (normal filesystem storage, should me NVMe backed; ~500MB/s/gpu + whatever is needed for moving sectors off to long-term storage)
  3. Add a batch of CC sectors!
curio seal start --now --cc --count 32 --actor f01234 --layers cluster --duration-days 365
  1. You should see a "Batch.." task running, claimed by the machine, CPU use should look something like the screenshot below
    2024-08-02-224523_2495x150_scrot
  2. In 3.5 ~ 5h you should see GPU use, this is phase2 computing TreeR / TreeC, should take ~2 mins per sector per GPU (3090-grade)
  3. After phase2 is done the rest of pipeline stages should execute normally, after the sector is finalized, the block storage will be released for another batch. A well balanced setup should keep phase1 running at all times, and execute phase2/waitseed/commit in before another batch finishes phase1.

Storage Recs:

You need 2 sets of nvme drives

  • Drives for layers - those need to total 10-20M IOPS, with capacity for the 11 x 32G x batchSize x pipelines - raw unformatted block devices, supraseal uses SPDK which will take them over SPDK is a userspace nvme driver, you won't even see the block device in /dev for those nvmes when it takes them over
  • Drives for P2 output - with a filesystem, just something reasonably fast with enough capacity (~70G x batchSize x pipelines). Can be remote storage if you can make it go fast enough (~500MiB/s/GPU)

Random unsorted notes:

gpu fan speed on headless systems
xorg.conf:

Section "Device"
    Identifier  "NVIDIA GPU"
    Driver      "nvidia"
    Option      "Coolbits" "4"
EndSection
#!/bin/bash

# Start Xorg or Xvfb
Xorg -noreset +extension GLX +extension RANDR +extension RENDER -logfile /var/log/Xorg.log -config /etc/X11/xorg.conf :0 &

# Allow some time for Xorg to start
sleep 2

# Set DISPLAY environment variable to use Xorg
export DISPLAY=:0

# Set fan control to manual and set the fan speed
nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=70"

# Keep Xorg running
wait

Hugepages must be configured (1g, min 36 pages for 32G sector sealing)

sudo sysctl -w vm.nr_hugepages=36

root@:~# cat /proc/meminfo | grep Huge -> should say:
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      36
HugePages_Free:       18
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        37748736 kB

@magik6k magik6k marked this pull request as draft June 25, 2024 10:56
@magik6k magik6k force-pushed the feat/supra_seal branch 13 times, most recently from a649566 to 30f7ed0 Compare July 7, 2024 08:00
@magik6k magik6k force-pushed the feat/snap branch 6 times, most recently from 10ecd79 to aff115d Compare July 16, 2024 14:04
Base automatically changed from feat/snap to main July 18, 2024 12:42
@magik6k magik6k force-pushed the feat/supra_seal branch 5 times, most recently from 25c2e6e to 2ee19eb Compare July 31, 2024 11:53
commit-phase1-output Outdated Show resolved Hide resolved
@magik6k magik6k marked this pull request as ready for review August 2, 2024 20:54
cmd/curio/calc.go Show resolved Hide resolved
deps/config/types.go Show resolved Hide resolved
lib/hugepageutil/checkhuge.go Show resolved Hide resolved
lib/paths/local.go Show resolved Hide resolved
lib/proof/porep_vproof_vanilla.go Show resolved Hide resolved
tasks/sealsupra/supra_config.go Show resolved Hide resolved
tasks/sealsupra/supra_config.go Show resolved Hide resolved
tasks/sealsupra/supra_config.go Outdated Show resolved Hide resolved
tasks/sealsupra/task_supraseal.go Show resolved Hide resolved
tasks/sealsupra/task_supraseal.go Show resolved Hide resolved
@snadrus
Copy link
Contributor

snadrus commented Aug 3, 2024

Lets document this as Batch Sealing Beta in documentation/en/batchsealing.md

since we hope to 1. modernize deps, 2. get rid of spdk 3. avoid needing Rust bin formats.

Copy link
Contributor

@snadrus snadrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put your PR doc info into documentation/en/batchsealing.md
so that doc is already updated in this PR?

lib/proof/sn-comp-sector-seal/main.go Show resolved Hide resolved
tasks/sealsupra/supra_config.go Outdated Show resolved Hide resolved
documentation/en/supraseal.md Outdated Show resolved Hide resolved
documentation/en/supraseal.md Show resolved Hide resolved
documentation/en/supraseal.md Outdated Show resolved Hide resolved
documentation/en/supraseal.md Show resolved Hide resolved
documentation/en/supraseal.md Outdated Show resolved Hide resolved
documentation/en/supraseal.md Outdated Show resolved Hide resolved
documentation/en/supraseal.md Show resolved Hide resolved
@magik6k magik6k merged commit d9a219d into main Aug 9, 2024
9 checks passed
@magik6k magik6k deleted the feat/supra_seal branch August 9, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants