Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd Community Diligence Review of DSPA-Asia Allocator #170

Open
pandacrypto opened this issue Sep 20, 2024 · 13 comments
Open

2nd Community Diligence Review of DSPA-Asia Allocator #170

pandacrypto opened this issue Sep 20, 2024 · 13 comments

Comments

@pandacrypto
Copy link

pandacrypto commented Sep 20, 2024

Latest Compliance Report: https://compliance.allocator.tech/report/f03010575/1726790532/report.md
Previous Compliance Review Comments: #21

Based on the previous compliance review comments, DSPA-Asia has made improvements in the operation of the allocator process.

  1. We have expanded the types of data clients, adding enterprise datasets in addition to public datasets, thereby enriching the types of data in the allocator. For details, please see the link: [DataCap Application] <Hantang Cloud> - <Cloud Era Cloud Monitoring> pandacrypto/DSPA-Allocator#22.

  2. The success rate of Spark retrieval has significantly improved, with many SPs achieving a success rate of over 80%.

image image
  1. Data clients will disclose information in advance for newly added SPs.
image image

We kindly request the official team to review the overall operation of the DSPA-Asia Allocator. We are pleased to note significant progress in the operational process of the allocator, and we will continue to improve in the following areas:

  1. Focus on the actual locations of SPs; if the information provided is incomplete, we will restrict Datacap approvals or conduct limited Datacap allocations while monitoring the results.
  2. Continuous improvement of Spark.
  3. Continue to encourage more enterprise data to join the Filecoin network.

We appreciate the official team's attention to our ongoing operations and look forward to working together!

@filecoin-watchdog
Copy link

Allocator Application
Compliance Report
First Review

pandacrypto/DSPA-Allocator#11
For this client, the retrieval rate could be better. The Allocator should follow up.

pandacrypto/DSPA-Allocator#8
Only 7 PiB would suffice (a single copy is 700 TiB, 10 replicas promised); why was 12 PiB requested?
5 out of 15 SPs don’t have a retrieval rate. The rest have a high.

pandacrypto/DSPA-Allocator#6
Only 5,5-6PiB would suffice (5 replicas of 1,1 PiB dataset); why is 10 PiB requested?
5 out of 13 SPs don’t have a retrieval rate, 1 has 30%, and the rest have a high.

pandacrypto/DSPA-Allocator#14
Only 2,5-3 PiB would suffice (5 replicas of 580TIB dataset); why is 5 PiB requested?
Why did the allocator grant 1 PiB of DC if the checker disclosed the retrieval was below expected?
5 out of 10 SPs have no retrieval rate, 1 has 30%, and the rest have high retrieval rates.

Many SPs overlap between clients. This allocator has 4 clients overall:
F02227496 - 3 clients
F02956073 - 4 clients
F03080854 - 3 clients
F03080852 - 3 clients
F03028412 - 3 clients

@pandacrypto
Copy link
Author

pandacrypto commented Sep 27, 2024

Thank you watchdog for your patience in reviewing this.

  1. About the dataset and Datacap quota.
    There is a certain conversion ratio between CAR file and Datacap quota, this ratio is usually 18G/32G=0.5625, which means that if it is actually 1.1PiB real data, according to the ratio you can get 1.1PiB/0.5625=2PiB Datacap quota, and if it is 5 backups, it is a total of 10 PiB Datacap quota.
    If it is 700TiB, the corresponding ratio is 700TiB/0.5625=1.2PiB, and if it is 10 backups, it is a total of 12PiB Datacap.
    If it is 580TiB, the corresponding ratio 580TiB/0.5625=1 PiB, and if it is 5 backups, it is the total 5 PiB Datacap application total.

  2. About some SPs Spark retrieval success rate shows “ - ”
    These SPs we communicated, they told us that Spark shows “ - ” because of the adoption of DDO mode, Spark team is expected to realize Spark and DDO compatibility in November, in detail you can check the roadmap, please check the link:
    https://blog.filstation.app/posts/spark-roadmap-h2-2024

image

Spark shows “ - ” SPs nodes with DDO mode, details can be viewed in the following screenshot (part):

img_v3_02er_534e7b2b-0f24-4b58-92e3-73670512159h
img_v3_02er_20e6ccd8-3876-495d-8529-2c13a1e009ch

@pandacrypto
Copy link
Author

  1. about [DataCap Application] <Human PanGenomics Project> - <HPGP-1> pandacrypto/DSPA-Allocator#14 Distributor still granting 1 PiB DCs
    According to the distribution rules, actually 2 PiB Datacap should be issued in this round, considering “ ⚠️40.00% of the storage providers have a retrieval success rate below 75%.” So we carried out active intervention, and considering the successful passage of KYC in the early stage, so we carried out 1 PiB Datacap limit amount to support the development of filecoin community network. We will continue to monitor the application and if there is no improvement, we will continue to intervene.
image image
  1. Regarding the SPs overlap situation
    As you know, DSPA-Asia has been committed to help CC SPs in Asia region to transform to fil-plus SPs, DSPA-Asia has organized two training camps in Hong Kong and Singapore in March and September 2023 respectively, please check the official website for details: https://dspa-asia.io/dspa-activity .
image

DSPA-Asia's good ecological influence of filecoin community has gained a lot of attention from SPs and data customers. In the communication of clients applying for Datacap from us, we also recommend SPs with good ratings from DSPA-Asia. but we think it's all different datasets, which seems to help the filecoin community to be stronger and bigger as well.

@pandacrypto
Copy link
Author

pandacrypto commented Sep 27, 2024

Encouragingly, the Spark retrieval success rate is very good, up from over 80% previously to over 90%! Keep up the progress! As an allocator, we'll keep an eye on it as well!
image
image

@filecoin-watchdog
Copy link

Referring to the first topic regarding the DC application, I understand that the divisibility of data by 18 or 32 does not always give an equal number of sectors. However, from what you say, you are taking up a 32GB sector with 18GB data, which is suboptimal. We should be aiming for 100% storage, not 50%.
The gov team should follow this up.

@pandacrypto
Copy link
Author

Referring to the first topic regarding the DC application, I understand that the divisibility of data by 18 or 32 does not always give an equal number of sectors. However, from what you say, you are taking up a 32GB sector with 18GB data, which is suboptimal. We should be aiming for 100% storage, not 50%.

The gov team should follow this up.

We currently recommend that clients use the *Singularity tool developed by the official team member Xinan Xu for data slicing. If the official team has a more suitable tool or guideline to recommend, we would greatly appreciate your guidance. We are open to following new tools for data preparation in the future.

Thank you!

@willscott
Copy link
Collaborator

this ratio is usually 18G/32G=0.5625

this is absolutely untrue and is not a norm we should accept. deals should be full of data, not half full.

@willscott
Copy link
Collaborator

Singularity has the parameter of PieceSize for indicating you wish to have the data packed to fill the full sector rather than half of it.

@pandacrypto
Copy link
Author

this ratio is usually 18G/32G=0.5625

this is absolutely untrue and is not a norm we should accept. deals should be full of data, not half full.

Thank you, @willscott for your response. As mentioned by @filecoin-watchdog even though this may not be the perfect solution, it consider second-best option.

DSPA-Asia, through the support of Protocol Lab, organized two bootcamps in Hong Kong and Singapore in March and September 2023, respectively. During these events, we accumulated significant data from clients and SPs and received real market feedback. We collectively agreed that achieving fully distributed storage for Filecoin will take some time and more exploration/best practices. For example, data clients often struggle to find highly reputable SPs.

In terms of data storage, we believe the process will go through the following stages: CC data, SPs paying fees to find real data, free storage, paid storage, and efficient data retrieval. Each stage will have different data requirements. From our opinion, a high standards requirement from the beginning may not be the best approach.

At the same time, until a more effective tool than Xinan Xu's Singularity been developed, we will continue using the current tool. The tool is highly beneficial for data slicing. Alternatively, we suggest the official team consider upgrading the Singularity tool to achieve 1:1 data slicing.

In conclusion, while the current approach may not be the optimal solution, it is at least a second-best choice. It is a pragmatic strategy to encourage more data clients to join the Filecoin network!

@willscott
Copy link
Collaborator

My response above is to the previous draft of this message.
This is a configurable parameter of singularity and other onboarders/pathways have been willing to change it to fully fill their sectors.

@pandacrypto
Copy link
Author

My response above is to the previous draft of this message.

This is a configurable parameter of singularity and other onboarders/pathways have been willing to change it to fully fill their sectors.

Thanks @willscott. We have received your feedback.

DSPA technical partners will conduct in-depth research on Singularity. If Singularity can indeed change parameter configuration, we will provide new work guidance to data customers. If you have any questions, please feel free to communicate. Thank you.

@galen-mcandrew
Copy link
Collaborator

Appreciate the good investigation and solutions-driven approach here. We are all hoping to make the network more efficient, useful, and effective for decentralized data storage.

We would love to see some additional developments around the data slicing and sector-sizing issues raised above. Please communicate with your clients and SPs to work towards increased optimization here, and we will continue to monitor. If you are able to develop tools that can review and report on overall 'padding', that would be a very useful metric for the community. Additionally, we would love to see increased enterprise data storage, with on-chain deal pricing as a provable metric for compliance.

Hopefully your team is able to keep these in mind, and perhaps make some additional developments, suggestions, or tools for the community.

Given the above investigation, we will request an additional 20PiB of DataCap from RKH, to allow this allocator to show increased diligence and alignment.

@pandacrypto
Copy link
Author

Thank you @galen-mcandrew for your feedback. We will continue to improve, so please stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants