Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: w3compute protocol #110

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/words-to-ignore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,7 @@ AccountA
AgentA
IssuerA
AudiencePrincipal
PieceCID
PieceCIDv2
FR32
sha256-trunc254-padded
125 changes: 125 additions & 0 deletions w3-compute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# W3 Compute Protocol

![status:draft](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square)

## Editors

- [Vasco Santos], [Protocol Labs]

## Authors

- [Vasco Santos], [Protocol Labs]

# Abstract

This spec describes a [UCAN] protocol allowing an implementer to perform simple computations over data on behalf of an issuer.

## Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119).

# Table of Contents

- [Introduction](#introduction)
- [Capabilities](#capabilities)
- [`compute/` namespace](#compute-namespace)
- [`compute/*`](#compute)
- [`compute/piececid`](#computepiececid)
- [Schema](#schema)
- [`compute/piececid` schema](#computepiececid-schema)

# Introduction

Within the w3up protocol flows, some computations MUST be performed over data. These case range from computing proofs to data indexes, as well as to verify client side offered computations.

The `w3-compute` protocol aims to enable clients to hire compute services to delegate some work, as well as for `w3-up` platform to hire third party compute services to verify client side offered computations if desirable.

Note that the discovery process by actors looking for services providing given computations is for now out of scope of this spec.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been trying to conceptualize w3 space as namespace where you can install various capabilities through subscriptions. Think of it as installing a software on your machine, it gets to read / write data on disk and use computational resources and you get to use the software and pay either subscription or one time fee it was sold for.

In this conceptual model provider/add is the way to install a service providing set of capabilities and once it is installed once capabilities are invoked they get handled by a service. More details are here https://github.com/web3-storage/specs/blob/main/w3-provider.md#provider-add

One subtle nuance here is that invocation audience is meant to be a service provider DID that will handle invoked capability, that way you could have multiple services providing same capabilities installed.

I think it would be nice if we could reconcile this proposal with that conceptual model. In which case piece CID compute would be a capability provided by the a service that could be installed in your space. It might be also interesting to consider version where you don't have to install provider, but instead delegate them access to a resource in your space so they could run compute over it, but that would require bit more thought probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good context, while I see what you say may make sense, I am not sure on whether this would be the use case. I think that would make sense in the context of paying to compute block level indexes, or whatever other thing you would like as user. In this case, I think it is different because it can be even a service provider wanting to run it to have a validation, can be w3filecoin pipeline to decide if it trusts user computed piece. Therefore, is not in direct contact with user, or space where something runs.

With the above, I don't know how we should proceed to accommodate both angles. Probably too early to have this discussion and we should just compute pieces out of band until we have this


# Capabilities

## `compute/` namespace

The `compute/` namespace contains capabilities relating to computations.

## `compute/*`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure compute namespace in itself makes sense. It is clear that unlike many other capabilities it is a pure computation as it has no side effects, yet not sure if namespace like this makes sense.

I would personally have added capability into the filecoin namespace somewhere and said that various providers could implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I started actually there. But then was thinking more on an angle of having services that do compute speced together, but I start to see a lot of specificities with each type of computation that may be difficult


> Delegate all capabilities in the `compute/` namespace

The `compute/*` capability is the "top" capability of the `compute/*` namespace. `compute/*` can be delegated to a user agent, but cannot be invoked directly. Instead, it allows the agent to derive any capability in the `compute/` namespace, provided the resource URI matches the one in the `compute/*` capability delegation.

In other words, if an agent has a delegation for `compute/*` for a given space URI, they can invoke any capability in the `compute/` namespace using that space as the resource.

## `compute/piececid`

Request computation of a PieceCIDv2 per [FRC-0069](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md). A CID representation for the FR32 padded sha256-trunc254-padded binary merkle trees used in Filecoin Piece Commitments.

> `did:key:zAliceAgent` invokes `compute/piececid` capability provided by `did:web:web3.storage`

```json
{
"iss": "did:key:zAliceAgent",
"aud": "did:web:web3.storage",
"att": [
{
"with": "did:key:zAliceAgent",
"can": "compute/piececid",
"nb": {
/* CID of the uploaded content */
"content": { "/": "bag...car" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a link to a UCAN that gives you read access to the content. While we do allow public reads right now, this seems like an overkill. However if reads are charged and potentially accelerated it would make a lot of sense to pass "readable resource" itself which in UCAN word would be a capability on a resource giving you a read access.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often like to think of capability groups like TS interfaces with some methods e.g. upload/* is something like

interface Upload {
   add(upload): Promise<Result<...>>
   list(): Promise<....>
   ....
}

When I delegate access I give you reference to either Upload instance or just selected methods from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is where we will need to map lot's of assumptions around what are the requirements and needs. I have seen this as a service that can choose where to read from according to their preferences. If I run in CF I prefer to read from R2, but the service is deployed anywhere. So, would not really be the caller to infer that.

Probably we need to consider better what we want and what are the requirements from perf and cost before deciding how these flows would be

}
}
],
"prf": [],
"sig": "..."
}
```

### Compute PieceCID Failure

The service MAY fail the invocation if the linked `content` is not found. Implementer can rely on IPFS gateways, location claims or any other service to try to find the CAR bytes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you delegate read access to the content this problem goes away, it may only fail if authorization was invalid or expired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may also fail if the delegation is for non existent content, which was more what I mean here. I would like to go into your direction, but when we put costs and efficiency into the equation, probably the choice of where to read should be from the service and not from the user. Unless they of course give several UCANs of where to read from


```json
{
"ran": "bafy...computePiececid",
"out": {
"error": {
"name": "ContentNotFoundError",
"content": { "/": "bag...car" }
}
}
}
```

### Compute PieceCID Success

```json
{
"ran": "bafy...filAccept",
"out": {
"ok": {
/* commitment proof for piece */
"piece": { "/": "commitment...car" }
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, but currently tircky with ucanto as it expects you to keep connection open to respond with a result. That needs to change, but right now long running tasks will be tough.

It is also worth considering if this thing is atomic operation or composite. If later it's probably better to use effects to delimit execution. In this case I'd argue it is delimited. First step needs to read content out, which can succeed or fail and second one will compute the piece from read content. If we want to report progress between steps employing effects is probably a good call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think here we would likely need to design with effects

```

# Schema

## `compute/piececid` schema

```ipldsch
type ComputePieceCid struct {
with AgentDID
nb ComputePieceCidDetail
}

type ComputePieceCidDetail struct {
# CID of file previously added to IPFS Network
content &Content
}
```

[Protocol Labs]: https://protocol.ai/
[Vasco Santos]: https://github.com/vasco-santos
3 changes: 2 additions & 1 deletion w3-filecoin.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,8 @@ This task is effectively a shortcut allowing an observer to find out the result

#### `filecoin/submit`

The task MUST be invoked by the [Storefront] which MAY be used to verify the offered content piece before propagating it through the pipeline.
The task MUST be invoked by the [Storefront] which MAY be used to verify the offered content piece before propagating it through the pipeline. For this, [Storefront] may ask a third party service to [compute](./w3-compute.md) the PieceCID for validation.

> `did:web:web3.storage` invokes capability from `did:web:web3.storage`

```json
Expand Down
Loading