Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed curio api #34

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions rfc/curio-api-rfc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Curio API RFC

This is a proposed set of APIs to be implemented in Curio to interface with a storacha node.

The proposed flow has three core API endpoints:
1. Endpoints for manipulating proofs sets
2. An endpoint for storing pieces + flow for storing pieces
3. An endpoint for retrieving pieces

There are addition considerations we should consider:
1. Authorization -- In storacha's network, it's important that the original end user maintain control of authorization for any action performed (including retrieval). We accomplish this through UCANs. We should discuss how we can maintain this without forcing curio to implement a full UCAN authorization process.
2. Aggregation - storacha's data is at times extremely small (<1mb in certain cases). Our understanding is that economically, it makes more sense to do some light aggregation of data before adding it to the proof set. The proposal below outlines a facility for doing this. While storacha would store pieces as it receives them, we would add them to the proof set in a seperate step, with a root that could optionally be an aggregate of several pieces.
3. IPNI announcements -- we plan to use IPNI announcements in a specific way with our pieces. Our understanding is the curio IPNI flow is in flux. We can try to integrate your IPNI api or just do it ourselves.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPNI is pretty much implemented now, with the following schema that coordinates it on the curio side:

-- Table for storing IPNI ads
CREATE TABLE ipni (
    order_number BIGSERIAL PRIMARY KEY, -- Unique increasing order number
    ad_cid TEXT NOT NULL,
    context_id BYTEA NOT NULL, -- abi.PieceInfo in Curio
    -- metadata column in not required as Curio only supports one type of metadata(HTTP)
    is_rm BOOLEAN NOT NULL,

    previous TEXT, -- previous ad will only be null for first ad in chain

    provider TEXT NOT NULL, -- peerID from libp2p, this is main identifier on IPNI side
    addresses TEXT NOT NULL, -- HTTP retrieval server addresses

    signature BYTEA NOT NULL,
    entries TEXT NOT NULL, -- CID of first link in entry chain

    unique (ad_cid)
);

CREATE TABLE ipni_head (
    provider TEXT NOT NULL PRIMARY KEY, -- PeerID from libp2p, this is the main identifier
    head TEXT NOT NULL, -- ad_cid from the ipni table, representing the head of the ad chain

    FOREIGN KEY (head) REFERENCES ipni(ad_cid) ON DELETE RESTRICT -- Prevents deletion if it's referenced
);

-- This table stores metadata for ipni ad entry chunks. This metadata is used to reconstruct the original ad entry from
-- on-disk .car block headers or from data in the piece index database.
CREATE TABLE ipni_chunks (
    cid TEXT PRIMARY KEY, -- CID of the chunk
    piece_cid TEXT NOT NULL, -- Related Piece CID
    chunk_num INTEGER NOT NULL, -- Chunk number within the piece. Chunk 0 has no "next" link.
    first_cid TEXT, -- In case of db-based chunks, the CID of the first cid in the chunk
    start_offset BIGINT, -- In case of .car-based chunks, the offset in the .car file where the chunk starts
    num_blocks BIGINT NOT NULL, -- Number of blocks in the chunk
    from_car BOOLEAN NOT NULL, -- Whether the chunk is from a .car file or from the database
    CHECK (
        (from_car = FALSE AND first_cid IS NOT NULL AND start_offset IS NULL) OR
        (from_car = TRUE AND first_cid IS NULL AND start_offset IS NOT NULL)
    ),

    UNIQUE (piece_cid, chunk_num)
);

Now, IPNI likes larger ads, so ideall storacha would create aggregate ads for multiple pieces; we can extend ipni_chunk to support reading from storacha-stored pieces (though really technically just the piececid works fine there)


## Basic flow

In the proposal below, the basic flow is as follows;
1. Create a proof set for Storacha on the SP (happens just once)
2. Upload pieces from storacha with the piece storage API
- at this point, the piece is immediately retrievable but not being proven
3. When enough pieces are received (128MB or more) create an aggregate root and add it to the proof set
- at this point, all pieces submitted to the proof set are retrievable AND proven

## Proof sets API

The following API describes how to create, read, update and delete proofsets managed by Curion. Essentially curio ill provide APIs that mirrored the Create/Add/Remove/Delete functions that will exist on chain for PDP proof sets, and then manage the submission of proofs on a schedule (I hear it's good at scheduling)

### POST /proof-sets

Create a new proof set for a specific
Request Body:
```json
{
// need to drill down on these propoerties
}
```

Response:
Code: 201
Location header: "/proof-set/{set-id}"

*TODO: do we need an interim response given this is a chain transaction with a place to fetch the set-id later?*

### GET /proof-sets/{set-id}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to some spec which lays out how those proof-sets look like / what they are

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truth be told I'm just betting based off the PDP service contract doc


Response:
Code: 200
Body:
```json
{
"roots": [
// Root ids in proof set in order
],
"ownerAddress": "f3...",
"challengePeriod": 15
}
```

### POST /proof-sets/{set-id}/roots

Append a root to the proof set, which may be an aggregation of one or more piece cids.

Request Body:
```json
{
"rootCid": "bafy....root",
"pieces": [
{
"cid": "bafy...piece1",
"proof": [
"bafy...intermediate1",
"bafy...intermediate2"
]
},
{
"cid": "bafy...piece2",
"proof": [
"bafy...intermediate3",
"bafy...intermediate4"
]
},
//...
],
"size": 1048576
}
```

This API should fail if the all pieces were not previously stored with the Piece Storage API

Response:
Code: 201
Location Header: "/proof-sets/{set id}/roots/{root id}

### GET /proof-sets/{set id}/roots/{root id}

Response Body:
```json
{
"rootCid": "bafy....root",
"pieces": [
{
"cid": "bafy...piece1",
"proof": [
"bafy...intermediate1",
"bafy...intermediate2"
]
},
{
"cid": "bafy...piece2",
"proof": [
"bafy...intermediate3",
"bafy...intermediate4"
]
},
//...
],
"size": 1048576
}
```

### DEL /proof-sets/{set id}/roots/{root id}

Remove the given root id from the given proof set

### DEL /proof-sets/{set id}

Remove the specified proof set entirely


## Piece Storage


### POST /piece
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to define how authorization works on this endpoint. This can't just be entirely open.

Also should define the lifecycle of the uploaded data somehow:

  • How long is it expected to stick around in storage after upload before being included in a proof-set? When should the data be removed if not added to a proof set?
    • Signalling for expected indexing with IPNI / ipfs-type (trustless gateway/bitswap) retrievals, and who can retrieve the piece?
  • What is the contract for retrieval - is it retrievable atomically when the notify hook is called? After inclusion in a proof set?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what are the size bounds for pieces that you expect curio to support? We can support even very large pieces (100G+), but I don't think a client-push model is a good idea above 1GB, where managing short-term buffers becomes a real concern, and download retry becomes non-optional


This is a new API for storing pieces of arbitrary size into Curio's storage.
```json
{
"pieceCid": "{piece cid v2}",
"notify": "optional http webhook to call once the data is uploaded"
}
```

Returns:
204 When piece is already there
201 When a new piece upload is created
the response should contain a location header with a URL that should be used for the actual upload. This URL should accept a PUT request with the actual bytes of the piece. The request should fail if the bytes do not hash to the correct piece CID.

## Piece Retrieval

These endpoints are simply used for retrieving blobs of any valid size

### GET /piece/{piece cid v2}
### HEAD /piece/{piece cid v2 }

These can just follow FRC-066: https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0066.md (please include the Range parameter!)

With the additional stipulation that they should probably ONLY accept v2 Piece CID: https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md