feat: add IPNI spec #85

olizilla · 2023-12-13T11:56:53Z

Describes the ipni/offer capability and how to merge inclusion claims with IPNI Advertisements

For IPNI we assert that we can provide batches of multihashes by signing "Advertisements".

With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index.

This spec describes how to merge these two concepts by adding an ipni/offer capability to submit an inclusion claim as an IPNI Advertisement.

TODO

define how to encode bytes of inclusion claim in Advertisement Metadata so that it's clear what they are. @alanshaw suggests wrapping it in a CAR, that way the bytes begin with the car prefix bytes, and how to decode them after that is well defined.
defined ipni/accept ability

License: MIT

Describes the `ipni/offer` capability and how to merge inclusion claims with IPNI Advertisements License: MIT Signed-off-by: Oli Evans <[email protected]>

vasco-santos · 2023-12-13T12:04:37Z

w3-ipni.md

+
+## Proposal
+
+Provide a `ipni/offer` ucan ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers.


I wonder if we could make a goal of this to move closer to decentralize IPNS. Have a separate service that provides this capability that could be implemented by multiple parties. This would also allow us to reach out to IPNI team and see if they would like to run this service instead of us, making it available for other users of IPNS if they would like to get into the UCAN world.

we could therefore decouple w3s from this system

I'm in to this from the angle: "lets expose our block level index info in a way that is easy to replicate rather than trapping it in a private db"... decentralise our indexes.

I don't think IPNI team would jump at the chance to host this but I'm in favour of seeing it as a separate service. I think we'll bundle it in to the client as part of the default upload flow to start with, but we can break it out and make it opt-in / hosted elsewhere once we have this in place.

vasco-santos · 2023-12-13T12:04:53Z

w3-ipni.md

+
+The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index)
+
+The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. 


is it really required to be 1? Could it not be 0 as well? I think the protocol should not imply that validation is required for at least one block, but that validation MAY happen for each block

i think this comment is intended for the validation section. This is the encode it as 1 or more adverts part which is just dealing with max block size.

vasco-santos · 2023-12-13T12:05:10Z

w3-ipni.md

+
+Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index.
+
+Full validation of every block is not recommended as it opens us up to performing unbounded work. *We have seen CAR files with millions of tiny blocks.*


based on our previous chat, I was thinking the random validation would consider a % of the blocks in a CAR. But reading this now, looks like a specific number. Would that be the case? I would be more in favour of a random %, but probably a good idea to add a custom MAX. Otherwise, there is the attack vector of uploading gigantic CAR of tiny blocks to better try to not get bad ones validated

Yeah percent sounds good, but maybe with a max number we're willing to consider...in the spec I'd probably just specify a "random sample that may inculde none or all of the blocks" though.

I suggest a configurable percentage (validation factor) from 0 to 100%. Any non-zero fraction of a number of blocks is rounded up to the nearest integer. So that 10 blocks at 3% validation, still validates one block.

alanshaw

ERHMAHGERD linter is not happy about this...

alanshaw · 2023-12-13T12:17:54Z

w3-ipni.md

+
+With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index.
+
+This spec describes how to merge these two concepts by adding an `ipni/offer` capability to submit an inclusion claim as an IPNI Advertisement.


ipni/offer implies an ipni/accept fx per our own conventions...

It might be good to have a ipni/accept task that is executed when the advert has been written. The receipt might include the advert (C)ID and an identifier for the chain that it is included in.

I assume that the client is the consumer of that receipt, correct? If so, what does it do with this information? Knowing that an advertisement is published does not guarantee that has yet been ingested by IPNI.

Should this receipt have all the same data as the Announce message: advertisement CID, peerID, and addresses of where the chain is hosted? The peerID (publisher ID) would identify which chain the ad is on.

alanshaw · 2023-12-13T12:19:51Z

w3-ipni.md

+
+**What this unlocks** (tl;dr)
+
+- Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId)


Shard CID + Space DID?

I think this should be Shard CID only. If we publish the same set of multihashes again to IPNI becuase someone adds the same CAR to a new space, I don't think we want to double the set of results that come back from IPNI for that multihash (1 per space it's in), all the records would have the same set of provider info (w3s), and theres no mechanism to determine which of the space dids the user should pass to us when reading (if at all).

related... I'm not sure what happens if we publish the same multihash multiple times to IPNI with different ContextIDs. I think you get multiple results back with same provider info.

License: MIT Signed-off-by: Oli Evans <[email protected]>

@Gozala

Provides an EntryChunk class to help encode batches of multihashes as IPNI EntryChunk IPLD blocks. ```js import { EntryChunk } from '@web3-storage/ipni' import { sha256 } from 'multiformats/hashes/sha2' const hash = await sha256.encode(new Uint8Array()) const chunk = EntryChunk.fromMultihashes([hash]) const block = await chunk.export() // the EntryChunk CID should be passed to an Advertisement as the `entries` Link. console.log(`entries cid ${block.cid}`) ``` Encourages using dag-cbor as it's almost half the size as dag-json, and every entrychunk and advert has to be stored and replicated to 1 or more IPNI servers. Borrows the idea of an `export()` that provides an encoded IPLD block from @Gozala in ucanto https://github.com/web3-storage/ucanto/blob/2200d43595b85a5e7b60c234987ff3ce91404401/packages/core/src/delegation.js#L248 Provides a cheap and accurate `calculateEncodedSize()` function to allow callers to determine when to split entries across multiple blocks. Thanks to @rvagg for the very useful https://github.com/rvagg/cborg module that makes that possible and the inspriation in the calculateHeaderLength fn from js-car see: https://github.com/ipld/js-car/blob/562c39266edda8422e471b7f83eadc8b7362ea0c/src/buffer-writer.js#L215 Fixes #2 WIP on storacha/specs#85 License: MIT --------- Signed-off-by: Oli Evans <[email protected]>

gobengo · 2023-12-18T18:28:07Z

w3-ipni.md

+
+The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response.
+
+A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.


I think it would be good to put a URL like https://web3.storage/ipfs/{cid} here. That alone is enough to allow http clients to e.g. do a HEAD request or even webfinger request to learn more about the supported representations of the resource and authorization required to retrieve it via web linking.

You can also put a multiaddr in there but IMO it would be nice for it to be represented as a URI https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml e.g. dweb:{multiaddr} but 🤷‍♀️)

The Metadata field here is at the Advertisement level, so it's about an arbitrary batch of CIDs rather than one. We could potentially put the a gateway url for the carCID at that level. The proposal here is to use this space for the bytes of the inclusion claim.

In IPNI today, an Advertisement maps an EntryChunk CID (aka a CID for a batch of multihashes) to a Provider. A Provider is an array of multiaddrs, that define the location and transport^ to use to fetch them. A URL would be nice, but PL-ware is multiaddrs everywhere.

^But, gotcha! right now specifying a multiaddr with "http" as a transport isn't enough to say "pls use trustless ipfs gateway semantics when retreving over http", so the Metadata field is provided to give more hints that can't yet be inferred from the multiaddr alone... Bitswap has a similar issue... the multiaddr would specify "p2p" as the transport, but then libp2p protocol negotiation is left to the peers to decide on. The Metadata field again has a varint to declare Bitswap should be expected.

Transforming multiaddr to URL is simple enough. Will the client be querying IPNI directly, or will some lookup service be doing that and then generating a possibly signed URL that the client can use to get the data?

Gozala

I think this is really cool! I really wish we captured the byte ranges however as that would allow reads without having to do extra lookups.

I also wish we would generalize this from the CARs. We have already started talking about writing files as is and just storing proof trees for them instead of transcoding things into CARs etc... In that future context is likely going to be root CID of the DAG and index will be byte ranges either in the source file or a proof. I think it would be a good idea to design protocol with that in mind instead and if source payload will happens to be a CAR right now that is fine also.

Gozala · 2023-12-18T17:01:09Z

w3-ipni.md

+
+A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.
+
+Regardless, it is space for provider specified bytes which we can use as to include the portable cryptographic proof that an end-user made the original claim that a set of blocks are included in a CAR and that as a large provider we have alerted IPNI on their behalf.


This is all really cool! My only concern is in regards to publishing user claims signed with our keys. I think we should either only publish verified claims or we should not use our own signing key, instead we could e.g. derive key-pair from our key + user did. However doing later will probably force IPNS to deal with potential abuse problems which they may not be as well equipped to do than if we do it at our level.

IPNI Advertisements must be signed with the PeerID key that should be used to secure the lib2p connection. In the case of Bitswap we don't have a choice. We must sign with the same key we use for the libp2p node that we, the provider, is claiming we can provide.

This for me is the interesting integration point with IPNI. Our user-provided content claims are assertions from content creators about the content. IPNI is assertions from providers about what can be fetched from them.

Including the user provided content claim in the IPNI advert at least allows the possibility for consumers to maintain a reputation system that considers the original claimants key rather than attributing all indexing errors to the large upload service.

however! for trustless-ipfs-gateway flavour http, the peerID key isn't used the secure the connection. it's http, the tls cert is used instead. so there is no requirement for the signing key for trustless http flavour providers to be anything in particular.

Including the user provided content claim in the IPNI advert

It may not be practical to put claims in the IPNI advertisement metadata if they require much data to store. It may be necessary to put a reference (CID or IPNS name) to the associated claims into the metadata, and fetch claims separately. If an IPNS name is used then the claims can be changed without changing the reference stored in the metadata.

Gozala · 2023-12-18T17:44:56Z

w3-ipni.md

+
+Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON.
+
+The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes.


Aside: Does IPNI uses IPFS to get advertisements by following the log from the published head ?

No, it uses HTTP. https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer

We tell it the CID of the new head, we write files with the CID as the filename. There is also a pre agreed /head file which ingest servers can poll to see if the head has changed, and if we are still "up". If we stop responding to /head requests, then at some point the IPNI server may flag our content / drop it from search results / delete it from their db.

Gozala · 2023-12-18T17:45:41Z

w3-ipni.md

+
+web3.storage publishes IPNI advertisements as a side-effect of the E-IPFS car [indexer-lambda].
+
+Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON.


Aside: How does publisher lambda knowns what was the previous head to link new advertisement to it ?

it writes a head file to s3 for durability, and we have to configure the sqs consumer to only allow a single instance of the lambda to run at once. It's not great.

Gozala · 2023-12-18T17:47:13Z

w3-ipni.md

+
+The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes.
+
+The IPNI server fetches new head Advertisement from our bucket, and any others in the chain it hasn't read yet, and updates it's indexes.


Wait they read from the bucket directly not through some more generic interface ?

HTTP https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer

Gozala · 2023-12-18T17:54:20Z

w3-ipni.md

+    "can": "ipni/offer",
+    "with": "did:key:space", // users space DID
+    "nb": {
+      "inclusion": CID   // inclusion claim CID


I'm bit confused is the CID here link to the { content, includes } block described below ? Or is it something else ?

Gozala · 2023-12-18T18:06:00Z

w3-ipni.md

+- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.
+- `ContextID` must be the byte encoded form of the CAR CID.


Suggested change

- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.

- `ContextID` must be the byte encoded form of the CAR CID.

- `ContextID` must be the byte encoded form of the CAR CID.

- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.

I think it's easier to follow if you mention what context is first as it's referenced from the other field.

Gozala · 2023-12-18T18:09:09Z

w3-ipni.md

+
+- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.
+- `ContextID` must be the byte encoded form of the CAR CID.
+- `Metadata` must be the bytes of the inclusion claim.


It's a shame to loose the provenance info as in where the claim was originated from, would be nice to capture the source ipni/offer somewhere.

Also I think it would be nice if ipni/offer had an ipni/add effect that would link to our IPNI advertisements, similar to what we do with filecoin. This would even allow user to notify indexer nodes without waiting on us to do it.

I am also very tempted to be storing advertisements in user space as opposed to our own custom bucket, if they delete it we can then publish delete advertisement.

I like the idea of storing advertisements in user space. That way the user pays to index the advertisements as part of the storage cost. We will need to generate events when a file is deleted to that a removal advertisement can be created.

Can a user opt-out of indexing?

General direction I'm advocating for is that if you upload content to our service it just sits there without been indexed or advertised anywhere. If user wants to make it readable they must issue invocation requesting a location claim to be made for the uploaded content, which will in turn index and advertise.

We will need to generate events when a file is deleted to that a removal advertisement can be created.

Deletes happen on user invocation which can be a trigger to remove an advertisement.

Gozala · 2023-12-18T18:21:36Z

w3-ipni.md

+
+Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index.
+
+Full validation of every block is not recommended as it opens us up to performing unbounded work. _We have seen CAR files with millions of tiny blocks._


I worry about the probability of considering index valid while in practice it is not the case. It also might be better to validate that random byte offsets in the CAR are correctly indexed as opposed to validating random blocks.

💡🤯

Could we actually make clients create advertisements signed with their own keypair and just our addresses so that we don't have to verify and simply enable users to publish to IPNI ?

No. This is what we all want, but is not an option for libp2p based protocols where the IPNI Advert signer and the provider PeerID must be the same.

We could explore it for trustless ipfs gateway providers like w3s.link, but, I'd rather explore a solution that works for both first.

Could we actually make clients create advertisements signed with their own keypair and just our addresses

An advertisement has the concept of a provider and a publisher who publishes ads on behalf of the provider. The publisher ID identifies the host from which advertisements are retrieved. The provider ID identifies the host from which content is retrieved. For now, these are both w3s.

IPNI does allow for an advertisement publisher to publish (and sign) ads on behalf of a provider. However, the client is neither the ad publisher nor the content host. This is one of the reasons that the CARv2 index CID is encoded into the ad metadata as an inclusion claim, as that allows the client to effectively sign the advertisement.

and simply enable users to publish to IPNI

Having users publish the advertisements is not practical as that would require the user to maintain an advertisement chain and serve a network endpoint from which indexers can fetch the advertisements.

When I keep say derive keys from user identifier I imply that we could derive private key on demand and consequently could have those synthetic peer's, but I think I'm getting into the weeds here.

My main sentiment is that currently we are incentivized to verify claims made by users otherwise our reputation might take hit. This is not a great, ideally we would allow users to advertise claims and let their reputation suffer if those claims appear invalid.

There are possibly other ways to accomplish this e.g produce synthetic advertisement keys for (user_did, cid) tuples, but unless there is a way to query by (*, cid) it will likely not be very practical.

Gozala · 2023-12-18T18:25:05Z

w3-ipni.md

+A `MultihashIndexSorted` Index encodes a set of multihashes. Mapping from an index to an `EntryChunk` requires parsing the index and encoding the multihashes it contains with the EntryChunk IPLD schema.
+
+```ipldsch
+type EntryChunk struct {


I find EntryChunk to be very misleading name, because it contains info about multiple entries not one. Can this be renamed to something less confusing like EntryBatch or AdvertisedEntrySet ?

Not in this spec. This is the IPNI vocabulary. https://github.com/ipni/specs/blob/main/IPNI.md#entrychunk-chain

It is baked into the IPNI spec and encoded into existing advertisements. An EntryChunk can contain information about only one multihash, but usually contains more.

AdvertisedEntrySet could also be misleading, since all the multihashes advertised by an advertisement are part of the advertised set. EntryBatch seems better, but does not indicate that individual batches are related or part of a larger unit, which they are if they are associated with the same data set and context ID as proposed in this spec.

The term EntryChunk was chosen because it serves the same purpose as HTTP chunking, and the term "chunk" implies that it is part of a larger collection. There may be better names, but IMO this is fairly accurate.

Gozala · 2023-12-18T18:27:40Z

w3-ipni.md

+}
+```
+
+Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks.


I would recommend sorting all entries first before putting them into batches.

Why?

Depending on the storage backend within IPNI, these do get sorted for sharding or index-based merging, etc.

For deterministic output. Unless order has some meaning, which I don't believe is the case here keeping things deterministic is usually better.

gobengo

It seems like the invocation to get an IPNI advertisement into web3.storage could just be an upload/add to a space?

And then if we had a upload/read nb.cid capability, you could delegate it to "aud": "did:web:cid.contact" and it could invoke it as desired to fetch the blocks? But then we'd also have receipts/billing on the read side of this, where it seems like it could be useful to have a different cost for reading the adverts 1 time every 2 weeks vs 1billion times per day, and furthermore the end-user may want to authorize advertisement reads from IPNI deployments other than cid.contact.

gobengo · 2023-12-18T19:01:04Z

w3-ipni.md

+
+See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement)
+
+The Advertisement should then be available for consumption by indexer nodes per the [Advertisement Transfer](https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer) section of the IPNI spec.


It seems the advertisement transfer section also has an affordance for submitting advertisements https://github.com/ipni/specs/blob/main/IPNI.md#http-1

Announcements are sent as HTTP PUT requests to /announce on the indexer node's 'ingest' server.

So it seems like the client can already send cid.contact the Advertisements directly. How would dev decide whether to to do that or go through this ipni/offer proposal?

IPNI Advertisements have to be signed with the PeerID key that should be used to secure the connection when trying to fetch the bytes from the provider. As such an end user can't tell IPNI that web3.storage will provide these blocks, as they can't sign it on our behalf.

This is an issue for libp2p based connections where the PeerID is used to authenticate and secure the connection, but not for trustless ipfs gateway flavour http, where tls is used and there is no notion of a PeerID.

@gobengo The section you cite above:

Announcements are sent as HTTP PUT

Talks about Announcements not Advertisements. These are different things. An announcement tells IPNI that a new advertisement is available at a particular address and the peer ID of the host for that advertisement. But, yes, a client could send the announce message to IPNI if it knows that data.

License: MIT Signed-off-by: Oli Evans <[email protected]>

gammazero · 2024-03-13T18:13:34Z

w3-ipni.md

+
+The latest `head` CID of the advert list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or sent via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience.
+
+The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response.


The ContextID also serves as a key that refers to specific metadata, and is used to update or delete that metadata. Updating metadata changes the metadata returned by IPNI lookups for all CIDs that were advertised with that context ID.

gammazero · 2024-03-13T18:31:44Z

w3-ipni.md

+
+A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.
+
+Regardless, it is a field we can use to include the portable cryptographic proof of the content-claim that an end-user made that a set of blocks are included in a CAR. The provider has to sign the IPNI advert with the peerID key that should be used to secure the libp2p connection when retrieving the block. For upload services like web3.storage, 


The functionality that is needed here is for the data owner to assert that the advertised CIDs are in the CAR file that is referred to by the context ID. In other words, this advertisement is correct for the CAR file. The advertisement is signed by the provider of the advertisement (w3s?), so the functionality here is adding the data owner's signature to the advertisement.

The inclusion claim carried by the metadata is limited to specifying a CAR index (by CID) that is associated with the CAR CID

gammazero · 2024-03-13T21:09:58Z

w3-ipni.md

+
+Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks.
+
+It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch.


This is not a problem since indexing is not guaranteed to be immediate, and it is much faster than having the same multihashes split over multiple advertisements.

gammazero · 2024-03-13T21:17:06Z

w3-ipni.md

+
+It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch.
+
+The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextID` set to the CAR CID.


it is recommended to split the set across multiple Advertisement blocks

I do not understand or agree with this recommendation since reading the same data from several advertisements will be much slower and require much more data transfer than reading from multiple entries blocks in the same advertisement.

Advertisements are also chained together with a field that points to the previous advertisement in the chain, so they also need to be decoded and read sequentially. Advertisements carry much more information (context ID, metadata, provider info, signature, etc.), whereas entries blocks contain only the multihashes and a link to the nexd block.

gammazero · 2024-03-13T21:41:34Z

w3-ipni.md

+- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.
+- `ContextID` must be the byte encoded form of the CAR CID.
+- `Metadata` must be the bytes of the inclusion claim.
+


Do we want to include where the content claims for this CAR can be retrieved from?
- Addresses endpoint addresses where content claims can be retrieved from.

The content claims will include location claims, and there will be a location claim for the CAR file and the CARv2 index.

Or should these location claims be encoded into the metadata with an additional location for the location of all the content claims?

Describes the ipni/offer capability and how data is indexed and retrieved without relying on centralized services. This is necessary for the write-anywhere initiative. Depends on storacha/specs#85

feat: add IPNI spec

720348a

Describes the `ipni/offer` capability and how to merge inclusion claims with IPNI Advertisements License: MIT Signed-off-by: Oli Evans <[email protected]>

vasco-santos reviewed Dec 13, 2023

View reviewed changes

alanshaw reviewed Dec 13, 2023

View reviewed changes

olizilla added 8 commits December 13, 2023 14:35

chore: lint

05caee2

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

f01552e

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

8e5ee5a

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

458f660

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

e9ad459

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

f3922bc

License: MIT Signed-off-by: Oli Evans <[email protected]>

docs: add more on entrychunk encoding

ad9e329

License: MIT Signed-off-by: Oli Evans <[email protected]>

chore: lint

2c212e6

License: MIT Signed-off-by: Oli Evans <[email protected]>

olizilla mentioned this pull request Dec 14, 2023

feat: add IPNI EntryChunk encoding storacha/ipni#18

Merged

gobengo reviewed Dec 18, 2023

View reviewed changes

Gozala reviewed Dec 18, 2023

View reviewed changes

gobengo reviewed Dec 18, 2023

View reviewed changes

olizilla added 2 commits January 18, 2024 14:07

Merge remote-tracking branch 'origin' into ipni

3bc0977

chore: copy tweaks

40fb998

License: MIT Signed-off-by: Oli Evans <[email protected]>

gammazero self-assigned this Mar 5, 2024

gammazero reviewed Mar 13, 2024

View reviewed changes

gammazero mentioned this pull request Mar 15, 2024

rfc: ipni for indexing w3c stored data storacha/RFC#15

Merged

alanshaw mentioned this pull request Apr 25, 2024

feat(capabilities)!: add index/add capability storacha/w3up#1410

Merged


		## Proposal

		Provide a `ipni/offer` ucan ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers.


		The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index)

		The set of multihashes must be encoded as 1 or more [IPNI Advertisements].


		Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index.

		Full validation of every block is not recommended as it opens us up to performing unbounded work. We have seen CAR files with millions of tiny blocks.


		With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index.

		This spec describes how to merge these two concepts by adding an `ipni/offer` capability to submit an inclusion claim as an IPNI Advertisement.


		What this unlocks (tl;dr)

		- Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId)


		The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response.

		A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.


		A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.

		Regardless, it is space for provider specified bytes which we can use as to include the portable cryptographic proof that an end-user made the original claim that a set of blocks are included in a CAR and that as a large provider we have alerted IPNI on their behalf.


		Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON.

		The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes.


		web3.storage publishes IPNI advertisements as a side-effect of the E-IPFS car [indexer-lambda].

		Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON.


		The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes.

		The IPNI server fetches new head Advertisement from our bucket, and any others in the chain it hasn't read yet, and updates it's indexes.

		- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR.
		- `ContextID` must be the byte encoded form of the CAR CID.


		See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement)

		The Advertisement should then be available for consumption by indexer nodes per the [Advertisement Transfer](https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer) section of the IPNI spec.


		The latest `head` CID of the advert list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or sent via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience.

		The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response.


		A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_.

		Regardless, it is a field we can use to include the portable cryptographic proof of the content-claim that an end-user made that a set of blocks are included in a CAR. The provider has to sign the IPNI advert with the peerID key that should be used to secure the libp2p connection when retrieving the block. For upload services like web3.storage,


		Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks.

		It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch.


		It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch.

		The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextID` set to the CAR CID.

feat: add IPNI spec #85

Are you sure you want to change the base?

feat: add IPNI spec #85

Conversation

olizilla commented Dec 13, 2023 • edited Loading

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanshaw left a comment

Choose a reason for hiding this comment

alanshaw Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

gammazero Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gobengo Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

💡🤯

Choose a reason for hiding this comment

gammazero Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gobengo left a comment • edited Loading

Choose a reason for hiding this comment

gobengo Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olizilla commented Dec 13, 2023 •

edited

Loading

alanshaw Dec 13, 2023 •

edited

Loading

gammazero Mar 6, 2024 •

edited

Loading

gobengo Dec 18, 2023 •

edited

Loading

gammazero Mar 6, 2024 •

edited

Loading

gammazero Mar 6, 2024 •

edited

Loading

gobengo left a comment •

edited

Loading

gobengo Dec 18, 2023 •

edited

Loading