From 34bb197a3246c359c1d019561c0c63d447feb870 Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 9 Feb 2022 19:10:11 +0000 Subject: [PATCH 01/12] Draft --- Pin.md | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 Pin.md diff --git a/Pin.md b/Pin.md new file mode 100644 index 0000000..a6f12c1 --- /dev/null +++ b/Pin.md @@ -0,0 +1,209 @@ +# ⚠️ Disclaimer ⚠️ + +> Since writing this I have realized that while incermental writing approach described here would work well in centralized system (single writer) it will be problematic in decentralized system where multiple actors may be doing concurrent updates e.g. if web3 application uses web3.storage service from multiple clients they would fail to transact or will have to coordinate updates (version & index). +> +> I think we could do better by using grow-only sets when patching the pin object and only coordinate pin update between `Pinned` states. That way `Transient` pins could be updated concurrently without coordination & only coordinate updates on `Pinned` pins. + +# Pins + +Following is an IPLD Schema representation of the "pin" objects (or whatever we want to call it) which: + +1. Can be in "tranisent" or "pinned" state + - To allow incermental updates through series of transactions. + - To have `tranisent` representation series of between udates. +3. Are identified by an [ed25519][] public key. Therefor they can represent + - IPNS names + - [did:key][] identifiers + - Actors in [UCAN][] authorization + + +This design would address several problems in .storage services: + +### Large uploads + +Large uploads that span multiple CAR files would gain a first class representation. Client application will be able to self issue new "pin" identifier and through incremental transactions amend it's state by uploading DAG shards via CAR files. Each transaction would amend `Pin` object with additional DAG shard head(s) followed by a final trasaction changing `Pin` from `Transient` state to `Pinned` state pointing to +a desired DAG root cid (that was provided in one of the transaction). + +This way would allow .storage service to list not only succesful, but also "in progress" uploads (pins). Additional metadata could also be used to provide domain specific information about the status. E.g. applications built on top of web3.storage could utilize this to provide human readable description along with domain specific status `code`. + + +### IPNS + +.storage services could directly map pins to corresponding IPNS names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. + +Pin status could be used to decide when to propagate pin updates through the network e.g. sevice could choose to only announce only pinned states. + +### did:key + +.storage service could also provide interface for accessing content under `did:key` that correspond to a given keys. Basically we can build IPNS like system except with delegated publishing through UCANs before integrating that into IPNS. + +### UCAN + +By representing pins as first class objects identified by `did:key` they become actors in UCANs delegated capabilties system. + +.storage user could issue delegated token for specific `Pin` object and excercise that capability to update given `Pin` object or delegate that capability to another actor in the system. + + +## Schema + +Following is an IPLD schema definition for the `Pin` object. + +> 💭 One one hand we would like to specify enough structure to be able to make sense of it in applications (be it .storage or it's clients), but on the other hand boxing actual DAG just to give it a "status" info seems awkward. + + +```ipldsch +-- Pin represents (IPNS) named pointer to a DAG that +-- is either in "transient" state that is partially +-- or fully "pinned" state. In both cases it is +-- anotated pointer to DAG head(s). +type Pin union { + Transient "transient" + Pinned "pinned" +} representation inline { + discriminantKey "status" +} + +-- Represents partially pinned DAG. Think of it as +-- dirty tree in the git, head points to previous +-- revision. +-- Please note: Even though pin links to a previous +-- revision of the pin there is does not imply it is +-- pinned (you would need to include that link in the +-- index for that) +type Transient struct { + -- Link to a previous pin revision in "pinned" state + head &Pinned + -- Set of DAG roots that next pinned state will be + -- comprised of. + -- Please note that providing blocks under DAG + -- happens out of band meaning that DAG under the + -- link could be partial. + links [Link] +} + +-- Represents fully pinned DAG with some metadata. +-- Please note that fully pinned DAG does not imply +-- that full DAG is pinned, but rather provided +-- subdag +type Pinned struct { + -- Root representing current state of the pin. + root Link + -- Previous version of this pin (not sure what + -- would genesis block pint to maybe we need + -- a special genesis variant of "Pin" union) + head &Pinned + -- We have links to all the relevant sub-DAGs + -- because `root` may not be traversable e.g + -- if it is encrypted. By providing links service + -- can traverse it and pin all the relevant blocks + -- even when it can't make sense of them. + links [Link] +} +``` + +### Pin Update Protocol + +General idea is that clients on the network could submit `Transactions` to perform `Pin` updates. Following is the IPLD schema for the transaction. + +> 🤔 Transaction and Pin are structurally almost identical, I wonder if it would make sense to make them actually identically. That way we could have `PUT` / `PATCH` operations where first replaces former value with new one and later patches it. + +```ipldsch +type Transaction union { + Patch "patch" + Commit "commit" +} representation inline { + discriminantKey "type" +} + +-- When "Patch" transaction is received, service +-- performs following steps: +-- 1. Verify that current pin head corresponds to +-- provided head (if pin is in transient state it +-- checks it checks against it's head). If provided +-- head points to older revision (heads form the +-- merkle clock) it should deny transaction. If +-- provided head is newer revision (than known to +-- service) state of the pin on service is out of +-- date and it still refuses transaction as it is +-- unable to process it yet. +-- 2. If pin is in "pinned" state transitions pin to +-- "transient" state in which `head` & `links` match +-- what was provided. +-- If pin is in "tranisent" state update it's `links` +-- to union of the provided links and pin state +-- links. +-- +-- Note that service may or may not publish IPNS +-- record after processing "Patch" transaction. +type Patch { + -- Pin identifier that is it's public key + pin ID + -- pointer to the head this pin. + head &Pinned + -- Set of links to be included in the next + -- revision of the pin. + links [Link] + -- This would link to IPLD representation of + -- the UCAN (wich "patch" capability) in which + -- outermost audience is service ID this patch was + -- send to and innermost issuer is the pin ID. + -- This would allow pinning service to publish a + -- new IPNS record (assuming we add support for + -- UCANs in IPNS). + -- Note: service needs to generate IPNS record + -- update based on it's pin state which may be + -- different from the one submitted by a client. + proof &UCAN +} + +-- When "Commit" transaction is recieved service +-- perform same steps as with "Patch". Main +-- difference is that after processing this +-- transaction Pin will transition to Pinned +-- state. If pin was in Pinned state new state +-- will contain only provided links, otherwise +-- it will contain union of provided links and links +-- in the current state. +-- +-- General expectation is that service will update IPNS +-- record after processing "commit" transaction. +type Commit { + -- Pin identifier that is it's public key + pin ID + -- pointer to the head of the pin. + head &Pinned + -- The root of the DAG for a new revisions + -- it is implicitly implied to be in the links. + root Link + -- Set of links to be included in the next + -- revision of the pin. + links [Link] + -- This would link to the IPLD representation of + -- the UCAN (with "commit" capability) in which + -- outermost audience is service ID and innermost + -- issuer is pin ID. That way service can verify that + -- commit is warranted and generate own IPNS update + -- record given it's current state. + proof &UCAN +} + + + +-- Binary representation of the ed25519 public key +type ID = Bytes +-- TODO: Define actual structure +type UCAN = Link +``` + +In the .storage setting it is expected that client will: + +1. Provide CAR encoded DAG shard(s) + (non empty) set `Transaction` blocks. + +In the .storage setting it is expect that service will: + +1. Verify that claimed transaction(s) are warrented by provided provided UCAN. +3. Perform atomic transaction either succesfully updating ALL `Pins` or failing and updating no pins. + +[ed25519]:https://ed25519.cr.yp.to/ +[UCAN]:https://whitepaper.fission.codes/access-control/ucan +[did:key]:https://w3c-ccg.github.io/did-method-key/ \ No newline at end of file From 04acff6af6d16066cf65977ec0c4a745dcb5fbc3 Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 9 Feb 2022 20:16:53 +0000 Subject: [PATCH 02/12] Rev 2 --- Pin.md | 56 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/Pin.md b/Pin.md index a6f12c1..107977e 100644 --- a/Pin.md +++ b/Pin.md @@ -1,37 +1,37 @@ -# ⚠️ Disclaimer ⚠️ - -> Since writing this I have realized that while incermental writing approach described here would work well in centralized system (single writer) it will be problematic in decentralized system where multiple actors may be doing concurrent updates e.g. if web3 application uses web3.storage service from multiple clients they would fail to transact or will have to coordinate updates (version & index). -> -> I think we could do better by using grow-only sets when patching the pin object and only coordinate pin update between `Pinned` states. That way `Transient` pins could be updated concurrently without coordination & only coordinate updates on `Pinned` pins. # Pins -Following is an IPLD Schema representation of the "pin" objects (or whatever we want to call it) which: +Following is an [IPLD Schema] representation of the `Pin` objects (or whatever we want to call it), which: -1. Can be in "tranisent" or "pinned" state - - To allow incermental updates through series of transactions. - - To have `tranisent` representation series of between udates. -3. Are identified by an [ed25519][] public key. Therefor they can represent - - IPNS names - - [did:key][] identifiers - - Actors in [UCAN][] authorization +1. Can be in `Tranisent` or `Pinned` state. + - Allow incermental `Pinned` state updates. + - Allow concurrent `Tranisent` state updates. +3. Are identified by an [ed25519][] public key, that can represent + - [IPNS][] names. + - [did:key][] identifiers. + - Actors in [UCAN][] authorization. This design would address several problems in .storage services: ### Large uploads -Large uploads that span multiple CAR files would gain a first class representation. Client application will be able to self issue new "pin" identifier and through incremental transactions amend it's state by uploading DAG shards via CAR files. Each transaction would amend `Pin` object with additional DAG shard head(s) followed by a final trasaction changing `Pin` from `Transient` state to `Pinned` state pointing to -a desired DAG root cid (that was provided in one of the transaction). +Large uploads that span multiple CAR files would gain a first class representation via `Pin` objects. Client application wishing to upload large file (or any other DAG) will be able to accomplish that by: + + +1. Self issuing a new `Pin` identifier (and corresponding UCAN) by generating new [ed25519][] keypair. +1. Submitting concurrent `Patch` transactions (in [CAR][] format). Each transaction will contain DAG shards, subset of blocks that would feet upload quota. +1. Finalizing upload by submitting `Commit` transaction (in [CAR][] format), setting `Pin` root to a `CID` of the large file. -This way would allow .storage service to list not only succesful, but also "in progress" uploads (pins). Additional metadata could also be used to provide domain specific information about the status. E.g. applications built on top of web3.storage could utilize this to provide human readable description along with domain specific status `code`. +This way would allow .storage service to list "in progress" uploads (keyed by `Pin` id) and complete uploads (keyed by `CID` or/and `Pin` id). ### IPNS -.storage services could directly map pins to corresponding IPNS names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. +.storage services could mirror `Pin`s to corresponding [IPNS][] names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. + +> Pin state (`Transient` or `Pinned`) could be used to decide when to propagate pin changes through the network e.g. sevice could choose to only announce only `Pinned` states. -Pin status could be used to decide when to propagate pin updates through the network e.g. sevice could choose to only announce only pinned states. ### did:key @@ -46,9 +46,7 @@ By representing pins as first class objects identified by `did:key` they become ## Schema -Following is an IPLD schema definition for the `Pin` object. - -> 💭 One one hand we would like to specify enough structure to be able to make sense of it in applications (be it .storage or it's clients), but on the other hand boxing actual DAG just to give it a "status" info seems awkward. +Following is an [IPLD schema][] definition for the `Pin` object. ```ipldsch @@ -103,9 +101,7 @@ type Pinned struct { ### Pin Update Protocol -General idea is that clients on the network could submit `Transactions` to perform `Pin` updates. Following is the IPLD schema for the transaction. - -> 🤔 Transaction and Pin are structurally almost identical, I wonder if it would make sense to make them actually identically. That way we could have `PUT` / `PATCH` operations where first replaces former value with new one and later patches it. +General idea is that clients on the network could submit `Transactions`s to perform `Pin` updates. Following is the [IPLD schema][] for the transaction. ```ipldsch type Transaction union { @@ -197,13 +193,17 @@ type UCAN = Link In the .storage setting it is expected that client will: -1. Provide CAR encoded DAG shard(s) + (non empty) set `Transaction` blocks. +1. Provide DAG shard(s) in [CAR][] format. +4. Include `Transaction` blocks in the provided [CAR][] and list them in [roots](https://ipld.io/specs/transport/car/carv1/#number-of-roots). In the .storage setting it is expect that service will: -1. Verify that claimed transaction(s) are warrented by provided provided UCAN. -3. Perform atomic transaction either succesfully updating ALL `Pins` or failing and updating no pins. +1. Verify that claimed transaction(s) are warrented by provided provided UCAN (Ensuring that client is allowed to update `Pin`). +3. Perform atomic transaction either succesfully updating ALL `Pins` (as per transaction) or failing and NOT updating any of the pins. (Rejecting request and provide [CAR][] all together). [ed25519]:https://ed25519.cr.yp.to/ [UCAN]:https://whitepaper.fission.codes/access-control/ucan -[did:key]:https://w3c-ccg.github.io/did-method-key/ \ No newline at end of file +[did:key]:https://w3c-ccg.github.io/did-method-key/ +[IPLD Schema]:https://ipld.io/docs/schemas/ +[IPNS]:https://github.com/ipfs/specs/blob/master/IPNS.md +[CAR]:https://ipld.io/specs/transport/car/carv1/ \ No newline at end of file From d83692747c6cc3b83896068a414c4f8240f7ab63 Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 23 Feb 2022 18:53:52 +0000 Subject: [PATCH 03/12] rename to revision --- Pin.md | 167 +++++++++++++++++++++++++++++++++------------------------ 1 file changed, 98 insertions(+), 69 deletions(-) diff --git a/Pin.md b/Pin.md index 107977e..d81e2fb 100644 --- a/Pin.md +++ b/Pin.md @@ -1,11 +1,11 @@ -# Pins +# Revision -Following is an [IPLD Schema] representation of the `Pin` objects (or whatever we want to call it), which: +Following is an [IPLD Schema] representation of the `Revision` objects (or whatever we want to call it), which: -1. Can be in `Tranisent` or `Pinned` state. - - Allow incermental `Pinned` state updates. - - Allow concurrent `Tranisent` state updates. +1. Can be in `Draft` or `Release` state. + - Allow _coordinated_ `Release` updates. + - Allow _concurrent_ `Draft` updates. 3. Are identified by an [ed25519][] public key, that can represent - [IPNS][] names. - [did:key][] identifiers. @@ -16,21 +16,21 @@ This design would address several problems in .storage services: ### Large uploads -Large uploads that span multiple CAR files would gain a first class representation via `Pin` objects. Client application wishing to upload large file (or any other DAG) will be able to accomplish that by: +Large uploads that span multiple CAR files would gain a first class representation via `Revision` objects. Client application wishing to upload large file (or any other DAG) will be able to accomplish that by: -1. Self issuing a new `Pin` identifier (and corresponding UCAN) by generating new [ed25519][] keypair. +1. Self issuing a new `Revision` identifier (and corresponding UCAN) by generating new [ed25519][] keypair. 1. Submitting concurrent `Patch` transactions (in [CAR][] format). Each transaction will contain DAG shards, subset of blocks that would feet upload quota. -1. Finalizing upload by submitting `Commit` transaction (in [CAR][] format), setting `Pin` root to a `CID` of the large file. +1. Finalizing upload by submitting `Commit` transaction (in [CAR][] format), setting `root` of the `Revision` to a `CID` of the large file. -This way would allow .storage service to list "in progress" uploads (keyed by `Pin` id) and complete uploads (keyed by `CID` or/and `Pin` id). +This would allow .storage service to list "in progress" uploads (keyed by `Revision` id) and "finished" uploads (keyed by `CID` or/and `Revision` id). ### IPNS -.storage services could mirror `Pin`s to corresponding [IPNS][] names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. +.storage services could mirror `Revisions`s to corresponding [IPNS][] names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. -> Pin state (`Transient` or `Pinned`) could be used to decide when to propagate pin changes through the network e.g. sevice could choose to only announce only `Pinned` states. +> `Revision` state (`Draft` or `Release`) could be used to decide when to propagate changes through the network e.g. sevice could choose to only announce only `Release` states. ### did:key @@ -39,38 +39,37 @@ This way would allow .storage service to list "in progress" uploads (keyed by `P ### UCAN -By representing pins as first class objects identified by `did:key` they become actors in UCANs delegated capabilties system. +By representing `Revision`s as first class objects identified by `did:key` they become actors in UCANs delegated capabilties system. -.storage user could issue delegated token for specific `Pin` object and excercise that capability to update given `Pin` object or delegate that capability to another actor in the system. +.storage user could issue delegated token for specific `Revision` object and excercise that capability to update given `Revision` object or delegate that capability to another actor in the system. ## Schema -Following is an [IPLD schema][] definition for the `Pin` object. +Following is an [IPLD schema][] definition for the `Revision` object. ```ipldsch --- Pin represents (IPNS) named pointer to a DAG that --- is either in "transient" state that is partially --- or fully "pinned" state. In both cases it is --- anotated pointer to DAG head(s). +-- Revision represents (IPNS) named pointer to a DAG that +-- is either in "draft" or fully "release" state. +-- In both cases it is anotated pointer to a DAG. type Pin union { - Transient "transient" - Pinned "pinned" + Draft "draft" + Release "release" } representation inline { discriminantKey "status" } -- Represents partially pinned DAG. Think of it as -- dirty tree in the git, head points to previous --- revision. --- Please note: Even though pin links to a previous --- revision of the pin there is does not imply it is --- pinned (you would need to include that link in the --- index for that) -type Transient struct { - -- Link to a previous pin revision in "pinned" state - head &Pinned +-- `Revision`. +-- Please note: Even though it links to a previous +-- revision that does not imply it is pinned (you +-- would need to include that link in the links +-- explicitly) +type Draft struct { + -- Link to a previous pin revision in "release" state + head &Release -- Set of DAG roots that next pinned state will be -- comprised of. -- Please note that providing blocks under DAG @@ -83,13 +82,13 @@ type Transient struct { -- Please note that fully pinned DAG does not imply -- that full DAG is pinned, but rather provided -- subdag -type Pinned struct { - -- Root representing current state of the pin. +type Release struct { + -- Root representing current state of the revision. root Link -- Previous version of this pin (not sure what -- would genesis block pint to maybe we need -- a special genesis variant of "Pin" union) - head &Pinned + head &Release -- We have links to all the relevant sub-DAGs -- because `root` may not be traversable e.g -- if it is encrypted. By providing links service @@ -101,7 +100,7 @@ type Pinned struct { ### Pin Update Protocol -General idea is that clients on the network could submit `Transactions`s to perform `Pin` updates. Following is the [IPLD schema][] for the transaction. +General idea is that clients on the network could submit `Transactions`s to perform `Revesion` updates. Following is the [IPLD schema][] for the transaction. ```ipldsch type Transaction union { @@ -113,66 +112,69 @@ type Transaction union { -- When "Patch" transaction is received, service -- performs following steps: --- 1. Verify that current pin head corresponds to --- provided head (if pin is in transient state it --- checks it checks against it's head). If provided +-- 1. Verify that current release head corresponds +-- to provided head (if pin is in draft state +-- it checks against it's head). If provided -- head points to older revision (heads form the -- merkle clock) it should deny transaction. If -- provided head is newer revision (than known to --- service) state of the pin on service is out of --- date and it still refuses transaction as it is --- unable to process it yet. --- 2. If pin is in "pinned" state transitions pin to --- "transient" state in which `head` & `links` match --- what was provided. --- If pin is in "tranisent" state update it's `links` --- to union of the provided links and pin state +-- service) state of the revision on service is +-- out of date and it still refuses transaction +-- as it is unable to process it yet. +-- 2. If revision is in "release" state transitions +-- it to "draft" state in which `head` & `links` +-- match what was provided. +-- If pin is in "draft" state update it's `links` +-- to union of the provided links and local state -- links. -- -- Note that service may or may not publish IPNS -- record after processing "Patch" transaction. type Patch { - -- Pin identifier that is it's public key - pin ID - -- pointer to the head this pin. - head &Pinned + -- Revision identifier that is it's public key + id ID + -- Pointer to the head patch assumes revision is on. + head &Release -- Set of links to be included in the next - -- revision of the pin. + -- release of the revision. links [Link] -- This would link to IPLD representation of - -- the UCAN (wich "patch" capability) in which - -- outermost audience is service ID this patch was - -- send to and innermost issuer is the pin ID. - -- This would allow pinning service to publish a - -- new IPNS record (assuming we add support for - -- UCANs in IPNS). + -- the UCAN (with "patch" capability) in which + -- invocation audience is service ID this patch + -- was send to and root issuer is the revision DID. + -- This allows service to publish a new IPNS + -- record (assuming we add support for UCANs in + -- IPNS). -- Note: service needs to generate IPNS record - -- update based on it's pin state which may be - -- different from the one submitted by a client. + -- update based on it's local `revision` state + -- which may be different from the one submitted + -- by a client. Client is responsible to do + -- necessary coordination. proof &UCAN } -- When "Commit" transaction is recieved service --- perform same steps as with "Patch". Main +-- performs same steps as with "Patch". Main -- difference is that after processing this --- transaction Pin will transition to Pinned --- state. If pin was in Pinned state new state +-- transaction Revision will transition to Release +-- state. If revision was in Release state new state -- will contain only provided links, otherwise --- it will contain union of provided links and links --- in the current state. +-- it will contain union of all the links that were +-- received via patches and links provided via +-- Commit. -- -- General expectation is that service will update IPNS -- record after processing "commit" transaction. type Commit { - -- Pin identifier that is it's public key - pin ID + -- Revision identifier that is it's public key + id ID -- pointer to the head of the pin. - head &Pinned - -- The root of the DAG for a new revisions - -- it is implicitly implied to be in the links. + head &Release + -- The root of the DAG for a new revisions it is + -- implicitly implied to be in the links. root Link -- Set of links to be included in the next - -- revision of the pin. + -- release of the revision. links [Link] -- This would link to the IPLD representation of -- the UCAN (with "commit" capability) in which @@ -198,8 +200,35 @@ In the .storage setting it is expected that client will: In the .storage setting it is expect that service will: -1. Verify that claimed transaction(s) are warrented by provided provided UCAN (Ensuring that client is allowed to update `Pin`). -3. Perform atomic transaction either succesfully updating ALL `Pins` (as per transaction) or failing and NOT updating any of the pins. (Rejecting request and provide [CAR][] all together). +1. Verify that claimed transaction(s) are warrented by provided provided UCAN (Ensuring that client is allowed to update `Revision`). +3. Perform atomic transaction either succesfully updating ALL `Revisions` (as per transaction) or failing and NOT updating non of the revisions. (Rejecting request and provided [CAR][] all together). + + +### 🚧 Transaction Serialization 🚧 + +> Note this requires more consideration, what follows is a just a current thinking on the matter. + +`Transaction`s MAY be serialized as CAR files. In this serilaziation format transaction to be executed +will be referenced as CAR roots and point to the DAG-CBOR encoded `Transaction` object with couple of +nuances: + +1. Transaction `links` may link to CAR CID which is to be intepreted as a set of `links` for all the blocks contained in the corresponding CAR. + + > **Note:** We do not currently have CAR IPLD codec. Idea is to iterate on this and define spec based on lessons learned. + +2. If `links` field are omitted from transaction object that implies links to all the blocks of this CAR (except transaction blocks). + +> 💔 I do not like "implicit links" as that seems impractical in case of multiple transactions. Even for a single transaction case `Transaction` may want to link a DAG known to be available at the destination e.g. CID of the previous revision. +> +> At the same time it would be impractical to list +> all the CIDs in the car in transaction itself. +> +> 💭 I'm starting to think we may want nested CARs, that way actual blocks can be included by encoding them via CAR codec. Which then can be referenced from the transaction in the outer CAR. +> ``` +> |--------- Header --------||------- Data -------| +> [ varint | DAG-CBOR block ][Transaction][DAG CAR] +> ``` +> That would allow breaking blocks into arbitrary sets and refer to them from the multilpe transactions. [ed25519]:https://ed25519.cr.yp.to/ [UCAN]:https://whitepaper.fission.codes/access-control/ucan From ffc31debb3fcfebe014f5d1b3ff3bfebb0445e89 Mon Sep 17 00:00:00 2001 From: Irakli Gozalishvili Date: Wed, 23 Feb 2022 10:54:29 -0800 Subject: [PATCH 04/12] Rename Pin.md to Revision.md --- Pin.md => Revision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename Pin.md => Revision.md (99%) diff --git a/Pin.md b/Revision.md similarity index 99% rename from Pin.md rename to Revision.md index d81e2fb..5874006 100644 --- a/Pin.md +++ b/Revision.md @@ -235,4 +235,4 @@ nuances: [did:key]:https://w3c-ccg.github.io/did-method-key/ [IPLD Schema]:https://ipld.io/docs/schemas/ [IPNS]:https://github.com/ipfs/specs/blob/master/IPNS.md -[CAR]:https://ipld.io/specs/transport/car/carv1/ \ No newline at end of file +[CAR]:https://ipld.io/specs/transport/car/carv1/ From a257f905b9894041c054aa4a39fdab975468df77 Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 16 Mar 2022 01:53:08 +0000 Subject: [PATCH 05/12] Revision 3 --- Pin.md | 323 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 323 insertions(+) create mode 100644 Pin.md diff --git a/Pin.md b/Pin.md new file mode 100644 index 0000000..8546e55 --- /dev/null +++ b/Pin.md @@ -0,0 +1,323 @@ +# Inter Planetary Transactional Memory (IPTM) + +Following document describes: + +1. Schema for representing DAGs that change over time, which from here on we will refer to as `Document`s. +1. Content derived addressing scheme for `Document` states. +1. Transactional update protocol for these `Documents`. +1. Document publishing protocol and associated concensus algorithm. + + +### Document Model + +Document represents a view of the DAG in time, uniquely identified by [ed25519][] public key. It is state derived from set of operations that were authorized via corresponding private key (through [UCAN][]). + +> Because documents are identified by [ed25519][] public key, that CAN represent +> - [IPNS][] names. +> - [did:key][] identifiers. +> - Actors in [UCAN][] authorization. + + +Documents can also be addressed in "specific state" by [CID][] _(which can be simply derived from CIDs of Shards it consists of, which will cover in more detail later)_ + + +### Document States + +Document can be in two logical states. Transitional state, here on referred as `Draft`, where it's in the process of transmittion _(e.g. in progress upload)_ where it has no `root`. Published document with a specific `root` is referred as `Edition`. + +Following is an [IPLD schema][] definition for the `Document` object + +```ipldsch +type Document union { + Draft "draft" + Edition "edition" +} representation inline { + discriminantKey "status" +} + +type Draft { + -- Shards of the DAG this draft is comprised of + shards: [&Shard] +} + +type Edition { + shards: [&Shard] + root: Link +} + +``` + +#### Draft + + +`Draft` represents in-flight document, usually while it's content is transmitted, which occurs prior to initial publish or between subsequent editions. + + +It's `shard` field links to all the `Shard`s it is comprised of, which are [CAR][] encoded set of blocks. + +Every possible `Draft` state can be addressed by `CID` which can be computed by encoding it as DAG-CBOR node with `shards` sorted alphabetically (There's relevant prior art of [ZDAG hearder compression][] we colud borrow from) + +> 💡 Please note that, while possible, `Draft`s are not meant to be stored. They are primarily a way to reference specific states of a document without having to retransmit a lot of data. + + +#### Edition + +`Edition` simply represents a state in which specific `Draft` is assigned a `root` node that MUST be present in one of it's shards. + +> `Edition`s can also be uniquely addressed via CID pretty much the same way as `Draft`s, although we currently have no practical need for this. + + +### Document update protocol + +Documents in our model are represented via "append only" DAGs and can be updated using two types of transactions: + +- `Append` - Appends provided DAG shards to the document state. +- `Publish` - Publishes a `root` of the DAG for a specified `Draft`. + + +Following is an [IPLD schema][] definition for the `Transaction` object + + +```ipldsch +type Transaction union { + Append "append" + Publish "publish" +} representation inline { + discriminantKey "type" +} + +type Append { + -- Document ID to append provided shards to + id ID + -- Shards to be appended + shards: [&Shard] + -- UCAN authorization of this append + proof &UCAN +} + +type Publish { + -- Document ID to append provided shard to + id ID + -- State of the document to publish + draft: &Draft + -- Root to be published + root: Link + -- Shard in which root is located + shard: &Shard + -- UCAN authorization to publish this document + proof &UCAN +} + +-- Binary representation of the ed25519 public key +type ID = Bytes +-- TODO: Define actual structure +type UCAN = Link +``` + +### Append + +Append operation is both [commutative][] and [idempotent][idempotence], in other words they can be applied in any order and multiple times yet result in the same document state, that is because result of application is just addition of provided `shards` into document's `shards` set. + +> It is worth noting that `Append` tranisions document from `Draft` or a `Edition` state into a `Draft` state, unless it has been already applied in which case it is noop. + + +### Publish + +Publish operation simply assigns root to a specific document `Draft`. Since conflicting publish operations could occur, e.g. when two operations link `root` to a different `CID` we apply both operations in an order of operation CIDs, those operation sorted lowest alphabetically wins. + +> In practice we expect this to be really rare and of limited value to an malicious actor since root could only point to the CID within the document shards. + +##### Logical clock + +Publishing a document may have a side effects (e.g. publishing it on IPNS). That is to say if given document state `D(a, b, c, d)` _(lower case letters signify shards)_ applying concurrent publish operations `P(a, b)` and `P(c, d)` may have visible side-effects despite been out of date. + +> Please note that `publish` operations themself MUST be part of the document shards which naturally creates causal relationships new operation implicitly refers older ones. More on this can be found in [Merkle CRDT][] paper. + + +In order to reconcile concurrent publish operations we define total order (only) among published drafts as follows: + +1. Given drafts `D1` and `D2`, if all shards of `D1` are included in `D2` we say `D1 <- D2` (`D1` predates `D2`). +2. Given drafts `D1` and `D2` where neither `D1` nor `D2` includes shards of the other `D1 <- D2` if: + 1. Number of shards in `D2` is greater than in `D1` + 2. Number of shards in `D2` is equal to number of shards in `D1` & `CID` of `D1 < D2`. + +## Appliactions + +### Large Uploads in dotStorage + +This section we describe practical application of this specification in dotStorage service(s), by walking through a large uploads flow, which would enbale service to list "in progress" and "complete" uploads. + + + +1. Client generates [ed25519][] keypair. +2. Client derives `Document` ID and corresponding `Append` / `Publish` UCANs for it from keypair. +3. Client passes large file to a `@ipld/unixfs` library to get a stream of blocks. +4. Blocks are read from the stream and packed into CARs of 200MiB in size. +5. Each CAR is packet is wrapped in the outer CAR, with `Append` operation which links to a nested packet CAR by it's CID in `shards` and sends it of to the dotStorage designated endpoint. +6. Once all packets `Append`-ed client produces `Publish` operation, by deriving `Draft` CID from all the CAR packet `CID`s it produced and with `root` corresponding to file `root` CID. +7. Clien sends `Publish` operation and awaits it's completion. + + +> Note that in this use case `Document` is used to represent _upload session_ which is discarded on success. +> Also note that wrapper cars could `Append` / `Publish` shards into more then one document. + + + + +### Incremental update flow + +In this flow submits incremental updates through ordered transactions `p1 <- p2 <- c2`. Note that client does not need to await for `p1` to finish before submitting `p2` since it is aware of `p1` CID at creation it can create `p2` which will only apply after `p1` and only if it succeeds (same with `c2`) + + +``` + c1()--+ - Commit c1 with no parent + | + R(c1) - Init Release with c0 parent + | + | + p1(c0) --+ - Patch with c0 parent + | + D(p1) - Transition to Draft with p1 parent + | + | + p2(p1)--+ - Apply Patch p2 with parent p1 + | + D(p2) - Transition to Draft with p2 parent + | + | + c2(p2) --+ - Apply Commit c2 with parent p2 + | + R(c2) - Transition to Release with c2 Parent +``` + +``` + c1()--+ - Commit c1 with no parent + | + R(c1) - Init Release with c0 parent + | + | + p1(c0) --+ - Patch with c0 parent + | + D(p1) - Transition to Draft with p1 parent + | + | + p2(p1)--+ - Apply Patch p2 with parent p1 + | + D(p2) - Transition to Draft with p2 parent + | + | + c2(p2) --+ - Apply Commit c2 with parent p2 + | + R(c2) - Transition to Release with c2 Parent +``` + + + +[ed25519]:https://ed25519.cr.yp.to/ +[UCAN]:https://whitepaper.fission.codes/access-control/ucan +[did:key]:https://w3c-ccg.github.io/did-method-key/ +[IPLD Schema]:https://ipld.io/docs/schemas/ +[IPNS]:https://github.com/ipfs/specs/blob/master/IPNS.md +[CAR]:https://ipld.io/specs/transport/car/carv1/ +[Merkle CRDT]:https://research.protocol.ai/blog/2019/a-new-lab-for-resilient-networks-research/PL-TechRep-merkleCRDT-v0.1-Dec30.pdf +[CID]:https://docs.ipfs.io/concepts/content-addressing/ +[ZDAG hearder compression]:https://github.com/mikeal/ZDAG/blob/master/SPEC.md#links_header_compression + +[commutative]:https://en.wikipedia.org/wiki/Commutative_property +[idempotence]:https://en.wikipedia.org/wiki/Idempotence + + From d11e58ae3f8565c3759fb91212422c539d614479 Mon Sep 17 00:00:00 2001 From: Irakli Gozalishvili Date: Tue, 15 Mar 2022 18:54:36 -0700 Subject: [PATCH 06/12] Delete Revision.md --- Revision.md | 238 ---------------------------------------------------- 1 file changed, 238 deletions(-) delete mode 100644 Revision.md diff --git a/Revision.md b/Revision.md deleted file mode 100644 index 5874006..0000000 --- a/Revision.md +++ /dev/null @@ -1,238 +0,0 @@ - -# Revision - -Following is an [IPLD Schema] representation of the `Revision` objects (or whatever we want to call it), which: - -1. Can be in `Draft` or `Release` state. - - Allow _coordinated_ `Release` updates. - - Allow _concurrent_ `Draft` updates. -3. Are identified by an [ed25519][] public key, that can represent - - [IPNS][] names. - - [did:key][] identifiers. - - Actors in [UCAN][] authorization. - - -This design would address several problems in .storage services: - -### Large uploads - -Large uploads that span multiple CAR files would gain a first class representation via `Revision` objects. Client application wishing to upload large file (or any other DAG) will be able to accomplish that by: - - -1. Self issuing a new `Revision` identifier (and corresponding UCAN) by generating new [ed25519][] keypair. -1. Submitting concurrent `Patch` transactions (in [CAR][] format). Each transaction will contain DAG shards, subset of blocks that would feet upload quota. -1. Finalizing upload by submitting `Commit` transaction (in [CAR][] format), setting `root` of the `Revision` to a `CID` of the large file. - - -This would allow .storage service to list "in progress" uploads (keyed by `Revision` id) and "finished" uploads (keyed by `CID` or/and `Revision` id). - -### IPNS - -.storage services could mirror `Revisions`s to corresponding [IPNS][] names, making it possible to access arbitrary uploads / pins through an IPNS resoultion. - -> `Revision` state (`Draft` or `Release`) could be used to decide when to propagate changes through the network e.g. sevice could choose to only announce only `Release` states. - - -### did:key - -.storage service could also provide interface for accessing content under `did:key` that correspond to a given keys. Basically we can build IPNS like system except with delegated publishing through UCANs before integrating that into IPNS. - -### UCAN - -By representing `Revision`s as first class objects identified by `did:key` they become actors in UCANs delegated capabilties system. - -.storage user could issue delegated token for specific `Revision` object and excercise that capability to update given `Revision` object or delegate that capability to another actor in the system. - - -## Schema - -Following is an [IPLD schema][] definition for the `Revision` object. - - -```ipldsch --- Revision represents (IPNS) named pointer to a DAG that --- is either in "draft" or fully "release" state. --- In both cases it is anotated pointer to a DAG. -type Pin union { - Draft "draft" - Release "release" -} representation inline { - discriminantKey "status" -} - --- Represents partially pinned DAG. Think of it as --- dirty tree in the git, head points to previous --- `Revision`. --- Please note: Even though it links to a previous --- revision that does not imply it is pinned (you --- would need to include that link in the links --- explicitly) -type Draft struct { - -- Link to a previous pin revision in "release" state - head &Release - -- Set of DAG roots that next pinned state will be - -- comprised of. - -- Please note that providing blocks under DAG - -- happens out of band meaning that DAG under the - -- link could be partial. - links [Link] -} - --- Represents fully pinned DAG with some metadata. --- Please note that fully pinned DAG does not imply --- that full DAG is pinned, but rather provided --- subdag -type Release struct { - -- Root representing current state of the revision. - root Link - -- Previous version of this pin (not sure what - -- would genesis block pint to maybe we need - -- a special genesis variant of "Pin" union) - head &Release - -- We have links to all the relevant sub-DAGs - -- because `root` may not be traversable e.g - -- if it is encrypted. By providing links service - -- can traverse it and pin all the relevant blocks - -- even when it can't make sense of them. - links [Link] -} -``` - -### Pin Update Protocol - -General idea is that clients on the network could submit `Transactions`s to perform `Revesion` updates. Following is the [IPLD schema][] for the transaction. - -```ipldsch -type Transaction union { - Patch "patch" - Commit "commit" -} representation inline { - discriminantKey "type" -} - --- When "Patch" transaction is received, service --- performs following steps: --- 1. Verify that current release head corresponds --- to provided head (if pin is in draft state --- it checks against it's head). If provided --- head points to older revision (heads form the --- merkle clock) it should deny transaction. If --- provided head is newer revision (than known to --- service) state of the revision on service is --- out of date and it still refuses transaction --- as it is unable to process it yet. --- 2. If revision is in "release" state transitions --- it to "draft" state in which `head` & `links` --- match what was provided. --- If pin is in "draft" state update it's `links` --- to union of the provided links and local state --- links. --- --- Note that service may or may not publish IPNS --- record after processing "Patch" transaction. -type Patch { - -- Revision identifier that is it's public key - id ID - -- Pointer to the head patch assumes revision is on. - head &Release - -- Set of links to be included in the next - -- release of the revision. - links [Link] - -- This would link to IPLD representation of - -- the UCAN (with "patch" capability) in which - -- invocation audience is service ID this patch - -- was send to and root issuer is the revision DID. - -- This allows service to publish a new IPNS - -- record (assuming we add support for UCANs in - -- IPNS). - -- Note: service needs to generate IPNS record - -- update based on it's local `revision` state - -- which may be different from the one submitted - -- by a client. Client is responsible to do - -- necessary coordination. - proof &UCAN -} - --- When "Commit" transaction is recieved service --- performs same steps as with "Patch". Main --- difference is that after processing this --- transaction Revision will transition to Release --- state. If revision was in Release state new state --- will contain only provided links, otherwise --- it will contain union of all the links that were --- received via patches and links provided via --- Commit. --- --- General expectation is that service will update IPNS --- record after processing "commit" transaction. -type Commit { - -- Revision identifier that is it's public key - id ID - -- pointer to the head of the pin. - head &Release - -- The root of the DAG for a new revisions it is - -- implicitly implied to be in the links. - root Link - -- Set of links to be included in the next - -- release of the revision. - links [Link] - -- This would link to the IPLD representation of - -- the UCAN (with "commit" capability) in which - -- outermost audience is service ID and innermost - -- issuer is pin ID. That way service can verify that - -- commit is warranted and generate own IPNS update - -- record given it's current state. - proof &UCAN -} - - - --- Binary representation of the ed25519 public key -type ID = Bytes --- TODO: Define actual structure -type UCAN = Link -``` - -In the .storage setting it is expected that client will: - -1. Provide DAG shard(s) in [CAR][] format. -4. Include `Transaction` blocks in the provided [CAR][] and list them in [roots](https://ipld.io/specs/transport/car/carv1/#number-of-roots). - -In the .storage setting it is expect that service will: - -1. Verify that claimed transaction(s) are warrented by provided provided UCAN (Ensuring that client is allowed to update `Revision`). -3. Perform atomic transaction either succesfully updating ALL `Revisions` (as per transaction) or failing and NOT updating non of the revisions. (Rejecting request and provided [CAR][] all together). - - -### 🚧 Transaction Serialization 🚧 - -> Note this requires more consideration, what follows is a just a current thinking on the matter. - -`Transaction`s MAY be serialized as CAR files. In this serilaziation format transaction to be executed -will be referenced as CAR roots and point to the DAG-CBOR encoded `Transaction` object with couple of -nuances: - -1. Transaction `links` may link to CAR CID which is to be intepreted as a set of `links` for all the blocks contained in the corresponding CAR. - - > **Note:** We do not currently have CAR IPLD codec. Idea is to iterate on this and define spec based on lessons learned. - -2. If `links` field are omitted from transaction object that implies links to all the blocks of this CAR (except transaction blocks). - -> 💔 I do not like "implicit links" as that seems impractical in case of multiple transactions. Even for a single transaction case `Transaction` may want to link a DAG known to be available at the destination e.g. CID of the previous revision. -> -> At the same time it would be impractical to list -> all the CIDs in the car in transaction itself. -> -> 💭 I'm starting to think we may want nested CARs, that way actual blocks can be included by encoding them via CAR codec. Which then can be referenced from the transaction in the outer CAR. -> ``` -> |--------- Header --------||------- Data -------| -> [ varint | DAG-CBOR block ][Transaction][DAG CAR] -> ``` -> That would allow breaking blocks into arbitrary sets and refer to them from the multilpe transactions. - -[ed25519]:https://ed25519.cr.yp.to/ -[UCAN]:https://whitepaper.fission.codes/access-control/ucan -[did:key]:https://w3c-ccg.github.io/did-method-key/ -[IPLD Schema]:https://ipld.io/docs/schemas/ -[IPNS]:https://github.com/ipfs/specs/blob/master/IPNS.md -[CAR]:https://ipld.io/specs/transport/car/carv1/ From 42ba255c562e45a271ffa5698951532a83bc5006 Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 16 Mar 2022 19:55:03 +0000 Subject: [PATCH 07/12] add abstract on source of truth --- Pin.md | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/Pin.md b/Pin.md index 8546e55..a8b4aea 100644 --- a/Pin.md +++ b/Pin.md @@ -1,5 +1,17 @@ # Inter Planetary Transactional Memory (IPTM) +## Simple Summary + +Protocol for representing and updating arbitrary IPLD DAGs over time. + +## Abstract + +In web3 **source of truth** is in data itself as opposed to raw in central database in traditional web2 applications. This often leads to a different architectures where databases are mere index. Good litmus test is are you able to drop existing database and recreate exact replica from the data itself. + +With above design goal following specification proposes permissionless protocol for representing and updating arbitrary IPLD DAGs over time with no assumbtions about databse / indexes one might use in implementation. + +## Specification + Following document describes: 1. Schema for representing DAGs that change over time, which from here on we will refer to as `Document`s. @@ -21,7 +33,7 @@ Document represents a view of the DAG in time, uniquely identified by [ed25519][ Documents can also be addressed in "specific state" by [CID][] _(which can be simply derived from CIDs of Shards it consists of, which will cover in more detail later)_ -### Document States +#### Document States Document can be in two logical states. Transitional state, here on referred as `Draft`, where it's in the process of transmittion _(e.g. in progress upload)_ where it has no `root`. Published document with a specific `root` is referred as `Edition`. @@ -36,13 +48,15 @@ type Document union { } type Draft { + status "draft" -- Shards of the DAG this draft is comprised of - shards: [&Shard] + shards [&Shard] } type Edition { - shards: [&Shard] - root: Link + status "edition" + shards [&Shard] + root Link } ``` @@ -114,14 +128,14 @@ type ID = Bytes type UCAN = Link ``` -### Append +#### Append Append operation is both [commutative][] and [idempotent][idempotence], in other words they can be applied in any order and multiple times yet result in the same document state, that is because result of application is just addition of provided `shards` into document's `shards` set. > It is worth noting that `Append` tranisions document from `Draft` or a `Edition` state into a `Draft` state, unless it has been already applied in which case it is noop. -### Publish +#### Publish Publish operation simply assigns root to a specific document `Draft`. Since conflicting publish operations could occur, e.g. when two operations link `root` to a different `CID` we apply both operations in an order of operation CIDs, those operation sorted lowest alphabetically wins. @@ -141,9 +155,9 @@ In order to reconcile concurrent publish operations we define total order (only) 1. Number of shards in `D2` is greater than in `D1` 2. Number of shards in `D2` is equal to number of shards in `D1` & `CID` of `D1 < D2`. -## Appliactions +### Appliactions -### Large Uploads in dotStorage +#### Large Uploads in dotStorage This section we describe practical application of this specification in dotStorage service(s), by walking through a large uploads flow, which would enbale service to list "in progress" and "complete" uploads. @@ -180,7 +194,7 @@ By representing `Documents`s as first class objects identified by `did:key` they dotStorage user could issue delegated token for specific `Document` object and excercise that capability to update given `Document` object or delegate that capability to another actor in the system. --> - + - [ed25519]:https://ed25519.cr.yp.to/ [UCAN]:https://whitepaper.fission.codes/access-control/ucan @@ -333,5 +245,8 @@ In this flow multiple clients concurrently submit patches and race commits. One [commutative]:https://en.wikipedia.org/wiki/Commutative_property [idempotence]:https://en.wikipedia.org/wiki/Idempotence - - +[DAG-CBOR]:https://ipld.io/specs/codecs/dag-cbor/spec/ +[IPLD]:https://ipld.io/specs/ +[IPFS]:https://ipfs.io/ +[IPLD Block]:https://ipld.io/glossary/#block +[IPLD codec]:https://ipld.io/specs/codecs/ \ No newline at end of file From c922a20e30c6476961cea33bef1e624fa0288d2c Mon Sep 17 00:00:00 2001 From: HackMD Date: Wed, 23 Mar 2022 07:24:51 +0000 Subject: [PATCH 09/12] add back section about source of truth --- Pin.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Pin.md b/Pin.md index 60f6067..85fb5ab 100644 --- a/Pin.md +++ b/Pin.md @@ -6,15 +6,15 @@ Protocol for representing, transporting and updating arbitrary [IPLD][] DAGs ove ### Abstract -Document describes [IPLD][] DAG replication protocol designed for constrained environments, where peer-to-peer replication is impractical. It aims to provide following functionality: +In decentralized applications **source of truth** is captured in the data itself, as opposed to a row in some database. This often leads to a less common architectures, where databases is mere index. Good litmus test is are you able to drop existing database and recreate exact replica from the data itself. + +With above design goal following specification describes [IPLD][] DAG replication protocol designed for constrained environments, where peer-to-peer replication is impractical. It aims to provide following functionality: 1. Allow transfer of large DAGs in shards _(of desired size)_ across multiple network request and/or sessions. 1. Allow transient DAG representations, that is partially replicated DAGs or revisions of one with a traversable root. 1. Allow for an uncoordinated multiplayer DAG creation/transfer with specific convergence properties. - - ### Motivation All content in [IPFS][] is represented by interlinked [blocks][IPLD Block] which form hash-linked DAGs. _(Every file in [IPFS][] is an [IPLD][] DAG under the hood.)_ From 01994a900a576339d681634815fad77c106518a4 Mon Sep 17 00:00:00 2001 From: HackMD Date: Tue, 29 Mar 2022 21:25:11 +0000 Subject: [PATCH 10/12] Add some graphics --- Pin.md | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 102 insertions(+), 4 deletions(-) diff --git a/Pin.md b/Pin.md index 85fb5ab..c14b3f9 100644 --- a/Pin.md +++ b/Pin.md @@ -26,7 +26,7 @@ Here we propose a DAG replication protocol that overcomes above limitations by t 1. Encoding sub-DAGs in desired sized packets - shards. 3. Wrapping shards in casually ordered operations (which can be transported out of order). 4. Define casually ordered _publish_ operations that can be used to bind DAGs states to a globally unique identifier. - + ### Replication Protocol Our replication protocol is defined in terms of atomic, immutable, content addressed "operations" which are wrapped in a container structure that adds casual ordering through hash-links. _(We define this container structure in the _Replica_ section below)_ @@ -224,13 +224,111 @@ type ID = Bytes Concurrent publish operations would lead to multilpe forks _(as with `Append`)_ which MUST be reconsiled by establishing total order among `Publish` operations as follows: -1. Given replicas `P1` and `P2`, if all operations of `P1` are included in `P2` we say `P1 <= P2` (`P1` predates `P2`). -1. Given replicas `P1` and `P2` where neither `P1` nor `P2` includes all changes of the other we say `P1 <= P2` if their CIDs in base32 encoding sort accordingly (`CIDofP1`, `CIDofP2`). +1. Given replicas `Pn` and `Pm`, if all operations of `Pn` are included in `Pm` we say `Pn <= Pm` (`Pn` predates `Pm`). +1. Given replicas `Pn` and `Pm` where neither `Pn` nor `Pm` includes all operations of the other we establish total order by: + 1. Finding divergence point, common replica `Po`. + 2. Compare CID _(in base32 string encoding)_ of each `Px` `Po...Pn` with `Py` from `Po...Pm`. If `Px < Py` then `Px < Py` and we compare `Px+1` with `Py` otherwise `Py < Px` and we compare `Py+1` with `Px` etc. -Please note that we do not care how operations within `P0...P1` and `P0...P2` order, where `P0` is devergence point as last operation effectively overrides all the other. Only exception to this is when last operation e.g. `P2` can not be performed due to linked DAG not been replicated yet. In such case implementer MAY compare `P1` with replica `prior` of `P2` with the same logic. +##### Illustrations + +Below we have peer `A` associating `a1`, `a2` and then `a3` records. Peer `B` publishes conflicting record `e1` concurrently with `k3`. +``` + A B + . . +g1.........1 + | . +g2---+.....2 + | | +g3 k1....3 +``` + +According to our convergence algoritm order of operations can be interpolated as follows _(because `CIDof(g3) < CIDof(k1)`)_ + +``` +A B + . . +g1.........1 + | . +g2---+.....2 + | | +g3...|.....3 + | + k1....4 +``` + +That also implies that if `A` has become aware of `k1` it's next record `g4` will link to `k1` and not `g3`. + +``` +A B + . . +g1........1 + | . +g2---+....2 + | | +g3...|....3 + | ++----k1...4 +| +g4........5 + +``` + +If `B` has published next record instead, event after becoming aware of `g3` it would still link to `k1` (as it sorts lower. + +``` +A B + . . +g1........1 + | . +g2---+....2 + | | +g3...|....3 + | + k1....4 + | + k2....5 + +``` + +In scenario where operation chains diverge further things are more complicated + + +``` + A B + . . + g1-----+ + | | + g2 e1 + | | + g3 k2 + | | + g4 e3 +``` + +Inferred order projects as follows + +``` + A B + . . + g1-----+......1 + | | + | e1.....2 (g2 > e1) + | | + g2............3 (g2 < k2) + | | + g3.....|......4 (g3 < k2) + | | + g4.....|......5 (g4 < k2) + | + k2.....6 + | + e3.....7 +``` + +It is worth noting that while `g4` and `e3` were concurrent and `g4 > e3` we still end up with `e3` after `g4`. That is to stress that comparing just last updates alone is not enough for establishing an order because at `k2` order would have been `g3 < k2` while at `e3` it would have been `g3 > e3`. By comparing all the concurrent operations we can establish deterministic order. [ed25519]:https://ed25519.cr.yp.to/ From 1e8ab07b15698116cd7654417c61376256e62738 Mon Sep 17 00:00:00 2001 From: Irakli Gozalishvili Date: Wed, 4 May 2022 12:53:41 -0700 Subject: [PATCH 11/12] Update Pin.md Co-authored-by: Brooklyn Zelenka --- Pin.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Pin.md b/Pin.md index c14b3f9..428d874 100644 --- a/Pin.md +++ b/Pin.md @@ -21,7 +21,7 @@ All content in [IPFS][] is represented by interlinked [blocks][IPLD Block] which Many applications in the ecosystem have adopted [Content Addressable Archives (CAR)][CAR] as a transport format for (Sub)DAGs in settings where peer-to-peer replication is impractical due to network, device or other constraints. This approach proved effective in settings where CAR size limit is not a concern, however there are still many constrained environments (e.g. serverless stacks) where transferring large DAGs in single CAR is impractical or plain impossible. -Here we propose a DAG replication protocol that overcomes above limitations by transporting large DAGs in multiple casually ordered network requests and/or sessions by: +Here we propose a DAG replication protocol that overcomes above limitations by transporting large DAGs in multiple causally ordered network requests and/or sessions by: 1. Encoding sub-DAGs in desired sized packets - shards. 3. Wrapping shards in casually ordered operations (which can be transported out of order). From b431ea98306e7697eaa402c17fd26420e9af0f66 Mon Sep 17 00:00:00 2001 From: Irakli Gozalishvili Date: Wed, 4 May 2022 12:56:26 -0700 Subject: [PATCH 12/12] Apply suggestions from code review Co-authored-by: Alan Shaw --- Pin.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Pin.md b/Pin.md index 428d874..b335cef 100644 --- a/Pin.md +++ b/Pin.md @@ -196,7 +196,7 @@ Shards according to this definition CAN be content addressed by [CID][], which i Publishing protocol allows representing DAGs over time by allowing authorized peers to change state associated with a unique identifier. -Just like DAG state we represent it's state in terms of casually oredered operations - Replica of `Publish` operations. +Just like DAG state we represent it's state in terms of casually ordered operations - Replica of `Publish` operations. `Publish` operation associates DAG _(as defined by our protocol)_ with a specific "root" with a unique identifier, represented by [ed25519][] public key. It is defined by a following [IPLD Schema][] @@ -209,7 +209,7 @@ type Publish { link &Any -- DAG representation origin &Replica - -- Shard containing root (Must be contained by origin) + -- Shard containing root block (Must be contained by origin) shard optional &Shard -- UCAN with publish capability to this id -- (Root issuer must be same as id) @@ -222,7 +222,7 @@ type ID = Bytes ##### Convergence -Concurrent publish operations would lead to multilpe forks _(as with `Append`)_ which MUST be reconsiled by establishing total order among `Publish` operations as follows: +Concurrent publish operations would lead to multiple forks _(as with `Append`)_ which MUST be reconciled by establishing total order among `Publish` operations as follows: 1. Given replicas `Pn` and `Pm`, if all operations of `Pn` are included in `Pm` we say `Pn <= Pm` (`Pn` predates `Pm`). 1. Given replicas `Pn` and `Pm` where neither `Pn` nor `Pm` includes all operations of the other we establish total order by: