Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sigchain Class API should provide paginated ordered claims by returning Array-POJO and indexed access #327

Closed
CMCDragonkai opened this issue Feb 8, 2022 · 10 comments · Fixed by #446
Assignees
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@CMCDragonkai
Copy link
Member

Specification

The Sigchain API requires some rework as discovered here: #310 (comment)

  1. The ChainData and ChainDataEncoded types are record POJOs, which are not the right data type we should be returning to callers. Record POJOs do not preserve order, and when returning chain data, we would want to have the ability to iterate over the data, and it should be in the same order as the chain data (as ordered by the claim id).
  2. When fetching a collection of resources in JS, always return an Array. Record POJOs are not ordered. Alternatively to preserve both order and index access, a Map can be used. HOWEVER, Array and POJOs are pure data. The Map and Set are not. This is why by convention, we only use Map as a persistent rich construct, and when exchanging data between systems, we use arrays and POJOs.
  3. This means we need Sigchain.getChain and Sigchain.getChainEncoded. Both methods should be returning the type Array<Claim> and Array<ClaimEncoded> respectively. The encoded version is the base64ed JWT, but usually in our own application we would be working off the Claim structure. At the same time there should be obvious conversion functions from Claim to ClaimEncoded and back.
  4. These 2 functions should expose seek and limit parameters to enable cursor pagination. Note that seek needs to be type of the seek key. The seek key of Sigchain is the ClaimId. Which we know to be based on IdSortable which itself uses timestamps internally. Both parameters are optional. One should be able to synthetically construct a ClaimId to represent a point in time cursor as well. This is a secondary priority, do this after the main design works.
  5. For indexed access, we would not expect the user to acquire a collection, and then query their collection. Instead we would expect the user to directly call a getClaim which you can pass the ClaimId to acquire the Claim itself. If they want an encoded version, we can expose getClaimEncoded.
  6. The ChainData and ChainDataEncoded may no longer be necessary as types. Just use Array<Claim> and Array<ClaimEncoded>. However it is essential that one can iterate over these 2 arrays, and be able to use extractors/transformers/converters to be able to acquire the ClaimId. This means the ClaimId must always be acquirable given a Claim or ClaimEncoded, ideally efficiently.
  7. Remote users that call over GRPC, they should be able to pass pagination parameters in and acquire a collection of claims, they can then process it the way they want and do indexed access if so.
  8. We should endeavour to always be using ClaimId internally in our application and ClaimIdString only when using them as POJO keys. The ClaimIdEncoded is purely used for external reference. And the Claim should be using ClaimId, and ClaimEncoded can use ClaimIdEncoded since you have to have JSON anyway. Make sure to use the reviver and replacer pattern as we do inside the Status class.

Additional context

Tasks

  1. ...
  2. ...
  3. ...
@joshuakarp
Copy link
Contributor

This will also have implications for our Discovery process, once we provide support for revocations in the sigchain, such that we can iterate over claims in chronological order. See #320 (comment)

@CMCDragonkai
Copy link
Member Author

In order to provide a "generic" paginated system.

We have to change the Sigchain.getClaims method call.

It needs to take a options object like:

    {
      order = 'asc',
      seek
    }: {
      order?: 'asc' | 'desc';
      seek?: ClaimId;
    } = {},

The above means that by default we are going to get C1, C2, C3. But you can reverse it by changing the order to desc. Furthermore, the seek allows one to seek to a particular ClaimId which is lexicographically ordered. This is used as gte under asc or lte under desc. This means the seek is actually inclusive.

Now this does mean you need to have an ID to actually be able to seek. However one could also "synthetically" construct a ClaimId because it's an IdSortable, although that can be a bit weird to do so, there's no utility function to do this atm though.

There's no limit, because it is an async generator enabling you to just stop consuming when you want to stop. Finally the whole thing is transactional too.

The original method had the ability to filter to specific kinds of claims. This should be generalised to a general indexing filter. To do this, you would need the ability to filter the records according to index. However there will only be some items that are indexed, not all of them would be, and we would statically limit these.

Examples of things to index to:

  • sequence number - so you can look up be sequence number instead
  • claim type: node or identity
  • claim about a node or about an identity

The third thing would be useful if you want to look up the most recent claim specified for a given NodeId or a given third party provider identity.

When done like this, the function can get alot more complicated, because rather than iterating over all claims, they would instead iterate over a particular index.

This is where a "general" query engine would be good, and that's what SQL was good for. However with a lack of a query engine here, we have to instead build a fixed set of indices that can be used.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 17, 2022

The return type is also AsyncGenerator<[ClaimId, TokenClaim]>.

We can follow this pattern across other domains too. Although I realise we haven't really standardised on "collection" iteration and seeking.

Perhaps if custom indexing is required, one might move these into other functions since ultimately indexing in our key value system is going to be static.

getClaims()
getClaimsByX() // where X is an index
getClaimsByNodeId
getClaimsByIdentityId

Then this should translate to changes for getCerts and getTasks and whatever else.

@CMCDragonkai
Copy link
Member Author

Everything here except 7. is done as part of #481, and put into PR #446.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Dec 5, 2022

In order to solve 7. we need to do things:

  1. Refactor src/agent/service/nodesChainDataGet.ts to instead be src/agent/service/sigchainClaimsGet.ts and this should also receive pagination parameters. This includes seek, limit and order. The order should be a protobuf enum: https://developers.google.com/protocol-buffers/docs/proto3#enum
  2. The Sigchain.getClaims with the apporpriate pagination parameters should be tested in tests/sigchain/Sigchain.test.ts. Right now this isn't being tested. We need to test both ascending and descending order and ensure that whatever ClaimId is being seeked is also included in the output.
  3. Remove the tests/sigchain/Sigchain.old.test.ts.

@CMCDragonkai
Copy link
Member Author

Apparently we don't really use the seek/limit parameters that much. I think this is because this was a recent addition to the js-db. But we do use lte a few more places like TaskManager.

Points 2, and 3 should be done in #466.

It seems that other places would be suitable as well to incorporate pagination parameters.

@CMCDragonkai
Copy link
Member Author

For 2. make sure that the tests cover both getClaims and getSignedClaims.

@CMCDragonkai
Copy link
Member Author

ATM only the Sigchain have top level methods taking the pagination parameters:

    {
      order = 'asc',
      seek,
      limit
    }: {
      order?: 'asc' | 'desc';
      seek?: ClaimId;
      limit?: number;
    } = {},

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Dec 5, 2022

We will need a new epic to deploy pagination-based streaming to all the relevant domain collection structures, and bubbling that up to the service handlers and CLI commands. That epic can incorporate point 1.

@CMCDragonkai
Copy link
Member Author

We can hijack #237 for this general pagination deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

Successfully merging a pull request may close this issue.

3 participants