Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCAN and resources path discussion #2

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions ucan-discussion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
User "bucket" identifier selection for .storage products
========================================================

Background
----------

In a UCAN, both the iss and aud fields contain a “DID”, which is essentially the public key of a public/private key-pair. This means that each user who uses UCANs with nft.storage or web3.storage must have a public/private key pair that they use.

Separately to that, the plan for `.storage` products is to give each user their own “bucket” (similar in concept to a bucket in S3). Each user will then be able to delegate permissions to that bucket and its subfolders to other users/clients via UCANs.

This raises the question of what identifier should be used for each user’s “bucket”. That is the topic of this document.


Possible Options
----------------

### Option 1 - User ID from database
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bad idea. I think of our database as an index not the source of truth. Furthermore we introduce dependence on a state of the database.


One option is to use the primary key of the user from the database.

This works well because it’s
1. Short
2. Unique (at least for each service)
3. Permanent

The one disadvantage is:
* It results in having service specific data in the UCAN, which means it’s not unique across both web3.storage and nft.storage and it means it’s not reusable across different services


### Option 2 - User’s public key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right and deterministic approach that does not depend on some state (like our db).


Another option is to use the user’s public key (or the first public key which they use with the service).

This works well because it’s:
1. Unique globally (across both services)

Although it’s only unique across both services if the user uses the same key-pair for both services.

A minor downside of tying the “bucket” ID to the user’s key-pair is that we’re creating a possibly confusing situation where a cryptographic identifier is being used for something which is really just a file path. Conceptually this might just be slightly odd to the user.

There are two possible ways we could implement this:

#### Option 2.1 - not stored in DB

Use the user’s public key as the bucket name, but don’t store it in the DB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should store n:m relationships between actor did and bucket in the database table. That is how we can allow specific actor to claim access to a lost UCAN which we can regenerate as long as claim is valid according to this table.


This is pretty much un-viable because:
* If we don’t store it in the DB then we can’t make it unique to the user. So each time a user created a UCAN they could do so with a different public key, giving them (yet another) new bucket.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is intended behavior. There should be no 1:1 correspondence between user and places it can write to. It's an n:m correspondence, user may have access to multiple places and multiple users might have access to the shared places.

* We have no way of knowing which bucket belongs to which user; we only know which bucket belongs to which key. This creates a situation with several distinct downsides:
- a. If the user loses their private key, they lose access to their bucket; we can’t implement any form of recovery.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is accurate. Remember service gives user access to the specific namespace. Service can revoke that access and give out another token for the same namespace. Of course giving access to random actors to the specific namespaces would be undesired. However if user can credibly proof that they should have access to specific namespace(s) there is no good reason to deny issuing them another token. Analogy here could be if I lost a movie theater ticket, I might show up with purchase receipt or ID and ask theater to reprint a same ticket so I could be admitted.

In our case we can implement recovery either in our web UI allowing user to add another DID to the specific namespace (bucket). Once that occurs client could request a UCAN from us for the specific DID and namespace(s), which our service could grant. There could be alternative mechanisms to do this, but main point here is in order to recover service needs to verify user is who they claim to be, with that issuing another UCAN is fairly trivial.

You can also think about as an equivalent of user binding another keypair to an account, except retroactively.

- b. If the user’s private key is leaked, then they have no way of revoking it, so access to their bucket is permanently breached.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me preface with saying yes revocation is not easy with UCANs, but it is not impossible. Best line of defense here is to issue UCANs with an experity date that strikes balance between needing to rotate & amount of damage attacker can do in that time frame. Ultimately this balance is not ours to strike as users themself will be in better position to do so.

I think assumption here is that if attacker obtains a private key, it permanently gains access to the namespace (bucket) because it can re-issue new UCAN for it. This is inaccurate.

If attacker gets hold of a private key, it can request an new UCAN authorization from service for corresponding namespace bucket. However service is still in a position to deny such request, which it will once it becomes aware that key was compromised. Furthermore service can revoke all the UCANs it issued for the given DID when it becomes aware of the key been compromised and subsequently deny service to request with those UCANs and all the UCANs that were delegated from it.

Again this is not to say this is piece of cake, but it is not too different from compromised password. Service somehow needs to be made aware that has occurred, somehow verify the user (usually via emal link) and let them create a new password. Our workflow could be the same except instead of user typing new password, they would paste new public key / did.

- c. The user could potentially create “backup” keys by creating UCANs with unlimited expiry times which delegate full privileges to another key. This helps mitigate problem (a), but makes problem (b) more likely.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad idea. As alluded to earlier we should not even issue UCANs without expiry date as that is primary line of defense.

That said, it is accurate that user can (and probably should) delegate full privileges to another key. In fact I would argue that key should never leave the device, so for every device user should delegate a token from the one that is already has one. Alternatively new device could ask a token from service directly, granted it will have to prove it is under control of the same user. This can happen through a login flow from that device or possibly through emailed link or whatever.

I'm not sure I agree with statement that it makes (b) more likely. My understanding of consensus among security experts is that key reuse create wider attach surface and is less secure than other way round.

Either way even if one of the keys gets compromised, revocation should be possible and we should probably design it sooner than later.

- d. With no possibility of revocation or recovery, it conceptually breaks the idea that, like passwords, key-pairs can be rotated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds upon misconception that user is tied to specific did which maps to specific bucket. I think this is bad way to conceptualize user <-> did relationship as 1:1. It is 1:n. It's much better to think of relation as user device <-> did.

And while indeed 1st did ends up been a bucket name it has no greater implication than just been a convenient way
to uniquely identify bucket. We could just as well generate another unrelated keypair in the backend, throw away private key and give new user access to a bucket corresponding to a key we threw away as opposed to did user provided.

That is to say there is no real significance in the fact that bucket ID maps to users first DID it's just convenient that is all. If this is too confusing we could just use CID of the user did instead and not tell anyone about it.

Furthermore with #1 we're basically creating DID for each document which is it's own bucket so users will have many buckets they will just interlink them as it makes sense.


#### Option 2.2 - stored in DB

In this implementation (which is what’s currently implemented), when a user uploads the root DID the first time, we store the relationship between the user adding the root UCAN and the the DID.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this implementation (which is what’s currently implemented), when a user uploads the root DID the first time, we store the relationship between the user adding the root UCAN and the the DID.
In this implementation (which is what’s currently implemented), when a user uploads the first DID, we store the relationship between the user adding the root UCAN and the the DID.

I think implying parent child relationship is a wrong way to think about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure why do we want to store did -> UCAN. I think may want to store user_id -> bucket_id and did -> bucket_id, relationships. That way user can add any did to any bucket it has access to and we can generate authorization UCAN for a user to all the buckets it has access to.


This solves all the problems of option 2.1, or at least leaves open the possibility of solving them by adding revocation/recovery functionality. (But still shares the overall downside of option 2 mentioned at the top.)


### Option 3 - A UUID

Another option is to generate a UUID for each user to use as their bucket name.

The advantages of this are that it’s:
1. Short
2. Globally unique across services
3. Removes the link between key-pairs and the bucket name, allowing key-pairs to be rotated.

This UUID could be synced between web3.storage and nft.storage to allow each user to have a single bucket identifier which is global. But unlike with option 2, the user doesn’t sync this value for us.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced UUID is better than just using DID, because it can't be derived. That said, it is aligned with my overall sentiment that bucket id and users first did just happen to be the same there is nothing more to it.

I would still prefer something derived from first user DID e.g. it's CID or a multihash which can look different enough yet is derived as opposed to made up.

There is also some benefit, that I can't fully put my finger on in regards to mapping buckets to DIDs as they map to IPNS etc... But then again that all is possible if id is derived somehow as opposed to been just made up.

Copy link
Contributor

@gobengo gobengo Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to encourage preferring a something like a did:key-encoded (ed25519?) public key to use as the identifier. I think the only downside over UUIDs might be compute cost?



Discussion
----------

It seems that the one advantage of using the DID as the bucket name is essentially this: it gets the user to do the work of setting a common value between nft.storage and web3.storage for us. As far as I (Adam) can see (but I might be wrong!) there is no other advantage to this over the UUID solution, it just gives us that syncing for free. The trade-off is the blurring of parameters and potential confusion that it may cause.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does user gets to even see bucket id ? If no then surface for confusion is fairly limited and basically scoped to engineering team. In which case probably can be addressed by documenting the facts.

If user does get to see the bucket ID it might be worth pondering what such a confusion could lead to and if it would make sense to disguise that DID (e.g. under CID) to remove potential confusion. That way benefits of DID are kept and tradeoffs are averted.


Given that we already have the https://api.nft.storage/user/did endpoint to allow the user to specify their DID. Would it make sense to allow them to specify their bucket name in a similar way? This would allow a conceptual separation between buckets and authentication keys while still allowing them to have the same bucket name across all storage services.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it ok to allow specifying bucket name as long as user can prove control of that namespace in non centralized way. Which is another way to say sure they can tell us another DID and sign challenge with private key to prove they own it.

It might be interesting to explore claiming DNS names, but probably should be out of scope for now and there is also a question of what happens when name gets a new owner.



### Revocation / Recovery

If a user loses their private key, then we want to provide a way for them to still control their bucket. This could be done in a couple of ways:
1. Allow them to “rename” (i.e. move) their bucket.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a bad idea to me. Bucket name should not matter.

- If so, we need to handle the mapping between old and new from a path perspective.
2. Allow them to change their authorised keys.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should absolutely allow key rotation and should track which dids user has and which dids have access to which buckets.

- Control a list of keys in their account which are allowed to sign UCANs?
- Revoke UCANs which have already been issued? Would this be done automatically if the keys which signed the UCANs have been revoked?

Revocation/recovery probably needs some more thought.


Scenarios to consider
---------------------

### Recovery scenario

This scenario applies if we are using the user’s public key/DID as their bucket name and we’re implementing an ability for a user to recover/reallocate their bucket if they lose their private key.

1. User A uploads their public key; we don’t validate that they own it when registering (POST user/did).
2. User B uploads User A’s public key as their own; again, we don’t validate that we own it when registering.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's we here in "that we own it" ?

3. User B then tells us that they’ve lost their private key, so they ask for their bucket to be mapped to a new public key and they provide us with a new public key.
4. We transfer ownership of the bucket belonging to User B’s public key to the new key that they provided. We’ve now accidentally transferred control of User A’s bucket to User B.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that uploading a key here implies that user is given access to corresponding bucket, which it should not. User should be able to grant access to the bucket to specific key holder, that is user could add a key to specific namespace they control, if they gave it to a key they have to private key of they shared access to their stuff to some other keyholder as opposed to obtained access to some bucket they had no prior access to.


Possible solutions:
1. Make the `did` column on the `user` table unique (which it already is).
- This prevents step 2 in the scenario. Does this also solve the problem across the different storage services as well? **What if I take the public key that someone else has used on nft.storage and register it as mine on web3.storage? The key is public, so this is perfectly possible.**
2. When we first store a user’s public key (DID) against their account, we validate that they actually own it.
- This requires a bit more work but might be a more robust solution.


### Scenario - Multiple buckets
#### Problem (TL;DR)
With the implementation described in this document we're tying `UserA` to a `didX`, and currently we allow for 1 root bucket.
We're strongly coupling `UserA` to `didX`(bucket).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a primary issue



Conceptually, `userA` (through a did, keypair) is at the moment accountable for "ticket printing" via UCANs.
Which I suspect might be a problem when we scale to a more complex architecure: multiple buckets, payments, etc.


## Scenario description

In this scenario we want to have multiple projects, ie. `myApp1`, `myApp2` projects. With the current architecture we could achive this by having `UserA` create a `didX/myApp1` and build a UCAN like `{with: 'storage://didX/myApp1', can: "*" }`. While this works it still keeps a strong relationship between those buckets and the original user.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App should just get it's own bucket.


If we think about it, assume `userA` has created a UCAN than grants `can: "*"` to `userB` for their entire bucket. Pratically, `userB` has the same level of auth as the original user, but from our DB perspective we're treating them "differently". There's no relationship between that userB and that bucket. At the moment the DID is column on the `User` table, so we are not linking to it.


I wonder if we should be thinking about Buckets as their own entities, that are accountable for signing UCANs (printing the tickets), and that ability can be delegated to multiple `did`s... But there's no hard 1:1 link between a single User and a single bucket. This would then leave open the possibility for things such as:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly this!

* One user having more than one bucket.
* A user being able to delete a bucket and start again through the web UI.
* Multiple users sharing _equal_ access to a bucket.
* Having "organisations" where multiple users share _equal_ top-level access to a bucket.

While these things (or similar setups) are theoretically possible by getting users to manage things through UCANs, we should probably ask whether that is the most convenient thing for the users.


I still struggle to envision how much duplication between ucan and service data structure is required to have this model working in practice. I suspect that the only way to undertand that is to start building an mvp


#### Questions
1. Who creates the keypair for the bucket in the first place? The user generating the bucket in the first place? The service?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User does

2. I feel a relationship between the bucket creator should exist, but I reckon it should be an external key to user table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. User generates bucket with a new keypair, if they want to give full access to it to the user they just grant it because they have keys to do so.

3. While UCAN would work for auth, I suspect the service (ie to hadle the UI), should still store information about delegation etc? Or am I missing something?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right approach is following

  1. Bucket is created from keypair.
  2. Bucket creator right away grants access to the user that created it (in general case it's the same keypair).

If bucket keypair is lost or compromised, user can recover by sharing access (it was granted) through (2). If user keypair is compromised , user can recover by proving to service they are who they claim to be and regaining full access just with different keys.

I also think we should track path -> [did, expiryTime, startTime] relations as opposed to bucket <-> user relations, because access to be shared across buckets to specific DIDs and it does not matter who created the bucket in first place. (I thinking of this as one giant bucket of serviceDID is more accurate, and given service controls keypair it can share access to paths under it to arbitrary dids as it's pleased)

Through this viewpoint user can only change underlying keypair just like user can change password. And service can reissue all the UCANs that were granted to the fromer did to a new one simply by going over them and creating new ones to a new aud.

I hope this makes more sense.

4. Ultimately, do we want to build the thing which is the simplest to build (current architecture), or is there an architecture that would create a better experience for the users, and should we build that?