-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds support for detached commits #3028
base: main
Are you sure you want to change the base?
Conversation
My original approach was much more complicated and used UUIDs as the version. However, if we keep the version as a u64 but borrow the most significant bit to flag detached vs. normal then we end up with much fewer changes and less overall complexity. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3028 +/- ##
==========================================
+ Coverage 78.19% 78.23% +0.03%
==========================================
Files 239 240 +1
Lines 76782 77227 +445
Branches 76782 77227 +445
==========================================
+ Hits 60043 60419 +376
- Misses 13669 13705 +36
- Partials 3070 3103 +33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable. Is there any special considerations we need to make for clean up?
// Detached versions should never show up first in a list operation which | ||
// means it needs to come lexicographically after all attached manifest | ||
// files and so we add the prefix `d`. There is no need to invert the | ||
// version number since detached versions are not part of the version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'm planning on tackling cleanup later. My thinking is that cleanup will be something along the lines of:
For the "detached versions are just temporary versions" case then For the "this is a secondary database and everything is detached" case then cleanup will be triggered by a cleanup of the primary database. After the cleanup of the primary database we will scan all remaining versions (in the primary database) and collect which secondary versions are still referenced. These will be passed in as |
A detached commit is a commit that is not part of the regular dataset lineage. It will never show up as the latest commit and is completely separate from the linear history of the dataset.
This can be useful for:
Closes #2889