Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File History Time Machine Functionality #57

Open
glowingwire opened this issue Apr 28, 2024 · 6 comments
Open

File History Time Machine Functionality #57

glowingwire opened this issue Apr 28, 2024 · 6 comments

Comments

@glowingwire
Copy link

glowingwire commented Apr 28, 2024

My end use is:
right click on a file in dolphin, click versions, see a list of versions saved.

This has been described as a major/involved feature to implement by people I talked to.

I imagine it as an optional feature for a subvolume, probably used on /home by default.

I imagine the system would keep all the parts on every file write, and create a new description in a secondary table, or in the main table and then create a mask that hides the old versions from normal viewing of the files. so there would be many inodes, and some would be hidden. another table would list the inodes and associate them with the current file. It would be cool if fragments of files are compared to see if they need to be rewritten, so only the diff is written... this may be existing behavior, but then saving repeatedly won't use a lot of disk, and solid state drives won't be worn out.

then there would be some new interface that would allow user interfaces to see the older versions and pull up the file. I imagine all the metadata would be available to the program.

This is an important feature for mainstream use of Linux.

@kakra
Copy link

kakra commented Apr 29, 2024

If you create scheduled snapshots of said subvolume, you already have that functionality. But Dolphin misses an interface to easily browse those snapshots based on the folder or file you selected. I think SuSE already implements this with snapper and a distribution-specific Dolphin plugin which seems not to be available outside of SuSE.

Snapper can also create diffs of files and folders to see what changed, and revert changes.

That said, since btrfs already implements the base functions, you request should go to KDE. It's not btrfs' job to implement a Dolphin plugin for browsing snapshots.

@glowingwire
Copy link
Author

glowingwire commented Apr 30, 2024 via email

@glowingwire
Copy link
Author

glowingwire commented Apr 30, 2024 via email

@Zygo
Copy link

Zygo commented Apr 30, 2024

It's relatively straightforward to set up a script that waits for file modifications with inotify, triggered by CLOSE_WRITE,MOVED_TO,DELETE events, which commits files to git. Then you can browse file versions with any git repo browser or plugin. This scales well for individual project folders, and it stores modifications with delta compression which can be significantly more efficient than snapshots or reflinks, especially for file formats where the entire file is rewritten every time. It doesn't store file permission modifications, but you can get extensions for git that handle those. I have one of these running on almost every project I work on in the last two decades or so--even before I started using btrfs. A script like this seems to check all of the boxes plus a few more: separate permissions, automatic updates, full file revision history, no admin required, tools to generate diffs and various reports, delta compression.

This can be extended a little with some help from btrfs: instead of committing directly from the working directory, cp --reflink the files from the user's working tree to the git tree and commit them there. That gives atomicity for file updates. For really big projects, or projects involving multiple database files that all have to be modified together, the script could snapshot the entire project (or even the entire /home), commit the modified files, and delete the snapshot; however, this can be very heavy for small projects. There's a size below which the cp --reflink is better and a size above which btrfs sub snap is better.

The problem with doing this at the filesystem level is that it's difficult for a filesystem to tell when an update begins and ends, especially for random-access file formats. A user's file revision history might contain 20 partially updated files for every complete file written to the filesystem, simply because the application writes the file in 20 segments. With the inotifywait approach, the triggering event is CLOSE_WRITE which indicates that the file has been closed, and in many cases can be considered complete.

Another problem is that some applications update their files constantly as users work with them. I've found with my git inotify script that I have to provide guard times--intervals where the script waits for the file to stop being modified before saving a copy of the file. As a user, it's simply not productive to have every version of every file--5 or 10 versions per minute are plenty for editing at human speeds. If your use case is tracking a program as it makes modifications to a file one at a time, it might be better to make a fuse filesystem to do that logging explicitly.

With current btrfs, deduplication, reflinks, and snapshots all force data to be flushed out of the page cache to disk, so they are quite heavy writing workloads. btrfs isn't particularly efficient at storing large numbers of data file updates compared to DVCS tools like git which are specialized for this use case. Changing that while maintaining btrfs compatibility would be a major undertaking--it would likely be easier to start over with a new filesystem.

@glowingwire
Copy link
Author

glowingwire commented Apr 30, 2024

Amazing information. Thank you! I am attempting to take it all in. I don't want the user to have to think about this feature, and be relieved when they can go back and recover something that they accidentally damaged.

Academically, I'd like to be able to work with files that are being edited all the time, but I assumed programs did this in their temp directories which I would not backup this way,
I expect programs to not clobber files until they are told that it is ok to do that. Perhaps limiting my imagined feature to Documents and Desktop folders takes care of this, because the /home/user/ folder can have all kinds of other uses.

And yet there is nothing stopping a person from having their program append to logs in /home/user/Documents/tmp or write over parts of the middle of a file, or do other things like that. I have an expectation that I can open a document and get back to the way it was until I hit save-- just by closing the program I am using. I believe this could go beyond just an expectation, but be a demand that programs behave this way.

With all these silly trackpads with tap enabled, I have accidentally moved blocks of text and clobbered work so bad that undo didn't help. I could just close the program or say Save As Different Filename and reopen the original.

I may also be tainted by misunderstanding blob storage. I have heard that perkeep can deduplicate files that share common information, and just differ in a small area. I think they do this by indexing every block by it's hash, and allowing more than one entry to point to that block.

I was expecting a file to rewrite only a few blocks in the middle if it was only changing part of the file and keeping most of it the same, but I could see how a program could simply just re-output everything every time.

What I am expecting this feature to provide is v maybe 5 revisions, over maybe a week back.

I was not considering it to be a primary backup.

I was thinking that btfs basically writes a diff to a new block, and keeps the old block, but I think it is more like the replaced block gets written, and then the old block gets marked cleared, and the table then references the new block, and the file might be fragmented a bit, but that's ok, we're running on SSDs.

I wonder how Apple and Microsoft implement this feature.

Perhaps it makes sense to do a snapshot of the Documents folder every time a program closes the write.

@kakra
Copy link

kakra commented Apr 30, 2024

I wonder how Apple and Microsoft implement this feature.

At least Windows does scheduled filesystem snapshots if you refer to the "previous versions" feature. It may be possible that this can be filtered by sub directory tree (so it won't keep copies of non-related files). It's not per file, although presentation of the feature through the GUI may pretend that. It uses view filtering to only list snapshots when the files actually changed. The snapshots are stored in "System Volume Information" and I think, you can even mount them as a virtual drive or export to a VM image - at least for full VSS snapshots this is true. The shadow copy feature probably works a little more granular but uses most of the same infrastructure, and does auto-cleanup if free space becomes low. But it doesn't do that on each change to the file.

Apple's Time Machine works in a similar way but it writes backups of changes to a repository, so it probably works more like the Git idea of @Zygo. I don't know if classic MacOS file systems support snapshots. The later ones should do because they have copy on write features.

You could probably get something similar if you use or develop some fuse daemon that transparently mounts over the interesting folders. Maybe these could be a starting points:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants