Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup BTRFS_INODE_NOCOMPRESS usage #26

Open
3 tasks
josefbacik opened this issue Aug 10, 2021 · 6 comments
Open
3 tasks

Cleanup BTRFS_INODE_NOCOMPRESS usage #26

josefbacik opened this issue Aug 10, 2021 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@josefbacik
Copy link
Contributor

We have a way to disable compression per files in btrfs, but there's a weird interaction with -o compress-force. You can set no compression with chattr per file and per directory, but we specifically ignore this setting if you are mounted with -o compress-force. This makes sense in general, because we not only use this flag for chattr, but also by the automatic heuristic code to mark an inode as not being a good candidate for compression.

This muddies the water a bit, because for the case of chattr we likely really want to disable compression in all cases, even in -o compress-force, but we don't want to do it for things marked with BTRFS_INODE_NOCOMPRESS because it could have been done with heuristics and thus not be completely correct.

We need to pick either

  1. Make BTRFS_INODE_NOCOMPRESS the ultimate authority on wether or not to compress a file. This would mean we need to add another flag for the inode if the heuristic determines that the file is not a good candidate for compression. One way to do this would be to simply use a runtime flag instead of setting BTRFS_INODE_NOCOMPRESS on the file, that way we only skip the compression while we have the inode in memory. This would be my preferred method of solving the problem.
  2. Make another flag to indicate the inode shouldn't be compressed, and use BTRFS_INODE_NOCOMPRESS in its current state. The other flag would probably need to be triggered with a btrfs property, and would require a lot more work.

My preference is for 1, which would mean you need to

  • Make sure that chattr -c turns off compression if it's currently set, ie chattr +c file; chattr -c file; simply goes to the default, whereas chattr -c file; gets BTRFS_INODE_NOCOMPRESS set. This may not actually be what we want to happen, and if that's the case we'll still need a btrfs property to flag BTRFS_INODE_NOCOMPRESS. This is the biggest part of this whole thing, you're going to need to solicit feedback from everybody on wether or not this is the behavior we want. The easiest way to do this would be to write a patch to change the behavior and see who complains.
  • Stop setting BTRFS_INODE_NOCOMPRESS if the data we tried to write was uncompressable. We could replace this with a new ondisk flag, but that's a disk format change so I think an in memory flag would be a better choice here.
  • Honor BTRFS_INODE_NOCOMPRESS above all else. Essentially make inode_can_compress() return false if BTRFS_INODE_NOCOMPRESS is set.
@josefbacik josefbacik added enhancement New feature or request good first issue Good for newcomers labels Aug 10, 2021
@kdave
Copy link
Member

kdave commented Aug 10, 2021

I have some prototypes to improve the heuristics, with tracking of the compression quality. The goal is to never need to use compress-force for mount and just compress + heuristics that will set the nocompress flag after more evaluation that happens now (and is know to be weak).

I'm kind of amused about the label 'good first issue', it takes a lot of thinking and evaluation to do it right, and there are always more ways to do it.

@josefbacik
Copy link
Contributor Author

Yeah 'good first issue' may have been overstating it, but mechanically it's very simple, just need to get consensus on what the behavior is we expect, which is the harder part. But if we break it down to behaviors it should be straightforward, we want

  • A way to disable compression completely, no matter what.
  • A way to disable compression for files it doesn't make sense for.
  • A defined way to disable compression.

This opens up a few questions that we need answered

  • Do we need to store the fact that a file is uncompressable on disk? I think no, consider the classic case of writing an image to disk. We're not going to open and write more to that file later on, we're going to write it once and be done. This means we can accomplish the same behavior with an in memory flag.
  • Do we let chattr disable compression completely, or simply toggle compression on and off? If we allow chattr to disable compression completely then we need to fix the code to only set NOCOMPRESS if COMPRESS wasn't already set, or if we don't want chattr to do that then we need to fix it to not set NOCOMPRESS at all and add a btrfs property to do the right thing.

Once we get those questions answered then the code is relatively simple. My opinion is

  1. An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.
  2. NOCOMPRESS means don't even attempt compression, regardless of the mount options.
  3. chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.
  4. We add a btrfs property that explictly sets NOCOMPRESS on the file.

@Zygo
Copy link

Zygo commented Aug 12, 2021

  1. chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.

chattr (e2fsprogs) only got support for FS_NOCOMP_FL six months ago (better late than never, it was added to ext2 in 1998). You want to remove that?

I'd prefer leaving the chattr bits as they are, not because they're an amazing model of user experience, but because they've had whatever meaning they have for decades now, changing them will break someone, and they aren't onerous to implement as they are. chattr supported the 'm' bit only recently, but people could have rolled their own chattr and used the bit on btrfs since 2011.

  1. An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.

I am in favor of that, though. The user-writable chattr bits are the bailiwick of the admin, and code shouldn't be flipping them unless explicitly asked to.

We have an xattr btrfs.compression which takes a string argument, and is internally converted to bits. Can it be set to none to mean "ignore mount option, don't compress this"? Deleting the attribute would set it to default, "do what the mount option says."

There's a lot of flexibility in the xattr, we can also specify compression level, whether to bypass the heuristics or force compression, or (insert future feature here). There are only two usable chattr bits, so they're not very expressive, and we've painted ourselves into a corner on what both of them mean.

@zhanglikernel
Copy link

Hello, the label of this issue is good first issue, so I want to apply a patch to it. The following are my thoughts.

chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.

  1. The fact is that chattr -c only clears the BTRFS_INODE_COMPRESS flag, it will not set BTRFS_INODE_NOCOMPRESS, chattr +m will set BTRFS_INODE_NOCOMPRESS as the inode flag, and the +m and +c flags cannot be set in the same inode at the same time.

  2. The current situation is that if the file system is mounted with the compression-force option, it will compress the file anyway (unless the compressed size is larger than the original, but every time we write to the file, it will check whether it is It can be compressed. If there is any possibility that the file can be compressed, it will compress the data), but if you use the option compress instead of compress-force to mount the file system, chattr +m is valid and it will disable the compression of the file. I think this is reasonable, because usually the administrator will do the mounting work, other users use the file system, if the administrator explicitly uses the compression force option to mount the file system, all users should abide by the rules, but if the administrator mounts Only the compression option, the user can choose not to compress.

  3. At present, if we use chattr +c set file to set compress, it will also add a compress xattr, but if we set chattr +m on the file, it will clear the compress attribute, it does not have xattr to indicate not to compress,

  4. I think it is correct to distinguish between the BTRFS_INODE_NOCOMPRESS and "this file sucks at compressing", so we need to introduce another sign to indicate "this file sucks at compressing"

  5. Now btrfs_inode→prop_compress only records which compression algorithm is used, not the compression level. I think we should also record the compression level, for example: prop_compress record algorithm in higher bit, and compression level lower bit.

  6. The current heuristic uses the Shannon entropy of the original file to determine whether the file should be compressed. It any possible the further  heuristics to tracking of the compression file Shannon entropy, I just wonder how to tracking of the compression quality.

The following is patch summary:

We add a btrfs property that explictly sets NOCOMPRESS on the file.

  1. Add btrfs xattr btrfs.compression value none corresponding to BTRFS_INODE_NOCOMPRESS, set by chattr +m or btrfs property set compress none, indicating that the file should not be compressed.

An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.

  1. Distinguish between BTRFS_INODE_NOCOMPRESS and "this file sucks at compressings" by adding a flag in the memory to indicate "the file is suck on compress"
  2. Expand prop_compress to include the compression level.

@josefbacik
Copy link
Contributor Author

This sounds reasonable to me. I imagine we'll discuss it more on list, but make sure to lay the mechanics out in your cover letter for the patches so reviewers know what the plan is. I look forward to seeing the patches.

@zhanglikernel
Copy link

Hi, I have pushed a patch about this issue,
[Related Patches]
https://www.spinics.net/lists/linux-btrfs/msg120300.html

It contains 5 parts and a patch summary description ([0/5]), but I forgot to set each part of the patch to in-reply-to="[0/5]" when sending the email, I should Clear this error? Resend. Or just keep it as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants