Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After shard is enabled, the size of the copied file is inconsistent with the original file #4401

Open
jifengzhou opened this issue Aug 14, 2024 · 2 comments

Comments

@jifengzhou
Copy link
Contributor

After shard is enabled, the size of the copied file is inconsistent with the original file.

The volume configuration is as follows:
Volume Name: data
Type: Replicate
Volume ID: 02c625c8-a097-46fd-b913-76a53f286ff7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/export/heketi/node_d5071/device_65fa5/data_953e3
Brick2: node3:/export/heketi/node_9d13b/device_dfd0a/data_770ff
Brick3: node2:/export/heketi/node_016de/device_56b70/data_762bc
Options Reconfigured:
performance.write-behind: on
diagnostics.brick-log-level: INFO
diagnostics.client-log-level: INFO
features.shard: on
features.shard-block-size: 1024MB
user.heketi.id: 85b4bccd7ffd0c6d97658cb5badbe3ae
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
client.event-threads: 1

Mount volume data on node1:
mount -t glusterfs node1:/data /mnt

Generate a data6.img file in the /root path:
if=/dev/zero of=/root/data6.img bs=128k count=11

View its md5 value and file size:
[root@node1 ~]# md5sum /root/data6.img
2aabc019f6b5d881028999f055f5ff14 /root/data6.img
[root@node1 ~]# ls -l /root/data6.img
-rw-r--r-- 1 root root 1441792 8Mon 14 14:19 /root/data6.img

Copy the data6.img file to the /mnt folder:
cp /root/data6.img /mnt/

Check the md5 and file size of the data6.img file in the /mnt path and find that there is a certain probability that the md5 and file size are inconsistent with the original file:
[root@node1 ~]# md5sum /mnt/data6.img
b98f319ebcfe36f416c0b7d9281f85ff /mnt/data6.img
[root@node1 ~]# ls -l /mnt/data6.img
-rw-r--r-- 1 root root 2359296 8Mon 14 14:19 /mnt/data6.img

Through log and gdb tracking, it was found that during the file copying process, when shard_common_inode_write_do_cbk->shard_get_delta_size_from_inode_ctx calculated local->delta_size, the ctx->stat.ia_size value changed significantly from the expected value and became significantly smaller, resulting in the calculated local->delta_size being larger than the actual value to be increased. Further tracking revealed that during the file copying process, the ctx->refresh of the file inode was set to _gf_true with a certain probability, resulting in the triggering of shard_lookup_base_file_cbk->shard_inode_ctx_set when the next write was triggered. It was precisely because of this update that the ctx->stat.ia_size value changed, resulting in an error in the calculation of local->delta_size by shard_get_delta_size_from_inode_ctx.

Why is there a certain probability that ctx->refresh of the file inode is set to _gf_true during file writing? In our usage environment, it is likely related to our upper-layer application frequently reading the contents of the /mnt folder. According to the glusterfs shard_readdirp code, it will set ctx->refresh to _gf_true under certain conditions.

Another interesting thing is that I found that the problem does not seem to occur when performance.write-behind is turned off. I don’t know if there is any connection between the two.

@jifengzhou
Copy link
Contributor Author

Additional explanation of the problem:
Although the file size and md5 value of the data6.img file read in /mnt/ are inconsistent with /root/data6.img, brick: node1:/export/heketi/node_d5071/device_65fa5/data_953e3/data6.img is the same as /root/data6.img. The reason for the inconsistency between /mnt/data6.img and /root/data6.img is the unreasonable update of the extended attribute trusted.glusterfs.shard.file-size. The current question is how to solve the unreasonable update of trusted.glusterfs.shard.file-size

@jifengzhou
Copy link
Contributor Author

After verification, the problem can be solved by disabling shard_inode_ctx_invalidate during readdirp. Looking at the historical commit records, I saw that shard_inode_ctx_invalidate was added in https://review.gluster.org/#/c/glusterfs/+/12400/. It is to solve a problem related to geo-rep. Geo-rep is not used in our project. Will the solution of "disabling shard_inode_ctx_invalidate during readdirp" bring any new problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant