Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incosistent hashing #674

Merged
merged 7 commits into from
Oct 4, 2023
Merged

Fix incosistent hashing #674

merged 7 commits into from
Oct 4, 2023

Conversation

angrybayblade
Copy link

@angrybayblade angrybayblade commented Oct 4, 2023

Proposed changes

This PR fixes the inconsistent hashing issues by

  • Not replacing CRLF with LF
  • Raising early if an empty file is found when hashing

Fixes

Fixes #647

Types of changes

What types of changes does your code introduce to agents-aea?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING doc
  • I am making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Lint and unit tests pass locally with my changes and CI passes too
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that code coverage does not decrease.
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Comment on lines -46 to -53
def _dos2unix(file_content: bytes) -> bytes:
"""
Replace occurrences of Windows line terminator CR/LF with only LF.

:param file_content: the content of the file.
:return: the same content but with the line terminator
"""
return re.sub(b"\r\n", b"\n", file_content, flags=re.M)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic which caused the inconsistent hash issues when locking and publishing packages, not sure why this was here in the first place but removing this logic fixes the issue and does not cause any new issue AFAIC so can be removed safely

Copy link
Collaborator

@solarw solarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some points are unclear for me

return file_b
data = file.read()
if len(data) == 0:
raise ValueError(f"File cannot be empty: {file_path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade why can not be empty? looks like empty file is a common case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPFS daemon treats empty files differently, if you push an empty file to the IPFS node it'll produce different hash compared to using the IPFSHashTool, now the reason behind this unclear I tried looking at the IPFS docs but could not find anything related. So for now we just raise an error when a user tries to hash an empty file so they don't run into these mismatching hashes issue when publishing the packages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade ok, let stay as is.
but i see a possible source of problems, if empty file will be needed for some reason

except UnicodeDecodeError:
return False


def _read(file_path: str) -> bytes:
"""Read a file, replacing Windows line endings if it is a text file."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade i dont see any fix for windows line endings anymore, file is opened in binary mode, so python will preserve all chars.

probably doc string is needed to be updated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed fb89dfe

solarw
solarw previously approved these changes Oct 4, 2023
Copy link
Collaborator

@solarw solarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@solarw solarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@angrybayblade angrybayblade merged commit 82b1aca into main Oct 4, 2023
36 checks passed
angrybayblade added a commit that referenced this pull request Oct 10, 2023
@DavidMinarsch DavidMinarsch deleted the fix/incosisten-hashing branch October 12, 2023 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent hashing
2 participants