Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incosistent hashing #674

Merged
merged 7 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 4 additions & 28 deletions aea/helpers/ipfs/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,12 @@
import hashlib
import io
import os
import re
from pathlib import Path
from typing import Any, Dict, Generator, Optional, Sized, Tuple, cast

import base58

from aea.helpers.cid import to_v1
from aea.helpers.io import open_file
from aea.helpers.ipfs.utils import _protobuf_python_implementation


Expand All @@ -43,35 +41,13 @@
from aea.helpers.ipfs.pb.merkledag_pb2 import PBNode # type: ignore


def _dos2unix(file_content: bytes) -> bytes:
"""
Replace occurrences of Windows line terminator CR/LF with only LF.

:param file_content: the content of the file.
:return: the same content but with the line terminator
"""
return re.sub(b"\r\n", b"\n", file_content, flags=re.M)
Comment on lines -46 to -53
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic which caused the inconsistent hash issues when locking and publishing packages, not sure why this was here in the first place but removing this logic fixes the issue and does not cause any new issue AFAIC so can be removed safely



def _is_text(file_path: str) -> bool:
"""Check if a file can be read as text or not."""
try:
with open_file(file_path, "r") as f:
f.read()
return True
except UnicodeDecodeError:
return False


def _read(file_path: str) -> bytes:
"""Read a file, replacing Windows line endings if it is a text file."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade i dont see any fix for windows line endings anymore, file is opened in binary mode, so python will preserve all chars.

probably doc string is needed to be updated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed fb89dfe

is_text = _is_text(file_path)
with open(file_path, "rb") as file:
file_b = file.read()
if is_text:
file_b = _dos2unix(file_b)

return file_b
data = file.read()
if len(data) == 0:
raise ValueError(f"File cannot be empty: {file_path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade why can not be empty? looks like empty file is a common case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPFS daemon treats empty files differently, if you push an empty file to the IPFS node it'll produce different hash compared to using the IPFSHashTool, now the reason behind this unclear I tried looking at the IPFS docs but could not find anything related. So for now we just raise an error when a user tries to hash an empty file so they don't run into these mismatching hashes issue when publishing the packages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade ok, let stay as is.
but i see a possible source of problems, if empty file will be needed for some reason

return data


def chunks(data: Sized, size: int) -> Generator:
Expand Down
14 changes: 0 additions & 14 deletions docs/api/crypto/helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,17 +172,3 @@ def hex_to_bytes_for_key(data: str) -> bytes

Convert hex string to bytes with error handling.

<a id="aea.crypto.helpers.generate_multiple_keys"></a>

#### generate`_`multiple`_`keys

```python
def generate_multiple_keys(n: int,
type_: str,
password: Optional[str] = None,
extra_entropy: Union[str, bytes, int] = "",
file: Optional[str] = None) -> None
```

Generate n key pairs.

13 changes: 1 addition & 12 deletions tests/test_helpers/test_ipfs/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,23 +22,12 @@
import tempfile
from pathlib import Path
from tempfile import TemporaryDirectory
from unittest.mock import patch

import pytest
from aea_cli_ipfs.ipfs_utils import IPFSTool # type: ignore

from aea.helpers.cid import to_v1
from aea.helpers.ipfs.base import IPFSHashOnly, _is_text


def test_is_text_negative():
"""Test the helper method 'is_text' negative case."""
# https://gehrcke.de/2015/12/how-to-raise-unicodedecodeerror-in-python-3/
with patch(
"aea.helpers.ipfs.base.open_file",
side_effect=UnicodeDecodeError("foo", b"bytes", 1, 2, "Fake reason"),
):
assert not _is_text("path")
from aea.helpers.ipfs.base import IPFSHashOnly


def test_hash_for_big_file():
Expand Down
Loading