Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file upload issue to azure storage account. #482

Open
mukulthakur3062 opened this issue Jul 4, 2024 · 0 comments
Open

Large file upload issue to azure storage account. #482

mukulthakur3062 opened this issue Jul 4, 2024 · 0 comments

Comments

@mukulthakur3062
Copy link

I am facing issue while uploading file size more than 70 MB. If the file size is less than 50 MB, it gets uploaded always.

But when is larger than that, it fails more than 50% of times.

I am trying below snippet

from azure.identity import ClientSecretCredential
from azure.storage.blob import BlobServiceClient
import os

client_secret = ''
container_name = ''
account_url = 'https://sharedstorage.blob.core.windows.net/'
tenant_id = ''
client_id = ''

credential = ClientSecretCredential(tenant_id, client_id, client_secret)
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
container_client = blob_service_client.get_container_client(container_name)

filename = "/work/out/person/Group.A_009.zip"
with open(filename, "rb") as fl:
    data = fl.read()
    container_client.upload_blob(name=os.path.basename(filename), data=data, overwrite=True)

Logs

# When the file size is 106MB
>>> filename = "/work/out/person/Group.A_009.zip"
>>> 
>>> with open(filename, "rb") as fl:
...     data = fl.read()
...     container_client.upload_blob(name=os.path.basename(filename), data=data, overwrite=True)
... 
Traceback (most recent call last):
  File "<console>", line 3, in <module>
  File "/usr/local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_container_client.py", line 1125, in upload_blob
    blob.upload_blob(
  File "/usr/local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_blob_client.py", line 775, in upload_blob
    return upload_block_blob(**options)
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", line 178, in upload_block_blob
    return client.commit_block_list(
  File "/usr/local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 1555, in commit_block_list
    _request = build_commit_block_list_request(
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 599, in build_commit_block_list_request
    return HttpRequest(method="PUT", url=_url, params=_params, headers=_headers, content=content, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/azure/core/rest/_rest_py3.py", line 114, in __init__
    default_headers = self._set_body(
  File "/usr/local/lib/python3.8/site-packages/azure/core/rest/_rest_py3.py", line 150, in _set_body
    default_headers, self._data = set_content_body(content)
  File "/usr/local/lib/python3.8/site-packages/azure/core/rest/_helpers.py", line 148, in set_content_body
    raise TypeError(
TypeError: Unexpected type for 'content': '<class 'xml.etree.ElementTree.Element'>'. We expect 'content' to either be str, bytes, a open file-like object or an iterable/asynciterable.

# When the file size is around 50MB
>>> file_path = filename = '/work/out/person/Sing.A_3.zip'
>>> with open(filename, "rb") as fl:
...     data = fl.read()
...     container_client.upload_blob(name=os.path.basename(filename), data=data, overwrite=True)
... 
<azure.storage.blob._blob_client.BlobClient object at 0x70135e1466d0>

Extra(if it helps)

Whenever I am facing issue with upload of file, larger than 50MB, below error is consistent at that time.

>>> blobs = container_client.list_blobs()
>>> for blob in blobs:
...     print(blob.name)
... 
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/azure/core/paging.py", line 123, in __next__
    return next(self._page_iterator)
  File "/usr/local/lib/python3.8/site-packages/azure/core/paging.py", line 83, in __next__
    self.continuation_token, self._current_page = self._extract_data(self._response)
  File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_list_blobs_helper.py", line 109, in _extract_data_cb
    self.current_page = [self._build_item(item) for item in self._response.segment.blob_items]
AttributeError: 'NoneType' object has no attribute 'blob_items'

Same issue I face when try to use adlfs library, adding snippet of that too.

@dataclass
class StorageSource:
    url: str
    storage_config: Optional[Dict] = field(
        default_factory=lambda: {}
    )
    def get_file_system(self) -> AbstractFileSystem:
    	import adlfs
    	fs = adlfs.AzureBlobFileSystem(**self.storage_config)
        return fs


class StorageProxy:
    def __init__(
        self, source: StorageSource, file_system: Optional[AbstractFileSystem] = None
    ):
        self.source = source
        self._fs = file_system

    @property
    def fs(self) -> AbstractFileSystem:
        if self._fs is None:
            self._fs = self.source.get_file_system()
        return self._fs

    def upload_file(self, file_path: str, destination_path: str):
        self.fs.put_file(file_path, destination_path)

class FileSource:
	def __init__(self, **kwargs):
		self.config_uri: str = kwargs.get("config_uri")
		self.config_storage_creds: Dict = kwargs.get("config_storage_creds")

	def get_storage_client(self):
        source = StorageSource(
            url=self.config_uri, storage_config=self.config_storage_creds
        )
        storage_client = StorageProxy(source)
        return storage_client

def push_file_to_storage_account(file_path: str):
	storage_config = {
        "tenant_id": storage_tenant_id,
        "client_id": storage_client_id,
        "client_secret": storage_client_secret,
        "account_name": storage_account_name,
    }
    url = f"az://{storage_container_name}{file_path}"
    fs = FileSource(config_uri=url, config_storage_creds=storage_config)
    storage_client = fs.get_storage_client()
    storage_client.upload_file(file_path, url)

Note: azcopy command always works through the terminal with the large file size.

adlfs==2024.4.1
fsspec==2024.3.1
azure-storage-blob==12.20.0

Any help will be highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant