You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be an issue when 2 instances of this file system write to the same blob from 2 different processes in parallel, where one of the uploads fails with:
Azure error
File "/code/.venv/lib/python3.10/site-packages/our_package/connector/storage/blob.py", line 117, in _save
with self._fs.open(
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1963, in __exit__
self.close()
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 1908, in close
super().close()
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1930, in close
self.flush(force=True)
File "/code/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1801, in flush
if self._upload_chunk(final=force) is not False:
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/code/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/code/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 2068, in _async_upload_chunk
await bc.commit_block_list(
File "/code/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 1861, in commit_block_list
process_storage_error(error)
File "/code/.venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
azure.core.exceptions.HttpResponseError: The specified block list is invalid.
RequestId:<request_id>
Time:2024-02-13T12:15:05.1957595Z
ErrorCode:InvalidBlockList
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidBlockList</Code><Message>The specified block list is invalid.
From our limited investigation, this seems to likely be caused by the way AzureBlobFile calculates the IDs of the uploaded blocks:
There seems to be an issue when 2 instances of this file system write to the same blob from 2 different processes in parallel, where one of the uploads fails with:
Azure error
From our limited investigation, this seems to likely be caused by the way
AzureBlobFile
calculates the IDs of the uploaded blocks:adlfs/adlfs/spec.py
Lines 2102 to 2103 in 576fb7a
Could this be changed to a hash of the content or something similar, which would correspond to the actual contents of the uploaded block?
The text was updated successfully, but these errors were encountered: