Fix azure logging #6860

iulialexandra · 2024-06-03T15:00:07Z

Motivation for features / changes

When using the TensorboardLogger in Pytorch lightning to log to an Azure blob storage location, the following error is thrown:

ErrorCode:InvalidBlobType
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidBlobType</Code><Message>The blob type is invalid for this operation.

This is due to the assumption that if the filesystem supports append opperations, these will automatically be available for any file in the blob storage, see this code snippet from gfile.py:

if self.fs_supports_append:
    if not self.write_started:
        # write the first chunk to truncate file if it already exists
        self.fs.write(self.filename, file_content, self.binary_mode)
        self.write_started = True

    else:
        # append the later chunks
        self.fs.append(self.filename, file_content, self.binary_mode)

However, in Azure, if a file is created by opening it in write mode, it will not permit append operations later on.

Technical description of changes

Screenshots of UI changes (or N/A)

Detailed steps to verify changes work correctly (as executed by you)

Alternate designs / implementations considered (or N/A)

bmd3k

You said:

However, in Azure, if a file is created by opening it in write mode, it will not permit append operations later on.

Is there a case where Azure will permit operations later on? Is there some other mode we should open it in instead?

bmd3k · 2024-06-04T18:00:02Z

tensorboard/compat/tensorflow_stub/io/gfile.py

@@ -19,7 +19,7 @@
 TensorBoard.  This allows running TensorBoard without depending on
 TensorFlow for file operations.
 """
-
+from adlfs import AzureBlobFileSystem


We don't have adlfs libraries installed on development or ci machines so this is not going to build. You can see this in the failures for the build checks:

https://github.com/tensorflow/tensorboard/actions/runs/9352527415/job/25755552731?pr=6860

What would be a good way to check for this without importing AzureBlobFileSystem?

Hi, if a file is initially created by opening it in append mode, this file will allow append operations later on. However, as shown in the code snippet above, the file is always created by opening it in write mode:

# write the first chunk to truncate file if it already exists self.fs.write(self.filename, file_content, self.binary_mode)

One other option, besides checking what filesystem we are dealing with, is to always open the file in append mode initially, when the fs supports that. However, this means the contents of the file, if it exists, will not be cleared.

Another option, which I am not very fond of, is checking the fs_class's string representation for the substring "Azure". You can see this approach in the latest commit.

However, I believe the approach I suggested above would be better. I don't have all the context behind why you would to want clear the file though, and whether always creating the file in append mode would be appropriate. In my context, where I use lightning's TensorboardLogger, a unique experiment directory gets created at the beginning of the run, so the events file will always be in a unique location, unless we explicitly want to continue an experiment, in which case appending to an existing events file should not be a problem.

bmd3k · 2024-06-04T18:06:25Z

tensorboard/compat/tensorflow_stub/io/gfile.py

@@ -671,7 +671,10 @@ def __init__(self, filename, mode):
            )
        self.filename = compat.as_bytes(filename)
        self.fs = get_filesystem(self.filename)
-        self.fs_supports_append = hasattr(self.fs, "append")
+        if _get_fs_class(self.filename) == AzureBlobFileSystem:


It's a bit odd that we would call get_filesystem, which calls fsspec.get_filesystem_class transitively, and then we call fsspec.get_filesystem_class again.

Can we remove the duplication?

Perhaps get_filesystem could return a tuple, where first entry is the filesystem (like _FSSPEC_FILESYSTEM) and the second entry is the filesystem class (if any, like AzureBlobFileSystem).

I wanted to avoid that approach because it required a bit more refactoring, but I implemented it now, please check it out.

ioangatop · 2024-08-07T09:12:25Z

Hi is there any update? 🙏

iulialexandra added 2 commits June 3, 2024 16:48

fix azure logging

11dd817

restore brackets

e196e23

bmd3k self-requested a review June 4, 2024 17:58

bmd3k reviewed Jun 4, 2024

View reviewed changes

iulialexandra added 2 commits June 25, 2024 18:57

checking string for Azure

52a9585

fixed typos

a6d180f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix azure logging #6860

Fix azure logging #6860

iulialexandra commented Jun 3, 2024

bmd3k left a comment

bmd3k Jun 4, 2024

iulialexandra Jun 11, 2024

iulialexandra Jun 25, 2024

bmd3k Jun 4, 2024

iulialexandra Jun 25, 2024

ioangatop commented Aug 7, 2024

Fix azure logging #6860

Are you sure you want to change the base?

Fix azure logging #6860

Conversation

iulialexandra commented Jun 3, 2024

Motivation for features / changes

Technical description of changes

Screenshots of UI changes (or N/A)

Detailed steps to verify changes work correctly (as executed by you)

Alternate designs / implementations considered (or N/A)

bmd3k left a comment

Choose a reason for hiding this comment

bmd3k Jun 4, 2024

Choose a reason for hiding this comment

iulialexandra Jun 11, 2024

Choose a reason for hiding this comment

iulialexandra Jun 25, 2024

Choose a reason for hiding this comment

bmd3k Jun 4, 2024

Choose a reason for hiding this comment

iulialexandra Jun 25, 2024

Choose a reason for hiding this comment

ioangatop commented Aug 7, 2024