Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3fs.exists incorrectly returns False after calling glob #879

Open
bbtfr opened this issue May 29, 2024 · 0 comments
Open

s3fs.exists incorrectly returns False after calling glob #879

bbtfr opened this issue May 29, 2024 · 0 comments

Comments

@bbtfr
Copy link

bbtfr commented May 29, 2024

Sample code

import os

from s3fs import S3FileSystem


fs = S3FileSystem()

base_path = "s3://moonshot-train-data/test_data/test_s3fs/"

def join_path(*args):
    return os.path.join(base_path, *args)


def test_exists(path):
    print(path, fs.exists(os.path.join(path)))

fs.touch(join_path("a/b/c.txt"))

print("=== before glob ===")
test_exists(join_path("a/b/c.txt"))
test_exists(join_path("a/b/"))
test_exists(join_path("a/b"))
test_exists(join_path("a/"))
test_exists(join_path("a"))

list(fs.glob(join_path("**/*.txt")))
print("=== after glob ===")
test_exists(join_path("a/b/c.txt"))
test_exists(join_path("a/b/"))
test_exists(join_path("a/b"))
test_exists(join_path("a/"))
test_exists(join_path("a"))

fs.invalidate_cache()
print("=== invalidate_cache ===")
test_exists(join_path("a/b/c.txt"))
test_exists(join_path("a/b/"))
test_exists(join_path("a/b"))
test_exists(join_path("a/"))
test_exists(join_path("a"))

Got

=== before glob ===
s3://moonshot-train-data/test_data/test_s3fs/a/b/c.txt True
s3://moonshot-train-data/test_data/test_s3fs/a/b/ True
s3://moonshot-train-data/test_data/test_s3fs/a/b True
s3://moonshot-train-data/test_data/test_s3fs/a/ True
s3://moonshot-train-data/test_data/test_s3fs/a True
=== after glob ===
s3://moonshot-train-data/test_data/test_s3fs/a/b/c.txt True
s3://moonshot-train-data/test_data/test_s3fs/a/b/ True
s3://moonshot-train-data/test_data/test_s3fs/a/b True
s3://moonshot-train-data/test_data/test_s3fs/a/ False  # <-- Here
s3://moonshot-train-data/test_data/test_s3fs/a False  # <-- And here
=== invalidate_cache ===
s3://moonshot-train-data/test_data/test_s3fs/a/b/c.txt True
s3://moonshot-train-data/test_data/test_s3fs/a/b/ True
s3://moonshot-train-data/test_data/test_s3fs/a/b True
s3://moonshot-train-data/test_data/test_s3fs/a/ True
s3://moonshot-train-data/test_data/test_s3fs/a True

Something wrong with the DirCache, create S3FileSystem with use_listings_cache=True or call invalidate_cache() can workaround

Python==3.11.9
fsspec==2024.5.0
s3fs==2024.5.0
@bbtfr bbtfr changed the title Listings cache issue: s3fs.exists returns False after calling glob s3fs.exists returns False after calling glob May 29, 2024
@bbtfr bbtfr changed the title s3fs.exists returns False after calling glob s3fs.exists incorrectly returns False after calling glob May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant