Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.9.0 fseek for zstandard file results in io.UnsupportedOperation: underlying stream is not seekable #153

Open
gcflymoto opened this issue Mar 19, 2024 · 7 comments

Comments

@gcflymoto
Copy link

gcflymoto commented Mar 19, 2024

Can be easily reproduced

/nfs/site/home/<user>/pyenv3.11.1/bin/python3
Python 3.11.1 (main, Dec  6 2023, 09:00:14) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xopen, os
>>> f = xopen.xopen("file.zst")
>>> f.seek(0, os.SEEK_END)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nfs/site/home/<user>/pyenv3.11.1/lib/python3.11/site-packages/xopen/__init__.py", line 476, in seek
    return self._file.seek(offset, whence)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
io.UnsupportedOperation: underlying stream is not seekable
@rhpvorderman
Copy link
Collaborator

Xopen never guarantees seekability. Xopen streams the zstd compressed data to the zstd application using pipes. Pipes are not seekable. Xopen also allows specifying stdin as input, which is also not seekable.

You can utilize threads=0 and make sure the zstandard package is installed. That way the file is always passed on to zstandard.open which should allow seeking.

If you are always opening zstandard files, it is almost always a better option to use the zstandard module itself. Unless you want the extra speed that xopen offers by piping it through the zstandard application, but in that case it is guaranteed not to be seekable. (There may be some exceptions on Windows IIRC).

@gcflymoto
Copy link
Author

gcflymoto commented Mar 19, 2024

Hi. I tried it with xopen.xopen("file.zst", threads=0) with zstandard package is already installed

Zstandard package is installed
zstandard-0.22.0.dist-info/ zstandard/

but it has the same error. Is there any way to trace whether zstandard package was used?

@marcelm
Copy link
Collaborator

marcelm commented Mar 19, 2024

You can check the .buffer.raw attribute:

>>> f = xopen.xopen("tests/file.txt.zst", threads=0)
>>> f.buffer.raw
<zstd.ZstdDecompressionReader object at 0x7f7cee2a0f30>

It is possible to use f.buffer.raw.seek(100), but you can not seek backwards and I strongly suspect that seeking forwards is implemented by just decompressing the stream until the desired position is reached, that is, it is very inefficient compared to how seeking in an uncompressed file would work. (This is the same for the other supported compression formats.)

Maybe we could try to more consistently provide a forward-only seek function. I’ve previously not thought that this is a good idea because seek() to me implies that it is efficient, but I see now that GzipFile, BZ2File, ZstdDecompressionReader and IGzipFile all support seek()ing, so maybe we should ensure that this works with files opened through xopen as well.

Is seeking forwards what you need?

@gcflymoto
Copy link
Author

gcflymoto commented Mar 19, 2024

Strangely I see this

print(f.buffer.raw)
AttributeError: 'zstd.ZstdDecompressionReader' object has no attribute 'raw'

Is seeking forwards what you need?

It's actually grabbing the end N number of lines like tail, but I can make it work with forward only seek

@marcelm
Copy link
Collaborator

marcelm commented Mar 19, 2024

Not sure what is going on, but it appears you need to use f.buffer.seek instead. But as Ruben said, you can also just use zstandard.open directly for now. This works for me:

python3 -c 'import zstandard; f=zstandard.open("tests/file.txt.zst"); f.seek(10)'

@gcflymoto
Copy link
Author

Thank you for the workaround

@marcelm
Copy link
Collaborator

marcelm commented Aug 5, 2024

Let me re-open this until we’ve decided whether this is something we should fix.

@marcelm marcelm reopened this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants