Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

OSX may break replay #70

Open
2 of 3 tasks
ChrisKeefe opened this issue Apr 6, 2022 · 1 comment
Open
2 of 3 tasks

OSX may break replay #70

ChrisKeefe opened this issue Apr 6, 2022 · 1 comment
Labels
good first issue Good for newcomers

Comments

@ChrisKeefe
Copy link
Collaborator

ChrisKeefe commented Apr 6, 2022

When replaying an extracted zip archive of files that was compressed on OSX, parsing fails because the included __MACOSX directory contains non-zipfiles named ._something.qz*:
image

We need to:

  • confirm that this only impacts compressed and then decompressed ziparchives
  • check whether this behavior also occurs on MacOS - details below
  • fix, possibly by ignoring hidden files, ignoring files within __MACOSX in the fp, removing __MACOSX, or catching BadZipFile("File is not a zip file") errors and looking more closely at them.
@ChrisKeefe
Copy link
Collaborator Author

ChrisKeefe commented May 4, 2022

@cherman2 helped diagnose this issue today. Thank you!

  • Provenance Replay assumes that any file suffixed .qza or .qzv is a QIIME 2 Archive
  • the default zip utility used by MacOS finder (Archive Utility') adds a __MACOSXdirectory when it zips things. Inside this directory are a collection of files named._.qza` that are not QIIME 2 Archives.
  • the zip command available from the MacOS terminal does not appear to do this
  • when these problematic zip archives are unzipped on MacOS (via terminal and presumably also finder), the resulting decompressed directory does not contain a __MACOSX directory`.
  • when unzipped on by nemo (and possibly other file browsers), the false .qzx files in the resulting __MACOSX directory break parsing with the confusing error message below. Unzipping in terminal with zip seems to drop the directory.

TLDR: this error only arises in cases where a mac user has zipped a collection of .qzx in their file browser and sent it to a non-MacOS machine where it was unzipped (probably again in the file browser) for parsing.

chris:~/src/provenance_lib (main)> replay reproducibility-supplement --i-in-fp testfiles-ahhh\ \(1\)/ --o-out-fp whatever.zip --p-recurse
Parsing testfiles-ahhh (1)/testfiles-ahhh/multiplexed-seqs.qza
Parsing testfiles-ahhh (1)/testfiles-ahhh/newdir/demux-paired-end-ahhhh.qza
Parsing testfiles-ahhh (1)/__MACOSX/testfiles-ahhh/._multiplexed-seqs.qza
Traceback (most recent call last):
  File "/home/chris/miniconda3/envs/q2-22.2/bin/replay", line 33, in <module>
    sys.exit(load_entry_point('provenance-lib', 'console_scripts', 'replay')())
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/chris/src/provenance_lib/provenance_lib/click_commands.py", line 184, in reproducibility_supplement
    write_reproducibility_supplement(
  File "/home/chris/src/provenance_lib/provenance_lib/replay.py", line 722, in write_reproducibility_supplement
    dag = ProvDAG(artifact_data=payload, validate_checksums=validate_checksums,
  File "/home/chris/src/provenance_lib/provenance_lib/parse.py", line 108, in __init__
    parser_results = parse_provenance(cfg, artifact_data)
  File "/home/chris/src/provenance_lib/provenance_lib/parse.py", line 443, in parse_provenance
    return parser.parse_prov(cfg, payload)
  File "/home/chris/src/provenance_lib/provenance_lib/parse.py", line 382, in parse_prov
    with zipfile.ZipFile(archive) as zf:
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/zipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "/home/chris/miniconda3/envs/q2-22.2/lib/python3.8/zipfile.py", line 1336, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

@ChrisKeefe ChrisKeefe added the good first issue Good for newcomers label May 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant