Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with compression and multiprocessing in pmincaverage #18

Open
mcvaneede opened this issue Oct 8, 2015 · 3 comments
Open

Deal with compression and multiprocessing in pmincaverage #18

mcvaneede opened this issue Oct 8, 2015 · 3 comments

Comments

@mcvaneede
Copy link
Member

In commit ab9d267, multiprocessing in pmincaverage was turned off, because it sometimes crashes on compressed files with error messages along the following lines:

SLICE: 0
SLICE: 10
SLICE: 20
SLICE: 30
SLICE: 40
SLICE: 50
SLICE: 60
SLICE: 70
SLICE: 80
SLICE: 90
SLICE: 100
SLICE: 110
SLICE: 120
SLICE: 130
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0:
#000: H5Dio.c line 174 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: H5Dio.c line 449 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: H5Dchunk.c line 1735 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: H5Dchunk.c line 2766 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: H5Z.c line 1120 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#5: H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object
/projects/mice/share/arch/linux-3_2_0-36-generic-x86_64-eglibc-2_15/src/minc-toolkit/libminc/libsrc2/hyper.c:794 (from MINC2): HDF5 function H5Dread failed
HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0:
#000: H5Dio.c line 174 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: H5Dio.c line 449 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: H5Dchunk.c line 1735 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: H5Dchunk.c line 2766 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: H5Z.c line 1120 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#5: H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object
/projects/mice/share/arch/linux-3_2_0-36-generic-x86_64-eglibc-2_15/src/minc-toolkit/libminc/libsrc2/hyper.c:794 (from MINC2): HDF5 function H5Dread failed
Traceback (most recent call last):
File "/axiom2/projects/software/arch/linux-precise/bin/pmincaverage", line 5, in
pkg_resources.run_script('python-stuffs==0.1.10', 'pmincaverage')
File "/axiom2/projects/software/arch/linux-precise/python/distribute-0.6.28-py2.7.egg/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/axiom2/projects/software/arch/linux-precise/python/distribute-0.6.28-py2.7.egg/pkg_resources.py", line 1239, in run_script
execfile(script_filename, namespace, namespace)
File "/axiom2/projects/software/arch/linux-3_2_0-36-generic-x86_64-eglibc-2_15/python/python_stuffs-0.1.10-py2.7-linux-x86_64.egg/EGG-INFO/scripts/pmincaverage", line 103, in
t.get() # seems to be necessary as a sort of waitpid call ...
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
pyminc.volumes.volumes.mincException

If we want to turn it back on, we need to get to the bottom of why this is happening

@bcdarwin
Copy link
Member

bcdarwin commented Oct 9, 2015

Perhaps the libminc functionality underlying getHyperslab is not properly thread safe? Haven't tested, but that seems a bit surprising ...

It seems to be too late to be the problem, but I'm also suspicious of this. We only wait for one slice (the one for the last file) to be added to the queue before we start taking slices from the queue to do averaging. If we were to replace the for loop with a map_async (since we were previously loading - hopefully - everything into memory in the queue anyway) or imap, then when get has returned, all results will be available.

@bcdarwin
Copy link
Member

bcdarwin commented Oct 9, 2015

Our HDF5 builds (MICe/SciNet) aren't thread-safe (see here) - I suspect that's the problem.

Interestingly, this problem persists even if you get rid of the queue and do everything 'synchronously' using a process pool's apply method with a pool of >1 processes. Perhaps this isn't a "real" race condition but merely has to do with multiple threads making libminc/HDF5 calls? I'm not sure whether the issue is at the pyminc/libminc/hdf5 level.

My other suspicions still stand, and I'd suggest something like this (can't use multiprocessing's map due to pickling restrictions - unsure why apply_async is different), which doesn't introduce any additional nondeterminism:

def getslice(volhandle, slice, nslices):
    # ...
    return (volhandle, t)

# ...

        futures = [p.apply_async(getslice, (j, i, nslices)) for j in range(nfiles)]
        results = [f.get() for f in futures]

        for ix, t in results:
            # ...

@gdevenyi
Copy link

Just got bit by this. Fixed it by moving pmincaverage and re-creating as a wrapper for mincaverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants