Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Exception reading linked NWBFile with TimeSeriesReference in NWBFile.trials #1863

Closed
3 tasks done
miketrumpis opened this issue Mar 19, 2024 · 10 comments · Fixed by #1865
Closed
3 tasks done
Assignees
Labels
category: bug errors in the code or code behavior

Comments

@miketrumpis
Copy link

What happened?

Dataset linking is breaking after adding a TimeSeriesReference to the NWB trials TimeIntervals table.
The result is a failure to read the NWBFile.
It seems that the constructor in pynwb.io.epochs.TimeIntervalsMap is getting a LinkBuilder but isn't aware of how to handle it.

Without knowing any details, it appears that LinkBuilder.builder is really what this constructor is looking for when a link is present:

# this is the case for NWB schema v2.4 and earlier, where the timeseries column was a regular VectorData.
timeseries_builder = builder.get('timeseries')
# if we have a timeseries column and the type is VectorData instead of TimeSeriesReferenceVectorData

Seems pretty similar to #1223 but I'm reporting a new bug since this involves pynwb.io.epochs rather than pynwb.io.misc.

Steps to Reproduce

from datetime import datetime
from uuid import uuid4
import os
import numpy as np
from dateutil.tz import tzlocal

from pynwb import NWBFile, TimeSeries, NWBHDF5IO

# create the NWBFile
new_nwb = NWBFile(
    session_description="my first synthetic recording",  # required
    identifier=str(uuid4()),  # required
    session_start_time=datetime(2017, 4, 3, 11, tzinfo=tzlocal()),  # required
    experimenter="Baggins, Bilbo",  # optional
    lab="Bag End Laboratory",  # optional
    institution="University of Middle Earth at the Shire",  # optional
    experiment_description="I went on an adventure with thirteen dwarves to reclaim vast treasures.",  # optional
    session_id="LONELYMTN",  # optional
)
# create some example TimeSeries
test_ts = TimeSeries(
    name="series1",
    data=np.arange(1000),
    unit="m",
    timestamps=np.linspace(0.5, 601, 1000),
)
rate_ts = TimeSeries(
    name="series2", data=np.arange(600), unit="V", starting_time=0.0, rate=1.0
)
# Add the TimeSeries to the file
new_nwb.add_acquisition(test_ts)
new_nwb.add_acquisition(rate_ts)

new_nwb.add_trial(start_time=1.0, stop_time=2.0, timeseries=[test_ts])
new_nwb.add_trial(start_time=10.0, stop_time=11.0, timeseries=[test_ts, rate_ts])
new_nwb.add_trial(start_time=30.0, stop_time=31.0, timeseries=[test_ts, rate_ts])

with NWBHDF5IO('test_a.h5', 'w') as io:
    io.write(new_nwb)

with NWBHDF5IO('test_a.h5', 'r') as base_io:
    nwb_add = base_io.read().copy()
    new_nwb.add_acquisition(TimeSeries(name='series3', unit='V', data=np.arange(300), rate=0.5))
    with NWBHDF5IO('test_b.h5', 'w', manager=base_io.manager) as out_io:
        out_io.write(nwb_add)

try:
    with NWBHDF5IO('test_a.h5', 'r') as io:
        nwb = io.read()
        print(len(nwb.trials))
    with NWBHDF5IO('test_b.h5', 'r') as io:
        nwb = io.read()
        print(len(nwb.trials))
finally:
    os.unlink('test_a.h5')
    os.unlink('test_b.h5')

Traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 52
     50         print(len(nwb.trials))
     51     with NWBHDF5IO('test_b.h5', 'r') as io:
---> 52         nwb = io.read()
     53         print(len(nwb.trials))
     54 finally:

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/pynwb/__init__.py:304, in NWBHDF5IO.read(self, **kwargs)
    301         raise TypeError("NWB version %s not supported. PyNWB supports NWB files version 2 and above." %
    302                         str(file_version_str))
    303 # read the file
--> 304 file = super().read(**kwargs)
    305 return file

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py:479, in HDF5IO.read(self, **kwargs)
    476     raise UnsupportedOperation("Cannot read from file %s in mode '%s'. Please use mode 'r', 'r+', or 'a'."
    477                                % (self.source, self.__mode))
    478 try:
--> 479     return super().read(**kwargs)
    480 except UnsupportedOperation as e:
    481     if str(e) == 'Cannot build data. There are no values.':  # pragma: no cover

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/backends/io.py:60, in HDMFIO.read(self, **kwargs)
     57 if all(len(v) == 0 for v in f_builder.values()):
     58     # TODO also check that the keys are appropriate. print a better error message
     59     raise UnsupportedOperation('Cannot build data. There are no values.')
---> 60 container = self.__manager.construct(f_builder)
     61 container.read_io = self
     62 if self.herd_path is not None:

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/manager.py:284, in BuildManager.construct(self, **kwargs)
    280     result = self.__type_map.construct(builder, self, parent)
    281 else:
    282     # we are at the top of the hierarchy,
    283     # so it must be time to resolve parents
--> 284     result = self.__type_map.construct(builder, self, None)
    285     self.__resolve_parents(result)
    286 self.prebuilt(result, builder)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/manager.py:795, in TypeMap.construct(self, **kwargs)
    793     raise ValueError('No ObjectMapper found for builder of type %s' % dt)
    794 else:
--> 795     return obj_mapper.construct(builder, build_manager, parent)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1228, in ObjectMapper.construct(self, **kwargs)
   1226 cls = manager.get_cls(builder)
   1227 # gather all subspecs
-> 1228 subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1229 # get the constructor argument that each specification corresponds to
   1230 const_args = dict()

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1157, in ObjectMapper.__get_subspec_values(self, builder, spec, manager)
   1155                 ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1156     # now process groups and datasets
-> 1157     self.__get_sub_builders(groups, spec.groups, manager, ret)
   1158     self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1159 elif isinstance(spec, DatasetSpec):

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1208, in ObjectMapper.__get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1205     continue
   1206 if dt is None:
   1207     # recurse
-> 1208     ret.update(self.__get_subspec_values(sub_builder, subspec, manager))
   1209 else:
   1210     ret[subspec] = manager.construct(sub_builder)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1157, in ObjectMapper.__get_subspec_values(self, builder, spec, manager)
   1155                 ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1156     # now process groups and datasets
-> 1157     self.__get_sub_builders(groups, spec.groups, manager, ret)
   1158     self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1159 elif isinstance(spec, DatasetSpec):

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1210, in ObjectMapper.__get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1208     ret.update(self.__get_subspec_values(sub_builder, subspec, manager))
   1209 else:
-> 1210     ret[subspec] = manager.construct(sub_builder)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/manager.py:280, in BuildManager.construct(self, **kwargs)
    278 if parent_builder is not None:
    279     parent = self._get_proxy_builder(parent_builder)
--> 280     result = self.__type_map.construct(builder, self, parent)
    281 else:
    282     # we are at the top of the hierarchy,
    283     # so it must be time to resolve parents
    284     result = self.__type_map.construct(builder, self, None)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/manager.py:795, in TypeMap.construct(self, **kwargs)
    793     raise ValueError('No ObjectMapper found for builder of type %s' % dt)
    794 else:
--> 795     return obj_mapper.construct(builder, build_manager, parent)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:1249, in ObjectMapper.construct(self, **kwargs)
   1247 for const_arg in get_docval(cls.__init__):
   1248     argname = const_arg['name']
-> 1249     override = self.__get_override_carg(argname, builder, manager)
   1250     if override is not None:
   1251         val = override

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/hdmf/build/objectmapper.py:532, in ObjectMapper.__get_override_carg(self, *args)
    530     self.logger.debug("        Calling override function for constructor argument '%s'" % name)
    531     func = self.constructor_args[name]
--> 532     return func(self, *remaining_args)
    533 return None

File ~/.pyenv/versions/miniconda3-latest/envs/dataproc-dev/lib/python3.9/site-packages/pynwb/io/epoch.py:20, in TimeIntervalsMap.columns_carg(self, builder, manager)
     17 timeseries_builder = builder.get('timeseries')
     18 # if we have a timeseries column and the type is VectorData instead of TimeSeriesReferenceVectorData
     19 if (timeseries_builder is not None and
---> 20         timeseries_builder.attributes['neurodata_type'] != 'TimeSeriesReferenceVectorData'):
     21     # override builder attributes
     22     timeseries_builder.attributes['neurodata_type'] = 'TimeSeriesReferenceVectorData'
     23     timeseries_builder.attributes['namespace'] = 'core'

AttributeError: 'LinkBuilder' object has no attribute 'attributes'

Operating System

macOS

Python Executable

Conda

Python Version

3.9

Package Versions

pynwb 2.5.0 pyh267d04e_0 conda-forge
hdmf 3.10.0 pyh1ea47a8_0 conda-forge

Code of Conduct

@oruebel
Copy link
Contributor

oruebel commented Mar 19, 2024

with NWBHDF5IO('test_a.h5', 'r') as base_io:
    nwb_add = base_io.read().copy()
    new_nwb.add_acquisition(TimeSeries(name='series3', unit='V', data=np.arange(300), rate=0.5))
    with NWBHDF5IO('test_b.h5', 'w', manager=base_io.manager) as out_io:
        out_io.write(nwb_add)

Could you help me understand this part of the example. It's unclear to me why add_acquisition is being called for new_nwb but the write is then done for nwb_add. I think what may be happening is because new_nwb.add_acquisition is being called on new_nwb but new_nwb is not being saved afterwards, that this may create a broken link. Also, I don't think that calling copy on the container is the right approach here. I think using export is probably more appropriate here, see https://pynwb.readthedocs.io/en/stable/export.html I think something along the following lines:

with NWBHDF5IO('test_a.h5', 'r') as base_io:
    nwb_add = base_io.read()
    nwb_add.add_acquisition(TimeSeries(name='series3', unit='V', data=np.arange(300), rate=0.5))
    with NWBHDF5IO('test_b.h5', 'w', manager=base_io.manager) as out_io:
        out_io.export(srd_io=base_io, nwbfile=nwb_add)

@oruebel oruebel added the category: bug errors in the code or code behavior label Mar 19, 2024
@miketrumpis
Copy link
Author

miketrumpis commented Mar 19, 2024

Whoops, that was a mistake. But even removing that line entirely, i.e. writing the copied NWBFile, leads to the same result.

And yes, your suggestion does maintain the (or at least an equivalent) timeseries link.

@oruebel
Copy link
Contributor

oruebel commented Mar 19, 2024

I believe the main issue is the probably because of out_io.write(nwb_add). The write method is intended for writing a new file. When writing an existing file to a new target, you will need to use export instead to make sure that linkage are being resolved correctly when writing to a new target. Using write will try to write only the parts that are new to the file (in your case series3), but when writing to a new file you also want to copy (or at least link) the existing content of the file.

Here an updated version of your script that uses export instead. At least on my labtop this works:

from datetime import datetime
from uuid import uuid4
import os
import numpy as np
from dateutil.tz import tzlocal

from pynwb import NWBFile, TimeSeries, NWBHDF5IO

# create the NWBFile
new_nwb = NWBFile(
    session_description="my first synthetic recording",  # required
    identifier=str(uuid4()),  # required
    session_start_time=datetime(2017, 4, 3, 11, tzinfo=tzlocal()),  # required
    experimenter="Baggins, Bilbo",  # optional
    lab="Bag End Laboratory",  # optional
    institution="University of Middle Earth at the Shire",  # optional
    experiment_description="I went on an adventure with thirteen dwarves to reclaim vast treasures.",  # optional
    session_id="LONELYMTN",  # optional
)
# create some example TimeSeries
test_ts = TimeSeries(
    name="series1",
    data=np.arange(1000),
    unit="m",
    timestamps=np.linspace(0.5, 601, 1000),
)
rate_ts = TimeSeries(
    name="series2", data=np.arange(600), unit="V", starting_time=0.0, rate=1.0
)
# Add the TimeSeries to the file
new_nwb.add_acquisition(test_ts)
new_nwb.add_acquisition(rate_ts)

new_nwb.add_trial(start_time=1.0, stop_time=2.0, timeseries=[test_ts])
new_nwb.add_trial(start_time=10.0, stop_time=11.0, timeseries=[test_ts, rate_ts])
new_nwb.add_trial(start_time=30.0, stop_time=31.0, timeseries=[test_ts, rate_ts])

with NWBHDF5IO('test_a.h5', 'w') as io:
    io.write(new_nwb)

with NWBHDF5IO('test_a.h5', 'r') as io:
    nwb = io.read()
    print(len(nwb.trials))

with NWBHDF5IO('test_a.h5', 'r') as base_io:
    nwb_add = base_io.read()
    nwb_add.add_acquisition(TimeSeries(name='series3', unit='V', data=np.arange(300), rate=0.5))
    with NWBHDF5IO('test_b.h5', 'w', manager=base_io.manager) as out_io:
        out_io.export(src_io=base_io, nwbfile=nwb_add)

with NWBHDF5IO('test_a.h5', 'r') as io:
    nwb = io.read()
    print(len(nwb.trials))
with NWBHDF5IO('test_b.h5', 'r') as io:
    nwb = io.read()
    print(len(nwb.trials))

@miketrumpis
Copy link
Author

Thank you for the suggestions. I'm glad to know there is a way to make this work.

As for the bug issue, I am using NWBFile.copy combined with NWBHDF5IO.write in practice for linking elements from a parent file. For example, the original MWE (modulo the typo you pointed out) runs just fine if add_trial does not have links to timeseries.

@oruebel
Copy link
Contributor

oruebel commented Mar 21, 2024

As for the bug issue, I am using NWBFile.copy combined with NWBHDF5IO.write in practice for linking elements from a parent file. For example, the original MWE (modulo the typo you pointed out) runs just fine if add_trial does not have links to timeseries.

@rly could you take a look. Is using NWBFile.copy combined with NWBHDF5IO.write in this way intended usage or should this use export?

@rly
Copy link
Contributor

rly commented Mar 22, 2024

@oruebel To make a shallow copy, then yes, one should use NWBFile.copy combined with NWBHDF5IO.write. This will copy the TimeIntervals table but the columns are links to columns in the original file. (note that these links rely on relative paths, so these files need to stay within the same relative path to each other for the link to work).

@miketrumpis Thanks for the thorough bug report! You are correct about the fix. I created a PR in #1865 with that fix.

@rly rly self-assigned this Mar 22, 2024
@oruebel
Copy link
Contributor

oruebel commented Mar 22, 2024

Thanks @rly for taking a look at this

@rly rly closed this as completed in #1865 Mar 22, 2024
@rly
Copy link
Contributor

rly commented Mar 22, 2024

@miketrumpis could you please install the latest dev branch of this github repo and see if that solves the issue for you?

@miketrumpis
Copy link
Author

@rly excuse the delay, but yes it's all good here

270291ad (HEAD -> dev, tag: latest, origin/dev, origin/HEAD) Update GitHub release checklist (#1872)

idk if there is a good way like __eq__ to compare the timeseries, but they look functionally equal!

@rly
Copy link
Contributor

rly commented Mar 27, 2024

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants