Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DTR validation assumes table is set and raises warning aggressively #962

Open
3 tasks done
rly opened this issue Oct 6, 2023 · 5 comments
Open
3 tasks done
Assignees
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users

Comments

@rly
Copy link
Contributor

rly commented Oct 6, 2023

What happened?

See catalystneuro/ndx-photometry#8

Using HDMF 3.7.0, when a table T1 containing a DynamicTableRegion R is added to another container T2, HDMF checks whether DynamicTableRegion R references a table that shares an ancestor with T2 to warn the user that the referenced table hasn't been added to the file hierarchy.

This check assumes that R.table has been set, which is not always the case on initialization. An error is raised when it is not yet set. We should check for that case.

Also when T1 is added to T2, it is not always the case that R.table should have always been added to an ancestor of T2. The warning seems premature and aggressively requires a particular order of operations that I think is awkward. See my changes in https://github.com/NeurodataWithoutBorders/pynwb/pull/1778/files . For example, to create objects in PyNWB before adding them to an NWBFile, you have to do something like:

        pm = ProcessingModule(...)
        ps = create_plane_segmentation()
        pm.add(ps)
        ff = Fluorescence()
        pm.add(ff)
        rt_region = ps.create_roi_table_region(...)
        rrs = RoiResponseSeries(...)
        ff.add_roi_response_series(rrs)

instead of, which used to work without a warning and feels more natural.

        pm = ProcessingModule(...)
        ps = create_plane_segmentation()
        rt_region = ps.create_roi_table_region(...)
        rrs = RoiResponseSeries(...)
        ff = Fluorescence(rrs)  # create a Fluorescence obj and immediately add rrs - this now raises a warning
        pm.add(ps)
        pm.add(ff)

I think we should relax this check so that it only runs when adding objects to the top-level File object. To accommodate that, we could either create a new class for HdmfFile that another class like NWBFile can inherit from, or add a new class variable __is_root_type on Container that indicates whether the class represents a top-level File object that NWBFile and any other top-level class from an HDMF extension can set to True. Option 1 supports code reuse but there's no code to put in HdmfFile. Option 2 is more flexible and avoids edge cases that can occur when generating classes with multiple inheritance, so I prefer Option 2.

Steps to Reproduce

See https://github.com/catalystneuro/ndx-photometry/issues/8

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.11

Package Versions

No response

Code of Conduct

@rly rly changed the title [Bug]: [Bug]: DTR validation assumes table is set and raises warning aggressively Oct 6, 2023
@rly rly self-assigned this Oct 6, 2023
@oruebel
Copy link
Contributor

oruebel commented Oct 6, 2023

add a new class variable __is_root_type on Container that indicates whether the class represents a top-level File object that NWBFile

In principle, any type can be a root, the only restriction, I believe is that the name most be root. As such, rather than __is_root_type on a class level, I would suggest to maybe just have a __is_root as an object property that is set in __init__. We could add an optional is_root argument on AbstractContainer that is false by default and if true, then we could also enforce that the name is set to "root", which would help avoid other issues as well. Alternatively, we could also just use the check if name == 'root', which I think would work too, but is a bit more hacky, because it doesn't allow for containers to be named root while not being the root and it introduces and implicit convention

@oruebel
Copy link
Contributor

oruebel commented Oct 6, 2023

check if name == 'root'

That being said, this is the logic that at least ZarrIO is currently using:

https://github.com/hdmf-dev/hdmf-zarr/blob/70bf35b60ed2c2eaae7b12080bc1f4cc3d89ba3e/src/hdmf_zarr/backend.py#L545

However, having an explicit variable to indicate whether a container is the root would be useful. I don't think such a change to the API would affect many users, but it would likely affect a lot of tests in HDMF (and possibly in PyNWB).

@rly
Copy link
Contributor Author

rly commented Oct 6, 2023

I like the idea of passing it in to the constructor. That is effectively how LinkML uses their tree_root property and it makes sense.

Also, the root object doesn't have a name on disk. The name is set to "root" for the root container in memory on read. On write, the name of the root container is currently ignored and just happens to be set to "root" in the case of NWBFile.

That being said, this is the logic that at least ZarrIO is currently using:

hdmf-dev/hdmf-zarr@70bf35b/src/hdmf_zarr/backend.py#L545

So ZarrIO will not currently work correctly when writing/reading a DynamicTable or other Container named something else?

@rly
Copy link
Contributor Author

rly commented Oct 6, 2023

@oruebel At least in HDF5, the name of the root container is not written. Does Zarr write the root container name to disk?

@oruebel
Copy link
Contributor

oruebel commented Oct 6, 2023

At least in HDF5, the name of the root container is not written. Does Zarr write the root container name to disk?

Same thing in Zarr. But bott ZarrIO and HDF5IO define ROOT_NAME="root" as a variable to define that the root container must be names root

@rly rly added category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

No branches or pull requests

2 participants