Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_gather_columns agnostic search #991

Merged
merged 24 commits into from
Nov 7, 2023
Merged

_gather_columns agnostic search #991

merged 24 commits into from
Nov 7, 2023

Conversation

mavaylon1
Copy link
Contributor

@mavaylon1 mavaylon1 commented Oct 31, 2023

Motivation

What was the reasoning behind this change? Please explain the changes briefly.

Within DynamicTable is a pre-init method _gather_columns that uses the base classes in a conditional to create pre-defined columns. When autogenerating a class, the order of these classes are determined in classgenerator.py

if '__clsconf__' in classdict:
            # do not add MCI as a base if a base is already a subclass of MultiContainerInterface
            for b in bases:
                if issubclass(b, MultiContainerInterface):
                    break
            else:
                if issubclass(MultiContainerInterface, bases[0]):
                    # if bases[0] is Container or another superclass of MCI, then make sure MCI goes first
                    # otherwise, MRO is ambiguous
                    bases.insert(0, MultiContainerInterface)
                else:
                    # bases[0] is not a subclass of MCI and not a superclass of MCI. place that class first
                    # before MCI. that class __init__ should call super().__init__ which will call the
                    # MCI init
                    bases.insert(1, MultiContainerInterface)

This results in an issue for an extension that use the nwbinspector to validate the NWBFile. The best fix is to make the search in _gather_columns agnostic to the order.

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c58b251) 88.65% compared to head (c6bf52c) 88.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #991      +/-   ##
==========================================
- Coverage   88.65%   88.65%   -0.01%     
==========================================
  Files          45       45              
  Lines        9451     9458       +7     
  Branches     2688     2691       +3     
==========================================
+ Hits         8379     8385       +6     
  Misses        757      757              
- Partials      315      316       +1     
Files Coverage Δ
src/hdmf/utils.py 96.50% <ø> (ø)
src/hdmf/common/table.py 86.37% <90.90%> (-0.01%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mavaylon1
Copy link
Contributor Author

Fix #959

@mavaylon1
Copy link
Contributor Author

mavaylon1 commented Oct 31, 2023

@rly Proof of concept. This passes the "test" of the extension now works.

src/hdmf/common/table.py Outdated Show resolved Hide resolved
Co-authored-by: Ryan Ly <[email protected]>
src/hdmf/common/table.py Outdated Show resolved Hide resolved
@rly
Copy link
Contributor

rly commented Oct 31, 2023

I think the for loop should break upon finding a class that has __columns__. Otherwise the same columns might be added multiple times. I also think the order should be last first. The current code suggests that we should look only at the last base class. Can you create a test where extension Class A extends DynamicTable and has __columns__, extension Class B extends Class A and has __columns__, and extension Class C extends Class B. The generated __columns__ of Class C should prepend only the columns defined in Class B and not touch the columns defined in Class A (which should be in Class B).

This also raises an issue where a column in Class B might overwrite a column in Class A. We don't want to add the same column twice during the prepending. We should add logic to check for that in this block of code, but perhaps in a separate PR.

@mavaylon1
Copy link
Contributor Author

I think the for loop should break upon finding a class that has __columns__. Otherwise the same columns might be added multiple times. I also think the order should be last first. The current code suggests that we should look only at the last base class. Can you create a test where extension Class A extends DynamicTable and has __columns__, extension Class B extends Class A and has __columns__, and extension Class C extends Class B. The generated __columns__ of Class C should prepend only the columns defined in Class B and not touch the columns defined in Class A (which should be in Class B).

This also raises an issue where a column in Class B might overwrite a column in Class A. We don't want to add the same column twice during the prepending. We should add logic to check for that in this block of code, but perhaps in a separate PR.

I'll give it a gander. This was just to get our initial idea out in practice.

@rly
Copy link
Contributor

rly commented Oct 31, 2023

Understood and sounds good.

@mavaylon1
Copy link
Contributor Author

mavaylon1 commented Nov 1, 2023

@rly Taking a gander and wanted to confirm some thoughts. PyNWB has PlaneSegmentation and that is a subclass of DynamicTable. The extension has PlaneExtension, which extends PlaneSegmentation. When writing a NWBFile from the extension, the class defined in the extension is used. However, when running the nwbinspector, it does not see the class but just the schema (not clear how). This leads hdmf to autogenerate the class.

In a testing case, Class A is PlaneSegmentation, Class B is PlaneExtension, and Class C is an extension of Class B with generated columns. You say "prepend" meaning I should use add_column to Class C and verify that it doesn't affect the columns in class A?

By not defining the columns in class C, that should trigger them to be autogenerated to match B correct? But that would require a schema

@rly
Copy link
Contributor

rly commented Nov 1, 2023

You say "prepend" meaning I should use add_column to Class C and verify that it doesn't affect the columns in class A?

By prepend, I am referring to line 287 in the new code. I mean that the gather_columns method should add the columns in class A once not twice because those columns were also added to class B.

By not defining the columns in class C, that should trigger them to be autogenerated to match B correct? But that would require a schema

Yes, if class C does not add or change any columns, then the C.__columns__ should match B.__columns__

@mavaylon1
Copy link
Contributor Author

You say "prepend" meaning I should use add_column to Class C and verify that it doesn't affect the columns in class A?

By prepend, I am referring to line 287 in the new code. I mean that the gather_columns method should add the columns in class A once not twice because those columns were also added to class B.

By not defining the columns in class C, that should trigger them to be autogenerated to match B correct? But that would require a schema

Yes, if class C does not add or change any columns, then the C.__columns__ should match B.__columns__

Do you mean add the columns "from" class A to class B?

@rly
Copy link
Contributor

rly commented Nov 1, 2023

Do you mean add the columns "from" class A to class B?

Gather columns should copy the columns from class A to class B, and from class B to class C, but because class B contains class A's columns, gather_columns should not additionally copy columns from class A to class C, or else class A's columns will be in class C twice.

@mavaylon1
Copy link
Contributor Author

Do you mean add the columns "from" class A to class B?

Gather columns should copy the columns from class A to class B, and from class B to class C, but because class B contains class A's columns, gather_columns should not additionally copy columns from class A to class C, or else class A's columns will be in class C twice.

Class A's columns would be in Class C via Class B.

@mavaylon1 mavaylon1 marked this pull request as ready for review November 6, 2023 21:01
@mavaylon1 mavaylon1 requested a review from rly November 6, 2023 22:15
src/hdmf/common/table.py Outdated Show resolved Hide resolved
Co-authored-by: Ryan Ly <[email protected]>
@rly
Copy link
Contributor

rly commented Nov 7, 2023

Looks good to me. Thanks!

@mavaylon1 mavaylon1 merged commit 1d7fd83 into dev Nov 7, 2023
28 checks passed
@mavaylon1 mavaylon1 deleted the multi branch November 7, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants