Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad validation check for postion in ElectrodeGroup.__init__ #1770

Merged
merged 25 commits into from
Oct 14, 2024

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Aug 29, 2023

Motivation

Fix #1768 ElectrodeGroup.position is a compound dataset of x,y,z positions, but ElectrodeGroup.__init__ validates it wrongly by enforcing that the array has 3 elements instead of checking that the elements have 3 elements. This PR updates ElectrodeGroup.__init__ to validate ElectrodeGroup.position correctly

How to test the behavior?

See the linked issue for details

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@oruebel oruebel requested a review from rly August 29, 2023 10:22
@oruebel oruebel added category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Aug 29, 2023
@oruebel oruebel added this to the Next Release milestone Aug 29, 2023
@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.87%. Comparing base (dc98e84) to head (3fe93eb).
Report is 1 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1770      +/-   ##
==========================================
+ Coverage   91.85%   91.87%   +0.01%     
==========================================
  Files          27       27              
  Lines        2689     2695       +6     
  Branches      701      703       +2     
==========================================
+ Hits         2470     2476       +6     
  Misses        145      145              
  Partials       74       74              
Flag Coverage Δ
integration 72.69% <75.00%> (+0.06%) ⬆️
unit 83.33% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/pynwb/ecephys.py Outdated Show resolved Hide resolved
@oruebel
Copy link
Contributor Author

oruebel commented Aug 29, 2023

Just some detective work for background:

  • It seems that ElectrodeGroup.position was added in NWB v2.2 in Support NWB schema 2.2.2 #1146. I believe this bug has existed since the field was added.
  • Looking at NeurodataWithoutBorders/nwb-schema@59db28e it seems that ElectrodeGroup.position was first added as an array of length 3 (which is what the current check is looking for) and then changed to a compound data type. I.e., my guess is that PyNWB didn't get updated when the change to compound happened, and the unit tests did not fully cover this field.

Given that it has take 3 years for this issue to surface, it seems that ElectrodeGroup.position is probably rarely used.

src/pynwb/ecephys.py Outdated Show resolved Hide resolved
src/pynwb/ecephys.py Outdated Show resolved Hide resolved
@@ -25,7 +25,8 @@ def setUpContainer(self):
eg = ElectrodeGroup(name='elec1',
description='a test ElectrodeGroup',
location='a nonexistent place',
device=self.dev1)
device=self.dev1,
position=[(1., 2., 3.), ])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question - why is this a list at all? Is there a case where we would have multiple tuples in this list? What would that represent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wasn't sure what the use case of the list would be - I added a test case with multiple tuples for now but not sure if that is necessary?

position_dtype_valid = False
if position_dtype_valid: # If we have list of element, then check that the elements are of length 3
try:
if len(args_to_set['position'][0]) != 3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking only the first element here seems to imply that this value should only be a single tuple of length 2 or 3 and that we would not expect more than one to be passed in the list input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could iterate through all elements. I believe I just wanted to avoid checking all elements to keep runtime short in case there are a lot of values .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to iterate and check the length of all the position elements

@CodyCBakerPhD
Copy link
Collaborator

@oruebel Any updates on this PR?

@oruebel
Copy link
Contributor Author

oruebel commented Jan 25, 2024

@stephprince could you take a look at this PR. I think this is essentially complete but needs some final attention to get over the finish line.

@rly
Copy link
Contributor

rly commented Mar 5, 2024

@stephprince do you have a chance to take a look at this? thanks!

@stephprince
Copy link
Contributor

Yes sorry for the delay I'll take a look!

@stephprince
Copy link
Contributor

@rly this is ready to review

@rly
Copy link
Contributor

rly commented Mar 13, 2024

Ok, after doing a full review of the issue and PR, I am also confused as to why ElectrodeGroup.position is allowed to be a list of tuples. I think @oruebel may be mistaken. ElectrodeGroup.position should be a scalar dataset with compound dtype (x, y, z). @oruebel can you review?

@oruebel
Copy link
Contributor Author

oruebel commented May 2, 2024

I think @oruebel may be mistaken. ElectrodeGroup.position should be a scalar dataset with compound dtype (x, y, z)

The schema here says:

  datasets:
  - name: position
    dtype:
    - name: x
      dtype: float32
      doc: x coordinate
    - name: y
      dtype: float32
      doc: y coordinate
    - name: z
      dtype: float32
      doc: z coordinate
    doc: stereotaxic or common framework coordinates
    quantity: '?'

I.e., position is a compound dataset but the shape is not specified. Looking at the schema language docs here, I believe this means that shape: null is assumed so that it should be scalar. So yes, it seems that this should indeed be a scalar with a compound data type. So this PR should be updated accordingly.

Question, how does the input for scalar compound datasets need to look like? Looking at the source issue #1768 setting:

  • position=np.array([1., 2., 3.]) : results in an error due to the validation in ElectrodeGroup
  • position=(1., 2., 3.) : is being written as [(1., 2., 3.) (1., 2., 3.) (1., 2., 3.)]
  • position=[1., 2., 3.] : is being written as [(1., 1., 1.) (2., 2., 2.) (3., 3., 3.)]

Specifying as [(1, 2, 3), ]) as currently on the PR fixes the bad write. Is position=[(1, 2, 3), ]) the right form and we need to updated the validation to check that only a single tuple is given or is position=(1., 2., 3.) the correct form and we need to dig deeper into write?

@stephprince stephprince modified the milestones: 2.7.0, 2.8.0 May 9, 2024
@stephprince
Copy link
Contributor

stephprince commented Sep 4, 2024

After some further investigation, it turned out that scalar datasets with a compound datatype were being written incorrectly so that the dataset was not scalar. These changes should be addressed in the upcoming hdmf release (3.14.4), but the tests will fail until that is released.

With the hdmf update, I believe ElectrodeGroup.position validation should check whether the position is a scalar with compound dtype, or whether it’s a single list/tuple of length 3 that we then convert to a scalar with compound dtype.

@rly rly enabled auto-merge (squash) October 10, 2024 17:30
@rly
Copy link
Contributor

rly commented Oct 10, 2024

This looks good to me but I will let @stephprince review and approve to merge.

Copy link
Contributor

@stephprince stephprince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @rly. All looks good to me.

I will fix the broken links in a separate PR.

@rly rly merged commit d32188b into dev Oct 14, 2024
24 of 25 checks passed
@rly rly deleted the fix/electrodegroup_position branch October 14, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Writing ElectrodeGroup position saves incorrectly modified data to file
4 participants