Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to write multidimensional string arrays #1173

Merged
merged 14 commits into from
Aug 21, 2024

Conversation

stephprince
Copy link
Contributor

@stephprince stephprince commented Aug 15, 2024

Motivation

Fixes #1137 and fixes #1096.

On build, multidimensional string lists/arrays were being converted into 1-D lists/arrays:

[['aa', 'bb'], ['cc', 'dd']] -> ["['aa', 'bb']", "['cc', 'dd']"].

On write, the shape of a multidimensional string dataset was being defined incorrectly and so the dataset could not be written. The data shape definition was being caught by this conditional that I believe is intended to check for structured arrays / compound data types:

elif isinstance(dtype, np.dtype):
data_shape = (len(data),)

I updated the conditional and get_data_shape is now used to set the shape of the dataset for multidimensional string lists/arrays.

How to test the behavior?

This came up in NeurodataWithoutBorders/pynwb#1924.

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

@stephprince stephprince marked this pull request as draft August 15, 2024 21:34
Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.89%. Comparing base (d50db92) to head (efc3b5d).
Report is 1 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #1173   +/-   ##
=======================================
  Coverage   88.88%   88.89%           
=======================================
  Files          45       45           
  Lines        9835     9839    +4     
  Branches     2795     2797    +2     
=======================================
+ Hits         8742     8746    +4     
  Misses        776      776           
  Partials      317      317           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephprince stephprince marked this pull request as ready for review August 15, 2024 22:32
@stephprince stephprince requested a review from rly August 15, 2024 22:32
src/hdmf/build/objectmapper.py Outdated Show resolved Hide resolved
@stephprince
Copy link
Contributor Author

stephprince commented Aug 19, 2024

@rly I realized that I had only updated the string conversion for datasets so I updated __convert_string to accommodate multidimensional string arrays as attributes as well.

For testing, I added an additional attr_array argument to the Bar class constructor, but I'm not sure if that was the best approach or if there's a better way to test it since the Bar class seemed to be used in multiple places.

@stephprince stephprince merged commit 2b167ae into dev Aug 21, 2024
29 checks passed
@stephprince stephprince deleted the write-multidim-str-arrays branch August 21, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants