Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Iterative writing to DynamicTable (trials, epochs, TimeIntervals ) #1933

Closed
3 tasks done
cboulay opened this issue Jul 10, 2024 · 3 comments
Closed
3 tasks done
Assignees
Labels
category: question questions about code or code behavior

Comments

@cboulay
Copy link

cboulay commented Jul 10, 2024

What would you like changed or added to the documentation and why?

Hi, I'm trying to write unbounded streams directly to nwb files (1 file per stream). So far, I have this working well for numeric TimeSeries. I was stuck for a while trying to write strings as event markers, especially as trials, epochs, or TimeIntervals series. The techniques described for TimeSeries and H5DataIO don't translate to event markers.

Where I failed was that I attempted to call io.write(nwbfile) after nwbfile.add_epoch_column(...) but before nwbfile.add_epoch(...). I thought I was following the pattern where I was setting up everything, calling io.write(nwbfile), then filling in after the fact. However, it appears that you cannot call io.write(nwbfile) if you have added a new epoch column but your epochs table emains empty.

            if isinstance(value, (list, tuple)):
                if len(value) == 0:
                    msg = "Cannot infer dtype of empty list or tuple. Please use numpy array with specified dtype."
>                   raise ValueError(msg)
E                   ValueError: Cannot infer dtype of empty list or tuple. Please use numpy array with specified dtype.

../../../.venv/lib/python3.9/site-packages/hdmf/build/objectmapper.py:314: ValueError

However, if I add an epoch first, then things work fine.

I think there are a few solutions for this. The easiest is documentation -- just explain that you can't write an empty table if you've added a custom column. In my own tool, I'm just going to defer adding the custom column until I have

Another option would be to allow setting the dtype via add_{x}_column(..., dtype=str), but this is significantly more work. Or am I supposed to subclass VectorData and supply that as the col_cls argument?

For now, my solution is to not add new columns until I receive a marker event.

ETA: The other major difference between a TimeSeries is that the nwbfile has to be re-written with multiple calls to io.write(nwbfile)

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

@cboulay cboulay changed the title [Documentation]: Iterative writing of string events (trials, epochs, else) [Documentation]: Iterative writing to DynamicTable (trials, epochs, TimeIntervals ) Jul 10, 2024
@cboulay
Copy link
Author

cboulay commented Jul 12, 2024

I realized after I wrote this issue that repeated calls to io.write(nwbfile) do nothing. For now I am leaving the io object open and writing once at the end when __del__ is called.

What is the Timeintervals id field? It seems that can accept an H5DataIO or DataChunkIterator. What can that be used for?

@rly
Copy link
Contributor

rly commented Jul 25, 2024

Hi @cboulay , we are working on making it possible to add rows to a DynamicTable after write by default (cc @mavaylon1). For now, adding rows after write is a little convoluted. You would have to predefine all your columns and wrap the data of each column (VectorData) in an H5DataIO with maxshape = (None, ) -- see code below. You can already add columns to a DynamicTable in append mode.

The easiest is documentation -- just explain that you can't write an empty table if you've added a custom column.

Thanks. We have an open issue ticket about that and unfortunately have not had the bandwidth to resolve it yet. We are updating the documentation here.

Another option would be to allow setting the dtype via add_{x}_column(..., dtype=str), but this is significantly more work. Or am I supposed to subclass VectorData and supply that as the col_cls argument?

Both are significantly more work, but the former would be good for us to do.

I thought I was following the pattern where I was setting up everything, calling io.write(nwbfile), then filling in after the fact.

In general, we recommend writing the data once you have all your data available, but I understand it is risky to hold all of that data in memory.

I think your proposal of not adding new columns until you receive a marker event makes sense. Try this code to add rows and columns to a trials table after an initial write. This will allow you to append to the file repeatedly (but you have to reopen the file after closing it).

from datetime import datetime
from uuid import uuid4
from dateutil import tz

from pynwb import NWBHDF5IO, NWBFile, H5DataIO

session_start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))

# add column
nwbfile = NWBFile(
    session_description="Mouse exploring an open field",  # required
    identifier=str(uuid4()),  # required
    session_start_time=session_start_time,  # required
)
nwbfile.add_trial(start_time=1.0, stop_time=2.0)
nwbfile.trials.id.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.trials.start_time.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.trials.stop_time.set_data_io(H5DataIO, {'maxshape': (None,)})

with NWBHDF5IO("test_append_dynamic_table.nwb", "w") as io:
    io.write(nwbfile)

io = NWBHDF5IO("test_append_dynamic_table.nwb", mode="a")
nwbfile = io.read()
nwbfile.add_trial_column('correct', 'whether the trial was correct', data=['test'])
nwbfile.trials.correct.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.add_trial(start_time=2.0, stop_time=3.0, correct='yes')
io.write(nwbfile)
io.close()

with NWBHDF5IO("test_append_dynamic_table.nwb", "r") as io:
    nwbfile = io.read()
    print(nwbfile.trials.to_dataframe())

@rly rly self-assigned this Jul 25, 2024
@rly rly added the category: question questions about code or code behavior label Jul 25, 2024
@cboulay
Copy link
Author

cboulay commented Jul 28, 2024

@rly , this was very helpful, thank you!
I was able to complete my objective and I now have multiple live data streams sinking to a single nwb file.
I'll close the issue now and await the release of the other API to stream to disk.

@cboulay cboulay closed this as completed Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: question questions about code or code behavior
Projects
None yet
Development

No branches or pull requests

2 participants