Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new neurodata_type: TTL pulse #306

Open
bendichter opened this issue Sep 9, 2019 · 16 comments
Open

new neurodata_type: TTL pulse #306

bendichter opened this issue Sep 9, 2019 · 16 comments
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users

Comments

@bendichter
Copy link
Contributor

It would be nice to have an intuitive and machine-readable neurodata_type for TTL pulses. Something like

neurodata_type_def: TTL
neurodata_type_ind: TimeSeries
datasets:
  - name: data
    shape:
      - None
    dims:
      - pulses
    dtype: int
 - name: keys
   shape:
     - None
   dims:
     - unique_labels
   dtype: int
 - name: labels
   shape:
     - None
   dims:
     - unique_labels
   dtype: text
@bendichter
Copy link
Contributor Author

@rly when you return I'd be interested in your thoughts

@t-b
Copy link
Contributor

t-b commented Sep 9, 2019

Jeep, that could be useful. For what would you use the labels and keys?

@bendichter
Copy link
Contributor Author

The labels would give you a human readable description of what the events were. The keys would map those labels to values

@t-b
Copy link
Contributor

t-b commented Sep 10, 2019

@bendichter But you could use epochs for that as well or?

@ajtritt
Copy link
Member

ajtritt commented Sep 10, 2019

Sounds like you’re describing a CV, which is usually represented with a table. I’d recommend just using a DynamicTable for the mapping. And the data could be stored as a DynamicTableRegion. I’m not sure how that would play with the TimeSeries.data tho

@bendichter
Copy link
Contributor Author

what do you mean by CV?

@ajtritt
Copy link
Member

ajtritt commented Sep 10, 2019

controlled vocabulary.

@bendichter
Copy link
Contributor Author

Oh I see. Yeah very similar but I think the one difference is that the actual value of the int is important to the labs as well as the label.

@ajtritt
Copy link
Member

ajtritt commented Sep 10, 2019

That might not be a CV then. Are the values of the pulses enumerable?

@bendichter
Copy link
Contributor Author

bendichter commented Sep 11, 2019

Enumerable? The ints are chosen by the researcher to signal specific events. There is a finite number of them and they generally are not sequential.

@rly
Copy link
Contributor

rly commented Sep 12, 2019

Although TTL pulses are a sequence of events in time, I do not think inheriting this type from TimeSeries is most appropriate here. TTL pulses are not generic time series measurements (the data don't have units) and the timestamps are almost never uniformly sampled in time. It would be more appropriate to use or alias the LabeledEvents type https://github.com/NeurodataWithoutBorders/nwb-schema/pull/302/files which is similar to what you have here ("keys" = "labels", "labels" = "label_map")

In my experience, users sometimes work with all of the TTL pulses together in order of time, but more often they separate the pulses by value and work with only a subset of them, often in isolation (peri-stimulus time histogram), and sometimes together in order. Those use cases would be best served by two different storage options:

  1. as @bendichter described, store a single array of N (timestamp, value) pairs
  2. store K 1-D arrays of timestamps, where K is the number of unique pulse values.

For Option 1, if the user wants to get the data in the form of Option 2 (i.e. split the array by value into K 1-D arrays), a naive implementation takes O(NK) time - I'm not sure this can be done faster.

For Option 2, if the user wants to get the data in the form of Option 1 (i.e. create a single array of all of the pulse values in order of time), then they would have to do a K-way merge sort of the K 1-D arrays, which is O(N log K) time.

Given that difference in efficiency (at the limit) and that I think working with only one or a subset of pulse values is more common than working with all pulse values in order, I think Option 2 would be better.

@ajtritt
Copy link
Member

ajtritt commented Sep 12, 2019

  1. store K 1-D arrays of timestamps, where K is the number of unique pulse values.

Could we store this in the same way we store spike units?

@rly
Copy link
Contributor

rly commented Sep 12, 2019

Yes, a DynamicTable would work, but for some reason, it feels less intuitive to me (same for the units table). It's a little overkill, but it would organize the TTL pulses together. Granted, the user does not need to know (and to some degree should not need to know) the low-level details of how the data is structured.

Ultimately, I think that TTL pulses, spike unit times, and events in general, could, and probably should, be stored in the same way as each other. So, yes, option 3:

  1. Store TTL pulses as a DynamicTable with indexed column pulse_times, and normal column label.

@bendichter
Copy link
Contributor Author

I worry about the proliferating use of DynamicTables, because it sort of side-steps the whole point of NWB:N, which is to provide good standards for meta-data. DynamicTable is basically an open door that allows you to store any type of data you want with minimal constraints. Any open door will lead to the temptation to use it for every datatype that is not currently supported, and I worry about the long term effects of that. This is useful for purely-meta-data structures like trials, where this flexibility is important and adherence to a strict standard is less important, but I worry when they are used for annotating acquired data. When they are used to annotate data, DynamicTables on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!

I agree that we need a way to express time series objects that are not measurements and are unitless. I think the proposed Events and/or LabelledEvents addresses this problem nicely, and I like that it generalizes to cases outside of TTL pulses because this has come up a few times and will continue to be needed. The one issue I have with the proposed extension is the names of the datasets. To me, label should be a string, i.e. the value of the dictionary, not the key.

Then there is the question of whether we should store a one object that captures all TTL pulses or several objects that each store a TTL pulse event type. This comes down to whether we want to make the data closer to what is acquired or closer to the semantic meaning/how it will be used an analysis. In this case I vote to make it closer to acquired with LabelledEvents because I imagine TTL pulses going in /acquisition. If users want to pull out specific events, we could encourage them to do that with Events objects in a processing module.

@rly rly mentioned this issue Oct 4, 2019
@rly
Copy link
Contributor

rly commented Oct 11, 2019

@bendichter Coming back to this, when you wrote:

When they are used to annotate data, DynamicTables on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!

What do you mean by global attributes? I think it is fine for subtypes of DynamicTables to define attributes that apply to all rows of the table.

@rly
Copy link
Contributor

rly commented Oct 11, 2019

What is the issue with linking individual spikes to snippets? I saw a bunch of issues/PRs on UnitSeries and it looked like a reasonable solution was agreed upon #248

@stephprince stephprince added category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users labels May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
Development

No branches or pull requests

5 participants