[Bug]: Adding Large Stimulus Table with add_interval takes incredibly long #1946

rcpeene · 2024-08-13T17:08:39Z

What happened?

I am trying to generate an NWB file with a rather large stim table row by row using TimeIntervals.add_interval(). The stim table for our experiment happens to be very large (>40,000 rows). Trying on two different machines, this takes more than 10 hours to do. The add_interval operation seems to be the bottleneck, and it takes greater amounts of time as the table gets longer.

After digging through the code it looks like it might be __calculate_idx_count, perhaps bisect.

Is there a more direct way to generate a TimeIntervals table from an existing table (while ensuring that types of each columns are properly casted)? Or is there a fix to the slowness of the add_interval operation?

Steps to Reproduce

Running this snippet when generating a TimeIntervals object with a very large table

        presentation_interval = create_stimulus_presentation_time_interval(
            name=f"{stim_name}_presentations",
            description=interval_description,
            columns_to_add=cleaned_table.columns,
        )

        for i, row in enumerate(cleaned_table.itertuples(index=False)):
            row = row._asdict()
            row = {key: str(value) for key, value in row.items()}
            start_column = 'Start'  # Adjust this as per the actual column name in CSV
            end_column = 'End'  # Adjust this as per the actual
            start_time = float(row[start_column])
            end_time = float(row[end_column])
            presentation_interval.add_interval(
                **row,
                start_time=start_time, stop_time=end_time,
                tags="stimulus_time_interval", timeseries=ts
            )

        nwbfile.add_time_intervals(presentation_interval)

Traceback

No traceback

Operating System

Windows

Python Executable

Conda

Python Version

3.10

Package Versions

pynwb==2.8.1

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

The text was updated successfully, but these errors were encountered:

rcpeene · 2024-08-14T08:27:25Z

The real bottleneck appears to be in DynamicTable.add_row()

stephprince · 2024-08-14T14:05:16Z

Hi @rcpeene,

One way to speed up the add_interval operation would be to add the argument check_ragged=False. We recently added this check to provide a better warning for ragged arrays, but this operation can cause performance issues for larger tables since it checks the data on each call to add_row / add_interval.

presentation_interval.add_interval(
    **row,
    start_time=start_time, stop_time=end_time,
    tags="stimulus_time_interval", timeseries=ts, check_ragged=False
)

Could you try setting check_ragged to False and see if that improves your performance?

rcpeene · 2024-08-14T19:54:17Z

This was remarkably faster and completed in a few minutes. Thanks!

stephprince added priority: medium non-critical problem and/or affecting only a small set of NWB users category: question questions about code or code behavior topic: HDMF issues related to the use, depending on, or affecting HDMF labels Aug 14, 2024

stephprince closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Adding Large Stimulus Table with add_interval takes incredibly long #1946

[Bug]: Adding Large Stimulus Table with add_interval takes incredibly long #1946

rcpeene commented Aug 13, 2024 •

edited

Loading

rcpeene commented Aug 14, 2024

stephprince commented Aug 14, 2024 •

edited

Loading

rcpeene commented Aug 14, 2024

[Bug]: Adding Large Stimulus Table with add_interval takes incredibly long #1946

[Bug]: Adding Large Stimulus Table with add_interval takes incredibly long #1946

Comments

rcpeene commented Aug 13, 2024 • edited Loading

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

rcpeene commented Aug 14, 2024

stephprince commented Aug 14, 2024 • edited Loading

rcpeene commented Aug 14, 2024

rcpeene commented Aug 13, 2024 •

edited

Loading

stephprince commented Aug 14, 2024 •

edited

Loading