Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up loading of namespaces: skip register type when already registered when loading namespace #1102

Merged
merged 6 commits into from
Aug 19, 2024

Conversation

magland
Copy link
Contributor

@magland magland commented Apr 22, 2024

Motivation and description

I am trying to speed up the loading of namespaces in pynwb. Sometimes it takes up to 6 seconds on initial load. I was tracing through the code to see what could be causing the slowness and I came across the __register_type function. It appears to register the same type many times during the course of loading the namespaces. So I added a simple check to skip the registering of the type if it has already been registered.

IMPORTANT: I am not familiar enough with the code to know whether this change is going to break anything.

This is one of two PRs I am submitting to try and speed things up.

How to test the behavior?

Run this script twice before the change and once after the change. The first time will download the needed data and will save the loaded file segments to a cache directory. The second time and third times it is run, it will not include the download time. On my machine it takes around 4 sec to load before the change and around 2 sec after the change.

import time
import remfile
import pynwb
import h5py


def example_slow_load_namespace():
    # https://neurosift.app/?p=/nwb&dandisetId=000409&dandisetVersion=draft&url=https://api.dandiarchive.org/api/assets/c04f6b30-82bf-40e1-9210-34f0bcd8be24/download/
    h5_url = 'https://api.dandiarchive.org/api/assets/c04f6b30-82bf-40e1-9210-34f0bcd8be24/download/'
    disk_cache = remfile.DiskCache('test_cache')
    remf = remfile.File(h5_url, disk_cache=disk_cache)
    timer = time.time()
    with h5py.File(remf, 'r') as h5f:
        with pynwb.NWBHDF5IO(file=h5f, mode='r', load_namespaces=True) as io:
            nwbfile = io.read()
            print(nwbfile)
    elapsed = time.time() - timer
    print('Elapsed time:', elapsed)


if __name__ == '__main__':
    example_slow_load_namespace()

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

@oruebel @rly

Copy link

codecov bot commented Apr 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.88%. Comparing base (49a60df) to head (065fe70).
Report is 4 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1102      +/-   ##
==========================================
- Coverage   88.89%   88.88%   -0.01%     
==========================================
  Files          45       45              
  Lines        9834     9836       +2     
  Branches     2794     2795       +1     
==========================================
+ Hits         8742     8743       +1     
  Misses        776      776              
- Partials      316      317       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mavaylon1
Copy link
Contributor

This looks good. I will review this next week when I am back.

@rly rly added category: enhancement improvements of code or code behavior topic: performance labels May 2, 2024
@rly rly added this to the 3.14.0 milestone May 2, 2024
@mavaylon1 mavaylon1 merged commit b0f068e into hdmf-dev:dev Aug 19, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior topic: performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants