Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audb.load_to() use cache for tables #221

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Jul 26, 2022

Closes #220

This loads missing tables from the cache to the build folder.
In addition, we avoid storing the tables an additional time to the db_root folder.

It stores the tables in the cache under the version number as given by the dependency table,
this avoids caching the same table again for each version, but this is not something that is already integrated in audb.load_table().

Execution time

I measured execution time by running

audb.load_to(build_dir, name, version=previous_version, only_metadata=True, num_workers=8)

for a large database (salamander-a) with lots of tables.

Branch Cached Time
master no 485.0 s
master yes 485.0 s
this no 450.0 s
this yes 4.5 s

What is not ideal yet, is that by using audb.load_table() to load single tables we block the processes when several tables need to be stored in the same cache folder.

So maybe we should target a general refactoring of the table storing code.

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #221 (3d5de1e) into master (b9bd498) will not change coverage.
The diff coverage is 100.0%.

Impacted Files Coverage Δ
audb/core/info.py 100.0% <ø> (ø)
audb/core/load_to.py 100.0% <100.0%> (ø)

@hagenw
Copy link
Member Author

hagenw commented Jul 26, 2022

Another workaround would be to recommend using the same ../build folder when publishing a new version, but make sure to delete all media files inside the build folder first (which is of cause not a nice solution).

@hagenw hagenw marked this pull request as draft October 16, 2023 14:06
@hagenw hagenw changed the title WIP: audb.load_to() use cache for tables audb.load_to() use cache for tables Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use cache for tables in audb.load_to()
1 participant