-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup caching of audbcards.Dataset #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This achieves a smaller footprint by changing the caching mechanisms,
and now distinguishing between cached properties and "normal" ones, i.e. between ones decorated with @functools.cached_property
and others decorated with @poperty
.
These are treated differently now, with the heavy ones that are conceptually not descriptive of a adtaset - backend, deps, header, repository_object - as "normal ones" that are not being cached.
These "normal" properties also have a kind of lazy loading implemented, and are loaded once in the lifetime of an object. As data-artifacts are slowly changing (and audbcards descriptives too), I can see no problem with that - so one does not have to deal with datasets that change during the liketoime of an object. So I belive this is fine.
The whole MR makes the process of building the "carddeck" 50% faster, but the saving in disk space is by a magnitude larget which is great.
The tests cover the new features and look sound to me.
The only concern I have is about dependencies: is the code depending on a newer version of audbackend already? I see not changes in the pyproject.toml
.
I do not think this will be a great deal and am approving tentatively.
No, this does not yet depend on a newer |
When caching
audbcards.Dataset
we store objects that are not needed to create a datacard,e.g. the dependency table and header of a dataset. This increases the size of the cache and makes loading slower than it is needed.
This pull request speeds up caching of
audbcards.Dataset
by pickling only cached properties, as listed byaudbcards.Dataset._cached_properties()
(formerlyaudbcards.Dataset.properties()
).The execution time for building our database overview page is as follows on compute5:
The size of the cache is reduced from 2.6G to 133M.
We can further improve execution time by also caching the images / audio examples from
audbcards.Datacard
, but I will handle this in a follow up pull request.Further changes:
audbcards.Dataset.properties()
toaudbcards.Dataset._cached_properties()
audbcards.Dataset.schemes_summary
, that holds entries needed for the dataset overview pageaudbcards.Dataset.cache_root
attributeaudbcards.Dataset.deps
andaudbcards.Dataset.header
to properties, and added them to the documentationaudbcards.Dataset.backend
andaudbcards.Dataset.repository_object
properties__getstate__
and__setstate__
methods to thedohq_artifactory.GenericRepository
object, as the repository is no longer pickledNewly added API entries: