Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frozen distribution considerably slower #532

Open
mathause opened this issue Sep 30, 2024 · 2 comments
Open

frozen distribution considerably slower #532

mathause opened this issue Sep 30, 2024 · 2 comments

Comments

@mathause
Copy link
Member

mathause commented Sep 30, 2024

Expression.evaluate(...) creates a frozen scipy distribution (frozen means that the params, e.g. loc are fixed). Using a frozen instance instead of a non-frozen one is considerably slower. It's the setup of the frozen instance which is costly and takes about 501 μs (on my laptop). So if we create one instance and call it repeatedly this does not matter much, but if we set it up every time it adds up. The setup time seems relatively independent of the number of elements.

This could also offer some speed ups - especially in optimization routines - but will require a change in how the functions are called.

import numpy as np
import scipy as sp

loc = np.random.randn(100)
scale = np.abs(np.random.randn(100))
x = np.random.randn(100)
dist = sp.stats.norm(loc=loc, scale=scale)

%timeit sp.stats.norm(loc=loc, scale=scale).rvs()
%timeit sp.stats.norm.rvs(loc=loc, scale=scale)

%timeit dist.rvs()
%timeit dist = sp.stats.norm(loc=loc, scale=scale)
@mathause
Copy link
Member Author

mathause commented Oct 7, 2024

I think the way to go is to try and avoid using evaluate if possible and move the dist methods to the Expression class. So for example, ppf, isf, loglike etc... After #539 this should be relatively straightforward. Especially in the fitting routines, where we call evaluate repeatedly this should speed up everything considerably. I also think that logically these functions belong to the Expression class (which, however, may need a rename to include "distribution").

So where we would currently do:

distrib = self.expr_fit.evaluate(x, self.data_pred)
support = distrib.support()

We would define:

class Expression:

    ...

    def support(self, coefficients_values, inputs_values):

        params = self.get_params(coefficients_values, inputs_values)
        return self.distrib().support(**params)

and can then use:

support = self.expr_fit.support(x, self.data_pred)

Important

This is secondary stuff

  • Going further down that rabbit hole we may want three versions:
    • self.expr_fit.support.from_params(**params)
    • self.expr_fit.support(coefficients_values, inputs_values, *, sanity_check=False)

@mathause
Copy link
Member Author

mathause commented Oct 7, 2024

I just saw that I did not mention that sp.stats.norm(loc=loc, scale=scale).rvs() is 25 times slower than sp.stats.norm.rvs(loc=loc, scale=scale) - this is an easy win to speed up numerical optimization.

495 μs ± 1.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
20.1 μs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
21.1 μs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
446 μs ± 2.85 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Of course evaluate_params does add some overhead as well, so the speed-up is not quite as dramatic but still considerable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant