Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better first guess for fit_yeo_johnson_transform #492

Open
mathause opened this issue Aug 8, 2024 · 3 comments
Open

better first guess for fit_yeo_johnson_transform #492

mathause opened this issue Aug 8, 2024 · 3 comments

Comments

@mathause
Copy link
Member

mathause commented Aug 8, 2024

We can speed up the fit_yeo_johnson_transform by passing a better first guess, assuming the trend is 0. We can get the first guess using:

from sklearn.preprocessing import PowerTransformer

l = PowerTransformer().fit(tas_stacked_y.tas).lambdas_

# we can calculate xi_0 from lambda as
xi_0 = (2 - l) / l
@veni-vidi-vici-dormivi
Copy link
Collaborator

Hm but instead of tas_stacked_y.tas with would use resids_after_hm.tas[month] right? So the assumption would be that there is a skew of the monthly residuals w.r.t. to the yearly values but that it is constant and not dependent on the yearly temperature value. That's a good idea. But we would need to do it 12 times too. Does that pay off?

@mathause
Copy link
Member Author

mathause commented Aug 9, 2024

Hm but instead of tas_stacked_y.tas with would use resids_after_hm.tas[month] right?

Yes

Does that pay off?

The idea is that there is not much trend and that it's much faster to fit one param than 2 and that starting at a good point for $\xi_0$ speeds up the minimization. It helps, but only by about 10% - so much less than I would have hoped.

@mathause
Copy link
Member Author

mathause commented Aug 12, 2024

I could try again with much lower precision for the first guess - most of the iterations are spent honing in the estimate. The fit uses sp.optimize.brent with a tolerance of about 1e-8. For our purpose 1e-2 is probably enough.

Only problem: the tol param is not exposed in PowerTransformer().fit.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brent.html

Just for clarity: this yields a maximum of another 10% speed gain - so still debatable if its worth the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants