Add support for sum of models and custom models #160

rhugonnet · 2023-08-11T00:58:14Z

Early draft! Stopped here to discuss the way forward, I prefer having your point of view before adjusting too many things in Variogram! 🙂

For now:

Added another test to check custom model definition (there were some already): test passed immediately,
Added a test to check custom model fitting: test didn't pass, required the adjustment below,
Added a boolean class attribute Variogram._is_model_custom defined during set_model(). This allows to separate the definition of bounds and p0 in Variogram.fit(), defined based the length of args from inspect.

However, doing it the current way, bounds and p0 will be quite poorly defined (at this point the script cannot tell which parameter matches which dimension). The user would likely have to pass them manually to reach a good fitting performance.
Additionally, the current implementation will raise issues in Variogram.describe() which then passes on to Variogram.parameters.

One way we could move forward would be to divide in 2 cases:

Add specific support for models that are a combination of models existing in skgstat/models, which would allow ._get_bounds(), .describe(), .parameters to function normally and make it easy for the user;
Add basic support for custom models, with a warning to the user that he needs to pass his own bounds and p0 to reach good fitting performance (in case those aren't passed).

To implement 1., however, I see several questions:

How to conveniently pass the combination of existing models to the model argument? I'm not sure of the best option. A dictionary such as model={"op": np.sum, "models": ["spherical", "gaussian", "matern"]} could do the job? But maybe this will get overly complex. Thinking about it, for something more complex than a sum, I'm not sure we'll be able to benefit much from knowing the nature of the models to infer "bounds" and "p0" efficiently anyways. We could simply leave it to case number 2, and add support just the sum for convenience, for which we can easily pass a list ["spherical", "gaussian", "matern"]?
How to return .parameters? Right now it's a list, should we simply concatenate into a longer one keeping always the same parameter order? Or switch to a dictionary?
For describe(), I guess we can simply detect that it is a combination of models and return a more detailed list of parameters.

To implement 2.:

We have to allow bounds and p0 to be passed by the user. I guess this should be both in Variogram.fit() and __init__()? We could add fit_bounds and fit_p0 in Variogram.__init__. But there's already a lot of arguments, so could also be in kwargs. What do you think?

That's all I noticed for now, but there might be a few additional things down the line! 😅

Resolves #159

… as parameters

…tom for clarity

codecov · 2023-08-11T01:05:52Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (c2d63d1) 89.75% compared to head (9690d39) 90.74%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #160      +/-   ##
==========================================
+ Coverage   89.75%   90.74%   +0.99%     
==========================================
  Files          23       23              
  Lines        2274     2475     +201     
==========================================
+ Hits         2041     2246     +205     
+ Misses        233      229       -4

Files	Coverage Δ
skgstat/DirectionalVariogram.py	`90.32% <100.00%> (+0.52%)`	⬆️
skgstat/Variogram.py	`96.36% <98.30%> (+0.73%)`	⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mmaelicke · 2023-08-11T08:48:04Z

Hi @rhugonnet,

Nice, thanks for all the effort! I am currently on vacation, but will be back next Tuesday. Is that early enough for you, if I only add a detailed response then?

A few quick thoughts: passing p0 and bounds might be a viable extension for the user in any case. As you can also manually fit a variogram model using fit_range on init, or range on fit, I think we can add these parameters like that. I would use the kwargs on init for that.

Maybe an option to define a sum-combination of pre-defined functions would be to add something like: model='spherical+gaussian'. That would be convenient and does not break the current semantics, as the parameter accepts strings for pre-defined models anyway. It would need to internal refactoring, as the Variogram.model property returns a function wrapped with models.variogram, ie. to set the right __name__ attribute on the function object (to be used in describe() and plotting). We would need some wrapper here, that can construct the actual model, and fill the __name__ property with something meaningful.

I would like to think about the parameters for a moment longer, and give you some thoughts on how to handle the parameters then. I think it would be problematic to change the structure of Variogram.parameters as a number of interfaces ie. to gstools.CovModel or gstatsim.Interpolate rely on that list. But I am sure, we can come up with something useful here. Have to think about that after my vacation.... :)

Thanks again and you will hear from me soon,
Mirko

rhugonnet · 2023-08-12T00:22:16Z

Sounds good, I like all ideas!

No problem, priority to holiday and I'm busy on other things next week anyway, will continue the week after next 🙂
Enjoy your vacation @mmaelicke! 😉

mmaelicke

Nice! Thanks for the contribution. I have only one comment regarding the docstring, but will approve anyway.

skgstat/Variogram.py

…be(), parameters() and get_fit_bounds()

rhugonnet · 2023-08-21T06:22:05Z

Sorry, was still working on your comments!
Just finalized the logic to support model names such as "spherical+gaussian" across the Variogram methods that used _model.__name__. Now I need to add more tests + doc, for now all existing tests pass. I'll do that tomorrow!

rhugonnet · 2023-08-22T05:55:28Z

Alright, I think this PR is now at a good stage! 😄

In short: Variogram now fully supports model sums such as "spherical+gaussian", as well as custom models from callable!

To make this work for custom models: You've already seen 80% of the changes in the first draft, I just had to add a couple other exceptions in describe() and parameters, now it all works!

To make this work for sums: I introduced a _model_name class attribute, and added two non-public functions: _get_argpos_sum_models to get argument positions and _build_sum_models to create the summed model. Those are then used by set_model(), describe(), fit() and parameters to properly set/get the arguments of each model in the sum.
By default, describe() returns a dict with effective_range_1, sill_1, nugget_1, effective_range_2, and so on... to make it "human-readable". And parameters returns a list of those in order given by the model name ("spherical+gaussian" = spherical first, gaussian second).

Wrote a lot of tests, and added descriptions in the function docstrings!

Points left to finalize:

How to interpret use_nugget for a sum of models? The user might only want to use nuggets for a subset of them. Should we then allow a boolean iterable to be passed for use_nugget? (right now untouched; all nugget or none)
Some tests are failing for only specific versions of Python (3.6 has a 4th decimal different in an array comparison for the mean residual test: https://github.com/mmaelicke/scikit-gstat/actions/runs/5933246824/job/16088390540, Python 3.9 has slightly different outputs of scipy.curve_fit: https://github.com/mmaelicke/scikit-gstat/actions/runs/5933108779/job/16088046636). Any idea how to fix those?
There's a couple new warnings in test_variogram when calling the stable model (divide by zero). Will investigate this...
Should I write a little example in the documentation? Maybe something here on sum of models + custom model: https://scikit-gstat.readthedocs.io/en/latest/userguide/variogram.html#variogram-models, and something here on passing custom bounds/p0 for the fitting https://scikit-gstat.readthedocs.io/en/latest/technical/fitting.html?

mmaelicke · 2023-08-23T09:22:04Z

Hi,

I really like your changes! I think it would be helpful if you could add an example, or maybe even a little tutorial notebook to the docs for others.

Concerning all the other points: I will think them through and give you my opinion on them.
Py 3.9 and 3.10 failing is happening for all branches right now, I will investigate what is going on there. Maybe that is related to bugfixes in sicpy, which might change the parameters estimated by curve_fit. Maybe we should run unittests way more fuzzy, as curve_fit is not really SciKit-GStat's business. Not sure yet.

I'll come back to this soon

rhugonnet · 2023-08-23T19:35:59Z

Thanks for the feedback, will start on the tutorial and wait for the other points!

For the curve_fit error on Python>3.9, they're fixing it in SciPy: Bug: skgstat.Variogram: After scipy update from 1.11.1 to 1.11.2: Optimal parameters not found #161 (comment). I guess it's good if dependent packages like SciKit-GStat have tests relying on it, so we can report to them when this happens (which hopefully should be quite rare... they really need to improve their test suite, that was quite a big error!).
For the error on mean_residual in Python 3.6: it looks like a floating precision error due to an old NumPy adjustment that happened between 1.19 (Python 3.6) and 1.21 (Python 3.7). I couldn't track what change exactly. We could simply change the assertAlmostEqual precision to 3 decimals here and for the 2 other statistical tests: https://github.com/mmaelicke/scikit-gstat/blob/main/skgstat/tests/test_variogram.py#L989?

mmaelicke · 2023-09-22T07:32:57Z

I will add my review as soon as possible.

I'm thinking I could also add a tutorial/technical note on fitting multiple range using both RasterMetricSpace + sum of models. But maybe in a different PR where we also add RasterMetricSpace to the API?

That sounds great! And yes, I would also do that in a separate PR.

mmaelicke

I really like the changes, fixed a lot of stuff that was on the agenda for ages, along with the sum of models.
Do you have any idea, why the python 3.9 unittests are failing? is this still connected to the bug in curve_fit?

rhugonnet · 2023-10-05T18:36:12Z

Just did some tests in this PR changing the requirements.txt, and it looks like the tests that fail for the stable_entropy binning come from updates in NumPy 1.25.
Additionally, it looks like the fix in SciPy 1.11.3 did not fix the issue for some of our cases... We still need 1.11.1 to pass. They're discussing some additional issues introduced in 1.11.2 here: scipy/scipy#19309.
I'll investigate and report a reproducible example of our issue in SciPy if I can get one!

rhugonnet · 2023-10-05T18:36:54Z

Otherwise all passing with NumPy 1.24 and SciPy 1.11.1! 😅

rhugonnet · 2023-10-05T19:08:50Z

For the SciPy issue persisting with 1.11.3, I maanged to make a reproducible example and opened an issue on SciPy directly, see: #161 (comment)

Investigating the NumPy one...

rhugonnet · 2023-10-05T23:52:07Z

It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly...
Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off

mmaelicke · 2023-10-09T05:48:10Z

It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly... Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off

I was also not successful in finding the exact problem. I guess we go for less precision, but open an issue to remind ourselves to test again with newer numpy versions?

rhugonnet · 2023-10-09T18:45:16Z

Yes, I agree!
I removed the NumPy version pinning, opened #166 and changed the test precision to 0 decimals.

For SciPy: they are going to revert the changes to how it was in 1.11.1 and before, and release a 1.11.4, but it might take some time. In the meantime, should we merge this PR with the version pinning in requirements.txt to <=1.11.1 and open an issue to unpin once they make the release? It would allow to test the other opened PRs without having this issue in the CI!

mmaelicke · 2023-10-10T06:00:50Z

I agree. Thanks for all the input, I am really glad that you are aware of all the changes happening in scipy.

mmaelicke · 2023-10-13T05:30:39Z

Hey @rhugonnet,

The pre-commit is now merged into main with #167. I just resolved the merge conflicts. There are some minor fixes, pre-commit complains about. I can't push on your fork, so could you fix the pre-commit stuff? Then we can finally merge and have a working repo again.

Thanks for all your effort

rhugonnet · 2023-10-13T18:17:53Z

@mmaelicke Perfect, thanks! All done, we can merge 🙂
I'll open an issue to remind ourselves to unpin SciPy when their bug is fixed.

If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):

git remote add rhugonnet https://github.com/rhugonnet/scikit-gstat.git
git fetch rhugonnet --prune
git checkout rhugonnet/combine_models
# Will go in "detached HEAD" mode
# Make the changes and commit them
git push rhugonnet HEAD:combine_models

mmaelicke · 2023-10-13T18:35:02Z

If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):

Ah cool, I was unaware of that. Nice to see all the checks green again. I'll merge and go home :)

rhugonnet added 3 commits August 10, 2023 16:17

Add _custom_model class attribute, exception in fit and p0 and bounds…

7232c43

… as parameters

Add tests on custom model definition and fitting

5247cd4

Add to DirectionalVariogram and rename _custom_model to _is_model_cus…

7b3cd22

…tom for clarity

mmaelicke approved these changes Aug 19, 2023

View reviewed changes

skgstat/Variogram.py Show resolved Hide resolved

Build sum of models from string name

20749fc

rhugonnet marked this pull request as draft August 21, 2023 04:47

Add self._model_name attribute, and logic for sum of models in descri…

c93ed9a

…be(), parameters() and get_fit_bounds()

rhugonnet added 6 commits August 21, 2023 14:15

Add more tests and correct bugs

daf6eda

Remove static typing, move to description

2df8592

Fix instantiation and fix tests

5d37015

Add tests for plotting

66aff87

Document fit_bounds and fit_p0, and add tests

82fa559

Update model argument description

4185c75

rhugonnet changed the title ~~Add support for custom model fitting~~ Add support for sum of models and custom model Aug 22, 2023

rhugonnet added 2 commits August 21, 2023 16:56

Fix random test and inf bound check

bc4e170

Fix warnings in test_variogram

aa2d7ed

rhugonnet marked this pull request as ready for review August 22, 2023 05:55

Fix typo in description

ce9aa85

rhugonnet mentioned this pull request Aug 22, 2023

Skip GSTools test in test_variogram if not installed #162

Merged

rhugonnet changed the title ~~Add support for sum of models and custom model~~ Add support for sum of models and custom models Aug 24, 2023

rhugonnet added 3 commits September 21, 2023 11:24

Fix floating precision error

de3a948

Add example in user guide

2e31741

Make nugget consistent for a sum of models

6985339

rhugonnet requested a review from mmaelicke September 21, 2023 21:59

Rerun tests with scipy 1.11.3

f4d33f1

mmaelicke approved these changes Oct 5, 2023

View reviewed changes

rhugonnet added 3 commits October 5, 2023 11:10

Force SciPy versions to before 1.11.1

34388c6

Try with NumPy version before 1.25

d357c1a

Try with 1.24

56127e2

rhugonnet mentioned this pull request Oct 5, 2023

Bug: skgstat.Variogram: After scipy update from 1.11.1 to 1.11.2: Optimal parameters not found #161

Closed

rhugonnet added 5 commits October 5, 2023 12:20

Try 1.24.1

7d5d287

Try 1.24.1

591ae37

Try 1.24.2

d0bb82d

Try 1.24.3

9a76ae7

Try 1.25.0

35903f8

Remove NumPy version fixing

9ada1f4

rhugonnet mentioned this pull request Oct 9, 2023

Change in precision of stable entropy bins with NumPy>=1.25.0 #166

Open

Change stable entropy bin test precision to 0 decimals

59781f1

Merge branch 'main' into combine_models

5f67e4e

Linting

9690d39

rhugonnet mentioned this pull request Oct 13, 2023

Reminder: Unpin scipy<=1.11.1 in requirements.txt once 1.11.4 released #168

Closed

mmaelicke merged commit 1c83f5a into mmaelicke:main Oct 13, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for sum of models and custom models #160

Add support for sum of models and custom models #160

rhugonnet commented Aug 11, 2023 •

edited

Loading

codecov bot commented Aug 11, 2023 •

edited

Loading

mmaelicke commented Aug 11, 2023

rhugonnet commented Aug 12, 2023

mmaelicke left a comment

rhugonnet commented Aug 21, 2023

rhugonnet commented Aug 22, 2023

mmaelicke commented Aug 23, 2023

rhugonnet commented Aug 23, 2023

mmaelicke commented Sep 22, 2023 •

edited

Loading

mmaelicke left a comment

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

mmaelicke commented Oct 9, 2023

rhugonnet commented Oct 9, 2023

mmaelicke commented Oct 10, 2023

mmaelicke commented Oct 13, 2023

rhugonnet commented Oct 13, 2023

mmaelicke commented Oct 13, 2023

Add support for sum of models and custom models #160

Add support for sum of models and custom models #160

Conversation

rhugonnet commented Aug 11, 2023 • edited Loading

codecov bot commented Aug 11, 2023 • edited Loading

Codecov Report

mmaelicke commented Aug 11, 2023

rhugonnet commented Aug 12, 2023

mmaelicke left a comment

Choose a reason for hiding this comment

rhugonnet commented Aug 21, 2023

rhugonnet commented Aug 22, 2023

mmaelicke commented Aug 23, 2023

rhugonnet commented Aug 23, 2023

mmaelicke commented Sep 22, 2023 • edited Loading

mmaelicke left a comment

Choose a reason for hiding this comment

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

rhugonnet commented Oct 5, 2023

mmaelicke commented Oct 9, 2023

rhugonnet commented Oct 9, 2023

mmaelicke commented Oct 10, 2023

mmaelicke commented Oct 13, 2023

rhugonnet commented Oct 13, 2023

mmaelicke commented Oct 13, 2023

rhugonnet commented Aug 11, 2023 •

edited

Loading

codecov bot commented Aug 11, 2023 •

edited

Loading

mmaelicke commented Sep 22, 2023 •

edited

Loading