Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX correct node splitting order & remove class weight #56

Merged
merged 7 commits into from
Aug 24, 2023

Conversation

PSSF23
Copy link
Member

@PSSF23 PSSF23 commented Aug 23, 2023

Reference Issues/PRs

neurodata/treeple#107

What does this implement/fix? Explain your changes.

Any other comments?

@PSSF23 PSSF23 requested a review from adam2392 August 23, 2023 19:51
@github-actions
Copy link

github-actions bot commented Aug 23, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 3eb84ae. Link to the linter CI: here

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error still persists at tree level. I'll test more about it. Not ready to merge yet.

Edit: tested in previous build (right after merging tree level paritial_fit) and the problem already existed back then.

@PSSF23
Copy link
Member Author

PSSF23 commented Aug 24, 2023

@adam2392 When you have time can you try running the code several times? The error is randomly appearing on my machine, and I couldn't find any clue why yet. Thanks.

from sklearn.ensemble import RandomForestClassifier

from sklearn import datasets
from numpy.random import permutation

iris = datasets.load_iris()

iris_X = iris.data
iris_y = iris.target
p = permutation(iris_X.shape[0])
iris_X = iris_X[p]
iris_y = iris_y[p]

clf_1 = RandomForestClassifier()
clf_1.partial_fit(iris_X, iris_y, classes=[0,1,2])
clf_1.partial_fit(iris_X, iris_y)

@PSSF23
Copy link
Member Author

PSSF23 commented Aug 24, 2023

Didn't find anything suspicious about tree.pyx that could cause the memory crash. Maybe the problem lies elsewhere? Like Splitter?

@adam2392
Copy link
Collaborator

adam2392 commented Aug 24, 2023

Didn't find anything suspicious about tree.pyx that could cause the memory crash. Maybe the problem lies elsewhere? Like Splitter?

Hmmm... I'm not sure, but this error now looks like some of the errors I was getting very randomly in scikit-tree, which I just skipped this specific test. It started coming up in scikit-tree after the partial_fit PR. I couldn't spot any errors at the time, but if it's coming up like this, then it means it could be an error I presume.

Some thoughts:

  • versions of joblib? I presume because it's related to some parallelization over trees
  • some edge-case of the partial fit, you are accessing memory that is not allocated, or deallocated. This could be happening in the update_node, or update_tree part in Cython... If so, this is worrying.
  • Some version of the compiler? I don't see why though.

I can't reproduce the error on my machine tho forsure. However, it seems to coming up in scikit-learn fork only after this PR introduced certain changes. So are these changes somehow forcing the tree into some edge case?

If you're able to reproduce the error, perhaps you can try using a C++ debugger, or valgrind?

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 I think I found the problem. build was resizing tree each time it runs, but that step seems to disrupt update. After correcting it, I don't see the error anymore.

See b92c9ee

@adam2392
Copy link
Collaborator

adam2392 commented Aug 24, 2023

Hmmm dam that would be a finicky bug. Is there any unit test we can add that checks this at the tree level that fails on main but succeeds in this PR branch?

Worried we'll miss this in the future and a regression occurs again.

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be other CI errors that I didn't resolve. Working on them now.

For unit tests, the problem is no test can handle segmentation fault? Unless there is one I don't know.

@adam2392
Copy link
Collaborator

There appears to be other CI errors that I didn't resolve. Working on them now.

For unit tests, the problem is no test can handle segmentation fault? Unless there is one I don't know.

As in the unit-test results in segmentation fault reliably in main, but does not in this branch.

Is it true that the root cause is that the capacity changes upon partial_fit twice? If that is the case, would a valid test be assert the capacity does not change (assuming the new data you feed in does not increase the depth of the tree significantly).

@PSSF23
Copy link
Member Author

PSSF23 commented Aug 24, 2023

Oh so this branch is fine now? That's good to hear after struggling since yesterday.

I fear that the capacity change itself (BaseTree._resize_c in tree.pyx) raises the memory error, so there's no guaranteed way to test after the resizing happens. The code would already crush at that point, and that's why I asked if there is a way to catch a seg fault.

@adam2392
Copy link
Collaborator

Oh so this branch is fine now? That's good to hear after struggling since yesterday.

I fear that the capacity change itself (BaseTree._resize_c in tree.pyx) raises the memory error, so there's no guaranteed way to test after the resizing happens. The code would already crush at that point, and that's why I asked if there is a way to catch a seg fault.

Nvm I was on my phone. I see the same CI errors still. Are these the ones you were referring to that was fixed by the resizing of the tree?

https://dev.azure.com/neurodata/neurodata/_build/results?buildId=678&view=logs&j=dde5042c-7464-5d47-9507-31bdd2ee0a3a&t=4bd2dad8-62b3-5bf9-08a5-a9880c530c94

@PSSF23
Copy link
Member Author

PSSF23 commented Aug 24, 2023

I'll keeping looking into the errors then. The resizing fix definitely resolved my local errors (code above).

@adam2392
Copy link
Collaborator

I'll keeping looking into the errors then. The resizing fix definitely resolved my local errors (code above).

I cannot replicate this error locally, but I'm inclined to merge this for now. If you're able to investigate on your end, then that would be great. I believe the error points to some type of fault in the partial_fit implementation since the traceback consistently shows it errors out in the self.builder_.build(...) line, which means there is some error in the Cython code when building the tree via partial fit for this test.

@adam2392 adam2392 merged commit 6801508 into submodulev3 Aug 24, 2023
7 of 16 checks passed
@PSSF23 PSSF23 deleted the class_fix branch August 24, 2023 21:33
adam2392 added a commit that referenced this pull request Sep 8, 2023
<!--
Thanks for contributing a pull request! Please ensure you have taken a
look at
the contribution guidelines:
https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md
-->

#### Reference Issues/PRs
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull
requests
you resolved, so that they will automatically be closed when your pull
request
is merged. See
https://github.com/blog/1506-closing-issues-via-pull-requests
-->

neurodata/treeple#107
#### What does this implement/fix? Explain your changes.


#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:

http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->

---------

Signed-off-by: Adam Li <[email protected]>
Co-authored-by: Adam Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants