Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH v2] Add partial fit to the correct branch for decisiontreeclassifier #54

Merged
merged 7 commits into from
Aug 14, 2023

Conversation

adam2392
Copy link
Collaborator

Supersedes: #50

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
@github-actions
Copy link

github-actions bot commented Aug 11, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 368df7a. Link to the linter CI: here

Comment on lines +354 to +355
with nogil:
while not update_stack.empty():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PSSF23 does this code affect the normal building process not via partial_fit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't. update_stack only exists when the initial_roots has content.

@adam2392
Copy link
Collaborator Author

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 Yes, the classes param was not properly passed in by the CI, so it was expecting a different kind of error. We can remove the _check_partial_fit_first_call check, but then there might be other errors associated with partial_fit api.

By the way, the performance is improved with the new branch, about the same as streaming tree on its own now. So previously the inferior results might be due to some bug in submodulev2.

@adam2392
Copy link
Collaborator Author

@adam2392 Yes, the classes param was not properly passed in by the CI, so it was expecting a different kind of error. We can remove the _check_partial_fit_first_call check, but then there might be other errors associated with partial_fit api.

Can you elaborate? Perhaps we can move the check to a better place and then have everything pass?

@PSSF23
Copy link
Member

PSSF23 commented Aug 11, 2023

It appears that they are testing all methods with fitting functions (fit, partial_fit, ...) about ccp_alpha, but did not give partial_fit the classes variable it requires. As long as _check_partial_fit_first_call exists there will be errors, because partial_fit could not run properly without classes. It was added because it was a requirement for the partial_fit api.

The checks for ccp_alpha, which they are testing, exist in _build_pruned_tree_ccp. So it can only be tested correctly after there's a tree.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unit tests, sklearn had built-in checks for partial_fit from other streaming algorithms, so I didn't prepare any extra ones. We can add new ones as we see fit, but the existing ones should cover most situations.

Signed-off-by: Adam Li <[email protected]>
@adam2392
Copy link
Collaborator Author

For unit tests, sklearn had built-in checks for partial_fit from other streaming algorithms, so I didn't prepare any extra ones. We can add new ones as we see fit, but the existing ones should cover most situations.

I see. I took a look and all you have to do is validate the input parameters before running the partial fit check: 63637f7

Signed-off-by: Adam Li <[email protected]>
@adam2392
Copy link
Collaborator Author

Do you want to add the support for partial_fit into ForestClassifier?

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 Thanks for all the improvements. I need to work on simulations for the cancer project this week so feel free to go ahead. We also found some potential bugs on honest code so I might need to revisit it.

@adam2392 adam2392 merged commit 7b8305c into submodulev3 Aug 14, 2023
8 of 16 checks passed
@adam2392 adam2392 deleted the partial_fit branch August 14, 2023 17:35
@adam2392
Copy link
Collaborator Author

Kay merged! @PSSF23 feel free to open up a PR to enable this for the ForestClassifier class.

@PSSF23 PSSF23 restored the partial_fit branch August 24, 2023 03:25
@PSSF23 PSSF23 deleted the partial_fit branch August 24, 2023 03:32
adam2392 added a commit that referenced this pull request Sep 8, 2023
…fier (#54)

Supersedes: #50 

Implements partial_fit API for all classification decision trees.

---------

Signed-off-by: Adam Li <[email protected]>
Co-authored-by: Haoyin Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants