Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scarliles/honesty #69

Draft
wants to merge 72 commits into
base: submodulev3
Choose a base branch
from

Conversation

SamuelCarliles3
Copy link

Reference Issues/PRs

What does this implement/fix? Explain your changes.

First draft honesty module

Any other comments?

SamuelCarliles3 and others added 30 commits February 16, 2024 13:36
Copy link

github-actions bot commented Jul 2, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


black

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/_forest.py	2024-09-18 16:56:23.193792+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/_forest.py	2024-09-18 16:56:37.324989+00:00
@@ -2078,11 +2078,11 @@
         "class_weight": [
             StrOptions({"balanced_subsample", "balanced"}),
             dict,
             list,
             None,
-        ]
+        ],
     }
     _parameter_constraints.pop("splitter")
 
     def __init__(
         self,
@@ -2105,11 +2105,11 @@
         class_weight=None,
         ccp_alpha=0.0,
         max_samples=None,
         max_bins=None,
         store_leaf_values=False,
-        monotonic_cst=None
+        monotonic_cst=None,
     ):
         super().__init__(
             estimator=DecisionTreeClassifier(),
             n_estimators=n_estimators,
             estimator_params=(
@@ -2479,11 +2479,11 @@
         "class_weight": [
             StrOptions({"balanced_subsample", "balanced"}),
             dict,
             list,
             None,
-        ]
+        ],
     }
     _parameter_constraints.pop("splitter")
     _parameter_constraints.pop("max_samples")
     _parameter_constraints["max_samples"] = [
         None,
@@ -2516,11 +2516,11 @@
         max_bins=None,
         store_leaf_values=False,
         monotonic_cst=None,
         stratify=False,
         honest_prior="ignore",
-        honest_fraction=0.5
+        honest_fraction=0.5,
     ):
         self.target_tree_kwargs = {
             "criterion": criterion,
             "max_depth": max_depth,
             "min_samples_split": min_samples_split,
@@ -2530,27 +2530,27 @@
             "max_leaf_nodes": max_leaf_nodes,
             "min_impurity_decrease": min_impurity_decrease,
             "random_state": random_state,
             "ccp_alpha": ccp_alpha,
             "store_leaf_values": store_leaf_values,
-            "monotonic_cst": monotonic_cst
+            "monotonic_cst": monotonic_cst,
         }
         super().__init__(
             estimator=HonestDecisionTree(
                 target_tree_class=target_tree_class,
                 target_tree_kwargs=self.target_tree_kwargs,
                 stratify=stratify,
                 honest_prior=honest_prior,
-                honest_fraction=honest_fraction
+                honest_fraction=honest_fraction,
             ),
             n_estimators=n_estimators,
             estimator_params=(
                 "target_tree_class",
                 "target_tree_kwargs",
                 "stratify",
                 "honest_prior",
-                "honest_fraction"
+                "honest_fraction",
             ),
             # estimator_params=(
             #     "criterion",
             #     "max_depth",
             #     "min_samples_split",
@@ -2589,11 +2589,10 @@
         self.target_tree_class = target_tree_class
         self.stratify = stratify
         self.honest_prior = honest_prior
         self.honest_fraction = honest_fraction
 
-
     @property
     def structure_indices_(self):
         """The indices used to learn the structure of the trees."""
         check_is_fitted(self)
         return [tree.structure_indices_ for tree in self.estimators_]
@@ -2609,22 +2608,25 @@
         """The sample indices that are out-of-bag.
 
         Only utilized if ``bootstrap=True``, otherwise, all samples are "in-bag".
         """
         if self.bootstrap is False and (
-            self._n_samples_bootstrap is None or self._n_samples_bootstrap == self._n_samples
+            self._n_samples_bootstrap is None
+            or self._n_samples_bootstrap == self._n_samples
         ):
             raise RuntimeError(
                 "Cannot extract out-of-bag samples when bootstrap is False and "
                 "n_samples == n_samples_bootstrap"
             )
         check_is_fitted(self)
 
         oob_samples = []
 
         possible_indices = np.arange(self._n_samples)
-        for structure_idx, honest_idx in zip(self.structure_indices_, self.honest_indices_):
+        for structure_idx, honest_idx in zip(
+            self.structure_indices_, self.honest_indices_
+        ):
             _oob_samples = np.setdiff1d(
                 possible_indices, np.concatenate((structure_idx, honest_idx))
             )
             oob_samples.append(_oob_samples)
         return oob_samples
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/_forest.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/tests/test_forest.py	2024-09-18 16:56:23.197793+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/tests/test_forest.py	2024-09-18 16:56:38.771430+00:00
@@ -269,10 +269,11 @@
     )
     clf.fit(iris.data, iris.target)
     score = clf.score(iris.data, iris.target)
     assert score > 0.5, "Failed with criterion %s and score = %f" % (criterion, score)
 
+
 @pytest.mark.parametrize("criterion", ("gini", "log_loss"))
 def test_honest_forest_iris_criterion(criterion):
     # Check consistency on dataset iris.
     print("yo")
     clf = HonestRandomForestClassifier(
@@ -287,10 +288,11 @@
     )
     clf.fit(iris.data, iris.target)
     score = clf.score(iris.data, iris.target)
     assert score > 0.5, "Failed with criterion %s and score = %f" % (criterion, score)
     print("sup")
+
 
 @pytest.mark.parametrize("name", FOREST_REGRESSORS)
 @pytest.mark.parametrize(
     "criterion", ("squared_error", "absolute_error", "friedman_mse")
 )
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/ensemble/tests/test_forest.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honest_tree.py	2024-09-18 16:56:23.241792+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honest_tree.py	2024-09-18 16:56:47.753026+00:00
@@ -9,11 +9,14 @@
 from ..utils._param_validation import Interval, RealNotInt, StrOptions
 from ..utils.multiclass import check_classification_targets
 
 from ._classes import (
     BaseDecisionTree,
-    CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
+    CRITERIA_CLF,
+    CRITERIA_REG,
+    DENSE_SPLITTERS,
+    SPARSE_SPLITTERS,
 )
 from ._honesty import HonestTree, Honesty
 from ._tree import DOUBLE, Tree
 
 import inspect
@@ -38,36 +41,40 @@
         target_tree_class=None,
         target_tree_kwargs=None,
         random_state=None,
         honest_fraction=0.5,
         honest_prior="empirical",
-        stratify=False
+        stratify=False,
     ):
         self.criterion = criterion
         self.target_tree_class = target_tree_class
-        self.target_tree_kwargs = target_tree_kwargs if target_tree_kwargs is not None else {}
+        self.target_tree_kwargs = (
+            target_tree_kwargs if target_tree_kwargs is not None else {}
+        )
 
         self.random_state = random_state
         self.honest_fraction = honest_fraction
         self.honest_prior = honest_prior
         self.stratify = stratify
 
         # TODO: unwind this whole gross antipattern
         if target_tree_class is not None:
-            HonestDecisionTree._target_tree_hack(self, target_tree_class, **target_tree_kwargs)
-    
+            HonestDecisionTree._target_tree_hack(
+                self, target_tree_class, **target_tree_kwargs
+            )
+
     @staticmethod
     def _target_tree_hack(honest_tree, target_tree_class, **kwargs):
         honest_tree.target_tree_class = target_tree_class
         honest_tree.target_tree = target_tree_class(**kwargs)
 
         # copy over the attributes of the target tree
         for attr_name in vars(honest_tree.target_tree):
             setattr(
                 honest_tree,
                 attr_name,
-                getattr(honest_tree.target_tree, attr_name, None)
+                getattr(honest_tree.target_tree, attr_name, None),
             )
 
         if is_classifier(honest_tree.target_tree):
             honest_tree._estimator_type = honest_tree.target_tree._estimator_type
             honest_tree.predict_proba = honest_tree.target_tree.predict_proba
@@ -78,11 +85,11 @@
         X,
         y,
         sample_weight=None,
         check_input=True,
         missing_values_in_feature_mask=None,
-        classes=None
+        classes=None,
     ):
         return self.fit(
             X, y, sample_weight, check_input, missing_values_in_feature_mask, classes
         )
 
@@ -126,34 +133,34 @@
         self : HonestTree
             Fitted tree estimator.
         """
 
         # run this again because of the way ensemble creates estimators
-        HonestDecisionTree._target_tree_hack(self, self.target_tree_class, **self.target_tree_kwargs)
+        HonestDecisionTree._target_tree_hack(
+            self, self.target_tree_class, **self.target_tree_kwargs
+        )
         target_bta = self.target_tree._prep_data(
             X=X,
             y=y,
             sample_weight=sample_weight,
             check_input=check_input,
             missing_values_in_feature_mask=missing_values_in_feature_mask,
-            classes=classes
+            classes=classes,
         )
 
         # TODO: go fix TODO in classes.py line 636
         if target_bta.n_classes is None:
             target_bta.n_classes = np.array(
-                [1] * self.target_tree.n_outputs_,
-                dtype=np.intp
+                [1] * self.target_tree.n_outputs_, dtype=np.intp
             )
 
         # Determine output settings
         self._init_output_shape(target_bta.X, target_bta.y, target_bta.classes)
 
         # obtain the structure sample weights
-        sample_weights_structure, sample_weights_honest = self._partition_honest_indices(
-            target_bta.y,
-            target_bta.sample_weight
+        sample_weights_structure, sample_weights_honest = (
+            self._partition_honest_indices(target_bta.y, target_bta.sample_weight)
         )
 
         # # compute the honest sample indices
         # structure_mask = np.ones(len(target_bta.y), dtype=bool)
         # structure_mask[self.honest_indices_] = False
@@ -172,11 +179,11 @@
         # create honesty, set up listeners in target tree
         self.honesty = Honesty(
             target_bta.X,
             self.honest_indices_,
             target_bta.min_samples_leaf,
-            missing_values_in_feature_mask = target_bta.missing_values_in_feature_mask
+            missing_values_in_feature_mask=target_bta.missing_values_in_feature_mask,
         )
 
         self.target_tree.presplit_conditions = self.honesty.presplit_conditions
         self.target_tree.postsplit_conditions = self.honesty.postsplit_conditions
         self.target_tree.splitter_listeners = self.honesty.splitter_event_handlers
@@ -188,25 +195,21 @@
             self.target_tree.fit(
                 target_bta.X,
                 target_bta.y,
                 sample_weight=sample_weights_structure,
                 check_input=check_input,
-                classes=target_bta.classes
+                classes=target_bta.classes,
             )
         except Exception:
             self.target_tree.fit(
                 target_bta.X,
                 target_bta.y,
                 sample_weight=sample_weights_structure,
-                check_input=check_input
-            )
-
-        setattr(
-            self,
-            "classes_",
-            getattr(self.target_tree, "classes_", None)
-        )
+                check_input=check_input,
+            )
+
+        setattr(self, "classes_", getattr(self.target_tree, "classes_", None))
 
         n_samples = target_bta.X.shape[0]
         samples = np.empty(n_samples, dtype=np.intp)
         weighted_n_samples = 0.0
         j = 0
@@ -219,60 +222,55 @@
 
             weighted_n_samples += sample_weights_honest[i]
 
         # fingers crossed sklearn.utils.validation.check_is_fitted doesn't
         # change its behavior
-        #print(f"n_classes = {target_bta.n_classes}")
+        # print(f"n_classes = {target_bta.n_classes}")
         self.tree_ = HonestTree(
             self.target_tree.n_features_in_,
             target_bta.n_classes,
             self.target_tree.n_outputs_,
-            self.target_tree.tree_
+            self.target_tree.tree_,
         )
         self.honesty.resize_tree(self.tree_, self.honesty.get_node_count())
         self.tree_.node_count = self.honesty.get_node_count()
 
-        #print(f"dishonest node count = {self.target_tree.tree_.node_count}")
-        #print(f"honest node count = {self.tree_.node_count}")
+        # print(f"dishonest node count = {self.target_tree.tree_.node_count}")
+        # print(f"honest node count = {self.tree_.node_count}")
 
         criterion = BaseDecisionTree._create_criterion(
             self.target_tree,
             n_outputs=target_bta.y.shape[1],
             n_samples=target_bta.X.shape[0],
-            n_classes=target_bta.n_classes
+            n_classes=target_bta.n_classes,
         )
         self.honesty.init_criterion(
             criterion,
             target_bta.y,
             sample_weights_honest,
             weighted_n_samples,
-            self.honest_indices_
+            self.honest_indices_,
         )
 
         for i in range(self.honesty.get_node_count()):
             start, end = self.honesty.get_node_range(i)
-            #print(f"setting sample range for node {i}: ({start}, {end})")
-            #print(f"node {i} is leaf: {self.honesty.is_leaf(i)}")
+            # print(f"setting sample range for node {i}: ({start}, {end})")
+            # print(f"node {i} is leaf: {self.honesty.is_leaf(i)}")
             self.honesty.set_sample_pointers(criterion, start, end)
 
             if missing_values_in_feature_mask is not None:
                 self.honesty.init_sum_missing(criterion)
-            
+
             self.honesty.node_value(self.tree_, criterion, i)
 
             if self.honesty.is_leaf(i):
                 self.honesty.node_samples(self.tree_, criterion, i)
 
-        setattr(
-            self,
-            "__sklearn_is_fitted__",
-            lambda: True
-        )
- 
+        setattr(self, "__sklearn_is_fitted__", lambda: True)
+
         return self
 
-    
     def _init_output_shape(self, X, y, classes=None):
         # Determine output settings
         self.n_samples_, self.n_features_in_ = X.shape
 
         # Do preprocessing if 'y' is passed
@@ -338,11 +336,10 @@
                 raise ValueError(
                     "Number of labels=%d does not match number of samples=%d"
                     % (len(y), self.n_samples_)
                 )
 
-
     def _partition_honest_indices(self, y, sample_weight):
         rng = np.random.default_rng(self.target_tree.random_state)
 
         # Account for bootstrapping too
         if sample_weight is None:
@@ -354,11 +351,13 @@
 
         nonzero_indices = np.where(structure_weight > 0)[0]
         # sample the structure indices
         if self.stratify:
             ss = StratifiedShuffleSplit(
-                n_splits=1, test_size=self.honest_fraction, random_state=self.random_state
+                n_splits=1,
+                test_size=self.honest_fraction,
+                random_state=self.random_state,
             )
             for structure_idx, _ in ss.split(
                 np.zeros((len(nonzero_indices), 1)), y[nonzero_indices]
             ):
                 self.structure_indices_ = nonzero_indices[structure_idx]
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honest_tree.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_classes.py	2024-09-18 16:56:23.241792+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_classes.py	2024-09-18 16:56:48.270322+00:00
@@ -88,23 +88,23 @@
 # =============================================================================
 
 
 class BuildTreeArgs:
     def __init__(
-            self,
-            X,
-            y,
-            sample_weight,
-            missing_values_in_feature_mask,
-            min_samples_leaf,
-            min_weight_leaf,
-            max_leaf_nodes,
-            min_samples_split,
-            max_depth,
-            random_state,
-            classes,
-            n_classes
+        self,
+        X,
+        y,
+        sample_weight,
+        missing_values_in_feature_mask,
+        min_samples_leaf,
+        min_weight_leaf,
+        max_leaf_nodes,
+        min_samples_split,
+        max_depth,
+        random_state,
+        classes,
+        n_classes,
     ):
         self.X = X
         self.y = y
         self.sample_weight = sample_weight
         self.missing_values_in_feature_mask = missing_values_in_feature_mask
@@ -449,13 +449,12 @@
             max_leaf_nodes=max_leaf_nodes,
             min_samples_split=min_samples_split,
             max_depth=max_depth,
             random_state=random_state,
             classes=classes,
-            n_classes=getattr(self, 'n_classes_', None)
+            n_classes=getattr(self, "n_classes_", None),
         )
-
 
     def _fit(
         self,
         X,
         y,
@@ -468,18 +467,18 @@
             X=X,
             y=y,
             sample_weight=sample_weight,
             check_input=check_input,
             missing_values_in_feature_mask=missing_values_in_feature_mask,
-            classes=classes
+            classes=classes,
         )
 
         criterion = BaseDecisionTree._create_criterion(
             self,
             n_outputs=bta.y.shape[1],
             n_samples=bta.X.shape[0],
-            n_classes=bta.n_classes
+            n_classes=bta.n_classes,
         )
 
         # build the actual tree now with the parameters
         return self._build_tree(
             criterion=criterion,
@@ -497,30 +496,24 @@
 
     @staticmethod
     # n_classes is an array of length n_outputs
     # containing the number of classes in each output dimension
     def _create_criterion(
-        tree: "BaseDecisionTree",
-        n_outputs,
-        n_samples,
-        n_classes=None
+        tree: "BaseDecisionTree", n_outputs, n_samples, n_classes=None
     ) -> BaseCriterion:
         criterion = tree.criterion
         if not isinstance(tree.criterion, BaseCriterion):
             if is_classifier(tree):
-                criterion = CRITERIA_CLF[tree.criterion](
-                    n_outputs, n_classes
-                )
+                criterion = CRITERIA_CLF[tree.criterion](n_outputs, n_classes)
             else:
                 criterion = CRITERIA_REG[tree.criterion](n_outputs, n_samples)
         else:
             # Make a deepcopy in case the criterion has mutable attributes that
             # might be shared and modified concurrently during parallel fitting
             criterion = copy.deepcopy(tree.criterion)
-        
+
         return criterion
-
 
     def _build_tree(
         self,
         criterion,
         X,
@@ -623,11 +616,11 @@
                 min_weight_leaf,
                 random_state,
                 monotonic_cst,
                 presplit_conditions=self.presplit_conditions,
                 postsplit_conditions=self.postsplit_conditions,
-                listeners=self.splitter_listeners
+                listeners=self.splitter_listeners,
             )
 
         if is_classifier(self):
             self.tree_ = Tree(self.n_features_in_, self.n_classes_, self.n_outputs_)
         else:
@@ -646,11 +639,11 @@
                 min_samples_leaf,
                 min_weight_leaf,
                 max_depth,
                 self.min_impurity_decrease,
                 self.store_leaf_values,
-                listeners = self.tree_build_listeners
+                listeners=self.tree_build_listeners,
             )
         else:
             builder = BestFirstTreeBuilder(
                 splitter,
                 min_samples_split,
@@ -658,11 +651,11 @@
                 min_weight_leaf,
                 max_depth,
                 max_leaf_nodes,
                 self.min_impurity_decrease,
                 self.store_leaf_values,
-                listeners = self.tree_build_listeners
+                listeners=self.tree_build_listeners,
             )
         builder.build(self.tree_, X, y, sample_weight, missing_values_in_feature_mask)
 
         if self.n_outputs_ == 1 and is_classifier(self):
             self.n_classes_ = self.n_classes_[0]
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_classes.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/tests/test_tree.py	2024-09-18 16:56:23.241792+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/tests/test_tree.py	2024-09-18 16:56:50.578143+00:00
@@ -320,43 +320,44 @@
         score = accuracy_score(clf.predict(iris.data), iris.target)
         assert score > 0.5, "Failed with {0}, criterion = {1} and score = {2}".format(
             name, criterion, score
         )
 
+
 def test_honest_iris():
     import json
 
     for criterion in CLF_CRITERIONS:
         hf = HonestDecisionTree(
             target_tree_class=DecisionTreeClassifier,
             target_tree_kwargs={
-                'criterion': criterion,
-                'random_state': 0,
-                'store_leaf_values': True
-            }
+                "criterion": criterion,
+                "random_state": 0,
+                "store_leaf_values": True,
+            },
         )
         hf.fit(iris.data, iris.target)
 
         # verify their apply results are identical
         dishonest = hf.target_tree.apply(iris.data)
         honest = hf.apply(iris.data)
-        assert np.sum((honest - dishonest)**2) == 0, (
-            "Failed with apply delta. dishonest: {0}, honest: {1}".format(
-                dishonest, honest
-            )
+        assert (
+            np.sum((honest - dishonest) ** 2) == 0
+        ), "Failed with apply delta. dishonest: {0}, honest: {1}".format(
+            dishonest, honest
         )
 
         # verify their predict results are identical
         # technically they may correctly differ,
         # but at least in this test case they tend not to,
         # so it's a reasonable smoke test
         dishonest = hf.target_tree.predict(iris.data)
         honest = hf.predict(iris.data)
-        assert np.sum((honest - dishonest)**2) == 0, (
-            "Failed with predict delta. dishonest: {0}, honest: {1}".format(
-                dishonest, honest
-            )
+        assert (
+            np.sum((honest - dishonest) ** 2) == 0
+        ), "Failed with predict delta. dishonest: {0}, honest: {1}".format(
+            dishonest, honest
         )
 
         # verify that at least some leaf sample sets
         # are in fact different for corresponding leaves.
         # again, possible to fail by chance,
@@ -377,63 +378,73 @@
                     print(f"dishonest: {dishonest.T}")
                     print(f"   honest: {honest.T}")
                     print(f"dishonest_hist: {dishonest_hist}")
                     print(f"   honest_hist: {honest_hist}")
 
-        assert len(leaf_eq) != leaf_ct, (
-            "Failed with all leaves equal: {0}".format(leaf_eq)
+        assert len(leaf_eq) != leaf_ct, "Failed with all leaves equal: {0}".format(
+            leaf_eq
         )
 
         # check accuracy
         score = accuracy_score(hf.target_tree.predict(iris.data), iris.target)
         print(f"dishonest score: {score}")
-        assert score > 0.9, "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
-           "DecisionTreeClassifier", criterion, score
+        assert (
+            score > 0.9
+        ), "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
+            "DecisionTreeClassifier", criterion, score
         )
         score = accuracy_score(hf.predict(iris.data), iris.target)
         print(f"honest score: {score}")
-        assert score > 0.9, "Failed with {0}, criterion = {1} and honest score = {2}".format(
-           "DecisionTreeClassifier", criterion, score
+        assert (
+            score > 0.9
+        ), "Failed with {0}, criterion = {1} and honest score = {2}".format(
+            "DecisionTreeClassifier", criterion, score
         )
 
         # check predict_proba
         dishonest_proba = hf.target_tree.predict_log_proba(iris.data)
         honest_proba = hf.predict_log_proba(iris.data)
-        assert len(dishonest_proba) == len(honest_proba), ((
+        assert len(dishonest_proba) == len(honest_proba), (
             "Mismatched predict_log_proba: len(dishonest_proba) = {0}, "
             "len(honest_proba) = {1}"
-        ).format(len(dishonest_proba), len(honest_proba)))
+        ).format(len(dishonest_proba), len(honest_proba))
 
         for i in range(len(dishonest_proba)):
-            assert np.all(dishonest_proba[i] == honest_proba[i]), ((
+            assert np.all(dishonest_proba[i] == honest_proba[i]), (
                 "Failed with predict_log_proba delta row {0}. "
                 "dishonest: {1}, honest: {2}"
-            ).format(i, dishonest_proba[i], honest_proba[i]))
+            ).format(i, dishonest_proba[i], honest_proba[i])
 
         # verify no invalid nodes in honest tree
         ht = HonestyTester(hf)
         invalid_nodes = ht.get_invalid_nodes()
-        invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
+        invalid_nodes_dict = [
+            node.to_dict() if hasattr(node, "to_dict") else node
+            for node in invalid_nodes
+        ]
         invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
-        assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
-
-        #clf = Tree(criterion=criterion, max_features=2, random_state=0)
-        #hf = HonestDecisionTree(clf)
-        #hf.fit(iris.data, iris.target)
-        #score = accuracy_score(clf.predict(iris.data), iris.target)
-        #assert score > 0.5, "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
+        assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(
+            invalid_nodes_json
+        )
+
+        # clf = Tree(criterion=criterion, max_features=2, random_state=0)
+        # hf = HonestDecisionTree(clf)
+        # hf.fit(iris.data, iris.target)
+        # score = accuracy_score(clf.predict(iris.data), iris.target)
+        # assert score > 0.5, "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
         #    name, criterion, score
-        #)
-        #score = accuracy_score(hf.predict(iris.data), iris.target)
-        #assert score > 0.5, "Failed with {0}, criterion = {1} and honest score = {2}".format(
+        # )
+        # score = accuracy_score(hf.predict(iris.data), iris.target)
+        # assert score > 0.5, "Failed with {0}, criterion = {1} and honest score = {2}".format(
         #    name, criterion, score
-        #)
-        #ht = HonestyTester(hf)
-        #invalid_nodes = ht.get_invalid_nodes()
-        #invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
-        #invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
-        #assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
+        # )
+        # ht = HonestyTester(hf)
+        # invalid_nodes = ht.get_invalid_nodes()
+        # invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
+        # invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
+        # assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
+
 
 @pytest.mark.parametrize("name, Tree", REG_TREES.items())
 @pytest.mark.parametrize("criterion", REG_CRITERIONS)
 def test_diabetes_overfit(name, Tree, criterion):
     # check consistency of overfitted trees on the diabetes dataset
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/tests/test_tree.py

Oh no! 💥 💔 💥
5 files would be reformatted, 923 files would be left unchanged.

ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.2.1.


sklearn/ensemble/_forest.py:2614:89: E501 Line too long (93 > 88)
     |
2612 |         """
2613 |         if self.bootstrap is False and (
2614 |             self._n_samples_bootstrap is None or self._n_samples_bootstrap == self._n_samples
     |                                                                                         ^^^^^ E501
2615 |         ):
2616 |             raise RuntimeError(
     |

sklearn/ensemble/_forest.py:2625:89: E501 Line too long (92 > 88)
     |
2624 |         possible_indices = np.arange(self._n_samples)
2625 |         for structure_idx, honest_idx in zip(self.structure_indices_, self.honest_indices_):
     |                                                                                         ^^^^ E501
2626 |             _oob_samples = np.setdiff1d(
2627 |                 possible_indices, np.concatenate((structure_idx, honest_idx))
     |

sklearn/ensemble/tests/test_forest.py:8:1: I001 [*] Import block is un-sorted or un-formatted
   |
 6 |   # SPDX-License-Identifier: BSD-3-Clause
 7 |   
 8 | / import itertools
 9 | | import math
10 | | import pickle
11 | | from collections import defaultdict
12 | | from functools import partial
13 | | from itertools import combinations, product
14 | | from typing import Any, Dict
15 | | from unittest.mock import patch
16 | | 
17 | | import joblib
18 | | import numpy as np
19 | | import pytest
20 | | from scipy.special import comb
21 | | 
22 | | import sklearn
23 | | from sklearn import clone, datasets
24 | | from sklearn.datasets import make_classification, make_hastie_10_2
25 | | from sklearn.decomposition import TruncatedSVD
26 | | from sklearn.dummy import DummyRegressor
27 | | from sklearn.ensemble import (
28 | |     ExtraTreesClassifier,
29 | |     ExtraTreesRegressor,
30 | |     RandomForestClassifier,
31 | |     RandomForestRegressor,
32 | |     RandomTreesEmbedding,
33 | | )
34 | | from sklearn.ensemble._forest import (
35 | |     _generate_unsampled_indices,
36 | |     _get_n_samples_bootstrap,
37 | |     HonestRandomForestClassifier,
38 | | )
39 | | from sklearn.exceptions import NotFittedError
40 | | from sklearn.metrics import (
41 | |     explained_variance_score,
42 | |     f1_score,
43 | |     mean_poisson_deviance,
44 | |     mean_squared_error,
45 | | )
46 | | from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split
47 | | from sklearn.svm import LinearSVC
48 | | from sklearn.tree._classes import SPARSE_SPLITTERS
49 | | from sklearn.utils._testing import (
50 | |     _convert_container,
51 | |     assert_allclose,
52 | |     assert_almost_equal,
53 | |     assert_array_almost_equal,
54 | |     assert_array_equal,
55 | |     ignore_warnings,
56 | |     skip_if_no_parallel,
57 | | )
58 | | from sklearn.utils.fixes import COO_CONTAINERS, CSC_CONTAINERS, CSR_CONTAINERS
59 | | from sklearn.utils.multiclass import type_of_target
60 | | from sklearn.utils.parallel import Parallel
61 | | from sklearn.utils.validation import check_random_state
62 | | 
63 | | # toy sample
   | |_^ I001
64 |   X = [[-2, -1], [-1, -1], [-1, -2], [1, 1], [1, 2], [2, 1]]
65 |   y = [-1, -1, -1, 1, 1, 1]
   |
   = help: Organize imports

sklearn/tree/__init__.py:3:1: I001 [*] Import block is un-sorted or un-formatted
   |
 1 |   """Decision tree based models for classification and regression."""
 2 |   
 3 | / from ._classes import (
 4 | |     BaseDecisionTree,
 5 | |     DecisionTreeClassifier,
 6 | |     DecisionTreeRegressor,
 7 | |     ExtraTreeClassifier,
 8 | |     ExtraTreeRegressor,
 9 | | )
10 | | from ._honest_tree import HonestDecisionTree
11 | | from ._export import export_graphviz, export_text, plot_tree
12 | | 
13 | | __all__ = [
   | |_^ I001
14 |       "BaseDecisionTree",
15 |       "HonestDecisionTree",
   |
   = help: Organize imports

sklearn/tree/_classes.py:519:1: W293 [*] Blank line contains whitespace
    |
517 |             # might be shared and modified concurrently during parallel fitting
518 |             criterion = copy.deepcopy(tree.criterion)
519 |         
    | ^^^^^^^^ W293
520 |         return criterion
    |
    = help: Remove whitespace from blank line

sklearn/tree/_classes.py:560:9: F841 Local variable `n_samples` is assigned to but never used
    |
558 |             Random seed.
559 |         """
560 |         n_samples = X.shape[0]
    |         ^^^^^^^^^ F841
561 | 
562 |         # Build tree
    |
    = help: Remove assignment to unused variable `n_samples`

sklearn/tree/_honest_tree.py:3:1: I001 [*] Import block is un-sorted or un-formatted
   |
 1 |   # Adopted from: https://github.com/neurodata/honest-forests
 2 |   
 3 | / import numpy as np
 4 | | from numpy import float32 as DTYPE
 5 | | 
 6 | | from ..base import _fit_context, is_classifier
 7 | | from ..model_selection import StratifiedShuffleSplit
 8 | | from ..utils import compute_sample_weight
 9 | | from ..utils._param_validation import Interval, RealNotInt, StrOptions
10 | | from ..utils.multiclass import check_classification_targets
11 | | 
12 | | from ._classes import (
13 | |     BaseDecisionTree,
14 | |     CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
15 | | )
16 | | from ._honesty import HonestTree, Honesty
17 | | from ._tree import DOUBLE, Tree
18 | | 
19 | | import inspect
20 | | 
21 | | 
22 | | # note to self: max_n_classes is the maximum number of classes observed
   | |_^ I001
23 |   # in any response variable dimension
24 |   class HonestDecisionTree(BaseDecisionTree):
   |
   = help: Organize imports

sklearn/tree/_honest_tree.py:4:30: F401 [*] `numpy.float32` imported but unused
  |
3 | import numpy as np
4 | from numpy import float32 as DTYPE
  |                              ^^^^^ F401
5 | 
6 | from ..base import _fit_context, is_classifier
  |
  = help: Remove unused import: `numpy.float32`

sklearn/tree/_honest_tree.py:14:5: F401 [*] `._classes.CRITERIA_CLF` imported but unused
   |
12 | from ._classes import (
13 |     BaseDecisionTree,
14 |     CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
   |     ^^^^^^^^^^^^ F401
15 | )
16 | from ._honesty import HonestTree, Honesty
   |
   = help: Remove unused import

sklearn/tree/_honest_tree.py:14:19: F401 [*] `._classes.CRITERIA_REG` imported but unused
   |
12 | from ._classes import (
13 |     BaseDecisionTree,
14 |     CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
   |                   ^^^^^^^^^^^^ F401
15 | )
16 | from ._honesty import HonestTree, Honesty
   |
   = help: Remove unused import

sklearn/tree/_honest_tree.py:14:33: F401 [*] `._classes.DENSE_SPLITTERS` imported but unused
   |
12 | from ._classes import (
13 |     BaseDecisionTree,
14 |     CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
   |                                 ^^^^^^^^^^^^^^^ F401
15 | )
16 | from ._honesty import HonestTree, Honesty
   |
   = help: Remove unused import

sklearn/tree/_honest_tree.py:14:50: F401 [*] `._classes.SPARSE_SPLITTERS` imported but unused
   |
12 | from ._classes import (
13 |     BaseDecisionTree,
14 |     CRITERIA_CLF, CRITERIA_REG, DENSE_SPLITTERS, SPARSE_SPLITTERS
   |                                                  ^^^^^^^^^^^^^^^^ F401
15 | )
16 | from ._honesty import HonestTree, Honesty
   |
   = help: Remove unused import

sklearn/tree/_honest_tree.py:17:28: F401 [*] `._tree.Tree` imported but unused
   |
15 | )
16 | from ._honesty import HonestTree, Honesty
17 | from ._tree import DOUBLE, Tree
   |                            ^^^^ F401
18 | 
19 | import inspect
   |
   = help: Remove unused import: `._tree.Tree`

sklearn/tree/_honest_tree.py:19:8: F401 [*] `inspect` imported but unused
   |
17 | from ._tree import DOUBLE, Tree
18 | 
19 | import inspect
   |        ^^^^^^^ F401
   |
   = help: Remove unused import: `inspect`

sklearn/tree/_honest_tree.py:47:89: E501 Line too long (94 > 88)
   |
45 |         self.criterion = criterion
46 |         self.target_tree_class = target_tree_class
47 |         self.target_tree_kwargs = target_tree_kwargs if target_tree_kwargs is not None else {}
   |                                                                                         ^^^^^^ E501
48 | 
49 |         self.random_state = random_state
   |

sklearn/tree/_honest_tree.py:56:89: E501 Line too long (95 > 88)
   |
54 |         # TODO: unwind this whole gross antipattern
55 |         if target_tree_class is not None:
56 |             HonestDecisionTree._target_tree_hack(self, target_tree_class, **target_tree_kwargs)
   |                                                                                         ^^^^^^^ E501
57 |     
58 |     @staticmethod
   |

sklearn/tree/_honest_tree.py:57:1: W293 [*] Blank line contains whitespace
   |
55 |         if target_tree_class is not None:
56 |             HonestDecisionTree._target_tree_hack(self, target_tree_class, **target_tree_kwargs)
57 |     
   | ^^^^ W293
58 |     @staticmethod
59 |     def _target_tree_hack(honest_tree, target_tree_class, **kwargs):
   |
   = help: Remove whitespace from blank line

sklearn/tree/_honest_tree.py:131:89: E501 Line too long (101 > 88)
    |
130 |         # run this again because of the way ensemble creates estimators
131 |         HonestDecisionTree._target_tree_hack(self, self.target_tree_class, **self.target_tree_kwargs)
    |                                                                                         ^^^^^^^^^^^^^ E501
132 |         target_bta = self.target_tree._prep_data(
133 |             X=X,
    |

sklearn/tree/_honest_tree.py:152:89: E501 Line too long (89 > 88)
    |
151 |         # obtain the structure sample weights
152 |         sample_weights_structure, sample_weights_honest = self._partition_honest_indices(
    |                                                                                         ^ E501
153 |             target_bta.y,
154 |             target_bta.sample_weight
    |

sklearn/tree/_honest_tree.py:259:1: W293 [*] Blank line contains whitespace
    |
257 |             if missing_values_in_feature_mask is not None:
258 |                 self.honesty.init_sum_missing(criterion)
259 |             
    | ^^^^^^^^^^^^ W293
260 |             self.honesty.node_value(self.tree_, criterion, i)
    |
    = help: Remove whitespace from blank line

sklearn/tree/_honest_tree.py:270:1: W293 [*] Blank line contains whitespace
    |
268 |             lambda: True
269 |         )
270 |  
    | ^ W293
271 |         return self
    |
    = help: Remove whitespace from blank line

sklearn/tree/_honest_tree.py:273:1: W293 [*] Blank line contains whitespace
    |
271 |         return self
272 | 
273 |     
    | ^^^^ W293
274 |     def _init_output_shape(self, X, y, classes=None):
275 |         # Determine output settings
    |
    = help: Remove whitespace from blank line

sklearn/tree/_honest_tree.py:328:21: F841 Local variable `expanded_class_weight` is assigned to but never used
    |
327 |                 if self.class_weight is not None:
328 |                     expanded_class_weight = compute_sample_weight(
    |                     ^^^^^^^^^^^^^^^^^^^^^ F841
329 |                         self.class_weight, y_original
330 |                     )
    |
    = help: Remove assignment to unused variable `expanded_class_weight`

sklearn/tree/_honest_tree.py:359:89: E501 Line too long (90 > 88)
    |
357 |         if self.stratify:
358 |             ss = StratifiedShuffleSplit(
359 |                 n_splits=1, test_size=self.honest_fraction, random_state=self.random_state
    |                                                                                         ^^ E501
360 |             )
361 |             for structure_idx, _ in ss.split(
    |

sklearn/tree/tests/test_tree.py:5:1: I001 [*] Import block is un-sorted or un-formatted
   |
 3 |   """
 4 |   
 5 | / import copy
 6 | | import copyreg
 7 | | import io
 8 | | import pickle
 9 | | import struct
10 | | from itertools import chain, product
11 | | 
12 | | import joblib
13 | | import numpy as np
14 | | import pytest
15 | | from joblib.numpy_pickle import NumpyPickler
16 | | from numpy.testing import assert_allclose
17 | | 
18 | | from sklearn import clone, datasets, tree
19 | | from sklearn.dummy import DummyRegressor
20 | | from sklearn.exceptions import NotFittedError
21 | | from sklearn.impute import SimpleImputer
22 | | from sklearn.metrics import accuracy_score, mean_poisson_deviance, mean_squared_error
23 | | from sklearn.model_selection import train_test_split
24 | | from sklearn.pipeline import make_pipeline
25 | | from sklearn.random_projection import _sparse_random_matrix
26 | | from sklearn.tree import (
27 | |     DecisionTreeClassifier,
28 | |     DecisionTreeRegressor,
29 | |     ExtraTreeClassifier,
30 | |     ExtraTreeRegressor,
31 | | )
32 | | from sklearn.tree._classes import (
33 | |     CRITERIA_CLF,
34 | |     CRITERIA_REG,
35 | |     DENSE_SPLITTERS,
36 | |     SPARSE_SPLITTERS,
37 | | )
38 | | from sklearn.tree._honesty import Honesty
39 | | from sklearn.tree._honest_tree import HonestDecisionTree
40 | | from sklearn.tree._test import HonestyTester
41 | | from sklearn.tree._tree import (
42 | |     NODE_DTYPE,
43 | |     TREE_LEAF,
44 | |     TREE_UNDEFINED,
45 | |     _check_n_classes,
46 | |     _check_node_ndarray,
47 | |     _check_value_ndarray,
48 | | )
49 | | from sklearn.tree._tree import Tree as CythonTree
50 | | from sklearn.utils import compute_sample_weight
51 | | from sklearn.utils._testing import (
52 | |     assert_almost_equal,
53 | |     assert_array_almost_equal,
54 | |     assert_array_equal,
55 | |     create_memmap_backed_data,
56 | |     ignore_warnings,
57 | |     skip_if_32bit,
58 | | )
59 | | from sklearn.utils.estimator_checks import check_sample_weights_invariance
60 | | from sklearn.utils.fixes import (
61 | |     _IS_32BIT,
62 | |     COO_CONTAINERS,
63 | |     CSC_CONTAINERS,
64 | |     CSR_CONTAINERS,
65 | | )
66 | | from sklearn.utils.validation import check_random_state
67 | | 
68 | | CLF_CRITERIONS = ("gini", "log_loss")
   | |_^ I001
69 |   REG_CRITERIONS = ("squared_error", "absolute_error", "friedman_mse", "poisson")
   |
   = help: Organize imports

sklearn/tree/tests/test_tree.py:389:89: E501 Line too long (96 > 88)
    |
387 |         score = accuracy_score(hf.target_tree.predict(iris.data), iris.target)
388 |         print(f"dishonest score: {score}")
389 |         assert score > 0.9, "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
    |                                                                                         ^^^^^^^^ E501
390 |            "DecisionTreeClassifier", criterion, score
391 |         )
    |

sklearn/tree/tests/test_tree.py:394:89: E501 Line too long (93 > 88)
    |
392 |         score = accuracy_score(hf.predict(iris.data), iris.target)
393 |         print(f"honest score: {score}")
394 |         assert score > 0.9, "Failed with {0}, criterion = {1} and honest score = {2}".format(
    |                                                                                         ^^^^^ E501
395 |            "DecisionTreeClassifier", criterion, score
396 |         )
    |

sklearn/tree/tests/test_tree.py:415:89: E501 Line too long (109 > 88)
    |
413 |         ht = HonestyTester(hf)
414 |         invalid_nodes = ht.get_invalid_nodes()
415 |         invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
    |                                                                                         ^^^^^^^^^^^^^^^^^^^^^ E501
416 |         invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
417 |         assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
    |

sklearn/tree/tests/test_tree.py:417:89: E501 Line too long (99 > 88)
    |
415 |         invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
416 |         invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
417 |         assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
    |                                                                                         ^^^^^^^^^^^ E501
418 | 
419 |         #clf = Tree(criterion=criterion, max_features=2, random_state=0)
    |

sklearn/tree/tests/test_tree.py:423:89: E501 Line too long (97 > 88)
    |
421 |         #hf.fit(iris.data, iris.target)
422 |         #score = accuracy_score(clf.predict(iris.data), iris.target)
423 |         #assert score > 0.5, "Failed with {0}, criterion = {1} and dishonest score = {2}".format(
    |                                                                                         ^^^^^^^^^ E501
424 |         #    name, criterion, score
425 |         #)
    |

sklearn/tree/tests/test_tree.py:427:89: E501 Line too long (94 > 88)
    |
425 |         #)
426 |         #score = accuracy_score(hf.predict(iris.data), iris.target)
427 |         #assert score > 0.5, "Failed with {0}, criterion = {1} and honest score = {2}".format(
    |                                                                                         ^^^^^^ E501
428 |         #    name, criterion, score
429 |         #)
    |

sklearn/tree/tests/test_tree.py:432:89: E501 Line too long (110 > 88)
    |
430 |         #ht = HonestyTester(hf)
431 |         #invalid_nodes = ht.get_invalid_nodes()
432 |         #invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
    |                                                                                         ^^^^^^^^^^^^^^^^^^^^^^ E501
433 |         #invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
434 |         #assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
    |

sklearn/tree/tests/test_tree.py:434:89: E501 Line too long (100 > 88)
    |
432 |         #invalid_nodes_dict = [node.to_dict() if hasattr(node, 'to_dict') else node for node in invalid_nodes]
433 |         #invalid_nodes_json = json.dumps(invalid_nodes_dict, indent=4)
434 |         #assert len(invalid_nodes) == 0, "Failed with invalid nodes: {0}".format(invalid_nodes_json)
    |                                                                                         ^^^^^^^^^^^^ E501
435 | 
436 | @pytest.mark.parametrize("name, Tree", REG_TREES.items())
    |

sklearn/tree/tests/test_tree.py:471:89: E501 Line too long (98 > 88)
    |
470 | # @skip_if_32bit
471 | # @pytest.mark.parametrize("name, Tree", {"DecisionTreeRegressor": DecisionTreeRegressor}.items())
    |                                                                                         ^^^^^^^^^^ E501
472 | # @pytest.mark.parametrize(
473 | #     "criterion, max_depth, metric, max_loss",
    |

sklearn/tree/tests/test_tree.py:485:89: E501 Line too long (90 > 88)
    |
483 | #     # limited
484 | 
485 | #     reg = Tree(criterion=criterion, max_depth=max_depth, max_features=6, random_state=0)
    |                                                                                         ^^ E501
486 | #     hon = HonestDecisionTree(reg)
487 | #     hon.fit(diabetes.data, diabetes.target)
    |

Found 35 errors.
[*] 16 fixable with the `--fix` option (2 hidden fixes can be enabled with the `--unsafe-fixes` option).

cython-lint

cython-lint detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed cython-lint version is cython-lint=0.16.2.


/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pxd:56:44: E261 at least two spaces before inline comment
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_sort.pxd:13:5: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_events.pxd:31:55: E261 at least two spaces before inline comment
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:9:40: 'swap' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:16:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:36:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:539:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:540:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:541:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:542:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:543:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:544:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:550:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:551:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:552:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:553:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:554:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pxd:269:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:6:33: 'HonestEnv' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:6:44: 'Views' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:7:21: 'BaseTree' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:18:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:33:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:45:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:47:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:67:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:67:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:79:5: E301 expected 1 blank line, found 0
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_test.pyx:86:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_events.pyx:53:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_events.pyx:60:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:13:1: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:71:90: W291 trailing whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:173:37: E252 missing whitespace around parameter equals
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:194:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:270:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:275:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:298:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:314:25: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:315:25: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:316:25: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:318:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:337:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:352:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:359:29: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:360:29: E128 continuation line under-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:366:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:390:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:394:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:399:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:405:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:410:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:423:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:490:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:597:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_tree.pyx:654:37: E252 missing whitespace around parameter equals
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:67:24: E261 at least two spaces before inline comment
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:92:24: E261 at least two spaces before inline comment
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:115:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:135:53: E703 statement ends with a semicolon
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:221:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:284:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:287:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:288:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:291:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:293:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:294:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:295:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:296:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:305:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:306:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:307:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:309:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:312:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:314:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:317:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:319:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:331:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:334:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:352:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:353:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:598:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:601:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:602:26: 'monotonic_cst' defined but unused (try prefixing with underscore?)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:603:15: 'with_monotonic_cst' defined but unused (try prefixing with underscore?)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:652:29: 'split_event_data' defined but unused (try prefixing with underscore?)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:654:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:661:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:744:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:754:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:760:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:766:21: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:770:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:781:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:794:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:804:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:809:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:821:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:822:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:827:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:828:17: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:862:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:864:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:868:13: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:891:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:896:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:906:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:923:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:928:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:936:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:944:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_splitter.pyx:951:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:1:21: 'cast' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:2:26: 'uintptr_t' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:5:26: 'BaseCriterion' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:77:5: E303 too many blank lines (2)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:84:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:97:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:100:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:103:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:112:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:118:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:121:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:137:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:140:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:151:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:161:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:164:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:167:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:181:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:198:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:208:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:227:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:230:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:233:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:242:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:250:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:253:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:269:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:275:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:276:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:282:28: 'n_missing' defined but unused (try prefixing with underscore?)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:287:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:288:9: E116 unexpected indentation (comment)
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:291:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:299:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:305:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:314:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:331:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:340:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:348:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:356:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:364:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:367:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:407:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:409:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:443:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:451:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:470:9: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:474:5: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_honesty.pyx:476:1: W293 blank line contains whitespace

Generated for commit: 71cacf3. Link to the linter CI: here

sklearn/tree/_events.pxd Outdated Show resolved Hide resolved
@SamuelCarliles3 SamuelCarliles3 marked this pull request as draft July 3, 2024 14:35
@adam2392 adam2392 requested a review from sampan501 July 9, 2024 18:55
Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to edit this comment to add information:

Testing

The current unit test tells me that the code runs, but uncertain if it works to an outsider.

  1. Add a unit-test comparing honest tree and dishonest tree depth on the same dataset def test_honest_tree_depth_vs_dishonest_tree
  2. Add a short Jupyter notebook comparing the visualization of a honest/dishonest tree (https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) on a fixed toy simulated dataset.
  3. etc. please document things you intend on testing w/ a brief sketch?

Questions/Comments

  1. logistics: I think it is possible to keep all sort functions within _partitioner and have this diff be essentially almost gone. See https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_partitioner.pxd. It's just easier to reason about the code if there's less diff. Similarly in areas where there's diff that isn't related to the functionality of the PR, it'd be good to remove whenever you can.
  2. It's unclear to me exactly the diff between _events.pxd/pyx, _honesty.pxd/pyx files, and the events abstractions created in the splitter/tree files. That is they define relevant events, but unclear what is necessary versus what is not. Can you elaborate by adding a file docstring at the top of the pxd files to help illustrate the intentions?
  3. I think overall this is an interesting design exploration for the reasons we've discussed over the past 6 months. However, I don't see us merging this as is because the changes are going to affect the maintainability of the scikit-learn fork, which is a hard dependency for treeple. With more testing, and a separate naive implementation of honesty, I think we can scope out how to get this functionality into treeple.

I think a separate PR implementing honesty naively as a separate splitter (similar to EconML and does not have to be beautiful) would be good to compare side-by-side. Do you think you can implement that once this PR branch has been tested and documented?

name, criterion, score
)

clf = Tree(criterion=criterion, max_features=2, random_state=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason max_features=2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants