Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Balancing PipeOps based on smotefamily #824

Merged
merged 16 commits into from
Sep 24, 2024
Merged

New Balancing PipeOps based on smotefamily #824

merged 16 commits into from
Sep 24, 2024

Conversation

advieser
Copy link
Collaborator

@advieser advieser commented Sep 14, 2024

Implements two new pipeops based on smotefamily:

  • PipeOpADAS: ADASYN algorithm for balancing data. New instances are created based on difficulty for learning.
  • PipeOpBLSmote: Borderline-SMOTE algorithm for balancing data. New instances are created only for data ponts near the decision regions.

Both PipeOps only work for classification tasks with numeric features that have no missing values and accept target columns with multiple classes (although only the class with the least instances is upsampled).

Both pipeops use the implementation from #815 for handling unsupported column types (so that has to be checked first).

Currently, PipeOpBLSmote prints [1] "Borderline-SMOTE done" after training. I'd think that we'd want to suppress this somehow, ideally?

Both smotefamily::BLSMOTE and smotefamily::ADAS contain bugs as of right now, which should be fixed on the side of smotefamily. This includes a bug where tasks with one feature cannot be handled (due to subsetting of the data.frame reducing the data to a vector on which nrow() is called which is then used in an if-statement).

Partially addresses #790

@advieser advieser changed the title New Balancing PipeOps based on ´smotefamily´ New Balancing PipeOps based on smotefamily Sep 14, 2024
R/PipeOpADAS.R Outdated Show resolved Hide resolved
R/PipeOpBLSmote.R Outdated Show resolved Hide resolved
R/PipeOpBLSmote.R Outdated Show resolved Hide resolved
R/PipeOpBLSmote.R Show resolved Hide resolved
@advieser advieser marked this pull request as ready for review September 24, 2024 08:24
@advieser advieser merged commit 945bc5d into master Sep 24, 2024
1 of 4 checks passed
@advieser advieser deleted the smotefamily_pos branch September 24, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants