Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

another classifier in classification chapter? #232

Open
leem44 opened this issue Aug 13, 2021 · 5 comments
Open

another classifier in classification chapter? #232

leem44 opened this issue Aug 13, 2021 · 5 comments

Comments

@leem44
Copy link
Contributor

leem44 commented Aug 13, 2021

Posting #106 Reviewer E's comment here about classification chapter:

  • Would it be possible to include at least one other classifier besides KNN? I don’t see this commonly used in industry, so even a brief counterpoint of logistic regression or random forest (even without deep explanation) could illuminate that there are many different algorithms with different performance, pros/cons
  • I love the discussion of the strengths and limitations of KNN, but it makes less sense without any presented alternatives
@leem44 leem44 added the question Further information is requested label Aug 13, 2021
@leem44 leem44 changed the title Classification I another classifier another classifier in classification chapter? Aug 13, 2021
@trevorcampbell trevorcampbell added v2 enhancement and removed question Further information is requested labels Oct 4, 2021
@trevorcampbell
Copy link
Contributor

I'll make this a v2 enhancement, with the goal of adding logistic regression. But in reality this is somewhere between "v2 enhancement" and "blue sky enhancement".

@trevorcampbell
Copy link
Contributor

trevorcampbell commented Jul 12, 2023

Just a bit of brainstorming on this as a followup: if we do this, I think it might make the most sense to have a new chapter, Classification III, that covers logistic regression. That way we can mimic the structure of Regression II, which is the equivalent chapter for regression. If we tried to do LogReg in Classification I or II, we wouldn't be able to do that (b/c there will be concepts that weren't introduced yet).

This would also involve editing Reg II to avoid repeating ourselves when we get to Lin Reg

@ttimbers
Copy link
Contributor

I worry a bit about introducing logistic regression before linear regression... and we use knn classification as a gateway to knn regression, so we're kind of tied to classification and then regression...

Maybe all this wouldn't be a problem if we place Classification II after Regression II? So it's kind of like a classification sandwich? Alternatively, we choose some other algorithm for classification II? Decision trees could be good? They're the basis for some of the most popular and best performing ML models right now? Or we could choose SVMs?

@trevorcampbell
Copy link
Contributor

trevorcampbell commented Jul 12, 2023

Thanks for brainstorming :)

Decision trees could be good? They're the basis for some of the most popular and best performing ML models right now? Or we could choose SVMs?

I'm definitely on board with adding more interesting classification models like decision trees / forests / SVMs / NNs.

If I had to pick more classification models to add, I'd go with LogReg first (because it's almost as simple as linear regression, very popular, and a nice counterpart to the uninterpretable knn stuff) and then Decision tree/forest because it has a really nice algorithmic / intuitive description of how it classifies things. SVMs and NNs are harder to introduce at the level of this textbook -- esp SVMs... -- but I don't think it's impossible.

Maybe all this wouldn't be a problem if we place Classification II after Regression II?

Hmmm, I don't think that will work -- that would cause a huge rewrite of at least 3 chapters -- since Reg 1 & 2 rely on knowing about cross val / tuning / etc from Cls 2.

Alternatively, we choose some other algorithm for classification II?

For me the purpose of Cls 2 is mostly to introduce evaluation / tuning. I wouldn't want to introduce a new classifier at the same time, just to avoid overloading people. That would also involve fairly heavy editing on an already polished chapter.

and we use knn classification as a gateway to knn regression, so we're kind of tied to classification and then regression...

I don't think it's super important to jump directly from knn classification to knn regression. We already space them out by Cls 2, which is all about tuning/eval. If we had a new "classification 3", at the beginning of "regression 1" we would just keep the same introduction to regression problems, and make very minor modifications to the text to say that we're going to introduce regression with a k-nn-based model, just like we did in classification.

I'm still fairly convinced that the most natural place to introduce new classifiers is in a new "Classification 3" chapter. It also makes it natural to later on consider adding other regression models in "Regression 3", but we could merge those into Reg 2 as well.

@trevorcampbell
Copy link
Contributor

Just documenting one point from an in-person chat with Tiffany: we probably want to avoid introducing new classifiers in the actual DSCI100 course itself, to avoid conflict with other existing classes (CPSC330 notably). But adding to the textbook can be independent of that.

One other potential issue with "Cls 3" chapter: introducing logistic regression before linear regression will be a bit awkward. Maybe best to stick with decision tree/forest?

Probably will punt this edit for now and return to it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants