another classifier in classification chapter? #232

leem44 · 2021-08-13T00:30:34Z

Posting #106 Reviewer E's comment here about classification chapter:

Would it be possible to include at least one other classifier besides KNN? I don’t see this commonly used in industry, so even a brief counterpoint of logistic regression or random forest (even without deep explanation) could illuminate that there are many different algorithms with different performance, pros/cons
I love the discussion of the strengths and limitations of KNN, but it makes less sense without any presented alternatives

trevorcampbell · 2021-10-04T21:53:55Z

I'll make this a v2 enhancement, with the goal of adding logistic regression. But in reality this is somewhere between "v2 enhancement" and "blue sky enhancement".

trevorcampbell · 2023-07-12T01:21:19Z

Just a bit of brainstorming on this as a followup: if we do this, I think it might make the most sense to have a new chapter, Classification III, that covers logistic regression. That way we can mimic the structure of Regression II, which is the equivalent chapter for regression. If we tried to do LogReg in Classification I or II, we wouldn't be able to do that (b/c there will be concepts that weren't introduced yet).

This would also involve editing Reg II to avoid repeating ourselves when we get to Lin Reg

ttimbers · 2023-07-12T02:25:29Z

I worry a bit about introducing logistic regression before linear regression... and we use knn classification as a gateway to knn regression, so we're kind of tied to classification and then regression...

Maybe all this wouldn't be a problem if we place Classification II after Regression II? So it's kind of like a classification sandwich? Alternatively, we choose some other algorithm for classification II? Decision trees could be good? They're the basis for some of the most popular and best performing ML models right now? Or we could choose SVMs?

trevorcampbell · 2023-07-12T16:17:40Z

Thanks for brainstorming :)

Decision trees could be good? They're the basis for some of the most popular and best performing ML models right now? Or we could choose SVMs?

I'm definitely on board with adding more interesting classification models like decision trees / forests / SVMs / NNs.

If I had to pick more classification models to add, I'd go with LogReg first (because it's almost as simple as linear regression, very popular, and a nice counterpart to the uninterpretable knn stuff) and then Decision tree/forest because it has a really nice algorithmic / intuitive description of how it classifies things. SVMs and NNs are harder to introduce at the level of this textbook -- esp SVMs... -- but I don't think it's impossible.

Maybe all this wouldn't be a problem if we place Classification II after Regression II?

Hmmm, I don't think that will work -- that would cause a huge rewrite of at least 3 chapters -- since Reg 1 & 2 rely on knowing about cross val / tuning / etc from Cls 2.

Alternatively, we choose some other algorithm for classification II?

For me the purpose of Cls 2 is mostly to introduce evaluation / tuning. I wouldn't want to introduce a new classifier at the same time, just to avoid overloading people. That would also involve fairly heavy editing on an already polished chapter.

and we use knn classification as a gateway to knn regression, so we're kind of tied to classification and then regression...

I don't think it's super important to jump directly from knn classification to knn regression. We already space them out by Cls 2, which is all about tuning/eval. If we had a new "classification 3", at the beginning of "regression 1" we would just keep the same introduction to regression problems, and make very minor modifications to the text to say that we're going to introduce regression with a k-nn-based model, just like we did in classification.

I'm still fairly convinced that the most natural place to introduce new classifiers is in a new "Classification 3" chapter. It also makes it natural to later on consider adding other regression models in "Regression 3", but we could merge those into Reg 2 as well.

trevorcampbell · 2023-07-12T17:43:40Z

Just documenting one point from an in-person chat with Tiffany: we probably want to avoid introducing new classifiers in the actual DSCI100 course itself, to avoid conflict with other existing classes (CPSC330 notably). But adding to the textbook can be independent of that.

One other potential issue with "Cls 3" chapter: introducing logistic regression before linear regression will be a bit awkward. Maybe best to stick with decision tree/forest?

Probably will punt this edit for now and return to it later.

leem44 added the question Further information is requested label Aug 13, 2021

leem44 mentioned this issue Aug 13, 2021

Review: Ch 6 (classification) #106

Closed

leem44 changed the title ~~Classification I another classifier~~ another classifier in classification chapter? Aug 13, 2021

trevorcampbell added v2 enhancement and removed question Further information is requested labels Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

another classifier in classification chapter? #232

another classifier in classification chapter? #232

leem44 commented Aug 13, 2021

trevorcampbell commented Oct 4, 2021

trevorcampbell commented Jul 12, 2023 •

edited

Loading

ttimbers commented Jul 12, 2023

trevorcampbell commented Jul 12, 2023 •

edited

Loading

trevorcampbell commented Jul 12, 2023

another classifier in classification chapter? #232

another classifier in classification chapter? #232

Comments

leem44 commented Aug 13, 2021

trevorcampbell commented Oct 4, 2021

trevorcampbell commented Jul 12, 2023 • edited Loading

ttimbers commented Jul 12, 2023

trevorcampbell commented Jul 12, 2023 • edited Loading

trevorcampbell commented Jul 12, 2023

trevorcampbell commented Jul 12, 2023 •

edited

Loading

trevorcampbell commented Jul 12, 2023 •

edited

Loading