Review: Ch 6 (classification) #106

leem44 · 2021-03-28T17:59:10Z

Reviewer E:

Would it be possible to include at least one other classifier besides KNN? I don’t see this commonly used in industry, so even a brief counterpoint of logistic regression or random forest (even without deep explanation) could illuminate that there are many different algorithms with different performance, pros/cons
- ML: made a new issue: another classifier in classification chapter? #232
~~p. 133: “learn more about x here” - phrasing makes sense for the web but consider rephrasing or removing for book~~
~~Factors are a fairly dense and R-specific concept. It might be worth introducing them earlier in data wrangling or explaining them a bit more deeply (or providing readers an external reference)~~
I love the EDA before modeling, but I would appreciate more narrative how the author chose what views to make and what they were looking for. Other scatter plots might not show such a strong relationship, and students should see the iterative process of success and failure in EDA
- ML: I think this is addressed with the new edits Trevor is making
I absolutely love all of the plots in this chapter and the manual solving of the algorithm with ggplot and dplyr!
- ML: no changes needed here
Great explanation of centering and scaling, but I worry a bit about the rule of thumb to “always do this” instead of “always think about what makes sense in the problem domain”. In some cases, you don’t want to scale (e.g. some cases with multiple measures in same units)
Chapter 7 is extremely dense and hits on so many foundational modeling concepts. I think some of this could be helpful to pull up before Chapter 6 and describe a holistic modeling workflow
- ML: I think this is addressed with the new edits Trevor is making
Distinguish which steps are specific to KNN versus a general part of modeling
- ML: I think this is addressed with the new edits Trevor is making
I love the discussion of the strengths and limitations of KNN, but it makes less sense without any presented alternatives
- ML: made a new issue: another classifier in classification chapter? #232

leem44 · 2021-03-28T18:30:39Z

Reviewer B:

I think classification serves as an excellent gateway to the world of modeling, especially since fairly complex models can be represented visually. Ex: Figure 6.2 has the outcome variable represented with colors, and the two predictor variables in the x-y coordinates.
- ML: no changes needed here

trevorcampbell · 2021-03-28T20:33:17Z

Reviewer D

Can the authors clarify whether the categorical response variable used in the classification chapters is intended to have only 2 categories or could it have more? otherwise perhaps name binary classification
For the “labelled data set” used in this chapter, can the authors only focus on the response variable and predictor variables that will be used in the classification?
- ML: I'm not really sure if this makes a big difference either way, also I don't know if it's necessary to hide other columns in the data from students.
When talking about “predicting a new observations” can the authors be more specific with the language being used? For example, can they talk instead about predicting the class (i.e., malignant or bening) for a new patient/tissue based on the perimeter and concavity values for that patient/tissue AND on the information from the patients/tissues included in the classification model/technique?
~~The section on More than two explanatory variables is not sufficiently fleshed out – perhaps this could be made into an exercise or elaborated upon more?~~
It would be nice to see a workflow for this chapter presented upfront, so that the students/readers know what to aim for:
Pre-Process Data, Separate Pre-Processed Data into Overall Training and Test Sets, Apply Classification Model/Technique to Overall Training Set, Verify Accuracy of Classification on Test Set, Use ENTIRE Data Set (i.e., Overall Training + Test) to Perform Classifications for New Entities (e.g., subjects/tissues). This workflow is fairly complex so stating it upfront will help students/readers keep track of its moving components. (By the time students/readers arrive at the section titled Putting it together in a workflow, they will be primed for it.)
- ML: I don't know if this will make sense to readers before we explain what each of these mean

trevorcampbell · 2021-03-28T21:14:41Z

Reviewer A

~~p136 first code box: Why not use cancer %>% pull(Class) %>% levels~~
~~p136 second code box: You haven't used n() before, does it need a little explanation?~~
~~p137 What is the value of defining so many colors now when only two are needed for this problem? I think this should be explained to the reader.~~
~~p139 figure: I suggest using a different glyph (for the red point) in addition to size and color.~~
p145: (re: rotating the 3d image) This caption would apply to the online version, but not the print version.
- Added to Review: Global/big picture formatting revisions #96
~~p155: I got this warning:~~

Warning message:
“`step_upsample()` is deprecated as of recipes 0.1.13.
Please use `themis::step_upsample()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.”

~~This might be resolved by an update to recipes soon, but a novice might panic when they see this message!~~

trevorcampbell · 2021-07-16T01:20:18Z

From #146 regarding the first figure in this chapter

Probably for the chapter review (add this to the issue for that?) - it would be good to improve this figure to show how the key measurements we focus on in the chapter (e.g., concavity and perimeter) are extracted from this image.

~~From that same PR, regarding the text around the loading the data and variable descriptions area~~

~~> I think we now introduce and discuss factors in wrangling and then use them more heavily in the viz chapter (is this correct @leem44 ?) and so we can lighten up the discussion of them here now?~~

Also, I see we introduce pull here... is this the best place, or should we introduce this in the wrangling chapter? What do you think @leem44 ?

ML: adding this to Review: Ch 3 (wrangling) #103

trevorcampbell · 2021-07-17T02:09:45Z

From #146 comment by @ttimbers : need more informative axis labels in figures. The variables we have are the mean values across cells in a tissue sample.

However, I worry a bit that changing the axis labels will make the examples more confusing (because the new axis labels should be something like Mean Concavity (for example)).

I will make this same comment in the chapter-specific edits thread for classification 2.

leem44 · 2021-08-07T18:49:15Z

Also Tiffany's comments on classification here : #92

leem44 · 2021-09-23T20:27:45Z

All of these are done or made into new issues

This was referenced Aug 7, 2021

Review: Ch 3 (wrangling) #103

Closed

another classifier in classification chapter? #232

Open

leem44 mentioned this issue Aug 16, 2021

Review classification chapters #234

Merged

leem44 linked a pull request Sep 21, 2021 that will close this issue

Review classification chapters #234

Merged

leem44 closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: Ch 6 (classification) #106

Review: Ch 6 (classification) #106

leem44 commented Mar 28, 2021 •

edited

Loading

leem44 commented Mar 28, 2021 •

edited

Loading

trevorcampbell commented Mar 28, 2021 •

edited by leem44

Loading

trevorcampbell commented Mar 28, 2021 •

edited by leem44

Loading

trevorcampbell commented Jul 16, 2021 •

edited by leem44

Loading

trevorcampbell commented Jul 17, 2021

leem44 commented Aug 7, 2021

leem44 commented Sep 23, 2021

Review: Ch 6 (classification) #106

Review: Ch 6 (classification) #106

Comments

leem44 commented Mar 28, 2021 • edited Loading

Reviewer E:

leem44 commented Mar 28, 2021 • edited Loading

Reviewer B:

trevorcampbell commented Mar 28, 2021 • edited by leem44 Loading

Reviewer D

trevorcampbell commented Mar 28, 2021 • edited by leem44 Loading

Reviewer A

trevorcampbell commented Jul 16, 2021 • edited by leem44 Loading

trevorcampbell commented Jul 17, 2021

leem44 commented Aug 7, 2021

leem44 commented Sep 23, 2021

leem44 commented Mar 28, 2021 •

edited

Loading

leem44 commented Mar 28, 2021 •

edited

Loading

trevorcampbell commented Mar 28, 2021 •

edited by leem44

Loading

trevorcampbell commented Mar 28, 2021 •

edited by leem44

Loading

trevorcampbell commented Jul 16, 2021 •

edited by leem44

Loading