Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review: Ch 7 (classification_continued) #107

Closed
leem44 opened this issue Mar 28, 2021 · 5 comments · Fixed by #234
Closed

Review: Ch 7 (classification_continued) #107

leem44 opened this issue Mar 28, 2021 · 5 comments · Fixed by #234

Comments

@leem44
Copy link
Contributor

leem44 commented Mar 28, 2021

Reviewer E:

  • I love the cross validation diagram -- extremely helpful!
    • ML: no changes needed here
  • Consider defining “accuracy” more precisely and either defining “Kap” or removing it from tidymodels output. It’s a bit distracting to have it reported but unexplained
  • Consider spending slightly more time explaining confusion matrix and what each cell means
  • Chapter 7 is extremely dense and hits on so many foundational modeling concepts. I think some of this could be helpful to pull up before Chapter 6 and describe a holistic modeling workflow
@leem44
Copy link
Contributor Author

leem44 commented Mar 28, 2021

Reviewer B:

  • Wow, cross-validation in an intro course! The future has arrived! Cross-validation based error-estimates are much easier to understand than summaries like R-squared or AIC.
    • ML: no changes needed here

@trevorcampbell
Copy link
Contributor

trevorcampbell commented Mar 28, 2021

Reviewer D

  • it is unclear to me what the overall workflow advocated by authors should be
  • Should the readers/users split the data into training set, validation set and test set (such that the training set and the validation set combine to yield the overall training set)?
    Should they then tune the classifier by building it on the training set and evaluating it on the validation set?
    Once the classifier is tuned, should they assess its accuracy on the test set?
    Once accuracy of tuned classifier is established, should the classification model/technique be applied to the ENTIRE data (i.e., training set + validation set + test set) to perform classifications for new observations?
    • ML: overview at the end does this
  • If the 4 items described above capture the workflow intended by the authors, I find it confusing that this workflow is presented backwards in the manuscript – for example, the accuracy of the classifier is assessed BEFORE the classifier is actually tuned.
    • ML: I think this is difficult to explain tuning before explaining the accuracy
  • Can the authors clarify their intended workflow and make sure the chapters in which they present the elements of this workflow follow a logical sequence?

@trevorcampbell
Copy link
Contributor

trevorcampbell commented Mar 28, 2021

Reviewer A

  • "precision and recall, not discussed": Why not?
    • ML: I don't know if it makes sense to explain it here if we aren't going to do anything with it later. I will add an issue in case we want to add it in a later iteration precision/recall #230
  • p163: the figures are hard to follow (the flow of them) because they get jumbled in the PDf version. in the HTML version it's fine. actual comment: I expected to see the scatterplot as the first graphic. If you want the reader to see these in print, be sure to point to them specifically in the text.
  • p165 first code block: I think it's important to point out that you are not simply sampling the rows of the cancer data set, but are performing a stratified sample.
    -p166 You could emphasize the stratification by summarizing the split between M and B in each data set.
  • p169 Is that good? I feel like some wrap up of the performance would be good for the novice reader here.
  • 170 Does that mean you pool all the data and train your final classifier? Be very clear for the reader since this is an intro text.
  • p177 in the underfitting paragraph: So you want to balance these two issues: be clear about that.

@trevorcampbell
Copy link
Contributor

From #146 comment by @ttimbers : need more informative axis labels in figures. The variables we have are the mean values across cells in a tissue sample.

However, I worry a bit that changing the axis labels will make the examples more confusing (because the new axis labels should be something like Mean Concavity (for example)).

I will make this same comment in the chapter-specific edits thread for classification 1.

@leem44
Copy link
Contributor Author

leem44 commented Aug 12, 2021

From #146 comment by @ttimbers : need more informative axis labels in figures. The variables we have are the mean values across cells in a tissue sample.

However, I worry a bit that changing the axis labels will make the examples more confusing (because the new axis labels should be something like Mean Concavity (for example)).

I will make this same comment in the chapter-specific edits thread for classification 1.

I decided not add "Mean" in front of the labels since I think it might make it more confusing, but I did specify when the values were standardized e.g. Perimeter (standardized)

@leem44 leem44 linked a pull request Sep 21, 2021 that will close this issue
@leem44 leem44 closed this as completed Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants