-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chapter 6 (classification I) further improvements #92
Comments
|
- |
|
Agreed, but when we first introduce the term we should put in a comment box the synonyms that people commonly use
This must already be explained somewhere, no? Chapter 7 maybe? Check there before editing. If not, it definitely should be. I remember talking about this at least a few times in class, and students having some trouble with it.
Only if it makes sense to introduce it in this section. All the other comments I agree with! Big comment: before we do any of these things, should read Chapter 7 as carefully, because certain organizational choices may become clearer. |
…ome paragraphs to clarify functions, removing unneeded explanations of functions which we will explain in earlier chapters, updating scaling/centering explanation, specifying data in balancing section
All done or made into new issues |
Tiffany's suggested improvements from reading it in the 2021 spring:
We can remove the discussion forforcats
as it is loaded with thetidyverse
. However, we should keep the sentence that discusses what factors are and why they are useful. We can move that to the first time we manipulate factors in this chapter?When we introduce the variables, we define all of them in the numerical list (even the relatively obvious ones) except for ID number, symmetry and fractal dimension. To be consistent, we should do this for all of them.In the following sentence we should explicitly state the B is benign and M is malignant: "Given that we only have 2 different values in our Class column (B and M), we only expect to get two names back."We don't use units in this chapter, we should... even the scaled data could be labelled as "μm scaled" (or whatever is appropriateThis line of code is too long and should be broken across several lines to increase readability:mutate(dist_from_new = sqrt((Perimeter - new_obs_Perimeter)^2 + (Concavity - new_obs_Concavity)^2)) %>%
(alternatively, we could shorten the variable names to that it is readable in one line, given that it is a mathematical equation).slice
in the data wrangling chapter, so we can just use it here. Again, this would make this book more standalone, we currently only do this in the worksheet/tutorial).In the section titled "More than two explanatory variables" I think we should explicitly show the formula for 3 features, and then after that we can show the general formula for m features. I wonder if we even want to repeat a calculation for a single nearest neighbour, or 3 even, for 3 features and annotate our 3D plot like we do for our 2D plot. ML: making new issue, will do if time: Annotate 3D plot in classification chapter #228I think we need a better explanation of whatset_engine
is doing, and why we need it. It seems like it is not always required - only if you want to add arguments to the model that exist in non-base packages? I am not 100% certain in that definition, but what we have written down currently is not clear, even to me. Also, after we fit the model we write: It confirms that the computational engine used to train the model waskknn::train.kknn
which is also a bit unclear, as in set engine we only set the package name, how did it know to use thetrain.kknn
function from thekknn
package, and not some other function...When we first usefit
we use some fairly abstract syntax right awayfit(Class ~ ., data = cancer_train)
and do not really explain it. Here I think we should write the formula our first fully:fit(Class ~ Perimeter + Concavity, data = cancer_train)
and then re-write it withClass ~ .
and explain what~ .
means.We should not use the word "tune" in this chapter, since they don't know what it means yet (sentence we use it in: "we will actually let R tune the model for us.", might also be other places, we should check)The text was updated successfully, but these errors were encountered: