-
Notifications
You must be signed in to change notification settings - Fork 1
/
team.Rmd
233 lines (172 loc) · 8.07 KB
/
team.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
title: "Lab 4 - Team"
author: "Ashley Sackpraseuth, Adam Ford"
date: "10/11/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Loading Data
```{r}
library(tidyverse)
library(readxl)
defense <- read_excel('cyclonesFootball2019.xlsx', sheet='Defensive')
str(defense)
offense <- read_excel('cyclonesFootball2019.xlsx', sheet='Offensive')
str(offense)
biography <- read_excel('cyclonesFootball2019.xlsx', sheet='Biography')
str(biography)
```
## Part one: Cleaning data
The player names and opponent names are factors.
All offensive and defensive statistics are numerical. The Weight column in biogrpahy is also numeric. (Optional: Learn to mutate multiple columns systematically using the ?across function in `dplyr)
Change the Height column in biogrpahy into numeric. (Hint: Use a unit that works well)
```{r}
defClean <- defense %>%
mutate(across(c("Name", "Opponent_Opponent"), as.factor)) %>%
mutate_if(is.character, as.numeric, na.rm = TRUE)
str(defClean)
offClean <- offense %>%
mutate(across(c("Name", "Opponent_Opponent"), as.factor)) %>%
mutate_if(is.character, as.numeric, na.rm = TRUE)
str(offClean)
bioClean <- biography %>%
mutate(Name = as.factor(Name)) %>%
mutate(Height = gsub("-", ".", Height)) %>% #Height is converted to ft.in
mutate(across(c("Weight", "Height"), as.numeric))
str(bioClean)
```
## Part two: Tidying
1.Reformat the `defClean` data frame into a tidy format using `pivot_longer`. The type of statistic (Tackles_Solo, Tackles_ASST, etc) is added as a new key column named `stat`.
```{r}
defCleanWide <- defClean
defClean <- defClean %>%
pivot_longer(Tackles_Solo:Pass_PB, names_to = 'stat', values_to='score')
print(defClean)
```
2. Compare the distributions of the defensive statistics. What defensive skills are rare?
```{r}
ggplot(defClean, aes(x = score)) + geom_histogram() + facet_wrap(~stat, scales = 'free_y') + ggtitle("Defense Statistics Distribution") + xlab("Score") + ylab("Count")
```
Turnovers (Turnover_FF, Turnover_FR, Turnover_INT) and Tackles_Sack are rare defensive skills.
3. Did ISU have better defense against Iowa or Notre Dame? Answer this question by creating a scatterplot with x- and y-axes being the number of solo tackles (of each player). A large number of solo tackles is an indicator of good defense.
```{r}
soloTackles <- defClean %>%
filter(Opponent_Opponent == 'Iowa' | Opponent_Opponent == 'Notre Dame', stat == 'Tackles_Solo') %>%
pivot_wider(names_from = Opponent_Opponent, values_from= score)
ggplot(soloTackles, aes(x=Iowa, y=`Notre Dame`)) + geom_point() + xlab('Solo tackles in Iowa game') + ylab('Solo tackles in Notre Dame game') + ggtitle("Notre Dame vs. Iowa Solo Tackles")
```
According to the plot, Iowa State had better defense in the Notre game, because the plot shows more points with 2 or more solo tackles.
4. Separate the `Hometown` column into two columns, namely the city and the state. Print the first few rows of these two columns. (Hint: look at the sep= argument of separate)
```{r}
bioClean <- bioClean %>%
separate(Hometown, into = c("City", "State"), sep = ", ")
head(bioClean)
```
5. How many players are there from each state?
```{r}
bioClean %>%
group_by(State) %>%
summarize(numPlayers = n())
```
## Part three: Joining data frames
1. Find which states do the ISU defensive and offensive players come from. Create an appropriate summary.
```{r}
defWithHomes <- defClean %>%
select(Name) %>%
distinct() %>%
left_join(bioClean, by = "Name")
defWithHomes %>%
group_by(State) %>%
summarize(numPlayers = n()) %>%
na.omit() %>%
arrange(desc(numPlayers))
offWithHomes <- offClean %>%
select(Name) %>%
distinct() %>%
left_join(bioClean, by = "Name")
offWithHomes %>%
group_by(State) %>%
summarize(numPlayers = n()) %>%
na.omit() %>%
arrange(desc(numPlayers))
```
2. How does the performance of quarterback Brock Purdy relate to the performance of the ISU team?
```{r}
offClean$YdsResponsibleFor <- rowSums(offClean[,c("Rushing_YDS", "Receiving_YDS", "Passing_YDS")], na.rm=TRUE)
offClean %>%
group_by(Name) %>%
summarize(AvgYdsResponsibleFor = mean(YdsResponsibleFor)) %>%
arrange(desc(AvgYdsResponsibleFor))
```
I added a conglomerate statistic for offensive players that is Yards Responsible For which consists of a sum of passing, rushing and receiving. Then averaged the stat for each player. It is obvious that Brock Purdy's performance has the greatest single impact on the Iowa State team.
3. Which ISU player(s) made big progress over the last year?
```{r}
defense18 <- read_excel('cyclonesFootball2018.xlsx', sheet='Defensive')
defCleanWide18 <- defense18 %>%
mutate(across(c("Name", "Opponent_Opponent"), as.factor)) %>%
mutate_if(is.character, as.numeric, na.rm = TRUE)
defCleanWide19 <- defCleanWide
defCleanWide18 <- defCleanWide18 %>%
group_by(Name) %>%
select(-Opponent_Opponent) %>%
summarize_all(list(mean), na.rm=TRUE)
defCleanWide19 <- defCleanWide19 %>%
group_by(Name) %>%
select(-Opponent_Opponent) %>%
summarize_all(list(mean), na.rm=TRUE)
colnames(defCleanWide18) <- paste("Last", colnames(defCleanWide18), sep = "_")
colnames(defCleanWide19) <- paste("This", colnames(defCleanWide19), sep = "_")
DefCompare <- inner_join(defCleanWide18, defCleanWide19, by = c("Last_Name" = "This_Name"))
DefCompare$DiffTacklesSolo <- DefCompare$This_Tackles_Solo - DefCompare$Last_Tackles_Solo
DefCompare$DiffTacklesSack <- DefCompare$This_Tackles_Sack - DefCompare$Last_Tackles_Sack
DefCompare$DiffTacklesAsst <- DefCompare$This_Tackles_ASST - DefCompare$Last_Tackles_ASST
DefCompare %>%
select(Last_Name, DiffTacklesSolo, DiffTacklesSack, DiffTacklesAsst) %>%
arrange(desc(DiffTacklesSolo)) %>%
head(1)
DefCompare %>%
select(Last_Name, DiffTacklesSolo, DiffTacklesSack, DiffTacklesAsst) %>%
arrange(desc(DiffTacklesSack)) %>%
head(1)
DefCompare %>%
select(Last_Name, DiffTacklesSolo, DiffTacklesSack, DiffTacklesAsst) %>%
arrange(desc(DiffTacklesAsst)) %>%
head(1)
```
These result shows the difference of the average of each player's important stats last season versus this season. We see Anthony Johnson most improved in Solo tackles and O'Rien Vance most improved in Sacks and Assisted Tackles.
```{r}
offense18 <- read_excel('cyclonesFootball2018.xlsx', sheet='Offensive')
offCleanWide18 <- offense18 %>%
mutate(across(c("Name", "Opponent_Opponent"), as.factor)) %>%
mutate_if(is.character, as.numeric, na.rm = TRUE)
offCleanWide19 <- offClean
offCleanWide18 <- offCleanWide18 %>%
group_by(Name) %>%
select(-Opponent_Opponent) %>%
summarize_all(list(mean), na.rm=TRUE)
offCleanWide19 <- offCleanWide19 %>%
group_by(Name) %>%
select(-Opponent_Opponent) %>%
summarize_all(list(mean), na.rm=TRUE)
colnames(offCleanWide18) <- paste("Last", colnames(offCleanWide18), sep = "_")
colnames(offCleanWide19) <- paste("This", colnames(offCleanWide19), sep = "_")
OffCompare <- inner_join(offCleanWide18, offCleanWide19, by = c("Last_Name" = "This_Name"))
OffCompare$DiffRecYds <- OffCompare$This_Receiving_YDS - OffCompare$Last_Receiving_YDS
OffCompare$DiffPassingYds <- OffCompare$This_Passing_YDS - OffCompare$Last_Passing_YDS
OffCompare$DiffRushingYds <- OffCompare$This_Rushing_YDS - OffCompare$Last_Rushing_YDS
OffCompare %>%
select(Last_Name, DiffRecYds, DiffPassingYds, DiffRushingYds) %>%
arrange(desc(DiffRecYds)) %>%
head(1)
OffCompare %>%
select(Last_Name, DiffRecYds, DiffPassingYds, DiffRushingYds) %>%
arrange(desc(DiffPassingYds)) %>%
head(1)
OffCompare %>%
select(Last_Name, DiffRecYds, DiffPassingYds, DiffRushingYds) %>%
arrange(desc(DiffRushingYds)) %>%
head(1)
```
These results are for the offensive players. Some NaN exist in categories where players have not done such an action, however this will not impact finding the "improvers". In Receiving Yards, Charlie Kolar has improved the most since last season. Brock Purdy has improved the most in Passing Yards. And Johnnie Lang has improved the most in Rushing Yards.