-
Notifications
You must be signed in to change notification settings - Fork 0
/
04_exercises_day2.Rmd
219 lines (171 loc) · 7.39 KB
/
04_exercises_day2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# Exercises Day 2
```{r include = FALSE}
# Hide/show R chunks
knitr::opts_chunk$set(class.source = "fold-hide", results = "hide")
```
Write an executable and commented R script.
## Exercise 1 {-}
Load the built-in data set `cars`. Find out about the data with
`?cars`. Create a scatter plot of "Distance" as a function of "Speed."
Adjust the size of the margins (`mai`) and the spacing between labels
and axes (`mgp`). Set `type = "n"` and `axes = FALSE` and create the
plot from scratch, step by step adding `points()`, `axis()`, `legend()`,
etc.
Export the plot to a png or pdf file. All labels should be readable, so
reduce `width` and `height` of the output device.
```{r, results = "hide", fig.height = 3.375, fig.width = 3.375}
par(mai = c(.6,.6,.15,.1), mgp = c(2,.7,0), tck = -.015)
plot(cars, type = "n", axes = FALSE, xlim = c(0, 30),
xlab = "Speed (mph)", ylab = "Stopping distance (ft)")
points(cars, pch = 16)
axis(side = 1)
axis(side = 2)
abline(lm(dist ~ speed, cars), lwd = 2)
box(bty = "L")
legend(0, 120, expression(hat(y) == hat(beta)[0] + hat(beta)[1] * Speed),
lty = 1, lwd = 2, bty = "n")
```
## Exercise 2 {-}
Load `Vocabulary.txt` into R again. Make a scatter plot of "score in
vocabulary test" as a function of "years of education." Set `pch = "."`
and use the `jitter()` function to actually see the structure in the
data.
Add the mean vocabulary score for each year of education to your plot.
Use `aggregate()` or `tapply()` to get these means. Add the standard
errors (use `arrows()`).
Export the plot to a file. Play with the graphical parameters. The goal
is a publication-ready figure.
```{r, results = "hide", fig.height = 3.375, fig.width = 3.375}
voc <- read.table("data/Vocabulary.txt", header = TRUE, stringsAsFactors = TRUE)
# calculate standard errors
vocm <- aggregate(vocabulary ~ education, voc, mean)
vocm$sd <- aggregate(vocabulary ~ education, voc, sd)[,2]
vocm$n <- aggregate(vocabulary ~ education, voc, length)[,2]
vocm$se <- vocm$sd / sqrt(vocm$n)
par(mai = c(.6,.6,.15,.1), mgp = c(2,.7,0), tck = -.015)
plot(jitter(vocabulary, 2) ~ jitter(education, 2), voc, pch = ".",
cex = .8, xlab = "Education (in years)", ylab = "Vocabulary (words correct)",
col = "darkgrey")
points(vocabulary ~ I(0:20), vocm, pch = 16)
with(vocm, arrows(0:20, vocabulary + se, 0:20, vocabulary - se, angle = 90,
length = .05, code = 3))
```
### Solution with `ggplot2` {-}
```{r, fig.width = 3.375, fig.height = 3.375}
library(ggplot2)
ggplot(data = vocm, mapping = aes(x = education, y = vocabulary)) +
geom_jitter(data = voc, mapping = aes(x = education, y = vocabulary), pch=".",
color = "gray") +
labs(x = "Education (in years)", y = "Vocabulary (words correct)") +
geom_point() + #geom_line() +
geom_errorbar(aes(ymin = vocabulary - se, ymax = vocabulary + se), width = 0.5) +
theme_bw()
```
## Exercise 3 {-}
Find out if there are sex differences in the relationship of "years of
education" and "score in vocabulary test," and if these differences
depend on the year the test was taken in. Load the `lattice` package and
use the `xyplot()` function. You can also use functions from the
`ggplot2` package. Hint: Make one scatter plot for each year
of education with one regression line for men and one for women.
```{r, fig.height = 6.75, fig.width = 6.75}
library(lattice)
voc$year <- factor(voc$year)
xyplot(jitter(vocabulary, 2) ~ jitter(education, 2) | year, data = voc,
groups = sex, type = c("p", "r"), col = c("darkgrey","black"), pch = ".", lwd = 2,
xlab = "Education (in years)", ylab = "Vocabulary (words correct)", lty = 1:2,
scales = list(tck = .5, cex = .7),
par.strip.text = list(cex = .7),
par.settings = list(strip.background = list(col = "lightgrey")),
key = list(lines = list(lwd = 2, col = c("darkgrey", "black"), lty = 1:2),
text = list(levels(voc$sex)), columns = 2)
)
```
### Solution with `ggplot2` {-}
```{r, message = FALSE, fig.width = 7, fig.height = 6.75}
ggplot(data = voc, mapping = aes(x = education, y = vocabulary, color = sex)) +
geom_jitter(pch=".") +
labs(x = "Education (in years)", y = "Vocabulary (words correct)") +
stat_smooth(method = "lm", se = FALSE) +
theme_bw() +
facet_wrap(vars(year))
```
## Exercise 4 {-}
Subjects are supposed to solve two mental exercises. Time needed will be
measured in minutes. Between the two exercises subjects get a training
which is supposed to enhance speed in solving these kind of mental
exercises. Ten subjects are tested. The following table shows the
solving times:
```{r}
dat <- data.frame(
Vp = 1:10,
A = c(5.68, 4.52, 5.56, 5.17, 4.67, 5.97, 5.65, 6.04, 4.94, 5.17),
B = c(5.45, 4.37, 5.31, 5.19, 4.18, 5.44, 5.19, 5.84, 4.95, 5.04)
)
```
```{r, echo = FALSE, tab.align = "center", results = "show"}
knitr::kable(dat, col.names = c("Subj", "Exercise A", "Exercise B"), align = "c")
```
a. Create a data frame that contains the depicted variables. Use the
function `reshape()` to transform the data frame to the "long"
format.
```{r}
dat2 <- reshape(dat, idvar = "Vp", timevar = "Aufg", varying = list(2:3),
v.names = "Lzeit", direction = "long")
dat2$Vp <- factor(dat2$Vp)
dat2$Aufg <- factor(dat2$Aufg, labels = c("A", "B"))
rownames(dat2) <- NULL
```
b. Plot the solving time for each subject depending on the exercise (one
panel per person). Use `xyplot()` from the `lattice` package. Export
your graphic as a png or pdf file.
```{r, fig.width = 6, fig.height = 5, fig.align = "center"}
xyplot(Lzeit ~ Aufg | Vp, dat2, as.table = TRUE, type = c("g", "b"),
pch = 16, xlab = "Exercise", ylab = "Solving time (in min)")
```
c. Check if the solving times differ for both exercises using a Wilcoxon
test ($\alpha = 0.05$).
```{r}
with(dat2, wilcox.test(Lzeit[Aufg == "A"], Lzeit[Aufg == "B"], paired = TRUE))
```
d. Check the same hypothesis using a $t$ test ($\alpha = 0.05$).
```{r}
with(dat2, t.test(Lzeit[Aufg == "A"], Lzeit[Aufg == "B"], paired = TRUE))
```
## Exercise 5 {-}
For $n = 12$ subjects, intelligence (variable $X$) and memory
performance (variable $Y$) have been measured. The following value pairs
were obtained:
```{r}
dat <- data.frame(
Vp = 1:12,
int = c(106, 92, 90, 118, 87, 123, 118, 115, 111, 95, 110, 108),
mem = c(94, 100, 84, 126, 95, 93, 120, 103, 100, 102, 120, 125)
)
```
```{r, echo = FALSE, tab.align = "center", results = "show"}
tab <- t(dat[, -1])
rownames(tab) <- c("$X$", "$Y$")
knitr::kable(tab, digits = 0, align = "c")
```
a. Create a data frame for these data.
b. Check the normality assumption separately for each variable using
quantile-quantile plots. Interpret the results.
c. Plot memory performance depending on intelligence in a scatter plot
and add a regression line. Export your graphic as a png or pdf
file.
d. Check the hypothesis $\rho_{XY} = 0$ (if $X$ and $Y$ are
uncorrelated) using an appropriate statistical test ($\alpha = 0.05$).
```{r qq_iqmemory, fig.width = 6, fig.height = 3.375}
par(mfrow = c(1, 2), mai = c(.6, .6, .6, .1), mgp = c(2, .7, 0))
qqnorm(dat$int, main = "IQ")
qqline(dat$int)
qqnorm(dat$mem, main = "Memory")
qqline(dat$mem)
```
```{r iqmemory, fig.width = 4, fig.height = 4, fig.align = "center"}
par(mai = c(.6, .6, .1, .1), mgp = c(2, .7, 0))
plot(mem ~ int, dat, xlab = "IQ", ylab = "Memory")
abline(lm(mem ~ int, dat))
cor.test(~ int + mem, dat, method = "pearson")
```