-
Notifications
You must be signed in to change notification settings - Fork 0
/
day_4_hypothesis.qmd
222 lines (175 loc) · 6 KB
/
day_4_hypothesis.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: 'Hypothesis testing'
subtitle: "ggstatsplot"
author: "Steen Harsted"
date: today
format:
html:
toc: true
toc-depth: 2
code-fold: true
code-summary: "Show me the code"
embed-resources: true
execute:
eval: true
echo: true
output: false
warning: false
message: false
---
```{r}
#| echo: false
library(tidyverse)
library(here)
library(ggstatsplot)
library(rstatix)
library(AER)
library(easystats)
library(GGally)
library(gt)
library(gtsummary)
data("PhDPublications")
library(palmerpenguins)
```
# `ggstatsplot`
<br><br>
Install the `ggstatsplot` and `rstatix` packages and add the library calls for these packages to your library code chunk
<br><br>
## The `bugs_long` dataset
`bugs_long` provides information on the extent to which men and women want to kill arthropods that vary in disgustingness (low, high) and freighteningness (low, high) (four groups in total). Each participant rated their attitude towards all four kinds of anthropods. `bugs_long` is a subset of the data reported by [Ryan et al.(2013)](https://www.sciencedirect.com/science/article/abs/pii/S0747563213000277) .
Note that this is a repeated measures design because the same participant gave four different ratings across four different conditions (LDLF, LDHF, HDLF, HDHF).
* `desire` - The desire to kill an arthropod was indicated on a scale from 0 to 10
* `gender` Male/Female
* `region`
* `condition`
- **LDLF**: low disgustingness and low freighteningness
- **LDHF**: low disgustingness and high freighteningness
- **HDLF**: high disgustingness and low freighteningness
- **HDHF**: high disgustingness and high freighteningness
```{r}
#| output: true
#| echo: false
#| fig-cap: "Picture from Ryan et al. (2013) https://doi.org/10.1016/j.chb.2013.01.024"
# knitr::include_graphics(here("presentations", "img" , "bugs_conditions.jpg"))
knitr::include_graphics(here("gfx" , "bugs.jpg"))
```
<br><br>
#### In `bugs_long`, is there a difference within the participants in their `desire` to kill bugs from the four different `conditions`?
* Should you use `ggwithinstats()` or `ggbetweenstats()` when comparing
* Is it reasonable to assume normality?
```{r}
bugs_long %>% group_by(condition) %>% shapiro_test(desire)
# qqplot
bugs_long %>%
ggplot(aes(sample = desire, group = condition)) +
geom_qq()+
geom_qq_line()
# Density plot
bugs_long %>%
ggplot(aes(x = desire, fill = condition)) +
geom_density(alpha = 0.2)
```
* Make the appropriate test
```{r}
bugs_long %>%
ggwithinstats(x = condition,
y = desire,
type = "nonparametric")
# Note that the ggstatstutorial actually runs this as a "parametric" test
```
* What is the name of the statistical test that was performed?
* What is your interpretation?
* What is the consequence if you change the type of test?
<br><br>
#### Is there a difference between men and women in the `desire` to kill bugs that are **LDHF** (low disgustingness and high freighteningness).
* Create a filtered data frame of bugs_long
```{r}
bl_LDHF <- bugs_long %>% filter(condition == "LDHF")
```
* Should you use `ggwithinstats()` or `ggbetweenstats()` for this test?
* Is it reasonable to assume normality?
```{r}
bl_LDHF %>%
filter(!is.na(gender), !is.na(desire)) %>%
group_by(gender) %>%
shapiro_test(desire)
# qqplot
bl_LDHF %>%
ggplot(aes(sample = desire, color = gender)) +
geom_qq()+
geom_qq_line()
# Density plot
bl_LDHF %>%
ggplot(aes(x = desire, fill = gender)) +
geom_density(alpha = 0.2)
```
* Make the appropriate test
```{r}
bl_LDHF %>%
ggbetweenstats(x = gender,
y = desire,
type = "nonparametric")
```
* What is the name of the statistical test that was performed?
* What is your interpretation?
* What is the consequence if you change the type of test?
<br><br>
#### Is there a difference in the frequency of men and women between `North America` and the remaining regions?.
* First, lump `region` togehter to two levels (`fct_lump()`)
* Reduce the data to one row pr subject ID, and discuss why this is a good idea.
```{r}
bl_region <- bugs_long %>%
mutate(region = fct_lump(region, 1)) %>%
group_by(subject) %>%
slice(1) %>%
ungroup()
```
* Should you use `ggwithinstats()` or `ggbetweenstats()` or perhaps `ggbarstats()` for this test?
* Should you asses normality?
```{r}
# Both variables are factors (categorical). Normality has to do with continuous data
```
* Make the appropriate test
```{r}
bl_region %>%
ggbarstats(x = gender,
y = region)
```
* What is the name of the statistical test that was performed? (check the help page under the `paired` argument)
* What is your interpretation?
* What is the consequence if you change the type of test?
<br><br>
## The `ToothGrowth` dataset
`ToothGrowth` gives information on tooth length in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
* `len` Tooth length
* `supp` Supplement type
- **VC** Vitamin C as ascorbic acid
- **OJ** Orange Juice
* `dose` Dose in milligrams/day (0.5, 1, or 2)
<br><br>
#### Is there a difference in Tooth length based on the type of supplement?
* Should you use `ggwithinstats()` or `ggbetweenstats()` when comparing
* Is it reasonable to assume normality?
```{r}
ToothGrowth %>% group_by(supp) %>% shapiro_test(len)
# qqplot
ToothGrowth %>%
ggplot(aes(sample = len, color = supp)) +
geom_qq()+
geom_qq_line()
# Density plot
ToothGrowth %>%
ggplot(aes(x = len, fill = supp)) +
geom_density(alpha = 0.2)
```
* Make the appropriate test
```{r}
ToothGrowth %>%
ggbetweenstats(x = supp,
y = len,
type = "robust")
# It is likely completely fine to run this as a "parametric" test
```
* What is the name of the statistical test that was performed?
* What is your interpretation?
* What is the consequence if you change the type of test?