forked from matthieugomez/statar
-
Notifications
You must be signed in to change notification settings - Fork 0
/
data-frames.Rmd
executable file
·64 lines (48 loc) · 1.4 KB
/
data-frames.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: "statar"
author: "Matthieu Gomez"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Data.frames function}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
## sum_up
`sum_up` prints detailed summary statistics (corresponds to Stata summarize)
```R
N <- 100
df <- data_frame(
id = 1:N,
v1 = sample(5, N, TRUE),
v2 = sample(1e6, N, TRUE)
)
sum_up(df)
df %>% sum_up(starts_with("v"), d = TRUE)
df %>% group_by(v1) %>% sum_up()
```
## tab
`tab` prints distinct rows with their count. Compared to the dplyr function `count`, this command just adds Frequency and Cumulative frequency.
```R
N <- 1e2 ; K = 10
df <- data_frame(
id = sample(5, N, TRUE),
v1 = sample(5, N, TRUE)
)
tab(df, id, v1)
tab(df, id, v1, na.rm = TRUE)
df %>% group_by(id) %>% tab(v1)
```
## join
`join` is a wrapper for dplyr merge functionalities, with two added functions
- The option `check` checks there are no duplicates in the master or using data.tables (as in Stata).
```r
# merge m:1 v1
join(x, y, kind = "full", check = m~1)
```
- The option `gen` specifies the name of a new variable that identifies non matched and matched rows (as in Stata).
```r
# merge m:1 v1, gen(_merge)
join(x, y, kind = "full", gen = "_merge")
```
- The option `update` allows to update missing values of the master dataset by the value in the using dataset