The R script run_analysis.R
was created to gather data, clean data, and created a tidy summary dataset of the UCI HAR Dataset. run_analysis.R
contians the following functions:
-
###runAnalysis This function is a wrapper to run all necessary functions start to finish and write the tidy summary dataset to uci-har-means-tidy.txt. To view the output run the following code:
data <- read.table('uci-har-means-tidy.txt', header = TRUE)
View(data)
-
###getLibraries This script requires the dplyr library. This function will check for this library and download and/or load if necessary.
-
###getData This function checks the working directory for the
UCI HAR Dataset
. If the folder does not exist the data is downloaded, unziped, and the zipped file is removed. The unziped folder location is reported to the user. -
###createCombinedDataset This function does the following:
- Reads all necessary tables into R.
- Makes a unique feature list to avoid duplicate label issues with dplyr select.
- Relabels the columns in both test and train data to the features.
- Adds the y for each test and train dataset and labels is
activity
. - Adds the
subject
to each test and train dataset. - Combines the test and train data into one dataset.
- Changes
activity
to a factor and resets the levels to the tidy activity labels. - Returns the combined labeled dataset.
- ###createSummaryDataset This functions takes the output of the createCombinedDataset function and does the following:
- Selects a subset of columns that contain:
subject
,activity
, and any field containing the wordmean
orstd
in the label. - Groups the data by
subject
andactivity
then summarizes all measurement fields by the mean. - Creates tidy column labels by the following:
a. removing hypens.
b. removing parenthesis.
c. applying camel case.
d. changes
BodyBody
toBody
. - returns the tidy summary dataset.