R Data Analysis and Visualization Workshop

Workshop Lessons:

1. Reproducible Research

Learn what reproducible research is.
Gain implementable strategies for file structure and file naming.

2. Why R (and R Studio)?

Download Code 1 Handout here

Describe the purpose of the RStudio Script, Console, Environment, and Plots panes.
Organize files and directories for a set of analyses as an R Project, and understand the purpose of the working directory.
Use the built-in RStudio help interface to search for more information on R functions.
Demonstrate how to provide sufficient information for troubleshooting with the R user community.

3. Introduction to R

Define the following terms as they relate to R: object, assign, call, function, arguments, options.
Assign values to objects in R.
Learn how to name objects
Use comments to inform script.
Solve simple arithmetic operations in R.
Call functions and use arguments to change their default options.
Inspect the content of vectors and manipulate their content.
Subset and extract values from vectors.
Analyze vectors with missing data.

4. Starting with Data

Describe what the here package does
Read in tibbles
Compare read_csv to read.csv
Extract values from tibbles
Perform basic operations on columns in a tibble.

5. Manipulating Data with dplyr

Describe what the dplyr package in R is used for.
Apply common dplyr functions to manipulate data in R.
Employ the ‘pipe’ operator to link together a sequence of functions.
Employ the ‘mutate’ function to apply other chosen functions to existing columns and create new columns of data.
Employ the ‘split-apply-combine’ concept to split the data into groups, apply analysis to each group, and combine the results.
Convert tibbles from long to wide and back again

6. Exploratory Data Analysis

Download theCode 2 Handout here

Produce histograms, barplot, boxplots, violin plots and scatterplots plot using ggplot.
Describe what faceting is and apply faceting in ggplot.
Modify the aesthetics of an existing ggplot plot (including axis labels and color).
Build complex and customized plots from data in a data frame.

6. Introduction to Statistics

Describe the difference between populations and samples
Distinguish between data that follows a normal distribution and data that deviates in modality, skew, or kurtosis
Define the difference between null and alternate hypothesis
Define type I and type II error
Describe the relationship between alpha, beta and power

Download the full code handout here

Post-Workshop Survey (please do this before you leave the room)

So you want to learn more?

Attribution for each lesson is provided on each page. The first lesson was heavily based on material from five lessons were heavily based on material from the Reproducible Science Workshop and Reproducible Science Curriculum. The four R lessons are based on Data Carpentry, primarily the Ecology and Genomics lessons. Inspiration and some components were taken from the Modern Dive textbook.

The lesson on exploratory data analysis is based on a microbiological dataset that compares drug resistance measured in disk diffusion and broth microdilution assays.

All materials are made available under the Creative Commons Attribution license. License.