This website and workshop was created by Aleeza Gerstein as a two-day workshop for microbiology graduate students. The overarching goal was to provide an introduction to the R programming language (including the `tidyverse`

) functions and to provide some instruction in data visualization and statistical analysis.

Attribution for each lesson is provided on each page. The first five lessons were heavily based on material from (Data Carpentry)[https://datacarpentry.org/], primarily the Ecology and Genomics lessons. Inspiration and some components were taken from the Modern Dive textbook.

The lesson on exploratory data analysis is based on a microbiological dataset that compares drug resistance measured in disk diffusion and broth microdilution assays.

Markdown files are available on the course github repo in the `scripts`

directory. This wosrkshop/course is actively being revised (particularly the introduction to statistics section) and suggestions or feedback are welcome; please submit as github issue or email Aleeza.

All materials are made available under the Creative Commons Attribution license. License.

(~ 30 minutes)

* Implement best practices in data table formatting

* Identify and address common formatting mistakes

Download the data for the spreadsheet exercise here.

(~ 60 minutes)

* Describe the purpose of the RStudio Script, Console, Environment, and Plots panes. * Organize files and directories for a set of analyses as an R Project, and understand the purpose of the working directory. * Use the built-in RStudio help interface to search for more information on R functions. * Demonstrate how to provide sufficient information for troubleshooting with the R user community.

Download the Day 1 Code Handout here

(~ 75 minutes)

* Define the following terms as they relate to R: object, assign, call, function, arguments, options.

* Assign values to objects in R.

* Learn how to name objects

* Use comments to inform script.

* Solve simple arithmetic operations in R.

* Call functions and use arguments to change their default options.

* Inspect the content of vectors and manipulate their content.

* Subset and extract values from vectors.

* Analyze vectors with missing data.

(~ 60 minutes)

* Describe what the `here`

package does * Read in tibbles * Compare `read_csv`

to `read.csv`

* Extract values from tibbles * Perform basic operations on columns in a tibble.

Download the Day 2 Code Handout here

Download the Day 2 Code Handout with full code here

(~ 90 minutes)

* Describe what the dplyr package in R is used for.

* Apply common dplyr functions to manipulate data in R.

* Employ the ‘pipe’ operator to link together a sequence of functions.

* Employ the ‘mutate’ function to apply other chosen functions to existing columns and create new columns of data.

* Employ the ‘split-apply-combine’ concept to split the data into groups, apply analysis to each group, and combine the results.

* Convert tibbles from long to wide and back again

(~ 40 minutes)

* Describe the difference between populations and samples

* Distinguish between data that follows a normal distribution and data that deviates in modality, skew, or kurtosis

* Define the difference between null and alternate hypothesis

* Define type I and type II error

* Describe the relationship between alpha, beta and power

(~ 180 minutes)

* Produce histograms, barplot, boxplots, violin plots and scatterplots plot using ggplot. * Describe what faceting is and apply faceting in ggplot. * Modify the aesthetics of an existing ggplot plot (including axis labels and color). * Build complex and customized plots from data in a data frame. * Be wowed by the power of R and compelled to keep using it after this workshop