The University of Manitoba, MBIO 7040

This website and workshop was created by Aleeza Gerstein as a two- or three-day workshop for microbiology graduate students. The overarching goal was to provide an introduction to the R programming language (including the tidyverse) functions and to provide some instruction in data visualization and statistical analysis.

Pre-Workshop Setup

The workshop will be fully hands-on. You will need your own computer.

Download R AND R Studio before the workshop from here

R and RStudio require separate installations. R is the underlying statistical computing environment. RStudio is a graphical integrated development environment (IDE). You can think of R as the engine, and RStudio as the car. You need to install R before you install RStudio. R and RStudio should both be installed prior to attending the workshop. I have not budgeted any workshop time for assisting with installation issues and we will begin promptly at 9:30am on the first day.

Workshop Lessons:

1. Reproducible Research

  • Learn what reproducible research is.
  • Gain implementable strategies for file structure and file naming.

2. Why R (and R Studio)?

Download Code 1 Handout here

  • Describe the purpose of the RStudio Script, Console, Environment, and Plots panes.
  • Organize files and directories for a set of analyses as an R Project, and understand the purpose of the working directory.
  • Use the built-in RStudio help interface to search for more information on R functions.
  • Demonstrate how to provide sufficient information for troubleshooting with the R user community.

3. Introduction to R

  • Define the following terms as they relate to R: object, assign, call, function, arguments, options.
  • Assign values to objects in R.
  • Learn how to name objects
  • Use comments to inform script.
  • Solve simple arithmetic operations in R.
  • Call functions and use arguments to change their default options.
  • Inspect the content of vectors and manipulate their content.
  • Subset and extract values from vectors.
  • Analyze vectors with missing data.

4. Starting with Data

  • Describe what the here package does
  • Read in tibbles
  • Compare read_csv to read.csv
  • Extract values from tibbles
  • Perform basic operations on columns in a tibble.

5. Manipulating Data with dplyr

  • Describe what the dplyr package in R is used for.
  • Apply common dplyr functions to manipulate data in R.
  • Employ the ‘pipe’ operator to link together a sequence of functions.
  • Employ the ‘mutate’ function to apply other chosen functions to existing columns and create new columns of data.
  • Employ the ‘split-apply-combine’ concept to split the data into groups, apply analysis to each group, and combine the results.
  • Convert tibbles from long to wide and back again

6. Exploratory Data Analysis

Download theCode 2 Handout here

  • Produce histograms, barplot, boxplots, violin plots and scatterplots plot using ggplot.
  • Describe what faceting is and apply faceting in ggplot.
  • Modify the aesthetics of an existing ggplot plot (including axis labels and color).
  • Build complex and customized plots from data in a data frame.

6. Introduction to Statistics

  • Describe the difference between populations and samples
  • Distinguish between data that follows a normal distribution and data that deviates in modality, skew, or kurtosis
  • Define the difference between null and alternate hypothesis
  • Define type I and type II error
  • Describe the relationship between alpha, beta and power

Download the full code handout here

Post-Workshop Survey (please do this before you leave the room)

So you want to learn more?

Attribution for each lesson is provided on each page. The first lesson was heavily based on material from five lessons were heavily based on material from the Reproducible Science Workshop and Reproducible Science Curriculum. The four R lessons are based on Data Carpentry, primarily the Ecology and Genomics lessons. Inspiration and some components were taken from the Modern Dive textbook.

The lesson on exploratory data analysis is based on a microbiological dataset that compares drug resistance measured in disk diffusion and broth microdilution assays.

All materials are made available under the Creative Commons Attribution license. License.