###03-R_Start

### Creating objects in R

# What are the values stored in each object in the following (do this in your head)?
# mass <- 50              # mass?
# age  <- 122             # age?
# mass <- mass * 2.0      # mass?
# age  <- age - 22        # age?
# mass_index <- mass/age  # mass_index?

### Vectors and data types
# We’ve seen that atomic vectors can be of type character, numeric, integer, and
# logical. But what happens if we try to mix these types in a single
# vector?

# What will happen in each of these examples? (hint: use `class()` to
# check the data type of your object)
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")

# Why do you think it happens?
#How many values in `combined_logical` are "TRUE" (as a character) in the following example:
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
combined_logical <- c(num_logical, char_logical)

### Conditional Subsetting
#Can you figure out why “four” > “five” returns TRUE?

### Missing Data
#1. Using this vector of heights in inches, create a new vector, `heights_no_na`, with the NAs removed.
heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)

#2. Use the function median() to calculate the median of the heights vector.

#3. Use R to figure out how many people in the set are taller than 67 inches.


###04-Starting with Data

### Download E coli data
library(here)
library(tidyverse)
download.file("https://raw.githubusercontent.com/datacarpentry/R-genomics/gh-pages/data/Ecoli_metadata.csv",
              here("data_in", "Ecoli_citrate.csv"))
here("data_in", "Ecoli_citrate")
Ecoli_citrate <- read_csv(here("data_in", "Ecoli_citrate.csv"))

###05-Manipulating data with dplyr

### Pipes
#Using pipes, subset `Ecoli_citrate` to include rows where the clade is ‘Cit+’ and keep only the columns `sample`, `cit`, and `genome_size`
#How many rows are in that tibble?

#Using pipes, subset `Ecoli_citrate` to include rows where the genome size is greater than 4.6 and generation is less than 20 000. Then keep on the columns named sample, run, and clade.

### Mutate
#Create a tibble containing each unique clade (removing the samples with unknown clades) and the rank of it's mean genome size. (note that #ranking genome size will not sort the table; the row order will be unchanged. You can use the `arrange()` function to sort the table).
# There are several functions for ranking observations, which handle tied values differently. For this exercise it doesn’t matter which #function you choose. Use the help options to find a ranking function.

# Hint: think about how many commands are required and what their order should be to produce this tibble!

### BONUS
# Challenge:
#  There are a few mistakes in this hand-crafted `data.frame`,
#  can you spot and fix them? Don't hesitate to experiment!
animal_data <- data.frame(
       animal = c(dog, cat, sea cucumber, sea urchin),
       feel = c("furry", "squishy", "spiny"),
       weight = c(45, 8 1.1, 0.8)
       )

# Challenge:
#   Can you predict the class for each of the columns in the following example?
#   Check your guesses using `str(country_climate)`:
#   * Are they what you expected? Why? why not?
#   * What would have been different if we had added `stringsAsFactors = FALSE`
#     when we created this data frame?
#   * What would you need to change to ensure that each column had the
#     accurate data type?
country_climate <- tibble(country = c("Canada", "Panama", "South Africa", "Australia"),
                                climate = c("cold", "hot", "temperate", "hot/temperate"),
                                temperature = c(10, 30, 18, "15"),
                                northern_hemisphere = c(TRUE, TRUE, FALSE, "FALSE"),
                                has_kangaroo = c(FALSE, FALSE, FALSE, 1))