Question 1

R Code

Load the data using the command a similar to:

dat <- read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/lab1/vidsales3040.csv")

You could have named the data set almost anything other than dat.

Answer

Blank.

Question 2

R code

In order to use only observations for which Global_Sales is greater than or equal to 1 million dollars, you may use the command:

dat <- dat[dat$Global_Sales >= 1, ]

The above command overwrites the original data set. You could have created a new data set under a different name. Whichever you choose, you must make sure that you are using data for which sales are greater than 1 million for the remainder of the assignment.

Answer

Blank.

Question 3

R code

To calculate the sample mean and variance of Global_Sales and Critic_Score, you can use:

mean(dat$Global_Sales)
## [1] 2.667646
var(dat$Global_Sales)
## [1] 5.121667
mean(dat$Critic_Score)
## [1] 8.012985
var(dat$Critic_Score)
## [1] 1.209831

Answer

You should report the mean and variance of the variables in your assignment, and not just copy and paste R code.

Question 4

R code

To draw the histogram you may use something similar to:

hist(dat$Critic_Score, main="Histogram of video game critic scores")

Answer

You need to include the generated histogram into your assignment. You should take a reasonable amount of effort to make the histogram look nice.

Question 5

R code

To determine the sample mean Global_Sales for video games with a Critic_Score greater than 9 use:

mean(dat$Global_Sales[dat$Critic_Score > 9])
## [1] 4.11016

and to determine sample mean Global_Sales for video games with a Critic_Score between 8 and 9 use:

mean(dat$Global_Sales[dat$Critic_Score <= 9 & dat$Critic_Score > 8])
## [1] 2.796065

It doesn’t matter whether you use “greater than” or “greater than or equal to”.

Answer

You should report these numbers in your assignment, making sure that you describe what each number is. i.e., you could write “The average global sales for a video game with a critic score greater than 9 is 4.11 million.”

Question 6

R code

The total Global_Sales for video games with an ESRB_Rating of M is:

sum(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 819.54

and for those without the M rating:

sum(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 1378.6

Answer

These numbers need to be reported in the assignment, with some indication of what they are.

Question 7

R code

Similar to above, the two sample means and variances may be found by:

mean(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 3.5325
var(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 11.47829
mean(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 2.328716
var(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 2.237079

Answer

Report your answers.

Question 8

R code

First, create a new “dummy” variable that indicates whether a game has an M rating or not.

dat$mat[dat$ESRB_Rating == "M"] <- 2
dat$mat[dat$ESRB_Rating != "M"] <- 1

where you could have chosen some name other than mat.

Next, we plot the data, using the dummy variable mat to determine the colour of each data point, and add a “legend” to the plot in order to indicate what the different colours mean:

plot(dat$Critic_Score, dat$Global_Sales, main="Sales and scores by 'M' rating", xlab="scores", ylab="sales", col=dat$mat)
legend("topleft", legend = c("Mature rating", "Other"), col=c(2,1), pch=1)

Answer

The above generated plot needs to be included in your assignment.