Load the data using the command a similar to:
dat <- read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/lab1/vidsales3040.csv")
You could have named the data set almost anything other than dat
.
Blank.
In order to use only observations for which Global_Sales
is greater than or equal to 1 million dollars, you may use the command:
dat <- dat[dat$Global_Sales >= 1, ]
The above command overwrites the original data set. You could have created a new data set under a different name. Whichever you choose, you must make sure that you are using data for which sales are greater than 1 million for the remainder of the assignment.
Blank.
To calculate the sample mean and variance of Global_Sales
and Critic_Score
, you can use:
mean(dat$Global_Sales)
## [1] 2.667646
var(dat$Global_Sales)
## [1] 5.121667
mean(dat$Critic_Score)
## [1] 8.012985
var(dat$Critic_Score)
## [1] 1.209831
You should report the mean and variance of the variables in your assignment, and not just copy and paste R code.
To draw the histogram you may use something similar to:
hist(dat$Critic_Score, main="Histogram of video game critic scores")
You need to include the generated histogram into your assignment. You should take a reasonable amount of effort to make the histogram look nice.
To determine the sample mean Global_Sales
for video games with a Critic_Score
greater than 9 use:
mean(dat$Global_Sales[dat$Critic_Score > 9])
## [1] 4.11016
and to determine sample mean Global_Sales
for video games with a Critic_Score
between 8 and 9 use:
mean(dat$Global_Sales[dat$Critic_Score <= 9 & dat$Critic_Score > 8])
## [1] 2.796065
It doesn’t matter whether you use “greater than” or “greater than or equal to”.
You should report these numbers in your assignment, making sure that you describe what each number is. i.e., you could write “The average global sales for a video game with a critic score greater than 9 is 4.11 million.”
The total Global_Sales
for video games with an ESRB_Rating
of M
is:
sum(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 819.54
and for those without the M
rating:
sum(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 1378.6
These numbers need to be reported in the assignment, with some indication of what they are.
Similar to above, the two sample means and variances may be found by:
mean(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 3.5325
var(dat$Global_Sales[dat$ESRB_Rating == "M"])
## [1] 11.47829
mean(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 2.328716
var(dat$Global_Sales[dat$ESRB_Rating != "M"])
## [1] 2.237079
Report your answers.
First, create a new “dummy” variable that indicates whether a game has an M
rating or not.
dat$mat[dat$ESRB_Rating == "M"] <- 2
dat$mat[dat$ESRB_Rating != "M"] <- 1
where you could have chosen some name other than mat
.
Next, we plot the data, using the dummy variable mat
to determine the colour of each data point, and add a “legend” to the plot in order to indicate what the different colours mean:
plot(dat$Critic_Score, dat$Global_Sales, main="Sales and scores by 'M' rating", xlab="scores", ylab="sales", col=dat$mat)
legend("topleft", legend = c("Mature rating", "Other"), col=c(2,1), pch=1)
The above generated plot needs to be included in your assignment.