Below are common things you’ll want to know for lab/FAQ
You’ll want to run the lines of code below which installs a way to make markdown render (tinytex), help with data cleaning (dplyr), and make graphs (ggplot2)
# We'll install these packages and then run the next line as well. To do this, un-hashtag (ie delete the # symbol) the two lines below and run them.
#install.packages(c("tinytex", "dplyr", "ggplot2"))
#tinytex::install_tinytex()
This is due to three lines in the original lab markdown that creates images. Part F of Question 3 talks about it and gives the relevant line numbers of code that are the problem. You can try to hashtag out the lines (ie use a # to tell R that that the rest of the stuff to the right of # should be ignored). This didn’t work for a student and instead we ended up just deleting those three lines.
Subsetting in the form of x[k] collects the k^th element from x, so…
x <- c(1,2,3,4,5,6,7) * 2
x
## [1] 2 4 6 8 10 12 14
#say I want the fourth element...
x[4] #is correct
## [1] 8
x[2] #is incorrect (this is the second element)
## [1] 4
#if I want the second and third elements I can...
x[2:3] #is correct
## [1] 4 6
x[c(2,3)] #also correct but less compact
## [1] 4 6
#x[2,3] is INcorrect and throws up an error
This is a question I wanted you to have to reach for. Let’s make a reproducable toy example of what could cause this problem.
To do this we will make our own data frame to more easily see the
x1 <- 1:5
x2 <- c(0,0,NA,4,5)
my_data <- data.frame(x1, x2)
my_data
## x1 x2
## 1 1 0
## 2 2 0
## 3 3 NA
## 4 4 4
## 5 5 5
median(my_data$x2)
## [1] NA
#Uh oh...let's look at x2 first
my_data$x2
## [1] 0 0 NA 4 5
#so there is an NA in the vector which is problematic since NA isn't a number.
#let's GO TO THE HELP PAGE
?median
#So looking at the function shows a second (default/optional) parameter that is set to false. Looking at the arguments it seems like it removes the NA before the calculations.
#Looking further down into the Value (ie output) section we can see in the second paragraph that if there are NA values and na.rm = FALSE then NA is returned. Sooo..... we need to include na.rm = TRUE which will clear up the issue.
median(my_data$x2, na.rm = TRUE)
## [1] 2