Yuchen Wu
2020/7/8
ggplot2
ggplot2 syntaxlibrary(ggplot2)
ggplot()ggplot2 syntaxggplot() +
geom_violin(data = mtcars,
mapping = aes(x = factor(cyl), y = hp))ggplot2 syntaxggplot() +
geom_violin(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_jitter(data = mtcars,
mapping = aes(x = factor(cyl), y = hp))ggplot2 syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter()ggplot2 syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter() +
labs(title = "Horsepower vs. Cylinder", x = "Cylinder",
y = "Horsepower")ggplot2 syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter() +
labs(title = "Horsepower vs. Cylinder", x = "Cylinder",
y = "Horsepower") +
theme_classic()dplyr (and %>% syntax)We rarely get data in exactly the form we need!
Transforming data in R is made easy by the dplyr package (“official” cheat sheet available here).
dplyr verbsselect(): pick variables by their namesmutate(): create new variables based on existing onesarrange(): reorder rowsfilter(): pick observations by their valuessummarize(): collapse many values down to a single summarylibrary(dplyr)
scores## Name Gender English Math Science History Spanish
## 1 Andrew M 60 96 80 56 77
## 2 John M 66 55 56 64 77
## 3 Mary F 92 63 70 62 98
## 4 Jane F 80 76 89 55 40
## 5 Bob M 80 80 82 48 50
## 6 Dan M 58 52 79 90 61
select: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
select: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
scores dataset.select: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
scores dataset.scores %>%
select(Name, History)## Name History
## 1 Andrew 56
## 2 John 64
## 3 Mary 62
## 4 Jane 55
## 5 Bob 48
## 6 Dan 90
mutate: create new columns based on old onesForm teacher: “What are their total scores?”
mutate: create new columns based on old onesForm teacher: “What are their total scores?”
scores dataset.mutate: create new columns based on old onesForm teacher: “What are their total scores?”
scores dataset.scores <- scores %>%
mutate(Total = English + Math + Science + History + Spanish)
scores## Name Gender English Math Science History Spanish Total
## 1 Andrew M 60 96 80 56 77 369
## 2 John M 66 55 56 64 77 318
## 3 Mary F 92 63 70 62 98 385
## 4 Jane F 80 76 89 55 40 340
## 5 Bob M 80 80 82 48 50 340
## 6 Dan M 58 52 79 90 61 340
arrange: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
arrange: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
scores dataset.arrange: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
scores dataset.scores %>%
arrange(Total)## Name Gender English Math Science History Spanish Total
## 1 John M 66 55 56 64 77 318
## 2 Jane F 80 76 89 55 40 340
## 3 Bob M 80 80 82 48 50 340
## 4 Dan M 58 52 79 90 61 340
## 5 Andrew M 60 96 80 56 77 369
## 6 Mary F 92 63 70 62 98 385
arrange: reorder rowsForm teacher: “No no, better students on top please…”
arrange: reorder rowsForm teacher: “No no, better students on top please…”
scores dataset.arrange: reorder rowsForm teacher: “No no, better students on top please…”
scores dataset.scores %>%
arrange(desc(Total))## Name Gender English Math Science History Spanish Total
## 1 Mary F 92 63 70 62 98 385
## 2 Andrew M 60 96 80 56 77 369
## 3 Jane F 80 76 89 55 40 340
## 4 Bob M 80 80 82 48 50 340
## 5 Dan M 58 52 79 90 61 340
## 6 John M 66 55 56 64 77 318
arrange: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
arrange: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
scores dataset.arrange: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
scores dataset.scores %>%
arrange(desc(Total), Name)## Name Gender English Math Science History Spanish Total
## 1 Mary F 92 63 70 62 98 385
## 2 Andrew M 60 96 80 56 77 369
## 3 Bob M 80 80 82 48 50 340
## 4 Dan M 58 52 79 90 61 340
## 5 Jane F 80 76 89 55 40 340
## 6 John M 66 55 56 64 77 318
filter: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
filter: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
scores dataset.filter: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
scores dataset.scores %>%
filter(History < 60)## Name Gender English Math Science History Spanish Total
## 1 Andrew M 60 96 80 56 77 369
## 2 Jane F 80 76 89 55 40 340
## 3 Bob M 80 80 82 48 50 340
Other ways to make comparisons:
>: greater than<: less than>=: greater than or equal to<=: less than or equal to!=: not equal to==: equal to (Do not use = to test for equality!!)Other ways to make comparisons:
>: greater than<: less than>=: greater than or equal to<=: less than or equal to!=: not equal to==: equal to (Do not use = to test for equality!!)Combining comparisons:
!: not&: and|: orfilter examplesDan’s parents: “I just want Dan’s scores”
filter examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
filter examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
Language teacher: “I want to know which students score < 50 for either English or Spanish”
filter examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
Language teacher: “I want to know which students score < 50 for either English or Spanish”
scores %>%
filter(English < 50 | Spanish < 50)## Name Gender English Math Science History Spanish Total
## 1 Jane F 80 76 89 55 40 340
summarize: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
summarize: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
scores dataset.summarize: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
scores dataset.scores %>%
summarize(corr = cor(Math, Science))## corr
## 1 0.5470561
summarize: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
summarize: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
scores dataset.summarize: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
scores dataset.scores %>%
summarize(Science_mean = mean(Science),
Science_sd = sd(Science))## Science_mean Science_sd
## 1 76 11.54123
dplyr commands using %>%Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
dplyr commands using %>%Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
scores dataset.dplyr commands using %>%Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
scores dataset.scores %>%
filter(Science > 80) %>%
select(Name)## Name
## 1 Jane
## 2 Bob
group_by: use dplyr verbs on a group-by-group basisAcademic: “I want to know if the boys scored better than the girls in Spanish”
group_by: use dplyr verbs on a group-by-group basisAcademic: “I want to know if the boys scored better than the girls in Spanish”
scores dataset.group_by: use dplyr verbs on a group-by-group basisQuestion: How many males and females are there in the data set?
scores dataset.scores %>%
group_by(Gender) %>%
count()## # A tibble: 2 x 2
## # Groups: Gender [2]
## Gender n
## <chr> <int>
## 1 F 2
## 2 M 4
transmute: create new columns based on old ones, discard old onesForm teacher: “I just want the mean score for each student”
scores %>%
transmute(mean = (English + Math + Science + History + Spanish) / 5)Language teacher: “I want to know which students scored < 70 for both Spanish, but I just want names”
Language teacher: “I want to know which students scored < 70 for both Spanish, but I just want names”
scores dataset.scores %>%
filter(Spanish < 70) %>%
select(Name)## Name
## 1 Jane
## 2 Bob
## 3 Dan
Language teacher: “I want to know which students scored < 70 for both English and Spanish, but I just want names”
scores dataset.scores %>%
filter(English < 70 & Spanish < 70) %>%
select(Name)## Name
## 1 Dan
History teacher: “I want the names of students with their history scores, with the entries sorted by name”
History teacher: “I want the names of students with their history scores, with the entries sorted by name”
scores dataset.name column.scores %>%
arrange(Name) %>%
select(Name, History)## Name History
## 1 Andrew 56
## 2 Bob 48
## 3 Dan 90
## 4 Jane 55
## 5 John 64
## 6 Mary 62
3 > 2## [1] TRUE
3 < 2## [1] FALSE
3 == 2## [1] FALSE
c(1, 2, 3, 1) == c(3, 2, 1, 2)## [1] FALSE TRUE FALSE FALSE
c(1, 2, 3, 1) == 1## [1] TRUE FALSE FALSE TRUE
NAs!1 == NA## [1] NA
NA == NA## [1] NA
is.na(NA)## [1] TRUE
%>%%>% is implemented by the magrittr packagedplyr package is loaded, magrittr is loaded too%>% is “syntactic sugar”: makes code easier to understand%>% becomes the first argument in the function on the right of %>%head(mtcars, n = 6) is equivalent to mtcars %>% head(n = 6)A function is a named block of code which
We’ve already seen a number of functions in R! For example,
is.character("123")## [1] TRUE
The function is.character takes the input given to it in the parentheses and returns TRUE or FALSE, depending on whether the input is of type character or not.
Others we’ve seen: str(), head(), rm(), ggplot(), select(), …
We can see what a function does by typing in ? followed by the function name in the R console.
?is.characterThe most important syntax in R is the function call. All R syntax has function calls underlying it.
A function call consists of:
function_name(<inputs to the function>,
<arguments which change
how the function operates>)function_name(<inputs to the function>,
<arguments which change
how the function operates>)x <- c(-5, -3, -1, 1, 3, NA)
mean(x)## [1] NA
function_name(<inputs to the function>,
<arguments which change
how the function operates>)x <- c(-5, -3, -1, 1, 3, NA)
mean(x, na.rm = TRUE)## [1] -1
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
mean(abs(x), na.rm = TRUE)## [1] 2.6
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
mean(abs(x), na.rm = TRUE)## [1] 2.6
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
A (probably) better answer: Documentation in R itself!
sample(): Descriptionsample(): UsageWhat comes after the = sign: default value for that argument
sample(): Argumentssample(): Detailssample(): Valuesample(x = 1:10, size = 10)## [1] 3 1 6 10 5 9 2 4 7 8
sample(1:10, 10, TRUE)## [1] 10 5 10 4 4 9 4 6 4 3
sample(1:10, TRUE, size = 5)## [1] 10 1 5 9 5