Yuchen Wu
2020/7/8
ggplot2
ggplot2
syntaxlibrary(ggplot2)
ggplot()
ggplot2
syntaxggplot() +
geom_violin(data = mtcars,
mapping = aes(x = factor(cyl), y = hp))
ggplot2
syntaxggplot() +
geom_violin(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_jitter(data = mtcars,
mapping = aes(x = factor(cyl), y = hp))
ggplot2
syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter()
ggplot2
syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter() +
labs(title = "Horsepower vs. Cylinder", x = "Cylinder",
y = "Horsepower")
ggplot2
syntaxggplot(data = mtcars,
mapping = aes(x = factor(cyl), y = hp)) +
geom_violin() +
geom_jitter() +
labs(title = "Horsepower vs. Cylinder", x = "Cylinder",
y = "Horsepower") +
theme_classic()
dplyr
(and %>%
syntax)We rarely get data in exactly the form we need!
Transforming data in R is made easy by the dplyr
package (“official” cheat sheet available here).
dplyr
verbsselect()
: pick variables by their namesmutate()
: create new variables based on existing onesarrange()
: reorder rowsfilter()
: pick observations by their valuessummarize()
: collapse many values down to a single summarylibrary(dplyr)
scores
## Name Gender English Math Science History Spanish
## 1 Andrew M 60 96 80 56 77
## 2 John M 66 55 56 64 77
## 3 Mary F 92 63 70 62 98
## 4 Jane F 80 76 89 55 40
## 5 Bob M 80 80 82 48 50
## 6 Dan M 58 52 79 90 61
select
: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
select
: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
scores
dataset.select
: pick subset of variables/columns by nameHistory teacher: “I just want their names and History scores”
scores
dataset.scores %>%
select(Name, History)
## Name History
## 1 Andrew 56
## 2 John 64
## 3 Mary 62
## 4 Jane 55
## 5 Bob 48
## 6 Dan 90
mutate
: create new columns based on old onesForm teacher: “What are their total scores?”
mutate
: create new columns based on old onesForm teacher: “What are their total scores?”
scores
dataset.mutate
: create new columns based on old onesForm teacher: “What are their total scores?”
scores
dataset.scores <- scores %>%
mutate(Total = English + Math + Science + History + Spanish)
scores
## Name Gender English Math Science History Spanish Total
## 1 Andrew M 60 96 80 56 77 369
## 2 John M 66 55 56 64 77 318
## 3 Mary F 92 63 70 62 98 385
## 4 Jane F 80 76 89 55 40 340
## 5 Bob M 80 80 82 48 50 340
## 6 Dan M 58 52 79 90 61 340
arrange
: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
arrange
: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
scores
dataset.arrange
: reorder rowsForm teacher: “Can I have the students in order of overall performance?”
scores
dataset.scores %>%
arrange(Total)
## Name Gender English Math Science History Spanish Total
## 1 John M 66 55 56 64 77 318
## 2 Jane F 80 76 89 55 40 340
## 3 Bob M 80 80 82 48 50 340
## 4 Dan M 58 52 79 90 61 340
## 5 Andrew M 60 96 80 56 77 369
## 6 Mary F 92 63 70 62 98 385
arrange
: reorder rowsForm teacher: “No no, better students on top please…”
arrange
: reorder rowsForm teacher: “No no, better students on top please…”
scores
dataset.arrange
: reorder rowsForm teacher: “No no, better students on top please…”
scores
dataset.scores %>%
arrange(desc(Total))
## Name Gender English Math Science History Spanish Total
## 1 Mary F 92 63 70 62 98 385
## 2 Andrew M 60 96 80 56 77 369
## 3 Jane F 80 76 89 55 40 340
## 4 Bob M 80 80 82 48 50 340
## 5 Dan M 58 52 79 90 61 340
## 6 John M 66 55 56 64 77 318
arrange
: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
arrange
: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
scores
dataset.arrange
: reorder rowsForm teacher: “Can I have them in descending order of total scores, but if students tie, then by alphabetical order?”
scores
dataset.scores %>%
arrange(desc(Total), Name)
## Name Gender English Math Science History Spanish Total
## 1 Mary F 92 63 70 62 98 385
## 2 Andrew M 60 96 80 56 77 369
## 3 Bob M 80 80 82 48 50 340
## 4 Dan M 58 52 79 90 61 340
## 5 Jane F 80 76 89 55 40 340
## 6 John M 66 55 56 64 77 318
filter
: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
filter
: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
scores
dataset.filter
: pick observations by their valuesHistory teacher: “I want to see which students scored less than 60 for history”
scores
dataset.scores %>%
filter(History < 60)
## Name Gender English Math Science History Spanish Total
## 1 Andrew M 60 96 80 56 77 369
## 2 Jane F 80 76 89 55 40 340
## 3 Bob M 80 80 82 48 50 340
Other ways to make comparisons:
>
: greater than<
: less than>=
: greater than or equal to<=
: less than or equal to!=
: not equal to==
: equal to (Do not use =
to test for equality!!)Other ways to make comparisons:
>
: greater than<
: less than>=
: greater than or equal to<=
: less than or equal to!=
: not equal to==
: equal to (Do not use =
to test for equality!!)Combining comparisons:
!
: not&
: and|
: orfilter
examplesDan’s parents: “I just want Dan’s scores”
filter
examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")
## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
filter
examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")
## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
Language teacher: “I want to know which students score < 50 for either English or Spanish”
filter
examplesDan’s parents: “I just want Dan’s scores”
scores %>%
filter(Name == "Dan")
## Name Gender English Math Science History Spanish Total
## 1 Dan M 58 52 79 90 61 340
Language teacher: “I want to know which students score < 50 for either English or Spanish”
scores %>%
filter(English < 50 | Spanish < 50)
## Name Gender English Math Science History Spanish Total
## 1 Jane F 80 76 89 55 40 340
summarize
: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
summarize
: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
scores
dataset.summarize
: get summaries of dataAcademic: “I want to know the correlation between math and science scores”
scores
dataset.scores %>%
summarize(corr = cor(Math, Science))
## corr
## 1 0.5470561
summarize
: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
summarize
: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
scores
dataset.summarize
: get summaries of dataScience teacher: “I want to know the mean and standard deviation of the scores for science”
scores
dataset.scores %>%
summarize(Science_mean = mean(Science),
Science_sd = sd(Science))
## Science_mean Science_sd
## 1 76 11.54123
dplyr
commands using %>%
Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
dplyr
commands using %>%
Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
scores
dataset.dplyr
commands using %>%
Science teacher: “I want to know which students scored > 80 for Science, but I just want names”
scores
dataset.scores %>%
filter(Science > 80) %>%
select(Name)
## Name
## 1 Jane
## 2 Bob
group_by
: use dplyr
verbs on a group-by-group basisAcademic: “I want to know if the boys scored better than the girls in Spanish”
group_by
: use dplyr
verbs on a group-by-group basisAcademic: “I want to know if the boys scored better than the girls in Spanish”
scores
dataset.group_by
: use dplyr
verbs on a group-by-group basisQuestion: How many males and females are there in the data set?
scores
dataset.scores %>%
group_by(Gender) %>%
count()
## # A tibble: 2 x 2
## # Groups: Gender [2]
## Gender n
## <chr> <int>
## 1 F 2
## 2 M 4
transmute
: create new columns based on old ones, discard old onesForm teacher: “I just want the mean score for each student”
scores %>%
transmute(mean = (English + Math + Science + History + Spanish) / 5)
Language teacher: “I want to know which students scored < 70 for both Spanish, but I just want names”
Language teacher: “I want to know which students scored < 70 for both Spanish, but I just want names”
scores
dataset.scores %>%
filter(Spanish < 70) %>%
select(Name)
## Name
## 1 Jane
## 2 Bob
## 3 Dan
Language teacher: “I want to know which students scored < 70 for both English and Spanish, but I just want names”
scores
dataset.scores %>%
filter(English < 70 & Spanish < 70) %>%
select(Name)
## Name
## 1 Dan
History teacher: “I want the names of students with their history scores, with the entries sorted by name”
History teacher: “I want the names of students with their history scores, with the entries sorted by name”
scores
dataset.name
column.scores %>%
arrange(Name) %>%
select(Name, History)
## Name History
## 1 Andrew 56
## 2 Bob 48
## 3 Dan 90
## 4 Jane 55
## 5 John 64
## 6 Mary 62
3 > 2
## [1] TRUE
3 < 2
## [1] FALSE
3 == 2
## [1] FALSE
c(1, 2, 3, 1) == c(3, 2, 1, 2)
## [1] FALSE TRUE FALSE FALSE
c(1, 2, 3, 1) == 1
## [1] TRUE FALSE FALSE TRUE
NA
s!1 == NA
## [1] NA
NA == NA
## [1] NA
is.na(NA)
## [1] TRUE
%>%
%>%
is implemented by the magrittr
packagedplyr
package is loaded, magrittr
is loaded too%>%
is “syntactic sugar”: makes code easier to understand%>%
becomes the first argument in the function on the right of %>%
head(mtcars, n = 6)
is equivalent to mtcars %>% head(n = 6)
A function is a named block of code which
We’ve already seen a number of functions in R! For example,
is.character("123")
## [1] TRUE
The function is.character
takes the input given to it in the parentheses and returns TRUE
or FALSE
, depending on whether the input is of type character or not.
Others we’ve seen: str()
, head()
, rm()
, ggplot()
, select()
, …
We can see what a function does by typing in ?
followed by the function name in the R console.
?is.character
The most important syntax in R is the function call. All R syntax has function calls underlying it.
A function call consists of:
function_name(<inputs to the function>,
<arguments which change
how the function operates>)
function_name(<inputs to the function>,
<arguments which change
how the function operates>)
x <- c(-5, -3, -1, 1, 3, NA)
mean(x)
## [1] NA
function_name(<inputs to the function>,
<arguments which change
how the function operates>)
x <- c(-5, -3, -1, 1, 3, NA)
mean(x, na.rm = TRUE)
## [1] -1
abs(x)
: If x
is positive, return x
. If x
is negative, return x
without the negative sign.
mean(abs(x), na.rm = TRUE)
## [1] 2.6
abs(x)
: If x
is positive, return x
. If x
is negative, return x
without the negative sign.
mean(abs(x), na.rm = TRUE)
## [1] 2.6
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
A (probably) better answer: Documentation in R itself!
sample()
: Descriptionsample()
: UsageWhat comes after the =
sign: default value for that argument
sample()
: Argumentssample()
: Detailssample()
: Valuesample(x = 1:10, size = 10)
## [1] 3 1 6 10 5 9 2 4 7 8
sample(1:10, 10, TRUE)
## [1] 10 5 10 4 4 9 4 6 4 3
sample(1:10, TRUE, size = 5)
## [1] 10 1 5 9 5