Yuchen Wu
2020/7/1
ggplot2
(and the +
syntax)In all the data structures so far, the elements have to be of the same type. To have elements on different types in one data structure, we can use a list, which we create with list()
. We can think of a list as a collection of key-value pairs. Keys should be strings.
person <- list(name = "John Doe", age = 26)
person
## $name
## [1] "John Doe"
##
## $age
## [1] 26
The str
function can be used to inspect what is inside person
:
str(person)
## List of 2
## $ name: chr "John Doe"
## $ age : num 26
To access the name
element person
, we have 2 options:
person[["name"]]
## [1] "John Doe"
person$name
## [1] "John Doe"
The elements of a list can be anything, even another data structure! Let’s add the names of John’s children to the person
object:
person$children <- c("Ross", "Robert")
str(person)
## List of 3
## $ name : chr "John Doe"
## $ age : num 26
## $ children: chr [1:2] "Ross" "Robert"
To see the keys associated with a list, use the names()
function:
names(person)
## [1] "name" "age" "children"
A special type of list:
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
View(mtcars)
head(mtcars) ## return the first 6 rows of the data set, also works with "tail"
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
dim(mtcars)
## [1] 32 11
Instead of using built-in data sets, we can also let R read from local files
# First, set your working directory
setwd("~/Desktop/")
carSpeeds <- read.csv(file = 'data/car-speeds.csv')
Can also read data from website!
df <- read.csv("https://stats60.github.io/Rlab/worldbank_data_tidy.csv",
stringsAsFactors = FALSE)
“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
library(ggplot2)
p_base <- ggplot(data = df, mapping = aes(y = mpg, x = weight))
p_scatter <- p_base + geom_point(aes(col = cylinders), size = 2)
p_scatter
What is the distribution of cylinders in my dataset?
ggplot(data = mtcars) +
geom_bar(aes(x = factor(cyl))) +
ggtitle("Count by cylinders") +
xlab("No. of cylinders")
What is the distribution of miles per gallon
in my dataset?
p_hist <- ggplot(data = mtcars) +
geom_histogram(aes(x = mpg), breaks = seq(10, 35, 5)) +
ggtitle("Histogram of miles per gallon")
p_hist
ggplot(data = mtcars) +
geom_histogram(aes(x = mpg)) +
ggtitle("Histogram of miles per gallon")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
What is the relationship between mpg
and weight
?
ggplot(data = df) +
geom_point(mapping = aes(y = mpg, x = weight), size = 2) +
ggtitle("Miles per gallon vs. weight")
What is the relationship between mpg
and time?
For each value of cylinder, what is the distribution of mpg
like?
ggplot(data = df) +
geom_boxplot(aes(cylinders,mpg)) +
ggtitle("Distribution of mpg by cylinders")
For each value of cylinder, what is the distribution of mpg
like?
ggplot(data = df) +
geom_violin(aes(cylinders,mpg)) +
ggtitle("Distribution of mpg by cylinders")
How often does each pair of cylinder
and gear
occur in the dataset?
ggplot2
Data: Dataset we are using for the plot
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
Geometries: Visual elements used for our data
Geom: point
Aesthetics: Defines the data columns which affect various aspects of the geom
3 different aesthetics:
p_base + geom_point(aes(size = cylinders, alpha = weight))
p_base + geom_point(aes(col = cylinders, shape=cylinders), size = 3)
ggplot2
codeggplot()
ggplot2
codeggplot() +
geom_histogram(data = df, mapping = aes(x = mpg))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot2
codeggplot() +
geom_boxplot(data = df, mapping = aes(x = cylinders, y = mpg))
ggplot2
codeggplot() +
geom_point(data = df,
mapping = aes(x = weight, y = mpg, col = cylinders),
shape = 15)
We can have more than one layer in a graphic.
= +
Each layer contains (essentially):
ggplot2
codeggplot() +
geom_boxplot(data = df, mapping = aes(x = cylinders, y = mpg)) +
geom_point(data = df, mapping = aes(x = cylinders, y = mpg),
position = "jitter")
ggplot2
codeWhen layers share attributes, we only have to type them once:
ggplot(data = df, mapping = aes(x = cylinders, y = mpg)) +
geom_boxplot() +
geom_point(position = "jitter")
ggplot2
codedata =
if it is the first argument of ggplot()
mapping =
if:
ggplot()
geom_xx()
functionggplot(df, aes(x = cylinders, y = mpg)) +
geom_boxplot() +
geom_point(position = "jitter")
Manually chosen colors
p_scatter + scale_color_manual(values=c("gold2", "darkorange","firebrick"))
p_scatter + facet_wrap(~cylinders)
Refers to all non-data ink
ggplot2
’s default theme
p_scatter
Minimal theme
p_scatter + theme_minimal()
Classic theme
p_scatter + theme_classic()
Dark theme
p_scatter + theme_dark()
R Graph Gallery: an excellent source of inspiration and code snippet examples