In all the data structures so far, the elements have to be of the same type. To have elements on different types in one data structure, we can use a list, which we create with list(). We can think of a list as a collection of key-value pairs. Keys should be strings.

person <- list(name = "John Doe", age = 26)
## $name
## [1] "John Doe"
## $age
## [1] 26


The str function can be used to inspect what is inside person:

## List of 2
##  $ name: chr "John Doe"
##  $ age : num 26

To access the name element person, we have 2 options:

## [1] "John Doe"
## [1] "John Doe"


The elements of a list can be anything, even another data structure! Let’s add the names of John’s children to the person object:

person$children <- c("Ross", "Robert")
## List of 3
##  $ name    : chr "John Doe"
##  $ age     : num 26
##  $ children: chr [1:2] "Ross" "Robert"

To see the keys associated with a list, use the names() function:

## [1] "name"     "age"      "children"

What is a data frame?

A special type of list:

Data frame

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Data Frame

We can use “help” menu on the bottom right corner of the Rstudio to check for the meaning of the variable names:

Data Frame


Data Frame

head(mtcars)     ## return the first 6 rows of the data set, also works with "tail"
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
## [1] 32 11

Data Frame

Instead of using built-in data sets, we can also let R read from local files

# First, set your working directory
carSpeeds <- read.csv(file = 'data/car-speeds.csv')

Can also read data from website!

df <- read.csv("",
               stringsAsFactors = FALSE)

Words vs. pictures

“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey

p_base <- ggplot(data = df, mapping = aes(y = mpg, x = weight))
p_scatter <- p_base + geom_point(aes(col = cylinders), size = 2)

Two classes of variables in statistics

Barplots: counts for a categorical variable

What is the distribution of cylinders in my dataset?

ggplot(data = mtcars) +
    geom_bar(aes(x = factor(cyl))) +
    ggtitle("Count by cylinders") +
    xlab("No. of cylinders")

Histograms: counts for a continuous variable

What is the distribution of miles per gallon in my dataset?

p_hist <- ggplot(data = mtcars) + 
    geom_histogram(aes(x = mpg), breaks = seq(10, 35, 5)) +
    ggtitle("Histogram of miles per gallon")

ggplot(data = mtcars) + 
    geom_histogram(aes(x = mpg)) +
    ggtitle("Histogram of miles per gallon")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Scatterplots: continuous variable vs. continuous variable

What is the relationship between mpg and weight?

ggplot(data = df) + 
    geom_point(mapping = aes(y = mpg, x = weight), size = 2) + 
    ggtitle("Miles per gallon vs. weight")

Lineplots: continuous variable vs. time variable

What is the relationship between mpg and time?


For each value of cylinder, what is the distribution of mpg like?

ggplot(data = df) + 
    geom_boxplot(aes(cylinders,mpg)) +
    ggtitle("Distribution of mpg by cylinders")


For each value of cylinder, what is the distribution of mpg like?

ggplot(data = df) + 
    geom_violin(aes(cylinders,mpg)) +
    ggtitle("Distribution of mpg by cylinders")

Heatmaps: categorical variable vs. categorical variable

How often does each pair of cylinder and gear occur in the dataset?


Data visualization in R: 2 broad approaches

base R

Data visualization in R: 2 broad approaches


3 essential elements of graphics: data, geometries, aesthetics

Data: Dataset we are using for the plot

##     mpg weight cylinders
## 1  21.0  2.620         6
## 2  21.0  2.875         6
## 3  22.8  2.320         4
## 4  21.4  3.215         6
## 5  18.7  3.440         8
## 6  18.1  3.460         6
## 7  14.3  3.570         8
## 8  24.4  3.190         4
## 9  22.8  3.150         4
## 10 19.2  3.440         6

3 essential elements of graphics: data, geometries, aesthetics

Geometries: Visual elements used for our data

Geom: point

3 essential elements of graphics: data, geometries, aesthetics

Aesthetics: Defines the data columns which affect various aspects of the geom

3 different aesthetics:

Examples of other aesthetics

p_base + geom_point(aes(size = cylinders, alpha = weight))

Examples of other aesthetics

p_base + geom_point(aes(col = cylinders, shape=cylinders), size = 3)

ggplot2 code


ggplot2 code

ggplot() +
    geom_histogram(data = df, mapping = aes(x = mpg))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot2 code

ggplot() +
    geom_boxplot(data = df, mapping = aes(x = cylinders, y = mpg))

ggplot2 code

ggplot() +
    geom_point(data = df, 
               mapping = aes(x = weight, y = mpg, col = cylinders),
               shape = 15)

Layers: Combining multiple plots into one graphic

We can have more than one layer in a graphic.

= +

Each layer contains (essentially):

ggplot2 code

ggplot() +
    geom_boxplot(data = df, mapping = aes(x = cylinders, y = mpg)) +
    geom_point(data = df, mapping = aes(x = cylinders, y = mpg), 
               position = "jitter")

ggplot2 code

When layers share attributes, we only have to type them once:

ggplot(data = df, mapping = aes(x = cylinders, y = mpg)) +
    geom_boxplot() +
    geom_point(position = "jitter")

ggplot2 code

ggplot(df, aes(x = cylinders, y = mpg)) +
    geom_boxplot() +
    geom_point(position = "jitter")


Examples of scales (Source: A Layered Grammar of Graphics)

Scales example: colors

Manually chosen colors

p_scatter + scale_color_manual(values=c("gold2", "darkorange","firebrick"))


p_scatter + facet_wrap(~cylinders)


Refers to all non-data ink

ggplot2’s default theme


Minimal theme

p_scatter + theme_minimal()

More pre-set themes

Classic theme

p_scatter + theme_classic()

More pre-set themes

Dark theme

p_scatter + theme_dark()

We’ve only scratched the surface!

R Graph Gallery: an excellent source of inspiration and code snippet examples