Yuchen Wu
2020/7/15
dplyr
select()
: pick variables/columns by their namesmutate()
: create new variables/columns based on existing onesarrange()
: reorder rowsfilter()
: pick rows by their valuessummarize()
: collapse many rows down to a single summarygroup_by()
: perform operations at a group levelALL of these functions take:
ALL of these functions take:
The dataset is either:
select(df, day)
ALL of these functions take:
The dataset is either:
%>%
, e.g.df %>% select(day)
ALL of these functions return a dataset!
You can do three things with this returned dataset:
%>%
%>%
syntax with dplyr
Take the mtcars
dataset, select just the wt
and mpg
columns, then select rows with mpg < 15
mtcars %>%
select(wt, mpg) %>%
filter(mpg < 15)
tidyr: gather and select
working directory
tidyr::gather()
E.g. dataset of no. of cases for each country
df
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
df
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
Probably want something like
ggplot(df) +
geom_line(aes(x = year, y = cases, group = country))
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Problem: Column names are values of the variable year
.
df
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset:
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
df %>% gather(`1999`, `2000`, key = "year", value = "cases")
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
df %>% gather(`1999`, `2000`, key = "year", value = "cases") %>%
ggplot() +
geom_line(aes(x = as.numeric(year), y = cases, col = country))
tidyr::separate()
E.g. dataset of rate (cases / population) for each country
df
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()
How to get cases and population into columns of their own?
df
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()
How to get cases and population into columns of their own?
Solution: Use tidyr
’s separate()
tidyr::separate()
How to get cases and population into columns of their own?
Solution: Use tidyr
’s separate()
df %>% separate(rate, into = c("cases", "population"), sep = "/")
## # A tibble: 6 x 4
## country year cases population
## <chr> <dbl> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
.txt
(text) or .csv
(comma-separated values) file/
C:/
/Users/yuchen/Downloads/datafile.csv
/Users/yuchen/Downloads/datafile.csv
.
)
/Users/yuchen
: ./Downloads/datafile.csv
/Users/yuchen/Downloads
: ./datafile.csv
or simply datafile.csv
getwd()
setwd("<path of new directory>")
None
to D4
: drought levels of increasing severity