Yuchen Wu
2020/7/15
dplyr
select(): pick variables/columns by their namesmutate(): create new variables/columns based on existing onesarrange(): reorder rowsfilter(): pick rows by their valuessummarize(): collapse many rows down to a single summarygroup_by(): perform operations at a group levelALL of these functions take:
ALL of these functions take:
The dataset is either:
select(df, day)ALL of these functions take:
The dataset is either:
%>%, e.g.df %>% select(day)ALL of these functions return a dataset!
You can do three things with this returned dataset:
%>%%>% syntax with dplyrTake the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
mtcars %>%
select(wt, mpg) %>%
filter(mpg < 15)tidyr: gather and select
working directory
tidyr::gather()E.g. dataset of no. of cases for each country
df## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
df## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
Probably want something like
ggplot(df) +
geom_line(aes(x = year, y = cases, group = country))tidyr::gather()How to make a line plot of no. of cases by year for each country?
Problem: Column names are values of the variable year.
df## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset:
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
df %>% gather(`1999`, `2000`, key = "year", value = "cases")## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
df %>% gather(`1999`, `2000`, key = "year", value = "cases") %>%
ggplot() +
geom_line(aes(x = as.numeric(year), y = cases, col = country))tidyr::separate()E.g. dataset of rate (cases / population) for each country
df## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()How to get cases and population into columns of their own?
df## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()How to get cases and population into columns of their own?
Solution: Use tidyr’s separate()
tidyr::separate()How to get cases and population into columns of their own?
Solution: Use tidyr’s separate()
df %>% separate(rate, into = c("cases", "population"), sep = "/")## # A tibble: 6 x 4
## country year cases population
## <chr> <dbl> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
.txt (text) or .csv (comma-separated values) fileC://Users/yuchen/Downloads/datafile.csv/Users/yuchen/Downloads/datafile.csv.)
/Users/yuchen: ./Downloads/datafile.csv/Users/yuchen/Downloads: ./datafile.csv or simply datafile.csvgetwd()setwd("<path of new directory>")None to D4: drought levels of increasing severity