STATS 60 Rlab Session 1

Yuchen Wu

2020/6/24

Lab Objectives

What is R?

Why learn R?

Reason #1: R was specifically designed for statistics and data analysis.

-Many features to aid data analysis which other programming languages don’t have

-Interactive, well-suited for exploratory data analysis and rapid prototyping

-Good graphical capabilities

Plot with R

Advanced Plot with R

Example: Map of 2016 U.S. presidential elections

Why learn R?

(Source: stack overflow)

Why learn R?

Reason #3: It’s easy to get started with R.

Why learn R?

Reason #4: Analyses done in R are reproducible.

Installing R and R studio

The Four Rstudio Windows

R as a calculator

You can use R has a high-powered calculator. For example,

1 + 2
## [1] 3
456 * 7
## [1] 3192
5 / 2
## [1] 2.5

Commenting

The # sign creates a comment so that the text is not read as an R command. It can be added at the beginning of the line or at the end of a command.

# This is a comment. This line is not evaluated by R

2 + 3  # R will evaluate whatever before # in this line
## [1] 5

It’s highly recommended to add comments to your code. Not only do comments help other readers to understand your code, but it also makes sure that you can remember what your code is doing. Well documented programs will often have more comments than actual lines of code.

Working Directories

A working directory is the default folder(or directory) from which R reads and writes data.

getwd()  # Return the current working directory 
## [1] "/Users/wuyc/Documents/Courses/TA/stats 60/Rlab 2020/session 1"
setwd("~/Desktop")  # change working directory

Since different data sets may use the same variable names, it is helpful to use a different working directory for each project or assignment.

Quitting R

To quit an R session, run the quit function:

q()          ## To quit R

Variables

A variable is a symbol that stands for another value (just like “X”" in algebra). We can create a variable by assigning a value to it using the “<-” operator. If we then type the name of the variable R will print out its value.

x <- 4
x
## [1] 4

Variables

The variable now stands for the value that it contains, so we can perform operations on it and get the same answer as if we used the value itself.

x + 3
## [1] 7
x == 5
## [1] FALSE

Variables

We can change the value of a variable by simply assigning a new value to it.

x <- x + 1
x
## [1] 5

Types of variables

Apart from numbers, R supports a number of different “types” of variables. The most commonly used ones are numeric variables, character variables (i.e. strings), factor variables, and boolean (or logical) variables.

We can check the type of a variable by using the typeof function:

typeof("1")
## [1] "character"
typeof(TRUE)
## [1] "logical"

Types of variables

We can change the type of a variable to type x using the function as.x. This process is called “coercion”. For example, the following code changes the number 6507232300 to the string "6507232300":

as.character(6507232300)
## [1] "6507232300"
typeof(6507232300)
## [1] "double"
typeof(as.character(6507232300))
## [1] "character"

Types of variables

We can also change variables to numbers or boolean variables.

as.numeric("123")
## [1] 123
as.logical(123)
## [1] TRUE
as.logical(0)
## [1] FALSE

Types of variables

Sometimes type conversion might not work:

as.numeric("def")
## Warning: NAs introduced by coercion
## [1] NA

Sometimes type conversion does not work as you might expect. Always check that the result is what you want!

as.logical("123")
## [1] NA

Installing and Loading R Packages

When you download and install R for the first time, you are installing the Base R software. Base R will contain most of the functions you will use on a daily basis like mean() and hist(). However, only functions written by the original authors of the R language will appear here. If you want to access data and code written by other people, you will need to install it as a package. An R package is simply a bunch of data, from functions, to help menus, to vignettes (examples), stored in one neat package.

Installing a package simply means downloading the package code onto your personal computer.

install.packages("MASS")

If you want to use something, like a function or dataset, from a package you always need to load the package in your R session first.

library("MASS")

Installing and Loading R Packages

An R package is like a lightbulb. First you need to order it with install.packages(). Then, every time you want to use it, you need to turn it on with library().

Getting Help

For help on a built-in function in R, use ? followed by the name of the function, or apply the “help()” function. For example:

?mean
help(mean)

You can also use the search bar in the help tab in the bottom right panel of Rstudio.

Vectors

alphabet <- c("a", "b", "c")
alphabet
## [1] "a" "b" "c"
numbers <- c(5,4,3,2,1)
numbers
## [1] 5 4 3 2 1
vec <- 1:10
vec
##  [1]  1  2  3  4  5  6  7  8  9 10

Vectors

even <- 1:50 * 2
even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100

Vectors

How can we get the odd numbers from 1 to 100 from even?

odd <- even - 1
odd
##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
## [24] 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
## [47] 93 95 97 99

Vectors: the seq() function

The seq() function is a more flexible version of a:b.

# Create the numbers from 1 to 10 in steps of 1
seq(from = 1, to = 10, by = 1)
##  [1]  1  2  3  4  5  6  7  8  9 10
# Integers from 0 to 100 in steps of 10
seq(from = 0, to = 100, by = 10)
##  [1]   0  10  20  30  40  50  60  70  80  90 100

Vectors: the rep() function

The rep() function allows you to repeat a scalar (or vector) a specified number of times, or to a desired length.

rep(x = 3, times = 10)
##  [1] 3 3 3 3 3 3 3 3 3 3
rep(x = c(1, 2), each = 3)
## [1] 1 1 1 2 2 2

Vectors: Indexing

To extract a subset of elements by their indices, put a vector of indices in square brackets

even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100
even[1]
## [1] 2

Vectors: Indexing

To extract a subset of elements by their indices, put a vector of indices in square brackets

even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100
even[3:7]
## [1]  6  8 10 12 14

Vectors: Indexing

To extract a subset of elements by their indices, put a vector of indices in square brackets

even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100
even[c(3,5)]
## [1]  6 10

Vectors: Negative indexing

To extract all except a few indices, put a negative sign before the vector of indices

even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100
even[-c(1,2)]
##  [1]   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [18]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72
## [35]  74  76  78  80  82  84  86  88  90  92  94  96  98 100

Vectors: Length

Use the length function to figure out how many elements there are in a vector

even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
## [18]  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68
## [35]  70  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100
length(even)
## [1] 50

Matrices and arrays

Two-dimensional analogs of vectors

A <- matrix(1:12, nrow = 3)
A
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

Indexing: put the rows you want before the comma, columns you want after the comma

A[1, 2]
## [1] 4

Matrices and arrays

To get the dimensions of the matrix, we can use the dim, nrow and ncol functions:

dim(A)
## [1] 3 4
nrow(A)
## [1] 3
ncol(A)
## [1] 4

Lists

In all the data structures so far, the elements have to be of the same type. To have elements on different types in one data structure, we can use a list, which we create with list(). We can think of a list as a collection of key-value pairs. Keys should be strings.

person <- list(name = "John Doe", age = 26)
person
## $name
## [1] "John Doe"
## 
## $age
## [1] 26

Lists

The str function can be used to inspect what is inside person:

str(person)
## List of 2
##  $ name: chr "John Doe"
##  $ age : num 26

To access the name element person, we have 2 options:

person[["name"]]
## [1] "John Doe"
person$name
## [1] "John Doe"

Lists

The elements of a list can be anything, even another data structure! Let’s add the names of John’s children to the person object:

person$children <- c("Ross", "Robert")
str(person)
## List of 3
##  $ name    : chr "John Doe"
##  $ age     : num 26
##  $ children: chr [1:2] "Ross" "Robert"

To see the keys associated with a list, use the names() function:

names(person)
## [1] "name"     "age"      "children"

Exercises

  1. Create the vector [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] in three ways: once using c(), once using a:b, and once using seq()

  2. Create a vector that repeats the integers from 1 to 5, 10 times. That is [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, …]. The length of the vector should be 50.

  3. Create the vector [101, 102, 103, 200, 205, 210, 1000, 1100, 1200] using a combination of the c() and seq() functions