Yuchen Wu
2020/6/24
-Many features to aid data analysis which other programming languages don’t have
-Interactive, well-suited for exploratory data analysis and rapid prototyping
-Good graphical capabilities
You can use R has a high-powered calculator. For example,
1 + 2
## [1] 3
456 * 7
## [1] 3192
5 / 2
## [1] 2.5
The # sign creates a comment so that the text is not read as an R command. It can be added at the beginning of the line or at the end of a command.
# This is a comment. This line is not evaluated by R
2 + 3 # R will evaluate whatever before # in this line
## [1] 5
It’s highly recommended to add comments to your code. Not only do comments help other readers to understand your code, but it also makes sure that you can remember what your code is doing. Well documented programs will often have more comments than actual lines of code.
A working directory is the default folder(or directory) from which R reads and writes data.
getwd() # Return the current working directory
## [1] "/Users/wuyc/Documents/Courses/TA/stats 60/Rlab 2020/session 1"
setwd("~/Desktop") # change working directory
Since different data sets may use the same variable names, it is helpful to use a different working directory for each project or assignment.
To quit an R session, run the quit function:
q() ## To quit R
A variable is a symbol that stands for another value (just like “X”" in algebra). We can create a variable by assigning a value to it using the “<-” operator. If we then type the name of the variable R will print out its value.
x <- 4
x
## [1] 4
The variable now stands for the value that it contains, so we can perform operations on it and get the same answer as if we used the value itself.
x + 3
## [1] 7
x == 5
## [1] FALSE
We can change the value of a variable by simply assigning a new value to it.
x <- x + 1
x
## [1] 5
Apart from numbers, R supports a number of different “types” of variables. The most commonly used ones are numeric variables, character variables (i.e. strings), factor variables, and boolean (or logical) variables.
We can check the type of a variable by using the typeof
function:
typeof("1")
## [1] "character"
typeof(TRUE)
## [1] "logical"
We can change the type of a variable to type x
using the function as.x
. This process is called “coercion”. For example, the following code changes the number 6507232300
to the string "6507232300"
:
as.character(6507232300)
## [1] "6507232300"
typeof(6507232300)
## [1] "double"
typeof(as.character(6507232300))
## [1] "character"
We can also change variables to numbers or boolean variables.
as.numeric("123")
## [1] 123
as.logical(123)
## [1] TRUE
as.logical(0)
## [1] FALSE
Sometimes type conversion might not work:
as.numeric("def")
## Warning: NAs introduced by coercion
## [1] NA
Sometimes type conversion does not work as you might expect. Always check that the result is what you want!
as.logical("123")
## [1] NA
When you download and install R for the first time, you are installing the Base R software. Base R will contain most of the functions you will use on a daily basis like mean() and hist(). However, only functions written by the original authors of the R language will appear here. If you want to access data and code written by other people, you will need to install it as a package. An R package is simply a bunch of data, from functions, to help menus, to vignettes (examples), stored in one neat package.
Installing a package simply means downloading the package code onto your personal computer.
install.packages("MASS")
If you want to use something, like a function or dataset, from a package you always need to load the package in your R session first.
library("MASS")
An R package is like a lightbulb. First you need to order it with install.packages(). Then, every time you want to use it, you need to turn it on with library().
For help on a built-in function in R, use ? followed by the name of the function, or apply the “help()” function. For example:
?mean
help(mean)
You can also use the search bar in the help tab in the bottom right panel of Rstudio.
c()
function, or using the :
shortcutalphabet <- c("a", "b", "c")
alphabet
## [1] "a" "b" "c"
numbers <- c(5,4,3,2,1)
numbers
## [1] 5 4 3 2 1
vec <- 1:10
vec
## [1] 1 2 3 4 5 6 7 8 9 10
even <- 1:50 * 2
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
How can we get the odd numbers from 1 to 100 from even
?
odd <- even - 1
odd
## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
## [24] 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
## [47] 93 95 97 99
The seq() function is a more flexible version of a:b.
# Create the numbers from 1 to 10 in steps of 1
seq(from = 1, to = 10, by = 1)
## [1] 1 2 3 4 5 6 7 8 9 10
# Integers from 0 to 100 in steps of 10
seq(from = 0, to = 100, by = 10)
## [1] 0 10 20 30 40 50 60 70 80 90 100
The rep() function allows you to repeat a scalar (or vector) a specified number of times, or to a desired length.
rep(x = 3, times = 10)
## [1] 3 3 3 3 3 3 3 3 3 3
rep(x = c(1, 2), each = 3)
## [1] 1 1 1 2 2 2
To extract a subset of elements by their indices, put a vector of indices in square brackets
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
even[1]
## [1] 2
To extract a subset of elements by their indices, put a vector of indices in square brackets
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
even[3:7]
## [1] 6 8 10 12 14
To extract a subset of elements by their indices, put a vector of indices in square brackets
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
even[c(3,5)]
## [1] 6 10
To extract all except a few indices, put a negative sign before the vector of indices
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
even[-c(1,2)]
## [1] 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [18] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72
## [35] 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Use the length
function to figure out how many elements there are in a vector
even
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
## [18] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68
## [35] 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
length(even)
## [1] 50
Two-dimensional analogs of vectors
A <- matrix(1:12, nrow = 3)
A
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
Indexing: put the rows you want before the comma, columns you want after the comma
A[1, 2]
## [1] 4
To get the dimensions of the matrix, we can use the dim
, nrow
and ncol
functions:
dim(A)
## [1] 3 4
nrow(A)
## [1] 3
ncol(A)
## [1] 4
In all the data structures so far, the elements have to be of the same type. To have elements on different types in one data structure, we can use a list, which we create with list()
. We can think of a list as a collection of key-value pairs. Keys should be strings.
person <- list(name = "John Doe", age = 26)
person
## $name
## [1] "John Doe"
##
## $age
## [1] 26
The str
function can be used to inspect what is inside person
:
str(person)
## List of 2
## $ name: chr "John Doe"
## $ age : num 26
To access the name
element person
, we have 2 options:
person[["name"]]
## [1] "John Doe"
person$name
## [1] "John Doe"
The elements of a list can be anything, even another data structure! Let’s add the names of John’s children to the person
object:
person$children <- c("Ross", "Robert")
str(person)
## List of 3
## $ name : chr "John Doe"
## $ age : num 26
## $ children: chr [1:2] "Ross" "Robert"
To see the keys associated with a list, use the names()
function:
names(person)
## [1] "name" "age" "children"
Create the vector [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] in three ways: once using c(), once using a:b, and once using seq()
Create a vector that repeats the integers from 1 to 5, 10 times. That is [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, …]. The length of the vector should be 50.
Create the vector [101, 102, 103, 200, 205, 210, 1000, 1100, 1200] using a combination of the c() and seq() functions