Yuchen Wu
2020/8/12
HW4 extended to Saturday 6PM PT
Fill in the form about the final project if you haven’t done so. Only one team member needs to fill the form on behalf of the group.
A hint was added to HW4 2 (b), the updated homework is posted on the website.
Function calls for hypothesis testings
User-defined functions
cars <- list(make = "Honda",
models = c("Fit", "CR-V", "Odyssey"),
available = c(TRUE, TRUE, TRUE))
Use [[
or $
notation to refer to a specific key-value pair
cars$make # no quotation marks
## [1] "Honda"
cars[["models"]] # remember quotation marks!
## [1] "Fit" "CR-V" "Odyssey"
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.
Various forms of t test
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.
Various forms of t test
A one-sample location test: whether the mean of a population has a value specified in a null hypothesis.
Student’s t-tests: testing the means of two populations are equal, the variances of the two populations are also assumed to be equal.
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.
Various forms of t test
A one-sample location test: whether the mean of a population has a value specified in a null hypothesis.
Student’s t-tests: testing the means of two populations are equal, the variances of the two populations are also assumed to be equal.
Welch’s t-test: testing the means of two populations are equal, the variances of the two populations are not assumed to be equal.
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.
Various forms of t test
A one-sample location test: whether the mean of a population has a value specified in a null hypothesis.
Student’s t-tests: testing the means of two populations are equal, the variances of the two populations are also assumed to be equal.
Welch’s t-test: testing the means of two populations are equal, the variances of the two populations are not assumed to be equal.
Paired two-sample t-test: testing the means of two populations are equal, statistical units are paired.
# one sample t-test
t.test(y,mu=3) # Ho: mu=3
# Student's t-test
t.test(y~x, var.equal = TRUE) # where y is numeric and x is a binary factor
# Welch's t-test
t.test(y1,y2, var.equal = FALSE) # where y1 and y2 are numeric
# paired t-test
t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric
# record the testing tesult in a list called "test_result"
test_result <- t.test(x, y)
# extract the value of the t statistics
test_result$statistic
# extract the p value
test_result["p.value"]
# extract the confidence interval
test_result$conf.int
From a sample of 150 students visiting the Health and Wellness center, 83 had obtained a flu shot. Find a 90% confidence interval for the percentage of students who have received a flu shot.
From a sample of 150 students visiting the Health and Wellness center, 83 had obtained a flu shot. Find a 90% confidence interval for the percentage of students who have received a flu shot.
\(H_0\): the percentage of students who have received a flu shot is 0.5.
From a sample of 150 students visiting the Health and Wellness center, 83 had obtained a flu shot. Find a 90% confidence interval for the percentage of students who have received a flu shot.
\(H_0\): the percentage of students who have received a flu shot is 0.5.
# Estimate the parameter of a binomial distribution
p.hat <- 83/150
p.hat
## [1] 0.5533333
From a sample of 150 students visiting the Health and Wellness center, 83 had obtained a flu shot. Find a 90% confidence interval for the percentage of students who have received a flu shot.
\(H_0\): the percentage of students who have received a flu shot is 0.5.
# Estimate the parameter of a binomial distribution
p.hat <- 83/150
p.hat
## [1] 0.5533333
# CI based on normal distribution
p.hat - 1.645*sqrt(p.hat*(1-p.hat)/149)
## [1] 0.4863359
p.hat + 1.645*sqrt(p.hat*(1-p.hat)/149)
## [1] 0.6203307
From a sample of 150 students visiting the Health and Wellness center, 83 had obtained a flu shot. Find a 90% confidence interval for the percentage of students who have received a flu shot.
\(H_0\): the percentage of students who have received a flu shot is 0.5.
# Estimate the parameter of a binomial distribution
p.hat <- 83/150
p.hat
## [1] 0.5533333
# CI based on t distribution
p.hat - qt(0.95, 149)*sqrt(p.hat*(1-p.hat)/149)
## [1] 0.4859228
p.hat - qt(0.05, 149)*sqrt(p.hat*(1-p.hat)/149)
## [1] 0.6207439
A chi-square test is a statistical hypothesis test that is valid to perform when the test statistic is chi-square distributed under the null hypothesis.
library(MASS) # load the MASS package
tbl = table(survey$Smoke, survey$Exer)
tbl # the contingency table
##
## Freq None Some
## Heavy 7 1 3
## Never 87 18 84
## Occas 12 3 4
## Regul 9 1 7
test_result <- chisq.test(tbl)
test_result
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 5.4885, df = 6, p-value = 0.4828
test_result$statistic
## X-squared
## 5.488546
test_result$p.value
## [1] 0.4828422
data_frame <- read.csv("https://goo.gl/j6lRXD")
table(data_frame$treatment, data_frame$improvement)
##
## improved not-improved
## not-treated 26 29
## treated 35 15
chisq.test(data_frame$treatment, data_frame$improvement)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data_frame$treatment and data_frame$improvement
## X-squared = 4.6626, df = 1, p-value = 0.03083
# the counts for categories A,B and C
x <- c(A = 80, B = 11, C = 9)
# testing if each category is equally likely
chisq.test(x)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 98.06, df = 2, p-value < 2.2e-16
# testing if each category occurs with specified probability
chisq.test(x, p = c(0.8, 0.1, 0.1))
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 0.2, df = 2, p-value = 0.9048
\(H_0\): the true correlation coefficient is equal to 0
Permutation test: randomly permute the matching pairs in the original data set, compute the new correlation coefficient, repeat the experiments many times to simulate the distribution of the test statistics under the null distribution.
T test: for pairs from an uncorrelated bivariate normal distribution, the sampling distribution of a certain function of Pearson’s correlation coefficient follows Student’s t-distribution
test_result <- cor.test(x, y, method = c("pearson", "kendall", "spearman"))
test_result$statistic
test_result$p.value
test_result$conf.int
gpa <- c(3.45, 3.03, 2.67, 2.50, 3.16, 2.83)
distance_from_campus <- c(1.3, 0.8, 5.7, 0.5, 2.9, 3.1)
\[H_0: \rho \geq 0\]
\[ t = r \sqrt{\dfrac{n - 2}{1 - r^2}}, \qquad r = \dfrac{t}{\sqrt{n - 2 + t^2}} \]
plot(gpa, distance_from_campus)
r_hat <- cor(gpa, distance_from_campus)
t_stat <- r_hat * sqrt((6 - 2) / (1 - r_hat^2))
t_int_lower <- t_stat + qt(0.05, df = 6 - 2)
r_int_lower <- t_int_lower / sqrt(6 - 2 + t_int_lower^2)
print(r_int_lower)
## [1] -0.7897173
# "greater" corresponds to positive association, "less" to negative association
cortest <- cor.test(gpa, distance_from_campus, "greater")
cortest$conf.int
## [1] -0.8240341 1.0000000
## attr(,"conf.level")
## [1] 0.95
One of the great strengths of R is the user’s ability to add functions. In fact, many of the functions in R are actually functions of functions. The structure of a function is given below.
myfunction <- function(arg1, arg2, ... ){
statements
return(object)
}
Let’s start by defining a function fahrenheit_to_celsius that converts temperatures from Fahrenheit to Celsius:
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
# freezing point of water
fahrenheit_to_celsius(32)
## [1] 0
# boiling point of water
fahrenheit_to_celsius(212)
## [1] 100
Functions do not necessarily return a value!
rescale <- function(x)
{
lower <- min(x, na.rm = TRUE)
upper <- max(x, na.rm = TRUE)
if(upper > lower)
{
return (x - lower) / (upper - lower)
}
else
{
print("x is a constant vector")
}
}