Description

5/5 - (1 vote)

Homework Setup

Name: Applied-Statistics Homework 5 Solved
SKU: 103705
Price: 35.00 USD
Availability: InStock

We’ll use the following packages for this homework assignment. We’ll also read in data from a csv file. To access the data, you’ll want to download the dataset from Canvas, and place it in the same folder as this R Markdown document. You’ll then be able to use the following code to load in the data.

library(ggplot2) library(MASS)

Exercise 1: Cat Model

For this exercise we will use the cats dataset from the MASS package. You should use ?cats to learn about the background of this dataset.

part a

Fit the following simple linear regression model in R. Use heart weight as the response and body weight as the predictor.

Yi = β0 + β1xi + i

Store the results in an R object called cat_model. Run a summary of the model and report the following:

The estimated value for _β₀, along with an interpretation
The estimated value for _β₁, along with an interpretation
The estimated value for _σ(no interpretation needed)
The estimated value for _R², along with an interpretation

Make sure that all interpretations include units and are given in the context of the problem.

# Use this code chunk for your answer. cat_model = lm(data = cats, Hwt ~ Bwt) summary(cat_model)

## Call:

## lm(formula = Hwt ~ Bwt, data = cats)

## Residuals:

## Min 1Q Median 3Q Max ## -3.5694 -0.9634 -0.0921 1.0426 5.1238

## Coefficients:

## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3567 0.6923 -0.515 0.607

## Bwt 4.0341 0.2503 16.119 <2e-16 ***

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

## Residual standard error: 1.452 on 142 degrees of freedom ## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441

F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16

part b

Does the estimate for _β₀provide a meaningful value? Is it reliable? Explain.

part c

Calculate the estimated standard error for the estimated slope. That is, calculate the _{standard deviation}for the sampling distribution of the estimated slope.

Then, compare this value to that reported in the R output from Exercise 1a.

# Use this code chunk for your answer.

x_bar = mean(cats$Bwt) sxx = sum((cats$Bwt – x_bar)ˆ2) sqrt((1.452ˆ2)/sxx)

## [1] 0.2501972

sqrt((1.452ˆ2)/sxx) – 0.2503

## [1] -0.0001028456

Exercise 2: Cat Inference [30 points]

For this exercise, we will continue analyzing the model fit in Exercise 1.

part a

Let’s assume that for another animal, the relationship between body weight and heart weight is such that a 1 kg increase in body weight tends to result in a 3.25 g increase in heart weight, on average.

Might this same relationship be true for cats, or is there evidence that this relationship for cats is different? Use a t test to test:

H₀: β₁= 3.25 • H₁: β₁= 36 .25

Report the following in your document outside the code block.

The value of the updated test statistic
Whether the _p-value of the test will be less than 0.05, or greater than 0.05
A statistical decision at _α= 0_.05

# Use this code chunk for your answer, if needed.

summary(cat_model)

## Call:

## lm(formula = Hwt ~ Bwt, data = cats)

## Residuals:

## Min 1Q Median 3Q Max ## -3.5694 -0.9634 -0.0921 1.0426 5.1238

## Coefficients:

## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3567 0.6923 -0.515 0.607

## Bwt 4.0341 0.2503 16.119 <2e-16 ***

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

## Residual standard error: 1.452 on 142 degrees of freedom ## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441 ## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16

t_stat = (4.0341 – 3.25)/0.2503 t_stat

## [1] 3.132641

part b

Compute a 70% confidence interval for β₀. (You may use the _confintfunction for this assignment.) Then, based on this confidence interval, anticipate the decision of the following two hypothesis tests:

H_0: _β₀= 0 vs.H_1: _β₀=6 0
H_0: _β₀= -1 vs. H_1: _β₀=6 -1

Explain your reasoning.

# Use this code chunk for your answer.

confint(cat_model, parm = ‘(Intercept)’, level = 0.7,)

## 15 % 85 %

## (Intercept) -1.076791 0.3634664

part c

The inference procedures from parts a and b being valid relies on some assumptions being met. First, write out the four assumptions.

part d

Which of these four assumptions cannot be checked with a plot?

part e

Create the two plots (it’s ok if you end up with 4 plots) that we can use to check our assumptions for the cats data. Interpret these two plots. Be sure to specify if the assumptions seem reasonable and describe what you see that supports your conclusions.

# Use this code chunk for your answer.

plot(cat_model)

Fitted values

Theoretical Quantiles lm(Hwt ~ Bwt)

Fitted values lm(Hwt ~ Bwt)

Leverage lm(Hwt ~ Bwt)

shapiro.test(resid(cat_model))

## Shapiro-Wilk normality test

## data: resid(cat_model)

## W = 0.9845, p-value = 0.1046

library(lmtest)

## Loading required package: zoo

## Attaching package: ‘zoo’

## The following objects are masked from ‘package:base’:

## as.Date, as.Date.numeric

bptest(cat_model)

## studentized Breusch-Pagan test

## data: cat_model

## BP = 9.5294, df = 1, p-value = 0.002022

Exercise 3: Cat Model Predictions

Continue analyzing the cats dataset using the model generated in Exercise 1.

part a

Use a 99% confidence interval to estimate the mean heart weight for body weights of 2.1 kilograms as well as for 2.8 kilograms.

Which of the two intervals is wider? How might you use the variance formula for _yˆ(_x) to know beforehand which would be wider?

# Use this code chunk for your answer.

predict(cat_model, newdata = data.frame(‘Bwt’ = c(2.1, 2.8)), interval = ‘confidence’, level = 0.99)

## fit lwr upr ## 1 8.114869 7.599225 8.630513 ## 2 10.938713 10.618796 11.258630

mean(cats$Bwt)

## [1] 2.723611

sd(cats$Bwt)

## [1] 0.4853066

part b

Use a 99% prediction interval to predict the heart weights for body weights of 4.2 kilograms. Interpret this interval in context. Do you trust this prediction interval? Explain why or why not.

# Use this code chunk for your answer.

predict(cat_model, newdata = data.frame(‘Bwt’ = c(4.2)), interval = ‘prediction’, level = 0.99)

## fit lwr upr

## 1 16.5864 12.66088 20.51192

summary(cats$Bwt)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 2.000 2.300 2.700 2.724 3.025 3.900

part c

A confidence interval for the mean heart weights is reported as: (13.3427, 14.1827). Determine the cat body weight that this confidence interval corresponds to.

# Use this code chunk, if needed, to perform any calculations.

bounds_avg = (13.3427 + 14.1827)/2

x_star = (bounds_avg – as.numeric(coef(cat_model)[1]))/as.numeric(coef(cat_model)[2]) x_star

## [1] 3.500035

part d

Suppose that I want to generate a confidence interval for the mean heart weights that is narrower than that described in part c. Describe two ways that I can create a narrower confidence interval.

Exercise 4: Simulations

Consider the model

Y = 4 + 0_x+ _ε

with

ε ∼ N(µ = 0,σ²= 36)

Before answering the following parts, set a seed value equal to _yourbirthday, as was done in class (this time in the format yyyymmdd).

birthday = 20010216 set.seed(birthday)

part a

Repeat the process of simulating _n= 125 observations from the above model 2500 times. Use the x created in the code chunk below for all iterations.

n = 125 x = runif(n, -2, 2) reps = 2500 beta0 = 4 beta1 = 0 sigma = 6

beta0hat = vector(length = reps) beta1hat = vector(length = reps)

for(i in 1:reps){

epsilon = rnorm(n, 0, sigma) y = beta0 + beta1 * x + epsilon mymodel = lm(y ~ x) beta0hat[i] = coef(mymodel)[1] ## change this line of code so 00 is now the estimated intercept beta1hat[i] = coef(mymodel)[2] ## change this line of code so 01 is now the estimated slope

}

ests = data.frame(cbind(beta0hat, beta1hat))

The general structure has been provided for you. The last two lines of the for loop have placeholder values of 00 and 01. Adjust these values that are saved at the end of the for loop to be the appropriate values for beta0hat and beta1hat.

part b

Plot a histogram of beta1hat values based on your simulation. Describe this histogram, including comments on the shape, center, spread, and outliers.

# Use this code chunk for your answer.

hist(beta1hat)

Histogram of beta1hat

beta1hat

part c

Import the skeptic.csv data (found on Canvas) and fit a SLR model. (Check the variable names in the skeptic dataset, as the names should indicate which to use as your y variable and which for your x variable). Print the estimated coefficient for _β₁.

# Use this code chunk for your answer.

setwd(“~/Desktop/data”) df = read.csv(“skeptic.csv”) lm1 = lm(y ~ x, data = df)

coef(lm1)[2]

## x

## 0.6895595

summary(lm1)

## Call:

## lm(formula = y ~ x, data = df)

## Residuals:

## Min 1Q Median 3Q Max ## -15.4987 -4.1277 -0.1347 3.6636 15.3081

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 4.7269 0.5430 8.705 1.73e-14 ***

## x 0.6896 0.4939 1.396 0.165

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

## Residual standard error: 6.053 on 123 degrees of freedom

## Multiple R-squared: 0.0156, Adjusted R-squared: 0.007597 ## F-statistic: 1.949 on 1 and 123 DF, p-value: 0.1652 part d

Your goal for this part of the question is to determine if you think the skeptic data could reasonably have been simulated from the model described at the beginning of this exercise. I won’t tell you exactly how to do that, but I will leave the following hint:

If the skeptic data really was simulated from this model, then how unusual is the βˆ₁we found from the skeptic data? How can our simulated βˆ₁values help us quantify this answer?

# Use this code chunk for your answer.

summary(lm1)

## Call:

## lm(formula = y ~ x, data = df)

## Residuals:

## Min 1Q Median 3Q Max ## -15.4987 -4.1277 -0.1347 3.6636 15.3081

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 4.7269 0.5430 8.705 1.73e-14 ***

## x 0.6896 0.4939 1.396 0.165

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

## Residual standard error: 6.053 on 123 degrees of freedom

## Multiple R-squared: 0.0156, Adjusted R-squared: 0.007597 ## F-statistic: 1.949 on 1 and 123 DF, p-value: 0.1652

test_stat2 = (0.6896 – 0)/0.4939 test_stat2

## [1] 1.396234 part e

Based on your investigation in _{part d}, do you think the skeptic data could have been simulated from the model provided in _{part a}?

Exercise 5: Formatting

The last five points of the assignment will be earned for properly formatting your final document. Check that you have:

included your name on the document
properly assigned pages to exercises on Gradescope
selected page 1 (with your name) and this page for this exercise (Exercise 5)
all code is printed and readable for each question
all output is printed
generated a pdf file

HW5-2fagkx.zip

Applied-Statistics Homework 5 Solved

If Helpful Share:

Description

Homework Setup

Exercise 1: Cat Model

part a

F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16

part c

Exercise 2: Cat Inference [30 points]

part a

part b

part c

part d

part e

Exercise 3: Cat Model Predictions

part a

part b

part c

part d

Exercise 4: Simulations

part a

part b

Histogram of beta1hat

part c

Exercise 5: Formatting

Related products

BIT 3444 Homework 5: Typed DataSet (Drag-N-Drop) and LINQ Query Solved

CSCI-561 – Foundations of Artificial Intelligence Homework 3 Solved

Homework 3 Solved