Applied-Statistics Homework 6 Solved

35.00 $

Description

Rate this product

We’ll use the following packages for this homework assignment. We’ll also read in data from a csv file. To access the data, you’ll want to download the dataset from Canvas, and place it in the same folder as this R Markdown document. You’ll then be able to use the following code to load in the data.

library(ggplot2) library(MASS)

Exercise 1: Revisiting Professor Evaluation Scores

Exercises 3 and 4 from Homework 4 involved examining and modeling professor evaluation scores from an average beauty measure as calculated from 6 ratings. We will continue working with the same professor evaluation dataset for Homework 6.

First, we need to load in the data. Make sure that you’ve downloaded the data from Canvas and that your Homework6.Rmd file is in the same folder as your data. Then, complete the following line of code to load the data as prof_evals.

# Use this code chunk to load in the data. setwd(“~/Desktop/data”)

prof_evals = read.csv(“Prof_Evals.csv”)

part a

We’ll fit a linear model that we’ll focus on throughout most of this assignment.

Fit a linear model that predicts the evaluation score from the following variables:

  • bty_avg, the average beauty rating given by 6 independent students
  • age, the age of the professor
  • cls_students, the size (number of students) in the class
  • cls_perc_eval, the proportion of the class who completed the evaluations.

Then, write out the fitted linear model. Make sure that the variables are clearly defined for your written model.

# Use this code chunk for your answer.

lm1 = lm(data = prof_evals, score ~ bty_avg + age + cls_students + cls_perc_eval) summary(lm1)

##

## Call:

## lm(formula = score ~ bty_avg + age + cls_students + cls_perc_eval,

##                  data = prof_evals)

##

## Residuals:

##             Min               1Q Median               3Q           Max

## -1.9590 -0.3426 0.1220 0.3851 1.1556

##

## Coefficients:

##                                                    Estimate Std. Error t value Pr(>|t|)

## (Intercept)                            3.5934345 0.2071146 17.350 < 2e-16 ***

## bty_avg                           0.0489457 0.0172216              2.842 0.004682 **

## age    -0.0024375 0.0026346 -0.925 0.355364 ## cls_students               0.0005651 0.0003545        1.594 0.111606

## cls_perc_eval 0.0060699 0.0016024                              3.788 0.000172 ***

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

##

## Residual standard error: 0.5276 on 458 degrees of freedom

## Multiple R-squared: 0.06708,                                 Adjusted R-squared: 0.05893

## F-statistic: 8.233 on 4 and 458 DF, p-value: 2.033e-06

part b

Write out interpretations for the following coefficients in the model:

  • intercept
  • slope for beauty average
  • slope for age

part c

Interpret the R2 value for this model.

 

Exercise 2: Predictions for Professors A & Z

We’ll continue interpreting the model from Exercise 1.

part a

Calculate the expected evaluation score values for the following two professors with the given features:

  • Professor Z, who has an average beauty score of 6.25, an age of 52, a class size of 61, and 83% of the class who completed the evaluations.
  • Professor A, who has an average beauty score of 9.5, an age of 34, a class size of 270, and 96% of the class who completed the evaluations.

Print the answers for Professor Z and Professor A, and complete the following statements.

# Use this code chunk for your answer.

professor_z = predict(lm1, data.frame(‘bty_avg’ = 6.25, ‘age’ = 52,

‘cls_students’ = 61, ‘cls_perc_eval’ = 83))

professor_z

##                    1

## 4.310864

professor_a = predict(lm1, data.frame(‘bty_avg’ = 9.5, ‘age’ = 34,

‘cls_students’ = 270, ‘cls_perc_eval’ = 96))

professor_a

##                    1

## 4.710826

part b

Suppose Professor Z has an evaluation score of 4.6, and Professor A has an evaluation score of 3.7. Calculate and report the residual for each professor.

# Use this code chunk as needed for your answer.

4.6 – 4.310864

## [1] 0.289136

3.7 – 4.710826 ## [1] -1.010826

part c

Calculate a 85% confidence interval for the mean response of professors with the same characteristics as Professor A.

# Use this code chunk for your solution.

predict(lm1, level = 0.85, newdata = data.frame(bty_avg = 9.5, age = 34, cls_students = 270, cls_perc_eval = 96),

interval = ‘confidence’)

##                     fit                lwr               upr

## 1 4.710826 4.543896 4.877756

part d

Calculate a 75% prediction interval for an individual response of a new professor with the same characteristics as Professor Z.

# Use this code chunk for your solution.

predict(lm1, level = 0.75, newdata = data.frame(bty_avg = 6.25, age = 52, cls_students = 61, cls_perc_eval = 83),

interval = ‘prediction’)

##                     fit              lwr               upr

## 1 4.310864 3.70108 4.920648

Exercise 3: Evaluating Professor Coefficients [30 points] part a

Calculate 80% confidence intervals for the true intercept, true slope for class size, and true slope for proportion of the class who complete the evaluation.

Complete the following statements with your answers.

# Use this code chunk for your answer. confint(lm1, level = 0.80, parm = c(‘(Intercept)’, ‘cls_students’, ‘cls_perc_eval’))

##            10 %        90 % ## (Intercept)               3.3276230059 3.859245931 ## cls_students 0.0001101363 0.001020059

## cls_perc_eval 0.0040133197 0.008126380

part b

Interpret the confidence interval for the class size from part a. Based on your confidence interval, do you believe that the slope for class size is significantly different from 0? Explain. Does the p-value for this coefficient support your claim? Include the p-value in your explanation.

part c

Write the hypotheses being tested for the hypothesis test described in part b.

part d

Calculate an 95% confidence interval for the slope for the average beauty rating. Comment on how your interval compares to the one from Homework 4, Exercise 4 part b. Be sure to discuss the centers and lengths of the two intervals as well as the overlap between the two intervals.

# Use this code chunk for your answer.

confint(lm1, level = 0.95, parm = c(‘bty_avg’))

##                                2.5 %             97.5 %

## bty_avg 0.0151026 0.08278874

# old model

lm2 = lm(score ~ bty_avg, data = prof_evals) confint(lm2, level = 0.95, parm = c(‘bty_avg’))

##                                   2.5 %             97.5 %

## bty_avg 0.03462335 0.09865066 abs((0.0151026 – 0.08278874)/2) – abs((0.0340600000 – 0.09865066)/2) # new center – old center

## [1] 0.00154774

abs(0.0151026 – 0.08278874) – abs(0.0340600000 – 0.09865066) # new length – old length

## [1] 0.00309548

0.08278874 – 0.03462335

## [1] 0.04816539

part e

For the inference to be valid, four assumptions need to be met. We can check three of those assumptions using plots. Generate the two plots to check these assumptions (it’s ok if four plots are generated). State whether the assumptions seem reasonable from the plots, and explain your answer.

# Use this code chunk for your answer.

plot(lm1)

Fitted values

lm(score ~ bty_avg + age + cls_students + cls_perc_eval)

Theoretical Quantiles

lm(score ~ bty_avg + age + cls_students + cls_perc_eval)

Fitted values

lm(score ~ bty_avg + age + cls_students + cls_perc_eval)

Leverage

lm(score ~ bty_avg + age + cls_students + cls_perc_eval)

Exercise 4: Comparing Professor Models

For this exercise, we will be comparing the models fit in this Homework assignment (HW 6 Exercise 1) and in Homework 4 (HW 4 Exercise 3). part a

For the two professor models (HW 4 Exercise 3 model and HW 6 Exercise 1 model), which do you expect to have a higher R2 value (if either)? Explain your answer. Report and compare the actual R2 values for these two models. No need to recompute these values, although you can if helpful.

# Use this code chunk for your answer, if needed.

summary(lm1)

##

## Call:

## lm(formula = score ~ bty_avg + age + cls_students + cls_perc_eval,

##                  data = prof_evals)

##

## Residuals:

##             Min               1Q Median               3Q           Max

## -1.9590 -0.3426 0.1220 0.3851 1.1556

##

## Coefficients:

##                                                    Estimate Std. Error t value Pr(>|t|)

## (Intercept)                            3.5934345 0.2071146 17.350 < 2e-16 ***

## bty_avg                           0.0489457 0.0172216              2.842 0.004682 **

## age    -0.0024375 0.0026346 -0.925 0.355364 ## cls_students               0.0005651 0.0003545        1.594 0.111606

## cls_perc_eval 0.0060699 0.0016024                              3.788 0.000172 ***

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

##

## Residual standard error: 0.5276 on 458 degrees of freedom

## Multiple R-squared: 0.06708, Adjusted R-squared: 0.05893 ## F-statistic: 8.233 on 4 and 458 DF, p-value: 2.033e-06

summary(lm2)

##

## Call:

## lm(formula = score ~ bty_avg, data = prof_evals)

##

## Residuals:

##             Min               1Q Median               3Q           Max

## -1.9246 -0.3690 0.1420 0.3977 0.9309

##

## Coefficients:

##                                          Estimate Std. Error t value Pr(>|t|)

## (Intercept) 3.88033       0.07614 50.96 < 2e-16 *** ## bty_avg            0.06664               0.01629 4.09 5.08e-05 ***

## —

## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

##

## Residual standard error: 0.5348 on 461 degrees of freedom

## Multiple R-squared: 0.03502,                                 Adjusted R-squared: 0.03293

## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.082e-05

 

part b

Calculate the SSE and SST values for these two models. Do the observed results match what you would anticipate? Explain.

# Use this code chunk for your answer, if needed.

sse = sum(residuals(lm1)ˆ2) sse

## [1] 127.4874

y_bar = mean(prof_evals$score) sst = sum((prof_evals$score – y_bar)ˆ2) sst

## [1] 136.6543

1 – sse/sst

## [1] 0.06708106

# old model

prof_model = lm(score ~ bty_avg, data = prof_evals)

sse_old = sum(residuals(lm2)ˆ2) sse_old

## [1] 131.8683

sst_old = sst sst_old

## [1] 136.6543

1 – sse_old/sst_old

## [1] 0.03502322

# higher sse means more error so lower Rˆ2 — want new model to have a lower error and higher Rˆ2

(sse/sst) <= (sse_old/sst_old)

## [1] TRUE

part c

For each of these two models, report the dimensions of the X and y matrices that would be used to calculate βˆ. What are the degrees of freedom associated with each of these models?

# Use this code chunk for your answer, if needed.

dim(prof_evals)

## [1] 463 19

 

part d

Thinking critically about this dataset, do you have any concerns about how it could be used? Any variables in the dataset that you’d like to know more about, or any variables you’d like to have added to the dataset?

You do not need to answer all of these questions, but I am hoping that you will carefully and thoughtfully consider our professor dataset and its applications.

 

Exercise 5: Formatting [5 points]

The last five points of the assignment will be earned for properly formatting your final document. Check that you have:

  • included your name on the document
  • properly assigned pages to exercises on Gradescope
  • selected page 1 (with your name) and this page for this exercise (Exercise 5)
  • all code is printed and readable for each question
  • all output is printed
  • generated a pdf file
  • HW6-6gmgap.zip