Assignment 2  Solved

40.00 $

Category:

Description

5/5 - (2 votes)

Task1: Simple Linear Regression

  1. Download the file ‘insurance.csv’ from our class Blackboard site.
  2. Read this file into your R environment. Show the step that you used to accomplish this.

 

  1. Filter the dataframe to create a new dataframe that only contains the records of people who are not smokers. Show the code that you used to do this.

 

– You will use this new dataframe for the rest of the assignment –

  1. Using ggplot, create a scatterplot to depict the relationship between the input variable age and the output variable charges. Show your scatterplot, along with the code that you used to build it. What does this scatterplot suggest about the relationship between the two variables? Why (or why not) does this make intuitive sense to you?

 

 

  1. Find the correlation between age and charges. Show the code that you used, and the results from your console, in a screenshot.

 

  1. Using your assigned seed value, create a data partition. Assign approximately 60% of the records to your training set, and the other 40% to your validation set. Show the code that you used to do this.

 

  1. Using your training set, create a simple linear regression model, with the input variable age and the outcome variable charges. Show the step(s) that you used to do this. Include a screenshot of the summary of your model, along with the code you used to generate that summary.

 

  1. What is the regression equation generated by your model? Make up a hypothetical input value and explain what it would predict as an outcome. To show the predicted outcome value, you can either use a function in R, or just explain what the predicted outcome would be, based on the regression equation and some simple math.
  2. Using the accuracy() function from the forecast package, assess the accuracy of your model against both the training set and the validation set. What do you notice about these results? Describe your findings in a couple of sentences.

 

Task 2: K-Nearest Neighbors

The model that we’ll build will aim to predict which species of fish is being sold at a popular urban fish market, using only the numeric attributes as inputs. The outcome variable of this model will be Species. The numeric attributes are described below:

Weight = weight of fish in Gram g

Length1 = vertical length in cm

Length2 = diagonal length in cm

Length3 = cross length in cm

Height = height in cm

Width = diagonal width in cm

  1. Download the file ‘fishmarket.csv’ from our class Blackboard site.
  2. Read this file into your R environment. Show the step that you used to accomplish this.

Hide

 

  1. Using your assigned seed value (from Assignment 2), partition your data into training (60%) and validation (40%) sets. Show the step(s) that you used to do this.

 

  1. Make up a fake fish (yes, really!)
  2. Give your fish a name (there’s no R code needed here, and you won’t use the name when you run k-nn… but give the fish a name anyway and just write it here).

 

  1. Use the runif() function to give your fish values for each of the six numeric attributes. Use the min and max values from your training set as the lower and upper boundaries for runif().

Hide

  1. Normalize your data using the preProcess() function from the caret package. Use Table 7.2 from the book as a guide for this. Show the step(s) that you used to do this.

 

 

  1. Using the knn() function from the FNN package, and using a k-value of 7, generate a predicted classification for your fish. Show the step(s) that you used to do this, along with the output in the console. What Species was your fish predicted to belong to? Also, who were your fish’s 7 nearest neighbors? What Species’ did they belong to? Show the step(s) that you used to find this out, along with a screenshot of the output in the console.

 

 

7a. Use your validation set to help you determine an optimal k-value. Use Table 7.3 from the textbook as a guide here. Show the step(s) that you used to do this, along with the output in the console.

 

7b. Using either the base graphics package or ggplot, make a scatterplot with the various k values that you used in 7a on your x-axis, and the accuracy metrics on the y-axis.

Hide

 

  1. Re-run your knn() function with this new k-value. What result did you obtain? Was it different from the one you saw in Step 9? Show the step(s) that you used to do this, along with the output in the console. Also, what were the outcome classes (Species) for each of your fish’s k-nearest neighbors?
  • Linear-Regression-and-K-Nearest-Neighbo-x5p9sl.zip