[SOLVED] STA4102 - Assignment-2

39.99 $

Category: Tags: , , , , , , , , , , , , , , , , , , , ,
Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

zip file icon Assignment-2-xdsxqt.zip (1920 KB)
Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!
Submit New Instructions
šŸ”’ Securely Powered by:
Secure Checkout
5/5 - (1 vote)

Question 1 (10 points)
Data: ā€œAmes Iowa Housing Prices Datasetā€ (https://www.kaggle.com/datasets/emurphy/ames-iowahousing-prices-dataset)
a) Use multiple linear regression to fit the ā€˜SalePrice’ column using data in the file ā€˜train1.csv’. Select up to 10 variables for your model. The choice is not key at this stage but a reasonable choice of a subset it needed with a brief selection exploration. Report the RMSE upon using the ā€˜test1.csv’ dataset.
b) Fit a decision tree to the set of independent variables chosen above using ā€˜train1.csv’. Report the RMSE from using ā€˜test1.csv’. ( library(Metrics); rmseDt = rmse(actual=testpricestarget, predicted=predictedprices )
c) Fit a Random Forest to the set of independent variables chosen above using ā€˜train1.csv’. Report the RMSE from using ā€˜test1.csv’.
d) Fit an XGBoost model to the set of independent variables chosen above using ā€˜train1.csv’. Report the RMSE from using ā€˜test1.csv’.
e) Which parameters of XGBoost can you tune which change the RMSE from your experience?
f) Comment on the quality of the fits produced in each case? What can you conclude from this predictive task about the nature of the sales price prediction?
Question 2 (10 points)
Data: ā€œPokemon for Data Mining and Machine Learningā€
(https://www.kaggle.com/datasets/alopez247/pokemon)
a) Randomly select 70% of the rows as training data and the remaining rows as testing data.
b) Fit to the training data a decision tree, random forest and XGBoost model where the dependent variable is the ā€˜Type_1’ column.
c) Report on the accuracy for predicting ā€˜Type_1’ on the testing rows for each model.
d) Produce a confusion matrix for each model and then comment on the model performances. Discuss in terms of the confusion matrix and accuracy.

  • Assignment-2-xdsxqt.zip