CS4375 Homework 4-machine learning algorithms in C++ Solved

40.00 $ 20.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: . zip solution files instantly, after Payment

Description

Rate this product

For this homework you will be implementing 2 machine learning algorithms in C++ and comparing the results and performance to the equivalent functions in R.

For this homework you can work with one other person or work alone if you prefer.

Steps:

  1.  Perform logistic regression on the given data set in an R script (not Rmd) using R library functions. Evaluate with the metrics indicated in details below. Your R script should also include at least 2 graphs and 4 R functions for data exploration.
  2.  Write a C++ program to implement logistic regression from scratch, and evaluate with the metrics indicated in details below.
  3. Perform naive Bayes on the given data set in an R script (not Rmd) using R library functions. Evaluate with the metrics indicated in details below. Your R script should also include at least 2 graphs and 4 R functions for data exploration.
  4.  Write a C++ program to implement naive Bayes from scratch, and evaluate with the metrics indicated in details below.
  5.  Report. Write a summary of the accuracy and performance (run time) of the two approaches. Include screen shots of the R runs and the C++ runs for each algorithm. Cite references (any format) you used for the algorithm, including coding examples. Include screen shots of your R graphs. No particular format is required for either the report or references.

Turn in your 2 R scripts, 2 cpp files, data files, and report, zipped together.

Notes:

  • Indicate in your summary how you computed run times. Here are some suggestions:
  • For the R scripts you can use proc.time() at the start and end of the machine learning part of the script and subtract the difference.
  • For the C++ programs, your IDE may give run time, otherwise measure from terminal.
  • Windows: https://stackoverflow.com/questions/673523/how-do-i-measure-execution-timeof-a-command-on-the-windows-command-line
  • Mac: https://stackoverflow.com/questions/26466572/mac-os-x-shell-script-measure-timeelapsed

Note: The timing for the R code should be only that portion running the algorithm, not parts that run data exploration functions or create graphs.

Details: Logistic Regression

  • Data: plasma in library HSAUR. You will need to export it using write.csv() for your C++ program.

Use all the data (32 observations) to build the model.

  • R script:
    • train a logistic regression model on all the data, ESR~fibrinogen, using glm() o print the coefficients of the model o build the model “from scratch” in R as shown in the book o make sure you get the same coefficients in each approach o note that we are not doing test set evaluation on this data
  • C++ program:
    • implement in C++ the same steps for logistic regression from scratch o feel free to use whatever data structures you like: arrays, vectors, etc.
    • if you have a linux system, you may want to check out the Armadillo library for matrix multiplication: http://arma.sourceforge.net/
    • feel free to use whatever programming paradigm you like, but make your C++ code fast

Details: Naïve Bayes

  • Data: Titanic data set “titanic_project.csv” on Piazza. Use the first 900 observations for train, the rest for test.
  • R script:
    • train a naïve Bayes model on the train data, survived~pclass+sex+age o print the model, which will show all the probabilities learned from the data o test on the test data
    • print metrics for accuracy, sensitivity, specificity
  • C++ program:
    • implement naïve Bayes in C++; the code in the book should help o train/test on the same data as in the R script; output the same metrics o feel free to use whatever data structures you like: arrays, vectors, etc. o Here is a great video that gives a conceptual picture of naïve Bayes with Gaussian predictors: https://www.youtube.com/watch?v=r1in0YNetG8
    • The following formula shows how to calculate the likelihood of a continuous predictor. The book gives hints as well..
  • Report o Write a summary of the two implementations, R and C++. Did you get the same results?

How do the run times compare? How did you measure execution time?

  • Include screen shots of the output of each program o Include screen shots of the run times of each program o Write out the algorithm you used for training the classifier o Cite all references used o No required format for the report
  • Be prepared to demo your code.

 

  • Homework4_CS4375-zromga.zip