CS4375 Homework 4-machine learning algorithms in C++ Solved

40.00 $

Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout


Rate this product

For this homework you will be implementing 2 machine learning algorithms in C++ and comparing the results and performance to the equivalent functions in R.

For this homework you can work with one other person or work alone if you prefer.


  1.  Perform logistic regression on the given data set in an R script (not Rmd) using R library functions. Evaluate with the metrics indicated in details below. Your R script should also include at least 2 graphs and 4 R functions for data exploration.
  2.  Write a C++ program to implement logistic regression from scratch, and evaluate with the metrics indicated in details below.
  3. Perform naive Bayes on the given data set in an R script (not Rmd) using R library functions. Evaluate with the metrics indicated in details below. Your R script should also include at least 2 graphs and 4 R functions for data exploration.
  4.  Write a C++ program to implement naive Bayes from scratch, and evaluate with the metrics indicated in details below.
  5.  Report. Write a summary of the accuracy and performance (run time) of the two approaches. Include screen shots of the R runs and the C++ runs for each algorithm. Cite references (any format) you used for the algorithm, including coding examples. Include screen shots of your R graphs. No particular format is required for either the report or references.

Turn in your 2 R scripts, 2 cpp files, data files, and report, zipped together.


  • Indicate in your summary how you computed run times. Here are some suggestions:
  • For the R scripts you can use proc.time() at the start and end of the machine learning part of the script and subtract the difference.
  • For the C++ programs, your IDE may give run time, otherwise measure from terminal.
  • Windows: https://stackoverflow.com/questions/673523/how-do-i-measure-execution-timeof-a-command-on-the-windows-command-line
  • Mac: https://stackoverflow.com/questions/26466572/mac-os-x-shell-script-measure-timeelapsed

Note: The timing for the R code should be only that portion running the algorithm, not parts that run data exploration functions or create graphs.

Details: Logistic Regression

  • Data: plasma in library HSAUR. You will need to export it using write.csv() for your C++ program.

Use all the data (32 observations) to build the model.

  • R script:
    • train a logistic regression model on all the data, ESR~fibrinogen, using glm() o print the coefficients of the model o build the model “from scratch” in R as shown in the book o make sure you get the same coefficients in each approach o note that we are not doing test set evaluation on this data
  • C++ program:
    • implement in C++ the same steps for logistic regression from scratch o feel free to use whatever data structures you like: arrays, vectors, etc.
    • if you have a linux system, you may want to check out the Armadillo library for matrix multiplication: http://arma.sourceforge.net/
    • feel free to use whatever programming paradigm you like, but make your C++ code fast

Details: Naïve Bayes

  • Data: Titanic data set “titanic_project.csv” on Piazza. Use the first 900 observations for train, the rest for test.
  • R script:
    • train a naïve Bayes model on the train data, survived~pclass+sex+age o print the model, which will show all the probabilities learned from the data o test on the test data
    • print metrics for accuracy, sensitivity, specificity
  • C++ program:
    • implement naïve Bayes in C++; the code in the book should help o train/test on the same data as in the R script; output the same metrics o feel free to use whatever data structures you like: arrays, vectors, etc. o Here is a great video that gives a conceptual picture of naïve Bayes with Gaussian predictors: https://www.youtube.com/watch?v=r1in0YNetG8
    • The following formula shows how to calculate the likelihood of a continuous predictor. The book gives hints as well..
  • Report o Write a summary of the two implementations, R and C++. Did you get the same results?

How do the run times compare? How did you measure execution time?

  • Include screen shots of the output of each program o Include screen shots of the run times of each program o Write out the algorithm you used for training the classifier o Cite all references used o No required format for the report
  • Be prepared to demo your code.


  • Homework4_CS4375-zromga.zip