Description
For each data set, your project will be evaluated as follows:
- Â You will get more points for larger/messier data sets:
- 0-5 pts <30K o 6-10 pts >=30K
- Â Data cleaning:
- provide a link where you found the data
- describe what steps you had to do for data cleaning (more points for messier data that needed cleaning)
- Â Data exploration:
- use at least 5 R functions for data exploration o create at least 2 informative R graphs for data exploration
- Â Run at least 3 ML algorithms on each data set, using at least 5 algorithms in all.
- this portion of your R script should include:
- code to run the algorithms
- commentary on feature selection you performed and why
- code to compute your metrics for evaluation as well as commentary discussing the results
- Â Run at least one ensemble method such as Random Forest, XGBoost o this portion of your R script should include:
- code to run the algorithms
- commentary on feature selection you performed and why
- code to compute your metrics for evaluation as well as commentary discussing the results
- Â Results analysis o rank the algorithms from best to worst performing on your data o add commentary on the performance of the algorithms
- your analysis concerning why the best performing algorithm worked best on that data
- commentary on what your script was able to learn from the data (big picture) and if this is likely to be useful
- this portion of your R script should include:
Project depth o 0-3 project minimally meets requirements o 4-6 project exceeds minimum requirements o 7-10 project went well above the requirements