CMSC435 Assignment 1 Solved

30.00 $

Category:

Description

5/5 - (1 vote)

Peer-reviewed scientific articles are high quality sources that are reviewed, accordingly improved/corrected, and approved by scientists before publication. This assignment provides an opportunity to read and discuss a peer-reviewed scientific article related to the core topics of our data science class.
We ask you to write about 1-page long report of an article entitled “Top 10 algorithms in data mining” by Wu, Kumar, Quinlan, Ghosh, Yang, Motoda, McLachlan, Ng, Liu, Yu, Zhou, Steinbach, Hand and Steinberg. Your review should consist of two main parts:
PART 1. (1 pt) Presentation of your criticisms of this article that covers your opinions and insights about the ideas presented by the authors. As a few options how to proceed, you can discuss which parts and why were particularly useful/interesting, what and why was deficient, and/or present relevant insights that were missed by the authors. Make sure that you do not duplicate statements that are included in the article. (answer length limit: up to ½ page) PART 2. Answers to the following four questions:
Question 2.1. (1 pt) Why is the Naive Bayes algorithm called naive? What is the meaning of the word Bayes? (answer length limit: up to 3 sentences).
Question 2.2. (1 pt) List the algorithm(s) discussed in this article that generate decision trees. Which of these algorithms was published/released first? (answer length limit: 2 sentences).
Question 2.3. (1 pt) Briefly explain the meaning of k in the kNN algorithm. Could k be set to 1? What would be the prediction for the 7NN algorithm with the majority vote where the 7 neighbors belong to the following classes: class1, class1, class3, class2, class1, class2, class3? (answer length limit: up to 3 sentences).
Question 2.4. (1 pt) Which five of the nine algorithms would be the most suitable choices to be used as the “weak learners” by the AdaBoost algorithm? Hint: these five are called classifiers or supervised algorithms (answer length limit: 1 sentence).
Notes
 You do not need to understand all concepts and information covered in this article but you need to learn enough to complete the review. Feel free to use other external resources to answer the questions (collaboration with other students is not allowed). A useful resource that I recommend is a post by Raymond Li at http://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html
 Use a separate and clearly marked paragraph for each of the five sub-parts of the solution and number them accordingly (i.e., 1, 2.1, 2.2, 2.3 and 2.4). There will be deductions if this is not followed.
 In the first part we ask for your personal criticism. This section must have substance and must not be based on existing sources.
 Read the questions carefully and make sure to answer each part of each question.
 Observe the limits on the length of the answers.
 The review should be typed single-spaced, using 12-point font size (Times New Roman font would be a good choice) and with standard margins. Convert the file into the pdf format for submission.
 Fun fact: we will learn many of these algorithms in details in the class including C4.5, k-means, SVM, and Apriori.
1. Go to the “Home” section, locate the “Assignment 1 submission” field and select it by clicking on the assignment title.

3. You can select your submission file(s) by clicking on “Choose file”.

4. [Important!] Your file(s) will be submitted only after you click the “Submit Assignment” button.

  • Assignment1-w17hqg.zip