CS4373_5473 Homework2 Solved

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

5/5 - (1 vote)

Notice: Given the nature of on-line course, we will require you to practice using Words, Markdown, or HTML to write and format your homework solutions (no scanned smeared image please). It will prepare you for taking the on-line exams, where only a Words style editor (with HTML support) is available.

  1.  (Smoothing) Write functions that use the cut() and qcut() functions provided by DataFrame to smooth data in a given column using the following methods. Apply these function on column F.
    • Equal-depth binning with bin means for depth k, for example k =100.
    • Equal-depth binning with bin boundaries for depth k, for example k =100.
    • Equal-width binning with bin median for 10
  2. (Data Reduction) Write a function that takes a DataFrame, a set of column names (of numeric columns), and an integer p (less than the total number of columns in the table), and use PCA method in Scikit-Learn (specifically, sklearn.decomposition.PCA) to reduce the set of columns into p new columns. Apply this function to reduce the columns {C, D, E, F} into two columns p1, and p2.
  3.  (Correlation) For this question, you will need to use packages scipy.stats
    • Compute the covariance and the correlation coefficient for each pair of the numericcolumns.
    • Use the crosstab() function to construct the contingency table for columns A and B,similar to the following sample, where the distinct values in attribute A are {a1,a2} and in attribute B are {b1,b2,b3} (this is just a sample and may not be the same as Computer Science 4373/5473 Assignment 2  August 10, 2020

the data in your data file). Write a sequence of Python code to perform the Pearson’s chi-square (χ2) test of independence with a confidence level of 0.001 to determine if the two attributes are correlated. You should use stats.chi2() to get the χ2 distribution. Print sufficient information to report the result of the test.

      A  
    a1 a2 all
B b1 ?? ?? ??
b2 ?? ?? ??
b3 ?? ?? ??
all ?? ?? ??
  • HW02-nfppvl.zip