## Description

# 1Â Â Â Â Â Â Â Â Â Â Sequence of Coin Flips

Suppose you have a *biased *coin with probability of heads equal to *p*. Imagine that you flip this coin until observing the first heads. Let *X *denote the number of flips needed to observe the first heads.

- Describe P[
*X*=*k*] as a function of*p*. - For any
*x*_{0 }1, show that the probability P[*XÂ Â Â x*_{0}] = (1Â*p*)^{x}^{0 1}. - Suppose we have a prior belief that
*p*is uniformly distributed in the [0*,*1] interval. Assuming that the first coin flip equals heads, compute the probability that*p >*1*/*2, i.e., compute P[*p >*1*/*2|*X*= 1]. Does our belief about the probability of the event {*p >*1*/*2} increase or decrease after observing the head in the first flip. Hint: Use Bayes Rule.

# 2Â Â Â Â Â Â Â Â Â Â Convex Functions and Information Theory

- Show that the function
*f*(*x*) = |*x*| + exp(*x*) is convex. - Suppose the random variable
*X*is distributed according to a*k*-class multi-nominal distributions with class probabilities*p*_{1}*,p*_{2}*,…,p*, such that = 1. Find the values of_{k}*p*= 1_{i},i*,…,k*such that the entropy of*X*is maximized.

# 3Â Â Â Â Â Â Â Â Â Â Linear Algebra

- The covariance matrix
**âŒƒ**of a random vector*X*is defined as**âŒƒ**= E[(*X*E*X*)(*X*E*X*)^{T}], where E*X*is the expectation of*X*. Is**âŒƒ**positive-semidefinite? - Let
*A*and*B*be two R^{D}^{â‡¥}^{D }symmetric matrices. Suppose*A*and*B*have the exact same set of eigenvectors*u*_{1}*,**u*_{2}*,*Â·Â·*,**u*_{D }with the corresponding eigenvalues*â†µ*_{1}*,â†µ*_{2}*,*Â·Â·Â·*,â†µ*_{D }for*A*, and_{1}*,*_{2}*,*Â·Â·Â·*,*_{D }for*B*. Please write down the eigenvectors and their corresponding eigenvalues for the following matrices:*C*=*A*+*B**D*=*A B**E*=*AB**F*=*A*^{1}*B*(assume*A*is invertible)

1

# 4Â Â Â Â Â Â Â Â Â Â KNN Classification in MATLAB/Octave

In this problem, you will implement a KNN classifier and deploy it on a real-world dataset. Below, we describe the steps that you need to take to accomplish this programming assignment.

You will work with a preprocessed version of the *Car Evaluation Dataset *from UCIâ€™s machine learning data repository. The training/validation/test sets are provided along with the assignment as cars train.data, cars valid.data, and cars test.data. For a description of the dataset and to determine which field corresponds to the label, please refer to http://archive.ics.uci.edu/ml/datasets/Car+Evaluation.

- The first step in every data analysis experiment involves inspecting the data and to make sure it is properly formatted. You will find that the features in the provided dataset are categorical. However, KNN requires the features to be real-valued numbers. To convert a categorical feature with
*K*categories to a real-valued number, you can create*K*new*binary*The*i*th binary feature indicates whether the original feature belongs to the*i*th category or not. This strategy is called â€˜one-hot encoding.â€™ - Please fill in the function knn classify in knn m. The inputs of this function are training data, new data (either validation or testing data) and
*k*. The function needs to output the accuracy on both training and new data (either validation or testing). - Consider
*k*= 1*,*3*,*5*,*Â·Â·*,*23. For each*k*, report the training and validation accuracy. Identify the*k*with the highest validation accuracy, and report the test accuracy with this choice of*k*. Note: if multiple values of*k*result in the highest validation accuracy, then report test accuracies for all such values of*k*. - Apply
*k*NN on the mat dataset which is a binary classification dataset with only two features. You need to run*k*NN with*k*= 1*,*5*,*15*,*20 and examine the decision boundary. A simple way to visualize the decision boundary is to draw 10000 data points on a uniform 100 â‡¥ 100 grid in the square (*x,y*) 2 [0*,*1] â‡¥ [0*,*1] and classify them using the*k*NN classifier. Then, plot the data points with diâ†µerent markers corresponding to diâ†µerent classes. Repeat this process for all*k*and discuss the smoothness of the decision boundaries as*k*increases.