Machine Learning 1 Week 4-Fisher Discriminant Solved

35.00 $

Description

5/5 - (2 votes)

Exercise 1: Fisher Discriminant (10 + 10 + 10 P)

The objective function to find the Fisher Discriminant has the form max w⊤SBw

w w⊤SWw

where SB = (m2 − m1) (m2 − m1)⊤ is the between-class scatter matrix and SW is within-class scatter matrix, assumed to be positive definite. Because there are infinitely many solutions (multiplying w by a scalar doesn’t change the objective), we can extend the objective with a constraint, e.g. that enforces w⊤SW w = 1.

  1. (a)  Reformulate the problem above as an optimization problem with a quadratic objective and a quadratic constraint.
  2. (b)  Show using the method of Lagrange multipliers that the solution of the reformulated problem is also a solution of the generalized eigenvalue problem:

    SBw = λSW w

  3. (c)  Show that the solution of this optimization problem is equivalent (up to a scaling factor) to

    w⋆ = S−1(m1 − m2) W

    Exercise 2: Bounding the Error (10 + 10 P)

    The direction learned by the Fisher discriminant is equivalent to that of an optimal classifier when the class- conditioned data densities are Gaussian with same covariance. In this particular setting, we can derive a bound on the classification error which gives us insight into the effect of the mean and covariance parameters on the error.

Consider two data generating distributions P (x|ω1) = N (μ, Σ) and P (x|ω2) = N (−μ, Σ) with x ∈ Rd. Recall that the Bayes error rate is given by:

􏰆

P (error) =

  1. (a)  Show that the conditional error can be upper-bounded as:

    P (error|x) ≤ 􏰇P (ω1|x)P (ω2|x)

  2. (b)  Show that the Bayes error rate can then be upper-bounded by:

    P (error) ≤ 􏰇P (ω1 )P (ω2 ) · exp 􏰃 − 21 μ⊤ Σ−1 μ􏰄 Exercise 3: Fisher Discriminant (10 + 10 P)

    Consider the case of two classes ω1 and ω2 with associated data generating probabilities 􏰁􏰁−1􏰂 􏰁2 0􏰂􏰂 􏰁􏰁+1􏰂 􏰁2 0􏰂􏰂

    p(x|ω1)=N −1 , 0 1 and p(x|ω2)=N +1 , 0 1

  1. (a)  Find for this dataset the Fisher discriminant w (i.e. the projection y = w⊤x under which the ratio between

    inter-class and intra-class variability is maximized).

  2. (b)  Find a projection for which the ratio is minimized.

    Exercise 4: Programming (30 P)

    Download the programming files on ISIS and follow the instructions.

x

P (error|x) p(x) dx

Exercise sheet 4 (programming) [WiSe 2021/22] Machine Learning 1

Fisher Linear Discriminant

In this exercise, we apply Fisher Linear Discriminant as described in Chapter 3.8.2 of Duda et al. on the UCI Abalone dataset. A description of the dataset is given at the page https://archive.ics.uci.edu/ml/datasets/Abalone (https://archive.ics.uci.edu/ml/datasets/Abalone). The following two methods are provided for your convenience:

utils.Abalone.__init__(self) reads the Abalone data and instantiates two data matrices corresponding to: infant (I), non-infant (N). utils.Abalone.plot(self,w) produces a histogram of the data when projected onto a vector w , and where each class is shown in a different

color.

Sample code that makes use of these two methods is given below. It loads the data, looks at the shape of instantiated matrices, and plots the projection on the first dimension of the data representing the length of the abalone.

In [1]: %matplotlib inline

import utils,numpy # Load the data

abalone = utils.Abalone()
# Print dataset size for each class
print(abalone.I.shape, abalone.N.shape)
# Project data on the first dimension
w1 = numpy.array([1,0,0,0,0,0,0])
abalone.plot(w1,'projection on the first dimension (length)')
(1342, 7) (2835, 7)

Implementation (10 + 5 + 5 = 20 P)
Create a function w = fisher(X1,X2) that takes as input the data for two classes and returns the Fisher linear discriminant.

Create a function objective(X1,X2,w) that evaluates the objective defined in Equation 96 of Duda et al. for an arbitrary projection vector w .

Create a function z = phi(X) that returns a quadratic expansion for each data point x in the dataset. Such expansion consists of the vector x itself, to which we concatenate the vector of all pairwise products between elements of x . In other words, letting x = (x1, …, xd) denote the d-dimensional data point, the quadratic expansion for this data point is a d ⋅ (d + 3) / 2 dimensional vector given by
φ(x) = (xi)1 ≤ i ≤ d ∪ (xixj)1 ≤ i ≤ j ≤ d. For example, the quadratic expansion for d = 2 is (x1, x2, x21, x2, x1x2).

In [2]:

def fisher(X1,X2):
##### Replace by your code import solutions
return solutions.fisher(X1,X2) #####

def objective(X1,X2,w):
##### Replace by your code
import solutions
return solutions.objective(X1,X2,w) #####

def expand(X):
##### Replace by your code import solutions
return solutions.expand(X) #####

Analysis (5 + 5 = 10 P)

Printvalueoftheobjectivefunctionandthehistogramforseveralvaluesof w: w is a canonical coordinate vector for the first feature (length).
w is the difference between the mean vectors of the two classes.
w is the Fisher linear discriminant.

w is the Fisher linear discriminant (after quadratic expansion of the data). In [3]:

##### REPLACE BY YOUR CODE

%matplotlib inline import solutions solutions.analysis() #####

First dimension (length):  0.00048
          Means Linear:  0.00050
                Fisher:  0.00057
Fisher after expansion:  0.00077
  • Week04_Fisher-Discriminant-z7yj46.zip