[SOLVED] Machine Learning 1 Week 3- Principal-Component-Analysis

35.00 $

Category: Tags: , , , ,
Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

zip file icon Week03_Principal-Component-Analysis-yk8yzr.zip (3189.2 KB)
Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!
Submit New Instructions
🔒 Securely Powered by:
Secure Checkout
Rate this product

Exercise Sheet 3

Exercise 1: Lagrange Multipliers

Let x1, . . . , xN ∈ Rd be a dataset of N data points. We consider the objective function

N
J(θ) = 􏰉 ||θ − xk||2

k=1

to be minimized with respect to the parameter θ ∈ Rd. In absence of constraints, the parameter θ that minimizes this objective is given by the empirical mean m = N1 􏰌Nk=1 xk. However, this is generally not the case when the parameter θ is constrained.

  1. (a)  Using the method of Lagrange multipliers, find the parameter θ that minimizes J(θ) subject to the constraint θ⊤b = 0, with b some unit vector in Rd. Give a geometrical interpretation to your solution.
  2. (b)  Using the same method, find the parameter θ that minimizes J(θ) subject to ||θ − c||2 = 1, where c is a vector in Rd different from m. Give a geometrical interpretation to your solution.

    Exercise 2: Principal Component Analysis (10 + 10 P)

We consider a dataset x1, . . . , xN ∈ Rd. Principal component analysis searches for a unit vector u ∈ Rd such that projecting the data on that vector produces a distribution with maximum variance. Such vector can be found by solving the optimization problem:

argmax u⊤xk− uNN

k=1
(a) Show that the problem above can be rewritten as

arg max u⊤Su u

u⊤xl

1 􏰉N 􏰍 1 􏰃 􏰉N

􏰄 􏰎 2
with ∥u∥2=1

􏰉N
where S = (xk − m)(xk − m)⊤ is the scatter matrix, and m = N xk is the empirical mean.

k=1 k=1
(b) Show using the method of Lagrange multipliers that the problem above can be reformulated as solving the

eigenvalue problem
and retaining the eigenvector u associated to the highest eigenvalue λ.

Exercise 3: Bounds on Eigenvalues (5 + 5 + 5 + 5 P)

Let λ1 denote the largest eigenvalue of the matrix S. The eigenvalue λ1 quantifies the variance of the data when projected on the first principal component. Because its computation can be expensive, we study how the latter can be bounded with the diagonal elements of the matrix S.

  1. (a)  Show that 􏰌di=1 Sii is an upper bound to the eigenvalue λ1.
  2. (b)  State the conditions on the data for which the upper bound is tight.
  3. (c)  Show that maxdi=1 Sii is a lower bound to the eigenvalue λ1.
  4. (d)  State the conditions on the data for which the lower bound is tight.

l=1

with

∥u∥2 = 1
1 􏰉N

Su = λu

Exercise 4: Iterative PCA (10 P)

When performing principal component analysis, computing the full eigendecomposition of the scatter matrix S is typically slow, and we are often only interested in the first principal components. An efficient procedure to find the first principal component is power iteration. It starts with a random unit vector w(0) ∈ Rd, and iteratively applies the parameter update

w(t+1) = Sw(t)􏰋 ∥Sw(t)∥
until some convergence criterion is met. Here, we would like to show the exponential convergence of power

iteration. For this, we look at the error terms

􏰊w⊤u1 􏰊
and observe that they should all converge to zero as w approaches the eigenvector u1 and becomes orthogonal

to other eigenvectors.

(a) Show that Ek(w(T)) = |λk/λ1|T ·Ek(w(0)), i.e. the convergence of the algorithm is exponential with the number of time steps T . (Hint: to show this, it is useful to rewrite the scatter matrix in terms of eigenvalues and eigenvectors, i.e. S = 􏰌di=1 uiu⊤i λi.)

Exercise 5: Programming (30 P)

Download the programming files on ISIS and follow the instructions.

􏰊􏰊 w ⊤ u k 􏰊􏰊
Ek(w) = 􏰊 􏰊 with k = 2,…,d,

Exercise sheet 3 (programming)

[WiSe 2021/22] Machine Learning 1

In [1]:
import numpy,sklearn,sklearn.datasets,utils

%matplotlib inline

Principal Component Analysis

In this exercise, we will experiment with two different techniques to compute the PCA components of a dataset:

Singular Value Decomposition (SVD)
Power Iteration: A technique that iteratively optimizes the PCA objective.

We consider a random subset of the Labeled Faces in the Wild (LFW) dataset, readily accessible from sklearn, and we apply some basic preprocessing to discount strong variations of luminosity and contrast.

In [2]:

D = sklearn.datasets.fetch_lfw_people(resize=0.5)[‘images’]
D = D[numpy.random.mtrand.RandomState(1).permutation(len(D))[:2000]]*1.0 D = D – D.mean(axis=(1,2),keepdims=True)
D = D / D.std(axis=(1,2),keepdims=True)
print(D.shape)

(2000, 62, 47)
Twofunctionsareprovidedforyourconvenienceandareavailablein utils.py thatisincludedintheziparchive.Thefunctionsarethefollowing:

utils.scatterplot producesascatterplotfromatwo-dimensionaldataset.
utils.render takesanarrayofdatapointsorobjectsofsimilarshape,andrendersthemintheIPythonnotebook.

Some demo code that makes use of these functions is given below.

In [3]:

utils.scatterplot(D[:,32,20],D[:,32,21]) # Plot relation between adjacent pixels utils.render(D[:30],15,2,vmax=5) # Display first 10 examples in the data

PCA with Singular Value Decomposition

Principal components can be found computing a singular value decomposition. Specifically, we assume a matrix Xˉ whose columns contain the data points represented as vectors, and where the data points have been centered (i.e. we have substracted to each of them the mean of the dataset). The matrix Xˉ is

of size d × N where d is the number of input features and N is the number of data points. This matrix, more specifically, the rescaled matrix Z = decomposed using singular value decomposition:

UΛV = Z The k principal components can then be found in the first k columns of the matrix U.

Tasks:

1 Xˉ is then √N

Compute the principal components of the data using the function numpy.linalg.svd .
Measure the computational time required to find the principal components. Use the function time.time() for that purpose. Do not include in your estimate the computation overhead caused by loading the data, plotting and rendering.
Plot the projection of the dataset on the first two principal components using the function utils.scatterplot .
Visualize the 60 leading principal components using the function utils.render .

Note that if the algorithm runs for more than 3 minutes, there may be some error in your implementation.

In [4]:

### REPLACE BY YOUR CODE

import solutions; solutions.pca_svd(D) ###

Time: 42.489 seconds

When looking at the scatter plot, we observe that much more variance is expressed in the first two principal components than in individual dimensions as it was plotted before. When looking at the principal components themselves which we render as images, we can see that the first principal components correspond to low-frequency filters that select for coarse features, and the following principal components capture progressively higher-frequency information and are also becoming more noisy.

PCA with Power Iteration (15 P)

The first PCA algorithm based on singular value decomposition is quite expensive to compute. Instead, the power iteration algorithm looks only for the first component and finds it using an iterative procedure. It starts with an initial weight vector w ∈ Rd, and repeatedly applies the update rule
w ← Sw / ‖

where S is the covariance matrix defined as S = \frac1N \bar{X}\bar{X}^\top. Like for standard PCA, the objective that iterative PCA optimizes is J(\boldsymbol{w}) = \boldsymbol{w}^\top S \boldsymbol{w} subject to the unit norm constraint for \boldsymbol{w}. We can therefore keep track of the progress of the algorithm after each iteration.

Tasks:

Implement the power iteration algorithm. Use as a stopping criterion the value of J(\boldsymbol{w}) between two iterations increasing by less than 0.01.
Print the value of the objective function J(\boldsymbol{w}) at each iteration.
Measure the time taken to find the principal component.

Visualize the the eigenvector \boldsymbol{w} obtained after convergence using the function utils.render . Note that if the algorithm runs for more than 1 minute, there may be some error in your implementation.

In [5]:

### REPLACE BY YOUR CODE

import solutions; solutions.pca_powit(D) ###

iteration  0   J(w) =
                           0.656
                         126.401
                         207.474
                         221.764
                         231.388
                         241.061
                         250.240
                         257.989
                         263.840
                         267.884
                         270.508
                         272.142
                         273.133
                         273.725
                         274.075
                         274.281
                         274.402
                         274.473
                         274.514
                         274.538
                         274.552
                         274.561
                         274.565
                         274.568
                         274.570

We observe that the computation time has decreased significantly. The difference of performance becomes larger as the number of dimensions and data points increases. We can observe that the principal component is the same (sometimes up to a sign flip) as the one obtained by the SVD algorithm.

iteration
iteration
iteration
iteration
iteration
iteration
iteration
iteration
iteration
iteration 10
iteration 11
iteration 12
iteration 13
iteration 14
iteration 15
iteration 16
iteration 17
iteration 18
iteration 19
iteration 20
iteration 21
iteration 22
iteration 23
iteration 24
Time: 5.962 seconds

1 2 3 4 5 6 7 8 9

J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =
J(w) =

.,, A, n/3 -;,.,,n’- -1-

1.tt

I

:. o

,vJd.,..;;W

:::::.)

.J …) *e ;; 1 . ~ -AJ,-

..J …J _ ) 2.-0-1.,,,, +AJr

:::7

i

L.

+-1.t o

~

kiiv~ ·~ .,:j/,JJ… “”‘-J.A,, ~ .t.

‘° ,.r–:..)

,./

),-J ‘j((‘\)” (e)+ 1(lfJ”If0-~I/+-:\ll(-t1{“-

J J ( t , 1 ) < l ( – J ( t t , ~0 – . . l ” – – –:.___,-..:.:.._· .::: :–::::s I(J – m,I\ ::\ VI -cII – ·”‘

)

L)

(-)

->

.J
. c.ll

de J{9

~>

(11+~){e-c)-( -c) ~ . -~ -~

“””” 1±4-..,/,w.-.- ..-.A t ‘

;h~ ~,th _M ( Y h – ) r . A , _ – ~ ~

J(0J_ j.J-. t.

~~

=

c.)–:::. o ~o

-…) “-

1) C )
‘L IG

-~ – <..

111

i(e- -i/– (e- . …:. _.>

.J-21 ./ ..J _),

+~)(e-C)

( ,1
-6:) (-1+-:\)I-(

-..)
I! ~ – <.I) _

‘\.

=1
+ ~ \ ; ; . ~, . . . 1 1 ~ –

·
‘TN-~ ;tB1=1,e,~II’–~00

-..l

_;;

– –

11

,vl;t:…. .~

.

rtwi }v

.fl.

£,”

  • Week03_Principal-Component-Analysis-yk8yzr.zip