- In this question we consider the hard margin SVM formulation as developed in class and thetextbook. Recall that we stated that for this type of optimization problem there is no duality gap. That is, the primal and dual objective functions have the same value at their optimal solutions. Use this fact and other facts developed in class to show that the maximum margin γ∗ can be calculated from the dual solution α∗. More specifically show that .
- The soft margin SVM formulation developed in class called for minimizing subject to constraints (for all i) 1 − ξi − ti(vTφ(xi) + v0) ≤ 0 and ξi ≥ 0.
Consider the alternative formulation QSVM that uses a quadratic penalty on ξi which is: minimize subject to constraints (for all i) 1 − ξi − ti(vTφ(xi) + v0) ≤ 0.
Note that we do not need the constraint ξi ≥ 0 in this formulation because ξi < 0 increases the objective function and makes the constraints harder to satisfy. Develop the dual formulation of QSVM: that is (1) compute the dual objective function, (2) state the dual optimization problem, and (3) argue that QSVM is also a kernel method.
- In this problem we show that many machine learning problems have solutions that can beexpressed through kernels. Consider the standard formulation where ai = wTφ(xi), and where we have a loss function `(ai,ti). For example we might use the square loss `(ai,ti) = (ti−ai)2 or the classification log loss `(ai,ti) = ti lnyi +(1−ti)ln(1−yi) where yi = σ(ai) etc. But for this question we consider any loss function. In addition consider any regularization function R(b) which is monotonically increasing, where b is a scalar.
Now consider a machine learning objective of the form Pi `(ai,ti) + R(wTw).
Show the the optimal solution can be expressed as w = Pi αiφ(xi) that is, it is a weighted sum of training examples. Note that this shows that predictions on test example z can be computed as wTφ(z) = Pi αiK(xi,z).
- In lecture we provided a general form for optimizing the hyper-parameters of Gaussian ProcessRegression (GPR). In general there is no closed form and we must use some iterative gradient based optimization. Consider GPR where one hyperparameter, θ0, gives the overall scale of the covariance. Specifically, assume the covariance matrix has the form C(xn,xm) = θ0k(xn,xm) where k(·,) is any valid kernel. Show that the marginal likelihood can be optimized w.r.t. θ0 in closed form and develop the solution for θ0.