Description
- In discussion we derived an expression for the signed distance d between an arbitrary point x (or p) and a hyperplane H given by g(x)= w0 + wT x = 0 , all in nonaugmented feature space. This question explores this topic further.
- Prove that the weight vector w is normal to H.
Hint: For any two points x1and x2 on H, what is g(x1)− g(x2)? How can you interpret the vector (x1− x2) ?
- Show that the vector w points to the positive side of H. (Positive side of H means the d > 0)
Hint: What sign does the distance d from H to x =(x1+ aw) have, in which x1 is a point on H?
- Derive, or state and justify, an expression for the signed distance r between an arbitrary point x(+) and a hyperplane g(x(+))= w(+)T x(+) = 0 in augmented feature space. Set up the sign of your distance so that w points to the positive-distance side of H.
- In weight space, using augmented quantities, derive an expression for the signed distance between an arbitrary point w(+) and a hyperplane g(x(+))= w(+)T x(+) = 0 , in
which the vector x(+) defines the positive side of the hyperplane.
- For a 2-class learning problem with one feature, you are given four training data points (in augmented space):
x1(1) =(1,−3); x2(1) =(1,−5); x3(2) =(1,1); x4(2) =(1,−1)
- Plot the data points in 2D feature space. Draw a linear decision boundary H that correctly classifies them, showing which side is positive.
- Plot the reflected data points in 2D feature space. Draw the same decision boundary; does it still classify them correctly?
- Plot the reflected data points, as lines in 2D weight space, showing the positive side of each. Show the solution region.
- Also, plot the weight vector w of H from part (a) as a point in weight space. Is w in the solution region?
- (a) Let p(x) be a scalar function of a D-dimensional vector x , and f ( p) be a scalar function of p. Prove that:
- 1 of 2
⎣ ⎦
i.e., prove that the chain rule applies in this way. [Hint: you can show it for the ith component of the gradient vector, for any i. It can be done in a couple lines.]
- Use relation (18) of DHS A.2.4 to find ∇x(xT x).
- Prove your result of ∇x(xT x) in part (b) by, instead, writing out the components.
⎡( T )3⎤
- Use (a) and (b) to find ∇x⎢ x x ⎥ in terms of x .
⎣ ⎦
- (a) Use relations above to find ∇w w 2 . Express your answer in terms of w 2 where
possible. Hint: let p = wTw; what is f ?
(b) Find: ∇w Mw− b 2. Express your result in simplest form. Hint: first choose p
(remember it must be a scalar).
- [Extra credit] For C > 2 , show that total linear separability implies linear separability, and show that linear separability doesn’t necessarily imply total linear separability. For the latter, a counterexample will suffice.