## Description

- In the basic stereo imaging setup below, the origin of the world coordinate system W is located at the lens center of the left camera. The distance between the lens centers of the two cameras is 12 cm. The two cameras have a focal length of 50 mm and the sensor chips (real image planes) of the cameras have a physical size of 1.2 cm ×2 cm. The output of the cameras is a pair of digital stereo images, each of size 512 × 512 pixels. The tip of vertical pole # 1 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,125) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) = (185,115). The tip of vertical pole # 2 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,179) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) = (185,169). Compute the horizontal distance between the tips of the two poles in the world coordinate system (horizontal distance = distance in the 𝑥𝑥 direction.) Show all work to get full credits. (The integer image plane uses the i
*-j*coordinate system with*i*going from top to bottom and*j*going from left to right.) _{𝑖𝑖}(𝑋𝑋) to classify input*X*into one of three classes. The prototype vectors for the three classes are given below. Find the equation of the decision boundary between classes 1 and 3 and simplify the equation into an algebra equation (not matrix equation) and then plot the decision boundary as a graph.- We would like to use the signed representation of the
*Histogram of Oriented Gradients*(*HOG*) descriptor to detect human in images. In the signed representation, the histogram has 18 bins.

- What is the dimension of the descriptor if we assume the following parameter settings:

detection window size = 296 x 168 pixels (rows x columns), cell size = 8 x 8 pixels, block size = 3 x 3 cells, and block overlap = 8 pixels.

- The bin centers for the 18 histogram bins, the gradient magnitudes and gradient angles of an 8 x 8 cell are as given below, compute the histogram of the cell (before block normalization.)

** **

Bin # |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |

Bin centers (in degrees) |
0 | 20 | 40 | 60 | 80 | 100 | 120 | 140 | 160 | 180 | 200 | 220 | 240 | 260 | 280 | 300 | 320 | 340 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 220 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 180 | 0 | 0 | 0 |

0 | 120 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

** Gradient Magnitudes **

** **

200 | 45 | 23 | 98 | 130 | 260 | 255 | 250 |

125 | 295 | 85 | 90 | 130 | 265 | 249 | 240 |

123 | 35 | 85 | 95 | 125 | 260 | 250 | 240 |

100 | 90 | 45 | 90 | 120 | 265 | 240 | 230 |

95 | 99 | 105 | 106 | 355 | 120 | 100 | 110 |

90 | 205 | 110 | 120 | 120 | 130 | 125 | 120 |

85 | 90 | 100 | 110 | 110 | 120 | 120 | 110 |

80 | 80 | 100 | 110 | 100 | 100 | 100 | 110 |

**Gradient Angles **

- Suppose we have already computed the normalized co-occurrence matrix 𝑃𝑃[𝑖𝑖, 𝑗𝑗] of an input image using displacement vector 𝑑𝑑 = (𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑), can we obtain the normalized co-occurrence matrix

𝑃𝑃^{′}[𝑖𝑖, 𝑗𝑗] for displacement vector 𝑑𝑑′ = (−𝑑𝑑𝑑𝑑, −𝑑𝑑𝑑𝑑) without referring to the original input image? If so, how do we do that? Do not write more than six sentences. (Hint: displacement vector 𝑑𝑑′ has the same magnitude as *d *but in the opposite direction.)

- Consider the camera coordinate system C and the world coordinate system W as

shown in the figure below. The origin of the camera coordinate system is located at 𝑤𝑤_{(𝑥𝑥,𝑦𝑦,𝑧𝑧)=}𝑤𝑤_{(6,2,0)} with respect to the world coordinate system. The *x* axis of the camera coordinate system is parallel to the *y* axis of the world coordinate system, the *y* axis of the camera coordinate system is parallel but points in the opposite direction of the *x *axis of the world coordinate system, and the *z* axis of the camera coordinate system is parallel to the *z *axis of the world coordinate system. The camera has a focal length of 45 mm and the real image plane (𝑥𝑥^{′}, 𝑦𝑦′) of the camera is of size 1 cm × 1 cm. The real image plane is digitized into a digital image of size 1024 × 1024 pixels. **Derive the **𝟑𝟑 × 𝟒𝟒** camera transform that transforms points in the world coordinate system to the pixel coordinate system of the camera. **

**Note**: Assume that the real image plane has origin at the lower left corner, with the 𝑥𝑥′ axis pointing to the right and the 𝑦𝑦′ axis pointing upward. The digital image plane has origin (0,0) at the upper left corner, with the *i *axis pointing downward and the *j* axis pointing to the right. The range for both *i* and *j* is [0, 1023].

- In the
*LeNet-5*convolutional neural network below, (a) what is the total number of links between the input layer and the C1 layer? (b) How many different parameters need to be trained for the links between the input layer and the C1 layer? *Softmax*Suppose the input to the Softmax layer is [0 7 5 0 1]^{𝑇𝑇}, what are the final outputs of the neural network?

__Hint__: the formula for the *Softmax* function is:

- In the Eigenface method for face recognition, we compute the distance between an input face and its reconstruction as 𝑑𝑑
_{0 }= dist(𝐼𝐼_{𝑅𝑅}⃗, 𝐼𝐼⃗). The distance between an input face image and its reconstruction should be small. Explain why the distance will be large for a non-face input image. Do not write more than six sentences.