Description

5/5 - (5 votes)

Overview

Name: Fundamentals of Computer Vision Project 3 Tracking Objects in Videos Solved
SKU: 55633
Price: 95.00 USD
Availability: InStock

One incredibly important aspect of human and animal vision is the ability to follow objects and people in our view. Whether it is a tiger chasing its prey, or you trying to catch a basketball, tracking is so integral to our everyday lives that we forget how much we rely on it. In this assignment, you will be implementing an algorithm that will track an object in a video.

You will first implement the Lucas-Kanade tracker, and then a more computationally efficient version called the Matthew-Baker (or inverse compositional) method [1]. This method is one of the most commonly used methods in computer vision due to its simplicity and wide applicability. We have provided two video sequences: a car on a road, and a helicopter approaching a runway.

To initialize the tracker you need to define a template by drawing a bounding box around the object to be tracked in the first frame of the video. For each of the subsequent frames the tracker will update an affine transform that warps the current frame so that the template in the first frame is aligned with the warped current frame.

Preliminaries

An image transformation or warp is an operation that acts on pixel coordinates and maps pixel values from one place to another in an image. Translation, rotation and scaling are all examples of warps. We will use the symbol W to denote warps. A warp function W has a set of parameters p associated with it and maps a pixel with coordinates x = [u v]^Tto x0 = [u0 v0]T.

x⁰= W(x;p) (1)

An affine transform is a warp that can include any combination of translation, anisotropic scaling and rotations. An affine warp can be parametrized in terms of 6 parameters p =

[p₁p₂p₃p₄p₅p₆]^T. One of the convenient things about an affine transformation is that it is linear; its action on a point with coordinates x = [u v]^Tcan be described as a matrix operation

(2)

Where W(p) is a 3 × 3 matrix such that

W (3)

Note that for convenience when we want to refer to the warp as a function we will use W(x;p) and when we want to refer to the matrix for an affine warp we will use W(p). Table 1 contains a summary of the variables used in the next two sections. It will be useful to keep these in mind.

Table 1: Summary of Variables

Symbol	Vector/Matrix Size	Description
u	1 × 1	Image horizontal coordinate
v	1 × 1	Image vertical coordinate
x	2 × 1 or 1 × 1	pixel coordinates: (u,v) or unrolled
I	m × 1	Image unrolled into a vector (m pixels)
T	m × 1	Template unrolled into a vector (m pixels)
W(p)	3 × 3	Affine warp matrix
	6 × 1	parameters of affine warp
	m × 1	partial derivative of image wrt u
∂v	m × 1	partial derivative of image wrt v
∂T ∂u	m × 1	partial derivative of template wrt u
	m × 1	partial derivative of template wrt v
∇I	m × 2	image gradient
∇T	m × 2	image gradient
∂W ∂p	2 × 6	Jacobian of affine warp wrt its parameters
J	m × 6	Jacobian of error function L wrt p
H	6 × 6	Pseudo Hessian of L wrt p

Lucas-Kanade: Forward Additive Alignment

A Lucas Kanade tracker maintains a warp W(x;p) which aligns a sequence of images I_tto a template T. We denote pixel locations by x, so I(x) is the pixel value at location x in image I. For the purposes of this derivation, I and T are treated as column vectors (think of them as unrolled image matrices). W(x;p) is the point obtained by warping x with a transform that has parameters p. W can be any transformation that is continuous in its parameters p. Examples of valid warp classes for W include translations (2 parameters), affine transforms (6 parameters) and full projective transforms (8 parameters). The Lucas Kanade tracker minimizes the pixel-wise sum of square difference between the warped image I(W(x;p)) and the template T.

In order to align an image or patch to a reference template, we seek to find the parameter vector p that minimizes L, where:

L = ^X[T(x) − I(W(x;p))]²(4)

In general this is a difficult non-linear optimization, but if we assume we already have a close estimate p of the correct warp, then we can assume that a small linear change ∆p is enough to get the best alignment. This is the forward additive form of the warp. The objective can then be written as:

L = ^X[T(x) − I (W(x;p + ∆p))]²

Expanding this to the first order with Taylor Series gives us:

(5)

L ≈ ^X (6)

Here , which is the vector containing the horizontal and vertical gradient at pixel location x. Rearranging the Taylor expansion, it can be rewritten as a typical least squares approximation ∆p^∗= argmin||A∆p − b||²

∆p

∆p^∗= argmin (7)

This can be solved with ∆p^∗= (A^TA)⁻¹A^Tb where:

(A^TA) = H = ^X (8)

A = ^X (9)

b = T(x) − I(W(x;p)) (10)

Once ∆p is computed, the best estimate warp can be updated p ← p + ∆p, and the whole procedure can be repeated again, stopping when ∆p is less than some threshold.

Matthew-Baker: Inverse Compositional Alignment

While Lucas-Kanade alignment works very well, it is computationally expensive. The inverse compositional method is similar, but requires less computation, as the Hessian and Jacobian only need to be computed once. One caveat is that the warp needs to be invertible. Since affine warps are invertible, we can use this method.

In the previous section, we combined two warps by simply adding one parameter vector to another parameter vector, and produce a new warp W(x,p+p⁰). Another way of combining warps is through composition of warps. After applying a warp W(x;p) to an image, another warp W(x;q) can be applied to the warped image. The resultant (combined) warp is

W(x;q) ◦ W(x;p) = W(W(x;p),q) (11)

Since affine warps can be implemented as matrix multiplications, composing two affine warps reduces to multiplying their corresponding matrices

W(x;q) ◦ W(x;p) = W(W(x;p),q) = W(W(p)x,q) = W(q)W(p)x (12)

An affine transform can also be inverted. The inverse warp of W(p) is simply the matrix inverse of W(p), W(p)⁻¹. In this assignment it will sometimes be simpler to consider an affine warp as a set of 6 parameters in a vector p and it will sometimes be easier to work with the matrix version W(p). Fortunately, switching between these two forms is easy (Equation 3).

The minimization is performed using an iterative procedure by making a small change (∆p) to p at each iteration. It is computationally more efficient to do the minimization by finding the ∆p that helps align the template to the image, than applying the inverse warp to the image. This is because the image will change with each frame of the video, but the template is fixed at initialization. We will see soon that doing this allows us to write the Hessian and Jacobian in terms of the template, and so this can be computed once at the beginning of the tracking. Hence at each step, we want to find the ∆p to minimize

L = ^X[T(W(x;∆p)) − I(W(x;p))]²(13)

For tracking a patch template, the summation is performed only over the pixels lying inside the template region. We can expand T(W(x;∆p)) in terms of its first order linear approximation to get

L ≈ ^X (14)

Where ∇T(x) = h^∂^T_∂u^{(x) ∂T}_∂v^(x)ⁱ. To minimize we need to take the derivative of L and set it to zero

∂L

(15)

Setting to zero, switching from summation to vector notation and solving for ∆p we get

∆p = H⁻¹J^T[I(W(x;p)) − T] (16)

where J is the Jacobian of T(W(x;∆p)), J = ∇T^∂_∂^W_p, H is the approximated Hessian

H = J^TJ and I(W(x;p)) is the warped image. Note that for a given template, the Jacobian J and Hessian H are independent of p. This means they only need to be computed once and then they can be reused during the entire tracking sequence.

Once ∆p has been solved for, it needs to be inverted and composed with p to get the new warp parameters for the next iteration.

W(x;p) ← W(x;p) ◦ W(x;∆p)⁻¹(17)

The next iteration solves Equation 16 starting with the new value of p. Possible termination criteria include the absolute value of ∆p falling below some value or running for some fixed number of iterations.

1 Theory Questions (25 points)

Type down your answers for the following questions in your write-up. Each question should only take a couple of lines. In particular, the “proofs” do not require any lengthy calculations. If you are lost in many lines of complicated algebra you are doing something much too complicated (or wrong).

Q1.1: Calculating the Jacobian (15 points)

Assuming the affine warp model defined in Equation 3, derive the expression for the Jacobian Matrix J in terms of the warp parameters p = [p₁p₂p₃p₄p₅p₆]⁰.

Q1.2: Computational complexity (10 points)

Find the computational complexity (Big O notation) for the initialization step (Precomputing J and H⁻¹) and for each runtime iteration (Equation 16) of the Inverse Compositional method. Express your answers in terms of n, m and p where n is the number of pixels in the template T, m is the number of pixels in an input image I and p is the number of parameters used to describe the warp W. How does this compare to the run time of the regular Lucas-Kanade method?

2 Lucas-Kanade Tracker (60 points)

For this section, TA will grade your tracker based on the performance you achieved on the two provided video sequences: (1) data/car1/. The provided script files lk demo.m and mb demo.m handle reading in images, template region marking, making tracker function calls and displaying output onto the screen. The function prototypes provided are guidelines. Please make sure that your code runs functionally with the original script and generates the outputs we are looking for (a frame sequence with the bounding box of the target being tracked on each frame) so that we can replicate your results.

Note that the only thing TA would do for you during grading is change the input data directory, and initialize your tracker based on what you mentioned in your write-up. Please submit one video for each of them in the results/ directory, with file name car.mp4. Also, please mention the initialization coordinates of your tracker for both video sequences in your write-up and in your code.

Q2.1: Write a Lucas-Kanade Tracker for a Flow Warp (20 points)

Write the function with the following function signature:

[u,v] = LucasKanade(It, It1, rect) that computes the optimal local motion from frame I_tto frame I_t₊₁that minimizes Equation 1. Here It is the image frame I_t, It1 is the image frame I_t₊₁, and rect is the 4×1 vector that represents a rectangle on the image frame It. The four components of the rectangle are [x, y, w, h], where (x, y) is the top-left corner and (w, h) is the width and height of the bounding box. The rectangle is inclusive, i.e., in includes all the four corners. To deal with fractional movement of the template, you will need to interpolate the image using the Matlab function interp2. You will also need to iterate the estimation until the change in warp parameters (u, v) is below a threshold. Use the forward compositional (Lucas-Kanade method) for this question.

Q2.2: Initializing the Matthew-Baker Tracker (10 points)

Write the function initAffineMBTracker() that initializes the inverse compositional tracker by precomputing important matrices needed to track a template patch.

function [affineMBContext] = initAffineMBTracker(img, rect)

The function will input a greyscale image (img) along with a bounding box (rect) (in the format [x y w h]).

The function should output a Matlab structure affineMBContext that contains the Jacobian of the affine warp with respect to the 6 affine warp parameters and the inverse of the approximated Hessian matrix (J and H⁻¹in Equation 16).

Q2.3: The Main Matthew-Baker Tracker (20 points)

Write the function affineMBTracker() that does the actual template tracking. function [Wout] = affineMBTracker(img, tmp, rect, Win, context) The function will input a greyscale image of the current frame (img), the template image (tmp), the bounding box rect that marks the template region in tmp, The affine warp matrix for the previous frame (Win) and the precomputed J and H⁻¹matrices context.

The function should output the 3 × 3 matrix Wout that contains the new affine warp matrix updated so that it aligns the current frame with the template.

You can either used a fixed number of gradient descent iterations or formulate a stopping criteria for the algorithm. You can use the included image warping function to apply affine warps to images.

Q2.4: Tracking a Car (10 points)

Test your trackers on the short car video sequence (data/car1/) by running the wrapper scripts lk demo.m and mb demo.m. What sort of templates work well for tracking? At what point does the tracker break down? Why does this happen?

In your write-up: Submit your best video of the car being tracked. Save it as results/car.mp4.

Figure 2: Tracking in the car image sequences

4 Submission Summary

1 Derive the expression for the Jacobian Matrix
2 What is the computational complexity of inverse compositional method?
1 Write the forward compositional tracker (LK Tracker)
2 Initialize the inverse compositional tracker (MB Tracker)
3 Write the inverse compositional tracker (MB Tracker)
4 Run the inverse compositional tracker on the car dataset. What templates does it work well with? When does the tracker break down? Why does this happen?
5 Run the inverse compositional tracker on the run markings dataset.

References

Simon Baker, et al. Lucas-Kanade 20 Years On: A Unifying Framework: Part 1, CMURI-TR-02-16, Robotics Institute, Carnegie Mellon University, 2002
Simon Baker, et al. Lucas-Kanade 20 Years On: A Unifying Framework: Part 2, CMURI-TR-03-35, Robotics Institute, Carnegie Mellon University, 2003
Bouguet, Jean-Yves. Pyramidal Implementation of the Lucas Kanade Feature Tracker:

Description of the algorithm, Intel Corporation, 2001

Credit

This project is adapted directly from Ioannis Gkioulekas.

Project3-k0vgx3.zip

Fundamentals of Computer Vision Project 3 Tracking Objects in Videos Solved

If Helpful Share: