Name: COMS3251 Homework 4-ML-Lossless-Compression-main Solved
SKU: 89008
Price: 30.00 USD
Availability: InStock

5/5 - (1 vote)

1 Numerical and Theory Problems

Let be an m × n matrix with columns q_iall unit length and mutually

orthogonal to each other.

Show that the transformation associated with such a matrix preserves the lengths and angles of all vectors. That is, for all x,y ∈ Rⁿ, we have
kxk = kQxk, and • ∠(x,y) = ∠(Qx,Qy).
Show that m ≥ n.

It is often useful to consider an orthogonal basis^[1] for a subspace. One way to orthogonalize a vector v from a set of vectors v₁,…,v_kis to subtract off the component that is parallel to the vectors v₁,…,v_kfrom v.

Consider two vectors v₁,v₂∈ Rⁿ(n ≥ 2) that form a basis for a 2-dimensional plane P ⊆ Rⁿ. Using the idea discussed above, we can come up with an orthogonal basis for P as follows:

b₁= v₁
b₂= v₂− α, where α + β = v₂, α ∈ span{v₁}, β ∈ span{v₁}^⊥.

(That is, α is the component of v₂that is parallel to the direction v₁, and β is the component of v₂that is perpendicular to the direction v₁)

Write down an expression for b₂that only uses the vectors v₁and v₂.
Show that the following facts:
- b₁ b₂= 0,
- span{b₁,b₂} = P.

(these two facts collectively show that {b₁,b₂} form an orthogonal basis for the given subspace P).

Let v₁,v₂,v₃form a basis for a given subspace W.

Inspired from the discussion above, provide a procedure to compute an orthogonal basis b₁,b₂,b₃for W.
Verify that your basis set provided in part (c) is in fact orthogonal and spans the given subspace W.

[Graph Theory Revisited] In HW3 we wanted to establish a connection between graph connectivity and some of the linear algebra primitives, but we hit a snag. So we will try again.

Given a graph G = (V,E) with v := |V | vertices and e := |E| edges. We can derive two important matrices

The v × v adjacency matrix A, where A_ijentry denotes if there is an edge between the

ith and the jth node. That is, A_ij= n 1 exists edge (vⁱ^,v^j^{) ∈ E}.

0 otherwise

The v × v diagonal degree matrix D, where the D_iientry denotes the degree of the ith node. That is D_ii= # of edges connected to v_i
Provide examples of three unweighted undirected (simple) graphs, each with six vertices. Make sure that each graph is disconnected and each graph has a different number of connected components from the other graphs. (You can use the same examples as you used in HW3)

For each case, compute the dimension of the null space of the difference matrix (D−A).

What connection/observation can you make between the dimension of the null space of the matrix (D − A) and the number of connected components of the graph? (You don’t need to prove anything.)

Let A be an n × n matrix whose rank is 1. Let v := (v₁,…,v_n)^T= 06 be a basis for col(A).
- Show that A_ij= v_iw_jfor some vector w := (w₁,…,w_n)^T6= 0.
- If the vector z = (z₁,…,z_n)^T6= 0 satisfies z w = 0, show that z is an eigenvector (of A) with eigenvalue 0.
- The trace of a matrix is the sum of its diagonal entries i.e. tr. With this definition show the following:

If tr(A) 6= 0, then tr(A) is an eigenvalue of A. What is the corresponding eigenvector?

Two matrices X and Y are similar if there exists an invertible matrix Z such that X = ZY Z⁻¹. With this definition show the following:

If tr(A) 6= 0, prove that A is similar to the following n × n matrix

^c 0 ··· 0^

0 0 ··· 0

… … … …,

0 0 ··· 0

where c = tr(A).

(e) If tr(A) = 1, show that A is a projection, that is, A²= A.

2 Applications & Programming (Due: Mon Aug 16)

For this question, answer all relevant writing parts in your homework report.

Consider the case where you are given access to some data, but you discover that the data is too bulky for you to carry around. You will eventually want to use this data for some downstream task: if your data is weather statistics, you may want to use it to build a prediction model, or if it’s an image, you may want to include it in a presentation. Unfortunately, you have no information about what this downstream task is- all you know is that you want to compress your data so that it fits in the memory that you have available, and that you have less memory available than what the data occupies right now. In this case, it is reasonable to assume you want to attempt the following:

To the best of your ability, you want to go for lossless compression. This means that you want to eliminate all the possible redundancies in your data. For example, say that your data is a matrix of n columns in R^dbut only m < n of these columns are linearly independent. Then storing the remaining of the n − m columns in their entirety is a redundancy, and there is room for lossless compression.
If lossless compression is not possible, you want to work toward lossy compression that incurs the least amount of data loss. This means that in the pursuit of compression, you will loose some information, but you still want to quantify how much information you lost and minimize some notion of this quantity.

We will see how Singular Value Decomposition can help us achieve either of these outcomes. Let us consider the application of compression to image data. We will say that matrix X ∈ R^m^×nrepresents a grayscale image, with each element of these matrix representing the grayscale value at that pixel location. Assume through this question that it takes 1 unit of memory to store one element of a matrix

How many units of memory does it take to store the matrix M? Your answer should be in terms of m and n.
Assume now that you have access to some algorithm A that provides you with a low-rank approximation of M. Essentially, given M, the algorithm A returns a matrix L and another matrix R such that M ≈ LR^Twhere L and R are taller than they are wide. Let us label the number of columns in L and R as k. How many rows must L and R have respectively?
This makes L and R a new representation of our matrix M, because (an approximation of) M can be recovered from them (since M ≈ LR^T). Given your answer in (2), what can you say about how many units of memory it takes to store this new representation of M? Give an inequality relating k,m and n that describes the condition that must be met for L and R to be a more memory-efficient representation of our matrix M.
In parts (2) and (3), you assumed that an algorithm could magically give you the matrices L and R. However, you have already learnt one such algorithm: the Singular Value decomposition. If the full-rank SVD of M = USV ^T, then you can simply take L = US and R = V to get the left and right matrices. In the full-rank approximation, we have that our estimate LR^Texactly equals the original M. How many units of memory are required to store this matrix? Is it better or worse than part (1)?

To turn exact equality into an approximation that gives us a lower rank approximation of M, we should take only the first k columns of our decomposition matrices. Then, we can model

= U_kS_kand R = V_kwhere U_k∈ R^m^×k, S_k∈ R^k^×kand V_k∈ Rⁿ^×kand recover M^˜_k= LR^Tas an approximation of M.

Why does this make for a good approximation of our original M? And under what notion of approximation is this M^˜_kclose to M.? A common metric of measuring the distance between matrices is the Frobenius norm. The Frobenius norm is the matrix generalization of the typical vector norm that we are familiar with. Then, we can measure the distance between two matrices of the same size using the Frobenius norm of their difference:

Consider where v₁,v₂,…,v_rare the right singular vectors of

and u₁,u₂,…,u_rare the left singular vector of M, all corresponding to M’s top singular values (arranged in descending order of size) σ₁,σ₂,…,σ_r. Thenwhere k ≤ r.

This notation makes it explicit that M^˜_khas rank k . Why?

It can be shown that M^˜_kis the best rank-k approximation of M under the Frobenius norm. Intuitively, this can be argued as follows: the rows of M^˜_kare the projections of the rows of M onto the subspace spanned by the first k singular vectors of M. Hence the rows of M^˜_krepresent some notion of the ”best-fit” k-dimensional subspace of M. Since we are working with the Frobenius norm as the metric, we can carry over our intuition for ”best-fit” from vector norms. We will test this notion empericially.

We will now move onto the programming parts.

Download the image provided in the HW release titled camera.png, which is a 226 × 226 full-rank matrix. You should open it using the Image.open function from the Pillow library. This should give you a [226, 226, 3] matrix of which you should only take the first channel, yielding a [226, 226] matrix. You can then check (approximately) that this matrix is full rank using the matrix rank function in the numpy.linalg library.
- Study the documentation for numpy.linalg.svd function to calculate SVD carefully, and make sure how it returns the singular values and the left and right singular vectors. Using this function, calculate the SVD of M and recover S,U and V . Reorder S,U and V in decreasing order of the size of the singular values (this should be easy in NumPy). Make a plot where the horizontal axis is the rank of your singular value and the vertical axis is the n^thlargest singular value in S. Precisely, your horizontal axis should range from 1 to 226, and the corresponding points on the vertical axis should be the largest and the smallest singular values. Include this plot in your report titled Singular, and retain the matrices for S, U and V in your code for further parts.
- Use the method outlined above to get S_k, U_kand V_kfor four different values of k ≤ 226. Compute L and R using these matrices and visualize your corresponding M^˜_k. Include all your images in the report, labelled with corresponding values of k, and reflect on what you observe. Remember that if you use k = 226, computing LR^Tshould recover the original image exactly (barring numerical issues).
- For a particular value of k define I_kas the weight of the first k singular values relative to the total sum of all singular values. Therefore

This is a representation of the percent ”information” about M carried by the first k singular vectors. Make another plot with the same horizontal axis as (a) ranging from k = 1 to k = 226 and vertical axis as I_k. Include this plot in your report, titled Information.

Finally, make a third plot with the same horizontal axis as (a) and (c) ranging from k = 1 to k = 226 and the vertical axis referring to the memory usage in units of memory using your formula in part (3). Include this plot in your report, titled Memory.
Using your graphs in part (a), (c) and (d), visual intuition from part (b), and your inequality from (3), pick a value of k that is suitable for storing the given image with memory constraints. Justify why this would make a good value for k. Include the resulting M^˜_kin your report titled Image of rank k.

Please make sure you submit code for all parts of the programming to GradeScope as a zip file, with a function for each part clearly labelled. No points will be given unless both the code and the report section for this problem is answered.

[1] An orthogonal basis b₁,…,b_kis such that the basis vectors are mutually orthogonal. That is, b_i· b_j=0 for all i 6= j.

ML-Lossless-Compression-main-fotht6.zip

[SOLVED] COMS3251 Homework 4-ML-Lossless-Compression-main

Want to See Past Work First?

[SOLVED] COMS3251 Homework 4-ML-Lossless-Compression-main

Related products

COMS3251 Homework 2-Stereo Vision main

COMS3251 Homework 1-ML-Gauss-Jordan-main

COMS3251 Homework 3-Benchmarking-main

Related in this category

More in this category

COMS3251 Homework 1-ML-Gauss-Jordan-main

COMS3251 Homework 2-Stereo Vision main

COMS3251 Homework 3-Benchmarking-main

Want to See Past Work First?