Name: ELEN4720 Problem 4 Solved
SKU: 57377
Price: 30.00 USD
Availability: InStock

Description

5/5 - (1 vote)

Problem 1 (Markov chains)

In this problem, you will rank 769 college football teams based on the scores of every game in the 2019 season. The data provided in CFB2019scores.csv contains the result of one game on each line in the format

Team A index, Team A points, Team B index, Team B points.

If Team A has more points than Team B, then Team A wins, and vice versa. The index of a team refers to the row of “TeamNames.txt” where that team’s name can be found.

Construct a 769×769 random walk matrix M on the college football teams. First construct the unnormalized matrix M_cwith values initialized to zeros. For one particular game, let i be the index of Team A and j the index of Team B. Then update

Mcii ← Mcii + 1{Team A wins} + pointspointsi+pointsi j ,

points_j

c {Team B wins} + pointsi+pointsj ,

Mcij ← Mcij + 1{Team B wins} + pointspointsi+pointsj j , {Team A wins} + pointspointsi+pointsi j .

After processing all games, let M be the matrix formed by normalizing the rows of M_cso they sum to one. Let w_tbe the 1×769 state vector at step t. Set w₀to the uniform distribution. Therefore, w_tis the marginal distribution on each state after t steps given that the starting state is chosen uniformly at random.

Use w_tto rank the teams by sorting in decreasing value according to this vector. List the top 25 team names (see accompanying file) and their corresponding values in w_tfor t = 10,100,1000,10000.
We saw that w_∞is related to the first eigenvector of M^T. That is, we can find w_∞by getting the first eigenvector and eigenvalue of M^Tand post-processing:

This is because by convention. Also, we observe that λ₁= 1 for this specific matrix. Plot kw_t− w_∞k₁as a function of t for t = 1,…,10000.

Problem 2 (Nonnegative matrix factorization) – 40 points

In this problem you will factorize an N × M matrix X into a rank-K approximation WH, where W is N × K, H is K × M and all values in the matrices are nonnegative. Each value in W and H can be initialized randomly to a positive number, e.g., from a Uniform(1,2) distribution.

The data to be used for this problem consists of 8447 documents from The New York Times. (See below for how to process the data.) The vocabulary size is 3012 words. You will need to use this data to construct the matrix X, where X_ijis the number of times word i appears in document j. Therefore, X is 3012×8447 and most values in X will equal zero.

Implement and run the NMF algorithm on this data using the divergence penalty. Set the rank to 25 and run for 100 iterations. This corresponds to learning 25 topics. Plot the objective as a function of iteration.
After running the algorithm, normalize the columns of W so they sum to one. For each column of W, list the 10 words having the largest weight and show the weight. The ith row of W corresponds to the ith word in the “dictionary” provided with the data. Organize these lists in a 5×5

Comments about Problem 2:

When dividing, you may get 0/0 = NaN. This will cause the algorithm to return all NaN values. You can add a very small number (e.g., 10⁻¹⁶) to the denominator to avoid this.
Each row in nytdata.txt corresponds to a single document. It gives the index of words appearing in that document and the number of times they appear. It uses the format “idx:cnt” with commas separating each unique word in the document. Any index that doesn’t appear in a row has a count of zero for that word. The vocabulary word corresponding to each index is given in the corresponding row of nytvocab.dat.

HW4-s6tyya.zip

ELEN4720 Problem 4 Solved

If Helpful Share:

Description

Related products

ELEN4720 Problem 3 Solved

ELEN4720 Problem 2 Solved

ELEN4720 Problem 1 Solved