[SOLVED] COMP5112 Assignment 2-Pthread

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

zip file icon pthreads_assign2-rodpro.zip (251.7 KB)
Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!
Submit New Instructions
🔒 Securely Powered by:
Secure Checkout
Rate this product

Assignment 2: Pthread Programming

Assignment Description

Graph structural clustering is a common data analysis task to cluster vertices by their edge connec- tions in the graph. SCAN (Structural Clustering Algorithm for Networks) [1] is such an algorithm that clusters vertices based on a structural similarity measure. The algorithm is efficient in both computation and memory.

Figure 1: Illustration of data structure.

In our implementation, a graph G = (V, E) is stored in a CSR structure as shown in Fig. 1: where one array Neighbors stores the neighbors of all vertices, and the other array Offsets stores the offset of each vertex into the Neighbors array.

Please note that: (1) The length of the array Offsets is |V | + 1), with the last element of this array points to the end of Neighbors. (2) Accordingly, the length of Neighbors is the number of edges plus 1 (|E| + 1).

The SCAN algorithm, shown in Algorithm 1, can be divided into two stages: (1) find pivot vertices using the structural similarity measure, (2) expand clusters starting from the pivot vertices in the depth-first order.

1

Assignment 2 COMP5112/MSBD5009, 2021 Spring

1 2

3 4 5

6

7 8 9

10 11 12 13

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Algorithm 1: The SCAN Algorithm

Input : a graph G = (V, E), the structural similarity threshold ε, and the minimal cluster size μ

Output: the total number of clusters in G, and the cluster ID of each vertex main()

read files and initialization

// Stage 1: find pivot vertices each of which has at least μ neighbors whose similarity with the vertex exceeds ε

foreach v ∈ V do
foreach w ∈ Neighbors(v) do

Γ(v) ← N eighbors(v) ∪ {v}, Γ(w) ← N eighbors(w) ∪ {w}

if similarity(v, w) ← √

> ε then

|Γ(v)∩Γ(w)|

|Γ(v)|×|Γ(w)| Neighborsε(v) ← Neighborsε(v) ∪ {w}

end end

if |Neighborsε(v)| > μ then pivots[v] ← true

else end

   // Stage 2:  expand clusters from pivot vertices

foreach v ∈ V do
if pivots[v] and !visited[v] then

visited[v] ← true
cluster_result[v] ← v expansion(v, v)
num_clusters ← num_clusters + 1

else end

output num_clusters and cluster_result expansion(v, label)

foreach w ∈ Neighborsε(v) do
if pivots[w] and !visited[w] then

visited[w] ← true cluster_result[w] ← label expansion(w, label)

end end

Input and output

In this assignment, you will implement a pthread version of the SCAN algorithm. In the assignment folder:

• clustering_sequential.cpp is the sequential version for your reference. You can check the clustering result and compare its running time.

• clustering.h provides some utility funictions.

• main.cpp contains the main function of the pthread version.

• clustering_pthread_skeleton.cpp is the code skeleton for your work. Your task is to complete the following function which returns the array cluster_result:
int *scan(float epsilon, int mu, int num_threads, int num_vs, int num_es, int *nbr_offs, int *nbrs);

2

Assignment 2 COMP5112/MSBD5009, 2021 Spring

• The datasets and results folders contain the datasets and clustering results respectively. Here is the output of executing the algorithm on ./sequential ../../data/test1/1.txt 0.7

3:

Screen output:

Elapsed Time: 0.000016516 s Number of clusters: 2

Result file output (sequential.txt):

2
-1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 -1 13 -1 13 -1 -1

2 in the first line is the number of clusters. The second line 6 13 is the cluster IDs (-1 if not in any cluster) of all vertices in order. Please note that in the parallel setting, we set the cluster ID be the lowest vertex ID of the pivots in the cluster. Make sure your file output follows the same format.

  • pthreads_assign2-rodpro.zip