CS201 Homework 3- Graphs Solved

35.00 $

Category:

Description

5/5 - (1 vote)

(a) i is the most popular vertex in the network with a
degree of 7.
(b) This network has 2 most popular vertices, i and q,
each having a degree of 5.
Figure 1: Two networks are shown with the nodes in each labeled with its degree.
The world we live in is an inter-connected system and many aspects of it can be represented by a graph,
or a network. A good example is the World Wide Web (WWW) which is an inter-connected system of
web-pages in which each vertex represents a page and an edge represents a direct link between two web
pages.
Network Science is the study of the networks that appear in different domains, e.g. in social, technological,
and biological disciplines. It finds application in many fields including sociology, physics, mathematics,
biology, economics, and computer science. The aim is to identify underlying common principles in the
formation of networks and explore why they vary in some cases. This is done through exploration of
networks to study structural patterns and to identify variations.
The field has undergone tremendous growth especially after ground-breaking discoveries of small-world
and scale-free properties in network models. Traditional graph theoretic properties take on added significance
in network science. For example, the degree of a vertex can be interpreted as the popularity of the vertex–in a
social network, a person with a higher value of the degree is a person with a lot of friends. Similarly, clustering
coefficient is used to quantify how closely vertices are connected to each other within a neighborhood. The
degree distribution of a graph describes the structure of the graph, i.e. whether it has a few prominent
vertices or hubs.
In this assignment, we are going to explore (and implement) some of the measures used in network science
and apply them to a few popular datasets.

1 Measures
We look at 5 measures: degree centrality, clustering coefficient, average neighbor degree, Jaccard similarity,
and popular distance.
Degree Centrality Centrality provides a measure of the importance of a vertex or edge. Degree centrality
uses the degree of a vertex, i.e. the number of connections of the vertex, as the measure. To compare
across networks of different sizes, i.e. with different number of vertices, the degree is normalized by the total
number of vertices in the network.
CD(i) = ki
n − 1
where i is the subject vertex, ki
is its degree, and n is the total number of vertices in the network.
Clustering Coefficient Clustering coefficient is used to quantify the connectivity of vertices in a graph.
The local clustering coefficient for an individual vertex, i, can be used to determine whether a graph is a
small-world network and is calculated as
Ci =
Li
ki(ki−1)
2
where ki
is the number of neighbors of i and Li
is the total number of edges between the ki neighbors of
i. The denominator indicates the maximum possible edges among the ki neighbors. If all the neighbors are
connected with each other, Ci will be 1, and if there are no common friends among the neighbors, Ci will
be 0. The average clustering coefficient of a network with n vertices is the average of all the local clustering
coefficients.
C =
1
n
X
i
Ci
Average Neighbor Degree The degree of a vertex describes how famous or popular it is in the network.
Average neighbor degree tells whether the vertex is surrounded by famous vertices. For a vertex, i, it is
calculated as
Ki =
1
|N(i)|
X
j∈N(i)
kj
where N(i) is the set of neighbours of i and kj is the degree of node j.
Jaccard Similarity Jaccard Similarity is used to determine the similarity of two vertices to each other.
Vertices i and j are more similar if they have more common friends. Jaccard similarity between 2 vertices,
i and j, is calculated as:
J(i, j) = |N(i) ∩ N(j)|
|N(i) ∪ N(j)|
=
|N(i) ∩ N(j)|
|N(i)| + |N(j)| − |N(i) ∩ N(j)|
where N(i) and N(j) are the sets of neighbors of i and j respectively.
Popular Distance Another measure to find the importance of a vertex in a network is its distance from
the most popular vertex of the network. The popular distance of a vertex i is the length of the shortest
path from i to the vertex with the maximum degree. If the network contains more than one vertex with
maximum degree then the minimum shortest path is considered. For example, the popular distance of vertex
j in Figure 1a is 2 as that is the length of the shortest path from j to i. In Figure 1b, the popular distance
of j is again 2 as it is at a distance of 2 from each of i and q which are the 2 most popular vertices in the
network.
Page 3 of 6
CS 201 Data Structures II HW 3: Graphs Spring 2020
2 Datasets
For this homework, we use three different datasets: “Zachary’s Karate Club”, “Coauthorships in Network
Science”, and “High Energy Physics – Theory collaboration network”. For the figures below, zoom into the
figure with your PDF viewer to see more detail.

Zachary’s Karate Club is probably the most
presented network at network science conferences!
The network represents social ties among the members of a karate club at a university. Each vertex
represents a member of the club and an edge between two vertices indicates that the corresponding
members are friends. The network is undirected and
unweighted.
Coauthorships in Network Science is a collaboration network of coauthorships between scientists in
the area of network theory and experiments. Each vertex represents an author and an edge between vertex i
and vertex j indicates that the corresponding authors have co-authored a paper. The network is undirected
and unweighted. See Figure 2 for a visualization.
High Energy Physics – Theory collaboration network is a collaboration network of coauthorships
between scientists in the area of High Energy Physics – Physics. The network is constructed from the e-print
arXiv for the papers submitted to High Energy Physics – Theory category. The network is undirected and
weighted where the weight of an edge represents the number of papers co-authored by both authors involved
in the edge. Edge weights are normalized. See Figure 3 for a visualization.
3 Tasks
Your task is to go over the provided files and fill in the required functionality.
networks.py contains network operations that operate on a Graph. All methods currently contain pass
which will have to be replaced by you. The exception is visualize which is implemented. When
this function executes successfully, the visualization is automatically stored in the files Graph.gv and
Graph.gv.pdf. Do not add these to your repository.
graphs.py contains a complete implementation of Edge which represents an undirected edge and the interface of Graph. You have to implement the Graph interface using 3 different classes SetGraph,
AdjacencyMatrix, and AdjacencyList that store a graph respectively as 2 sets, an adjacency matrix,
and an adjacency list. Do not alter the code already provided.
Also go over the files in the datasets folder. Each file represents a network as en edge list. Each line
contains the 2 endpoints of an edge. Each vertex is represented by an integer. For some networks, each line
also contains a weight as a floating point number. Note that the vertex numbers need not begin at 0 and
need not be contiguous. A Graph is instantiated with a str instance representing the content of such a file.
You may use the given networks to test your code.

  • Network-Science-Operations-kwy4r0.zip