Let’s now dig deeper into graph analysis! What are the main kinds of graphs? What are their properties?

Introduction

For what comes next, open a Jupyter Notebook and import the following packages :

import numpy as np
import random
import networkx as nx
from IPython.display import Image
import matplotlib.pyplot as plt

If you have not already installed the networkx package, simply run :

pip install networkx

The following articles will be using the latest version 2.x of networkx. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

We can analyze a graph at different scales :

  • using the global properties of the network
  • using communities and clusters
  • or by looking at individual nodes

The main descriptive measures we’ll explore will be :

  • the degree distribution
  • the clustering coefficient
  • the “small world” phenomena
  • the centrality of a node
  • the diameter of the graph …

We’ll now cover the most common types of graph models :

Erdos-Rényi model

Definition

In an Erdos-Rényi model, we build a random graph model with \(n\) nodes. The graph is generated by drawing an edge between a pair of nodes \((i,j)\) independently with probability \(p\). We, therefore, have 2 parameters: \(n\) the number of nodes and \(p\)

image

In Python, the networkx package has a built-in function to generate Erdos-Rényi graphs.

# Generate the graph
n = 50
p = 0.2
G_erdos = nx.erdos_renyi_graph(n,p, seed =100)

# Plot the graph
plt.figure(figsize=(12,8))
nx.draw(G_erdos, node_size=10)

You’ll get a result pretty similar to this one :

image

Degree distribution

Let \(p_k\) the probability that a randomly selected node has a degree \(k\). Due to the random way the graphs are built, the distribution of the degrees of the graph is binomial :

\[p_k = {n-1 \choose k} p^k (1-p)^{n-1-k}\]

The distribution of the number of degrees per node should be close to the mean. The probability of high nodes decreases exponentially.

degree_freq = np.array(nx.degree_histogram(G_erdos)).astype('float')

plt.figure(figsize=(12, 8))
plt.stem(degree_freq)
plt.ylabel("Frequence")
plt.xlabel("Degree")
plt.show()

To visualize the distribution, I have increased \(n\) to 200.

image

Descriptive statistics

  • The average degree is given by \(n \times p\). With \(p = 0.2\) and \(n = 200\), we are centered around 40.
  • The degree expectation is given by \((n-1) \times p\)
  • The maximum degree is concentrated around the average
# Get the list of the degrees
degree_sequence_erdos = list(G_erdos.degree())

nb_nodes = n
nb_arr = len(G_erdos.edges())

avg_degree = np.mean(np.array(degree_sequence_erdos)[:,1])
med_degree = np.median(np.array(degree_sequence_erdos)[:,1])

max_degree = max(np.array(degree_sequence_erdos)[:,1])
min_degree = np.min(np.array(degree_sequence_erdos)[:,1])

esp_degree = (n-1)*p

print("Number of nodes : " + str(nb_nodes))
print("Number of edges : " + str(nb_arr))

print("Maximum degree : " + str(max_degree))
print("Minimum degree : " + str(min_degree))

print("Average degree : " + str(avg_degree))
print("Expected degree : " + str(esp_degree))
print("Median degree : " + str(med_degree))

This should give you something similar to :

Number of nodes: 200
Number of edges: 3949
Maximum degree: 56
Minimum degree: 25
Average degree: 39.49
Expected degree: 39.800000000000004
Median degree: 39.5

The average and the expected degrees are close since there is only a small factor between the two.

Barabasi-Albert model

Definition

In a Barabasi-Albert model, we build a random graph model with \(n\) nodes. The graph is generated by the following algorithm :

  • Step 1: With a probability \(p\), move to the second step. Else, move to the third step.
  • Step 2: Connect a new node to existing nodes chosen uniformly at random
  • Step 3: Connect the new node to \(n\) existing nodes with a probability proportional to their degree

Such a graph aims to model preferential attachment, which is often observed in real networks.

In Python, the networkx package has also a built-in function to generate Barabasi-Albert graphs.

# Generate the graph
n = 150
m = 3
G_barabasi = nx.barabasi_albert_graph(n,m)

# Plot the graph
plt.figure(figsize=(12,8))
nx.draw(G_barabasi, node_size=10)

You’ll get a result pretty similar to this one :

image

You can easily notice how some nodes appear to have a much larger degree than others now!

Degree distribution

Let \(p_k\) the probability that a randomly selected node has a degree \(k\). The degree distribution follows a power law :

\[p_k \propto k^{-\alpha}\]

The distribution is now heavy-tailed. There is a large number of nodes that have a small degree, but a significant number of nodes have a high degree.

degree_freq = np.array(nx.degree_histogram(G_barabasi)).astype('float')

plt.figure(figsize=(12, 8))
plt.stem(degree_freq)
plt.ylabel("Frequence")
plt.xlabel("Degree")
plt.show()

image

The distribution is said to be scale-free, in the sense that the average degree is not informative.

Descriptive statistics

  • The average degree is constant if \(\alpha ≤ 2\), else, it diverges
  • The maximum degree is \(O(n^{ \frac {1} { \alpha -1}})\)
# Get the list of the degrees
degree_sequence_erdos = list(G_erdos.degree())

nb_nodes = n
nb_arr = len(G_erdos.edges())

avg_degree = np.mean(np.array(degree_sequence_erdos)[:,1])
med_degree = np.median(np.array(degree_sequence_erdos)[:,1])

max_degree = max(np.array(degree_sequence_erdos)[:,1])
min_degree = np.min(np.array(degree_sequence_erdos)[:,1])

esp_degree = (n-1)*p

print("Number of nodes : " + str(nb_nodes))
print("Number of edges : " + str(nb_arr))

print("Maximum degree : " + str(max_degree))
print("Minimum degree : " + str(min_degree))

print("Average degree : " + str(avg_degree))
print("Expected degree : " + str(esp_degree))
print("Median degree : " + str(med_degree))

This should give you something similar to :

Number of nodes: 200
Number of edges: 3949
Maximum degree: 56
Minimum degree: 25
Average degree: 39.49
Expected degree: 39.800000000000004
Median degree: 39.5

The average and the expected degrees are close since there is only a small factor between the two.

In the next article, we’ll cover the main Graph Algorithms used for fraud detection or in a social network for example.

Conclusion : I hope that this article introduced clearly the basis of graphs analysis and that it does now seem clear to you. Don’t hesitate to drop a comment if you have any question.