Let’s now dig deeper into graph analysis! What are the main kinds of graphs? What are their properties?

## Introduction

For what comes next, open a Jupyter Notebook and import the following packages :

```
import numpy as np
import random
import networkx as nx
from IPython.display import Image
import matplotlib.pyplot as plt
```

If you have not already installed the `networkx`

package, simply run :

```
pip install networkx
```

The following articles will be using the latest version `2.x`

of `networkx`

. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

We can analyze a graph at different scales :

- using the global properties of the network
- using communities and clusters
- or by looking at individual nodes

The main descriptive measures we’ll explore will be :

- the degree distribution
- the clustering coefficient
- the “small world” phenomena
- the centrality of a node
- the diameter of the graph …

We’ll now cover the most common types of graph models :

## Erdos-Rényi model

### Definition

In an Erdos-Rényi model, we build a *random* graph model with \(n\) nodes. The graph is generated by drawing an edge between a pair of nodes \((i,j)\) independently with probability \(p\). We, therefore, have 2 parameters: \(n\) the number of nodes and \(p\)

In Python, the `networkx`

package has a built-in function to generate Erdos-Rényi graphs.

```
# Generate the graph
n = 50
p = 0.2
G_erdos = nx.erdos_renyi_graph(n,p, seed =100)
# Plot the graph
plt.figure(figsize=(12,8))
nx.draw(G_erdos, node_size=10)
```

You’ll get a result pretty similar to this one :

### Degree distribution

Let \(p_k\) the probability that a randomly selected node has a degree \(k\). Due to the random way the graphs are built, the distribution of the degrees of the graph is binomial :

\[p_k = {n-1 \choose k} p^k (1-p)^{n-1-k}\]The distribution of the number of degrees per node should be close to the mean. The probability of high nodes decreases exponentially.

```
degree_freq = np.array(nx.degree_histogram(G_erdos)).astype('float')
plt.figure(figsize=(12, 8))
plt.stem(degree_freq)
plt.ylabel("Frequence")
plt.xlabel("Degree")
plt.show()
```

To visualize the distribution, I have increased \(n\) to 200.

### Descriptive statistics

- The average degree is given by \(n \times p\). With \(p = 0.2\) and \(n = 200\), we are centered around 40.
- The degree expectation is given by \((n-1) \times p\)
- The maximum degree is concentrated around the average

```
# Get the list of the degrees
degree_sequence_erdos = list(G_erdos.degree())
nb_nodes = n
nb_arr = len(G_erdos.edges())
avg_degree = np.mean(np.array(degree_sequence_erdos)[:,1])
med_degree = np.median(np.array(degree_sequence_erdos)[:,1])
max_degree = max(np.array(degree_sequence_erdos)[:,1])
min_degree = np.min(np.array(degree_sequence_erdos)[:,1])
esp_degree = (n-1)*p
print("Number of nodes : " + str(nb_nodes))
print("Number of edges : " + str(nb_arr))
print("Maximum degree : " + str(max_degree))
print("Minimum degree : " + str(min_degree))
print("Average degree : " + str(avg_degree))
print("Expected degree : " + str(esp_degree))
print("Median degree : " + str(med_degree))
```

This should give you something similar to :

```
Number of nodes: 200
Number of edges: 3949
Maximum degree: 56
Minimum degree: 25
Average degree: 39.49
Expected degree: 39.800000000000004
Median degree: 39.5
```

The average and the expected degrees are close since there is only a small factor between the two.

## Barabasi-Albert model

### Definition

In a Barabasi-Albert model, we build a *random* graph model with \(n\) nodes. The graph is generated by the following algorithm :

- Step 1: With a probability \(p\), move to the second step. Else, move to the third step.
- Step 2: Connect a new node to existing nodes chosen uniformly at random
- Step 3: Connect the new node to \(n\) existing nodes with a probability proportional to their degree

Such a graph aims to model *preferential attachment*, which is often observed in real networks.

In Python, the `networkx`

package has also a built-in function to generate Barabasi-Albert graphs.

```
# Generate the graph
n = 150
m = 3
G_barabasi = nx.barabasi_albert_graph(n,m)
# Plot the graph
plt.figure(figsize=(12,8))
nx.draw(G_barabasi, node_size=10)
```

You’ll get a result pretty similar to this one :

You can easily notice how some nodes appear to have a much larger degree than others now!

### Degree distribution

Let \(p_k\) the probability that a randomly selected node has a degree \(k\). The degree distribution follows a power law :

\[p_k \propto k^{-\alpha}\]The distribution is now heavy-tailed. There is a large number of nodes that have a small degree, but a significant number of nodes have a high degree.

```
degree_freq = np.array(nx.degree_histogram(G_barabasi)).astype('float')
plt.figure(figsize=(12, 8))
plt.stem(degree_freq)
plt.ylabel("Frequence")
plt.xlabel("Degree")
plt.show()
```

The distribution is said to be scale-free, in the sense that the average degree is not informative.

### Descriptive statistics

- The average degree is constant if \(\alpha ≤ 2\), else, it diverges
- The maximum degree is \(O(n^{ \frac {1} { \alpha -1}})\)

```
# Get the list of the degrees
degree_sequence_erdos = list(G_erdos.degree())
nb_nodes = n
nb_arr = len(G_erdos.edges())
avg_degree = np.mean(np.array(degree_sequence_erdos)[:,1])
med_degree = np.median(np.array(degree_sequence_erdos)[:,1])
max_degree = max(np.array(degree_sequence_erdos)[:,1])
min_degree = np.min(np.array(degree_sequence_erdos)[:,1])
esp_degree = (n-1)*p
print("Number of nodes : " + str(nb_nodes))
print("Number of edges : " + str(nb_arr))
print("Maximum degree : " + str(max_degree))
print("Minimum degree : " + str(min_degree))
print("Average degree : " + str(avg_degree))
print("Expected degree : " + str(esp_degree))
print("Median degree : " + str(med_degree))
```

This should give you something similar to :

```
Number of nodes: 200
Number of edges: 3949
Maximum degree: 56
Minimum degree: 25
Average degree: 39.49
Expected degree: 39.800000000000004
Median degree: 39.5
```

The average and the expected degrees are close since there is only a small factor between the two.

In the next article, we’ll cover the main Graph Algorithms used for fraud detection or in a social network for example.

Conclusion: I hope that this article introduced clearly the basis of graphs analysis and that it does now seem clear to you. Don’t hesitate to drop a comment if you have any question.