Daily foraging and seasonal migration between forest patches¶

Complex Network project_2021_M2 Complex Systems

Context

According to the Food and Agriculture Organisation (FAO), forests are characterized by a minimal area of 50 ares (5000 m²), the presence of trees higher than 5 meters, a density of tree higher than 10% and a mean width superior to 20 meters.

France's forests host numerous animals, including large mammals such as deers. Those usually live in herds in 'seasonally ephemeral' home range, meaning they establish a territory during the breeding season then move.

The population home range can span across several patches of nearby forests. It is explored daily or weekly for food and shelter opportunities, during foraging explorations Hjeljord (2001). Foraging movements can extend the home range from 2 to up to 24 km².

During their lives, deers also migrate under different constraints: to find mating partners and feeding resources, to avoid inbreeding or predators such as wolves, and under environmental constraints such as flood or fire. The deer dispersal distance range from tens to hundred kilometers. For example, GPS-tracking of collared white-tailed deer have evidenced patterns of spring and fall migration in Canada ranging from 6.9 to 87 km. Another study conducted by Nelson on white-tailed deer have shown comparable migration speed among the animals, with little differences in spring and fall. The deer migrated at an average speed of 1.6 km/h, in a non-linear fashion: they migrated during day only, pausing and turning back in unpredictable manner. Deers can migrate between non-connected forests in agricultural landscape, but only settle in forests or mountains.

Deer movements in a canadian study

Project

Deers tend to feed in clearings and agricultural fields, and avoid dense forests. Deers can cause damage to nearby cultures and to forests due to selective grazing. Forest management is thus of prime importance to regulate deer populations.

Here, we are interested in the distribution of forest patches on the french homeland territory and how that distribution could impact deer settlement, foraging movements and migration patterns.

Foraging movements

We assume that deer foraging movements can extend from 1 to 15km in a day. For a deer resting or settled in a given patch of forest, forests patches within that range can thus be explored with a probability related to their distance to the resting forest and the forests area and geometry.

Deers settle in forest patches if they can access sufficient resources. For our model, we assume a linear relationship between resource availability and area of a forest or the area of connected forests in close vicinity (home range).

Under those assumptions, we investigate aspects of the putative foraging patterns: in what patches of forests do deers have a higher probability to settle and meet reproductive partners? (components, sufficient weight/connectivity, …)

Migration

Deers migrate between potential settlement sites during migration events.

Under this model, we try to answer questions related to migration patterns: what migratory paths between far patches are more probable ? (how does the network of forests extend, betweenness of the nodes,…)

Method

To study the putative migration area of the raindeer, we picture French homeland forests as a geographical graph.

We represent forests by nodes at their centroid location. The nodes are weighted by the forest area. Edges represents forests accessible during foraging, i.e. in a arbitrary distance from forests contours, and they are weighted relative to the probability of travel between two forests.

In [1]:
import pandas as pd
import json
import numpy as np
from matplotlib import pyplot as plt
import networkx as nx
import shapely.geometry
import sys

I_Raw data obtention¶

Forest outlines were obtained from the geoportail website.

In [2]:
with open('FOR_PUBL_FR.json', 'r') as j:
     data = json.loads(j.read())

For each public forest on the national territory (a 'feature' in the dictionary), the data contains the coordinates of several points on the forest's edges. This allows us to visualize a forest as a polygon.

In [3]:
print("Polygon of the first forest : ")
polygon = np.asarray(data['features'][0]['geometry']['coordinates'][0])
x= polygon[:,0]
y= polygon[:,1]
plt.scatter(x,y)
plt.show()
Polygon of the first forest : 
In [4]:
print("Polygon of the second forest : ")
polygon = np.asarray(data['features'][1]['geometry']['coordinates'][0])
x= polygon[:,0]
y= polygon[:,1]
plt.scatter(x,y)
plt.show()
Polygon of the second forest : 

Some forests are made of more than one polygon, like the eleventh forest below.

For a deer, it makes no difference if two separated polygons belong to the same forest or not. Therefore, we treat the different polygons of a same forest as different forests. This explains the 'for' statement when we construct the graph.

In [19]:
polygons = np.asarray(data['features'][11]['geometry']['coordinates'])
polygons = polygons[:,0]

for i in range(len(polygons)):
    polygon = np.asarray(polygons[i], dtype = object)
    x= polygon[:,0]
    y= polygon[:,1]
    plt.scatter(x,y)

plt.show()

II_Graph construction¶

We choose to represent forests' location by their centroid. Centroids are computed from their discrete contour data using the shapely.geometry method for polygon.

Each forest's polygon is constructed using the coordinates of their contour.

In [6]:
centroid = [] #Will contain the positions of the centroids of the forests
areas = [] #Will contain the areas of the forests
for i in range(len(data['features'])):
    if len(data['features'][i]['geometry']['coordinates'][0][0]) > 2 : #When the forest has multiple polygons
        for j in range(len(data['features'][i]['geometry']['coordinates'][0])):
            shapely_pol = shapely.geometry.Polygon(data['features'][i]['geometry']['coordinates'][0][j])
            centroid.append(list(shapely_pol.centroid.coords)[0])
            areas.append(shapely_pol.area) #We add positions and areas of each polygon independently
    else : #If there is only one polygon
        shapely_pol = shapely.geometry.Polygon(data['features'][i]['geometry']['coordinates'][0])
        centroid.append(list(shapely_pol.centroid.coords)[0])
        areas.append(shapely_pol.area)

centroid = np.asarray(centroid)
areas = np.asarray(areas)

Centroids representation¶

In [7]:
fig, ax = plt.subplots(figsize = (20, 20))
for val in centroid :
     ax.add_patch(plt.Circle(val, 10**(-3), facecolor = 'r', edgecolor='black'))
ax.axes.get_xaxis().set_visible(False) 
ax.axes.get_yaxis().set_visible(False) 
ax.axes.set_xlim([min(np.transpose(centroid)[0]) - 1,
               max(np.transpose(centroid)[0]) + 1])
ax.axes.set_ylim([min(np.transpose(centroid)[1]) - 1,
               max(np.transpose(centroid)[1]) + 1])

plt.show()

The size of the dots here are only for representation.

We check the global distribution of the forests centroids compared to the geoportail map.

Public forests are more densely distributed in the East and South of France: our analysis will be biased due to the lack of data on private forests, which represents 74% of French forests.

Weighted nodes construction¶

To take forests' area into account, we weight the nodes according to each polygon's area.

In [22]:
G = nx.Graph()
count = 0
for (val, a) in zip(centroid, areas) :
    G.add_node(count, lon = val[0], lat = val[1], area = a)
    count += 1       

Edges construction¶

We link together nodes representing patches of forest a deer can explore during daily foraging.

Forests have various geometry that the centroid position does not represent. Two adjacent forests can share a border of various length: deers would have a higher probability of moving between two forest if they have a long border than if they only have a small corridor to get from one to the other. Moreover, even if two forest are serparated by a distance that a deer can travel in a day, deers would have less chance of getting from one to the other if they are further appart than if they are very close. To take into account the fact that deers have different probabilities of getting from one forest to another, we use weighted edges.

Weighting the edges with the distances between the centroids would erase the importance of the shape of the forest, the fact that the interfaces at short distances can be more or less important. To avoid that bias, we use the shapely buffer method to compute the distance between polygons.

Illustration of the shapely buffer method¶

The buffer method is a way to extend the polygon while conserving the shape. The method takes all the points within a certain distance (the buffer distance) of the polygon. Bellow are the results of the buffer method on the first forest polygon for different values of d the buffer distance. You can see the way the buffer modify the polygon by running the script.

In [7]:
poly = shapely.geometry.Polygon(data['features'][0]['geometry']['coordinates'][0]) # polygon
poly
Out[7]:
In [18]:
poly.buffer(0.0005) # extended polygon for d=0.0005
Out[18]:
In [11]:
poly.buffer(0.001) # extended polygon for d=0.001
Out[11]:
In [12]:
poly.buffer(0.01) # extended polygon for d=0.01
Out[12]:

We can see that for the higher values of d, the initial shape loses its relevance and every shape becomes a disc. This is expected because if the buffer distance is very big compared to the size of a polygon we can ignore the initial shape of the polygon.

There is other ways to buffer a polygon with shapely, that give something else than a disc in the end, but that choice seems to be relevant with what we're trying to achieve.

Computing the buffed polygons¶

At large distances compared to the forests dimensions, using the shapely buffer method is similar to taking the euclidian distance between forests centroids and taking into account the area of each polygon.

We compute the extended polygons with the buffer method.

Since we are working with latitude and longitude in France, we can consider that $lat \approx 110km$ and $lon \approx 80km$. But we will consider that $lat = lon \approx 110km$ in order to simplify the graph construction (see last part to look at such considerations). That way, for $d = 0.01$, we have a buffer of 1.1km, but since we're putting buffer on every polygon, it removes 2.2km between two polygons.

In [16]:
d = 0.01 # the buffer distance = daily foraging distance/2
all_buff_poly = [] #Will contain all the extended polygons
for i in range(len(data['features'])):
    if len(data['features'][i]['geometry']['coordinates'][0][0]) > 2 : #Forest with more than one polygon
        for j in range(len(data['features'][i]['geometry']['coordinates'][0])):
            shapely_pol = shapely.geometry.Polygon(data['features'][i]['geometry']['coordinates'][0][j])
            buff_pol = shapely_pol.buffer(d)
            all_buff_poly.append(buff_pol)
    else : #Forests with only one polygon
        shapely_pol = shapely.geometry.Polygon(data['features'][i]['geometry']['coordinates'][0])
        buff_pol = shapely_pol.buffer(d)
        all_buff_poly.append(buff_pol)

Weighting edges¶

Now that we have our extended polygons, we want to calculate the intersection between them, add an edge if this intersection is not null, and give to the edge a weight corresponding to the area of the intersection.

It takes too much time to check the intersection between every pair of polygon so we define an arbitrary length L (with L>>d) and if the distance between the centroids of two polygons is greater than L, we don't compute the intersection between them. This reduces the time necessary to construct the graph.

In [19]:
L = 1 # arbitrary diameter of the foraging area, in degree
for i in range(len(centroid) - 1):
    sys.stdout.write("\r{0}%".format(round((i + 1)/(len(centroid) - 1)*100, 2)))
    sys.stdout.flush() #This is to visualize how much of the graph has been treated already
    for j in range(i + 1, len(centroid)):
        if np.linalg.norm(centroid[i] - centroid[j]) < L: #We only consider the polygons that are close enough to the polygon i
            pol1 = all_buff_poly[i]
            pol2 = all_buff_poly[j]
            area = pol1.intersection(pol2).area
            if area > 0: #If there is an intersection, we put an edge, weighted by the area of intersection between the buffered polygons
                G.add_edge(i, j, intersection = area)
100.0%
In [20]:
nx.write_graphml(G, 'graph_d{}_L{}.graphml'.format(d, L)) #Store the graph and the characteristics

Representation of the graph¶

In order to avoid the time necessary to create the whole graph, we can simply read it from a .graphml file. We will illustrate our results with two graph : one with $d = 0.01$ small (2.2km), and one where $d = 0.05$ large (11km).

In [3]:
G_small = nx.read_graphml('graph_d0.01_L1.graphml')
In [4]:
plt.figure(figsize = (20, 20))
coordinates = {n:(G_small.nodes[n]["lon"], G_small.nodes[n]["lat"]) for n in G_small.nodes}
nx.draw_networkx(G_small, pos = coordinates, with_labels = False, node_size = 1, width = 1)
In [5]:
print("Number of nodes = {}, Number of edges = {}".format(len(G_small.nodes()), len(G_small.edges())))
print("Number of connected components = ", len(list(nx.connected_components(G_small))))
Number of nodes = 18932, Number of edges = 23809
Number of connected components =  7207
In [6]:
G_large = nx.read_graphml('graph_d0.05_L1.graphml')
In [7]:
plt.figure(figsize = (20, 20))
coordinates = {n:(G_large.nodes[n]["lon"], G_large.nodes[n]["lat"]) for n in G_large.nodes}
nx.draw_networkx(G_large, pos = coordinates, with_labels = False, node_size = 1, width = 1)
In [8]:
print("Number of nodes = {}, Number of edges = {}".format(len(G_large.nodes()), len(G_large.edges())))
print("Number of connected components = ", len(list(nx.connected_components(G_large))))
Number of nodes = 18932, Number of edges = 259544
Number of connected components =  398

We can see that there is a critical value of d, similar to a percolation threshold, where most of the graph is suddenly connected in a giant component with the East and South of France. Another way to put it is that the number of edges is not of the same order of magnitude that the number of nodes anymore.

III_Graph analysis¶

Degree¶

The area here corresponds to a node, not to a forest, since a forest can be composed of different nodes.

In [10]:
list_areas = sorted(nx.get_node_attributes(G_small,'area').items(), key = lambda x: x[1])

print("List of 5 highest areas :", list_areas[-5:])
print("List of 5 smallest areas :", list_areas[:5])
List of 5 highest areas : [('12090', 0.008164664789284326), ('7361', 0.008448092891491437), ('7187', 0.01003791573598896), ('3670', 0.010979624359549748), ('7572', 0.011786669568507215)]
List of 5 smallest areas : [('2537', 1.3726108977721857e-16), ('8831', 2.2776345303746083e-14), ('14893', 2.946839434124618e-13), ('757', 4.2343020157101895e-13), ('14362', 4.3484514220810675e-13)]

We can see that almost every node have a very small area, but some of them are several order of magnitude larger, with a heavy tail in our distribution.

In [28]:
plt.figure()
plt.hist(np.array(list_areas, dtype = float)[:,1], bins = 100)
plt.show()

Since nodes are not equivalent, the degree can be somewhat misleading, depending on how polygons are created. A large cluster of small nodes can be equivalent in area of a smaller cluster of larger nodes.

In [30]:
degree_list_small = sorted(G_small.degree, key = lambda x: x[1])
print("List of 5 highest degrees for small d:", degree_list_small[-5:])

degree_list_large = sorted(G_large.degree, key = lambda x: x[1])
print("List of 5 highest degrees for large d:", degree_list_large[-5:])
List of 5 highest degrees: [('9898', 27), ('11630', 27), ('12249', 28), ('14495', 29), ('9332', 34)]
List of 5 highest degrees: [('817', 104), ('18090', 104), ('624', 105), ('18019', 105), ('18096', 105)]

We can also see that the most connected nodes are not the same between the two graph, meaning that the growth of the graph depending on the variation of d is not linear.

In [36]:
plt.figure()
plt.hist(np.array(degree_list_large, dtype = int)[:,1], bins = 100, label = 'large d')
plt.hist(np.array(degree_list_small, dtype = int)[:,1], bins = 100, label = 'small d')
plt.legend()
plt.show()

When d increases, the distribution of degree flattens.

Pagerank¶

The pagerank being a random walk in the graph where we consider everything to be at least very slightly connected, we can see these nodes as "hotspot" where animals are most likely to travel to in the long run. We should not include Corsica in order to get a better result (because animals can't take the ferry), but this result is still interesting to look at.

In [47]:
pagerank_small = nx.algorithms.link_analysis.pagerank_alg.pagerank(G_small, weight = 'area')
pr_list_small = sorted(pagerank_small.items(),
                 key = lambda x: x[1])
print("List of 5 highest pagerank for small d:", pr_list_small[-5:])

pagerank_large = nx.algorithms.link_analysis.pagerank_alg.pagerank(G_large, weight = 'area')
pr_list_large = sorted(pagerank_large.items(),
                 key = lambda x: x[1])
print("List of 5 highest pagerank for large d:", pr_list_large[-5:])
List of 5 highest pagerank: [('9332', 0.00021953185700843112), ('12316', 0.00024068099364861584), ('2833', 0.0002522213854431817), ('9261', 0.00028801024010819293), ('3670', 0.0003155251775955247)]
List of 5 highest pagerank: [('11625', 0.00010736520208220798), ('7304', 0.00011114917959170071), ('14228', 0.00011274537635468298), ('2833', 0.00011349730714741698), ('3670', 0.00014834055883530622)]

Pagerank values decrease with a larger value of d, which seems logical for a graph based on the geographical distances between nodes.

Components¶

Here we define "biggest" as the connected component with the highest number of nodes, and "largest" as the connected component with the highest area.

We can see that for small value of d, these values are different, since nodes with large area can outweight a cluster of small nodes. But after the critical point, the largest connected component is also the biggest.

In [43]:
biggest_cc_small = G_small.subgraph(sorted(nx.connected_components(G_small), key=len, reverse=True)[0])
print("Biggest connected component :", len(biggest_cc_small.nodes()))

biggest_cc_area_small = sum(list(nx.get_node_attributes(biggest_cc_small,'area').values()))
print("Area of the biggest connected component :", biggest_cc_area_small)

largest_cc_area_small = 0
for i in range(len(list(nx.connected_components(G_small)))):
    sys.stdout.write("\r{0}%".format(round((i + 1)/(len(list(nx.connected_components(G_small))))*100, 2)))
    sys.stdout.flush()
    test_g = G_small.subgraph(list(nx.connected_components(G_small))[i])
    if sum(list(nx.get_node_attributes(test_g,'area').values())) > largest_cc_area_small:
        largest_cc_area_small = sum(list(nx.get_node_attributes(test_g,'area').values()))
        largest_cc_small = test_g

print("\n Largest connected component :", len(largest_cc_small.nodes()))
print("Area of the largest connected component :", largest_cc_area_small) 
Biggest connected component : 413
Area of the biggest connected component : 0.009959920536711445
100.0%
 Largest connected component : 51
Area of the largest connected component : 0.021455487619196696

We can see them in our graph (red in the Center for biggest, yellow in the North-East for largest) :

In [59]:
plt.figure(figsize = (20, 20))
coordinates = {n:(G_small.nodes[n]["lon"], G_small.nodes[n]["lat"]) for n in G_small.nodes}
nx.draw_networkx(G_small, pos = coordinates, with_labels = False, node_size = 1, width = 1)

coordinates = {n:(biggest_cc_small.nodes[n]["lon"], biggest_cc_small.nodes[n]["lat"]) for n in biggest_cc_small.nodes}
nx.draw_networkx(biggest_cc_small, pos = coordinates, with_labels = False, node_size = 1, width = 1, node_color = 'r')

coordinates = {n:(largest_cc_small.nodes[n]["lon"], largest_cc_small.nodes[n]["lat"]) for n in largest_cc_small.nodes}
nx.draw_networkx(largest_cc_small, pos = coordinates, with_labels = False, node_size = 1, width = 1, node_color = 'y')
In [46]:
biggest_cc_large = G_large.subgraph(sorted(nx.connected_components(G_large), key=len, reverse=True)[0])
print("Biggest connected component :", len(biggest_cc_large.nodes()))

biggest_cc_area_large = sum(list(nx.get_node_attributes(biggest_cc_large,'area').values()))
print("Area of the biggest connected component :", biggest_cc_area_large)

largest_cc_area_large = 0
for i in range(len(list(nx.connected_components(G_large)))):
    sys.stdout.write("\r{0}%".format(round((i + 1)/(len(list(nx.connected_components(G_large))))*100, 2)))
    sys.stdout.flush()
    test_g = G_large.subgraph(list(nx.connected_components(G_large))[i])
    if sum(list(nx.get_node_attributes(test_g,'area').values())) > largest_cc_area_large:
        largest_cc_area_large = sum(list(nx.get_node_attributes(test_g,'area').values()))
        largest_cc_large = test_g

print("\n Largest connected component :", len(largest_cc_large.nodes()))
print("Area of the largest connected component :", largest_cc_area_large) 
Biggest connected component : 16830
Area of the biggest connected component : 0.919316217408412
100.0%
 Largest connected component : 16830
Area of the largest connected component : 0.919316217408412

This corresponds to the component connecting the East and the South of France (in red, corresponding to both the biggest and largest connected component), covering roughly $1 \times 10^6 ha$. In reference, based on this Forestry inventory there is in France $17 \times 10^6 ha$ of forest, so roughly $4.25 \times 10^6 ha$ of public forest.

In [60]:
plt.figure(figsize = (20, 20))
coordinates = {n:(G_large.nodes[n]["lon"], G_large.nodes[n]["lat"]) for n in G_large.nodes}
nx.draw_networkx(G_large, pos = coordinates, with_labels = False, node_size = 1, width = 1)

coordinates = {n:(biggest_cc_large.nodes[n]["lon"], biggest_cc_large.nodes[n]["lat"]) for n in biggest_cc_large.nodes}
nx.draw_networkx(biggest_cc_large, pos = coordinates, with_labels = False, node_size = 1, width = 1, node_color = 'r')

IV_Further analysis¶

Here are some further analysis we would have liked to make :

  • Take into account the difference between latitude and longitude in order to have coherent distances between nodes. Up until now, our discs made with the buffer method are more "ellipses" if we convert into kilometers, because latitude and longitude are not equal in France. The code below is a first draft that tries to tackle this issue:
In [ ]:
for i in range(len(data['features'])):
    if len(data['features'][i]['geometry']['coordinates'][0][0]) > 2 : #When the forest has multiple polygons
        for j in range(len(data['features'][i]['geometry']['coordinates'][0])):
            for k in range(len(data['features'][i]['geometry']['coordinates'][0][j])):
            data['features'][i]['geometry']['coordinates'][0][j][k][1] *= (long - longmin)*110/80
    else : #If there is only one polygon
        for k in range(len(data['features'][i]['geometry']['coordinates'][0])):
            data['features'][i]['geometry']['coordinates'][0][k][1] *= (long - longmin)*110/80
  • Find the most important nodes corresponding to forests that we should protect, based on area or path of wild animals, which we can find with a weighted betweenness centrality. This analysis was not conducted to its term because of the computing time:
In [ ]:
bc_small = nx.algorithms.centrality.betweenness_centrality(G_small)
bc_list_small = sorted(bc_small.items(),
                key = lambda x: x[1])
print("List of 5 highest betweenness centrality:", bc_list_small[-5:])

bc_large = nx.algorithms.centrality.betweenness_centrality(G_large)
bc_list_large = sorted(bc_large.items(),
                key = lambda x: x[1])
print("List of 5 highest betweenness centrality:", bc_list_large[-5:])
  • Look into diffusion processes that can show the propagation of invasive species or exploration of a population for instance.

  • To go further, we can try to take into account geographical obstacles, such as rivers, mountains, highways or cities that can cut some possible paths.