The data set contains observational and wearable sensors data collected in a group of 19 Guinea baboons living in an enclosure of a Primate Center in France, between June 13th 2019 and July 10th 2019.
These data were analyzed and published in the paper V. Gelardi, J. Godard, D. Paleressompoulle, N. Claidière, A. Barrat, “Measuring social networks in primates: wearable sensors vs. direct observations”, Proc. R. Soc. A 476:20190737 (2020).
The file OBS_data.csv contains all the behavioral events registered by an observer, with 8 columns:
DateTime = Time stamp of the event, namely the moment the observed behavior was registered. In case of STATE events (events with duration > 0), it refers to the beginning of the behavior;
Actor = The name of the actor;
Recipient = The name of the individual the Actor is acting upon;
Behavior = The behavior the Actor. 14 types of behaviors are registered:’Resting’, ‘Grooming’, ‘Presenting’,’Playing with’, ‘Grunting-Lipsmacking’, ‘Supplanting’,’Threatening’, ‘Submission’, ‘Touching’, ‘Avoiding’, ‘Attacking’,’Carrying’, ‘Embracing’, ‘Mounting’, ‘Copulating’, ‘Chasing’. In addition two other categories were included: ‘Invisible’ and ‘Other’;
Category = The classification of the behavior. It can be ‘Affiliative’, ‘Agonistic’, ‘Other’;
Duration = Duration of the observed behavior. POINT events have no duration;
The file RFID_data.csv contains contacts data recorded in the same period by the SocioPatterns infrastructure. The proximity sensors were worn by 13 of the 20 individuals cited above.
The data file consists of 4 columns:
t = time of the beginning of the contact in Epoch format (Unix timestamps);
i = Name of the first individual;
j = Name of the second individual;
DateTime
import pandas as pd
import re
import numpy as np
import time
data = pd.read_csv('OBS_data.csv', delimiter = '\t')
#the following lines allow to separate the date and the time in two distinct columns
datetime_df = pd.DataFrame(data['DateTime'].str.split(' ', expand=True))
datetime_df.columns = ['date', 'time']
data['Date'] = datetime_df['date']
data['Time'] = datetime_df['time']
data = data.drop(columns=['DateTime'])
#the following lines permit to get rid of few pathologic lines
boollist = data['Recipient'] =='EXTERNE'
data2 = data[boollist]
externe_index = list(data2.index)
data = data.drop(externe_index, axis = 0)
boollist = data['Recipient'] =='SELF'
data2 = data[boollist]
externe_index = list(data2.index)
data = data.drop(externe_index, axis = 0)
#we extract sets of baboons and bahaviours from the data
Actor = list(data['Actor']) + list(data['Recipient'])
Actor = set(Actor)
Actor.remove(np.nan)
Category = data['Category']
Category = set(Category)
Behavior = data['Behavior']
Behavior = set(Behavior)
#we group the data by dates and store them in a dictionnary
dates = set(data['Date'])
dates = list(dates)
dates.sort(key=lambda x: time.mktime(time.strptime(x,"%d/%m/%Y"))) #sorting the dates
data_groupby_date = data.groupby('Date')
values = [data_groupby_date.get_group(date) for date in dates]
data_groupby_date = dict(zip(dates, values))
data
Actor | Recipient | Behavior | Category | Duration | Point | Date | Time | |
---|---|---|---|---|---|---|---|---|
0 | EWINE | NaN | Invisible | Other | 34 | NO | 13/06/2019 | 09:35 |
1 | EWINE | NaN | Other | Other | 21 | NO | 13/06/2019 | 09:35 |
2 | EWINE | NaN | Invisible | Other | 42 | NO | 13/06/2019 | 09:35 |
3 | EWINE | NaN | Other | Other | 2 | NO | 13/06/2019 | 09:36 |
4 | EWINE | NaN | Invisible | Other | 30 | NO | 13/06/2019 | 09:36 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
5372 | LIPS | FELIPE | Resting | Affiliative | 21 | NO | 10/07/2019 | 11:05 |
5373 | LIPS | NEKKE | Resting | Affiliative | 21 | NO | 10/07/2019 | 11:05 |
5374 | LIPS | FELIPE | Resting | Affiliative | 8 | NO | 10/07/2019 | 11:05 |
5375 | LIPS | NaN | Other | Other | 28 | NO | 10/07/2019 | 11:05 |
5376 | KALI | NaN | Invisible | Other | 301 | NO | 10/07/2019 | 11:06 |
5373 rows × 8 columns
The set of data we can now deal with is composed of:
dates: the list of observation dates
data_groupby_date: the dictionnary of the observation grouped by dates
Actor: the set of the baboons
Behavior: the set of behaviors¶
Category: the set of category to which belong the behaviors
print('Number of baboons: {}\nNumber of observation days: {}\nNumber of behaviours: {}'.format(len(Actor), len(data_groupby_date), len(Behavior)))
Number of baboons: 20 Number of observation days: 20 Number of behaviours: 18
def interactions(df):
'''returns a dataframe containing all the actions where 'Recipient' is
not NaN. Thus all the interactions between two animals.'''
# filters the rows where 'Recipient' is NaN, allowing us to see only
# the interactions between two monkeys (because 'Actor' is never
# NaN)
mask = (df['Actor'] != df['Recipient'])
return df[mask].dropna(subset=['Recipient', 'Actor'])
def interactions_a_on_b(df, a, b):
"""with df the dataframe of all interactions, returns the dataframe of
all events where there is an oriented interaction a -> b, so a is
the 'Actor', and b the 'Recipient'"""
mask = (data_df['Actor'] == a) & (data_df['Recipient'] == b)
return df[mask]
# this function is not very necessary because easy but meh, why not
# it can make the code easier to read
def interactions_behavior(df, behavior):
'''returns a dataframe containing an interaction with a certain behavior'''
# filters the rows where 'Recipient' is NaN, allowing us to see only
# the interactions between two monkeys (because 'Actor' is never
# NaN)
return df[df['Behavior'] == behavior]
def interactions_category(df, category):
'''returns a dataframe containing an interaction with a certain behavior'''
# filters the rows where 'Recipient' is NaN, allowing us to see only
# the interactions between two monkeys (because 'Actor' is never
# NaN)
return df[df['Category'] == category]
def interactions_in_timerange(df, t1, t2):
'''extracts all the interactions (between two baboons) in df between
two datetime t1 and t2.'''
# filters the rows where 'Recipient' is NaN, allowing us to see only
# the interactions between two monkeys (because 'Actor' is never
# NaN)
mask = (t1 <= df['datetime_obj']) & (df['datetime_obj'] <= t2)
return df[mask].dropna(subset=['Recipient'])
def interactions_to_edges(df):
'''takes the dataframe df: "actor, recipient, weight" with
interactions between monkeys and transform it to a list of tuples
(u, v, w) with u,v the edges, and w the weight.
this function is made to easily feed the networkx method
"add_weighted_edges_from" for a graph'''
# this "edge" iterates through each rows of the numpy array
# "df.to_numpy()", transforming each row into a tuple, and returning
# the list of all the tuples
edge_list = [tuple(edge) for edge in df.to_numpy()]
return edge_list
To create graph, we are using the library Networkx. Each node of the graph will correspond to a baboon. We gonna connect two nodes with an edge if there exist at least one interaction between the two nodes, i.e. the two baboons. The edges are gonna be weighted by the number of interactions between the nodes.
import networkx as nx
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [9, 5]
G = nx.Graph()
G.add_nodes_from(Actor)
nx.draw(G, with_labels = True)
We implement a first function called get_day_interaction which returns the graph of interactions over one day between the baboons. We can choose to selec only a given type of behaviour. Another function called draw_day_interactions is designed to draw the graph return by the function get_day_interaction:
def get_day_interactions(data_groupby_date, date, category, behavior):
'''get the network of the daily interaction between pairs of baboons
category = {'Affiliative', 'Agonistic', 'Other'}
behavior = {'Attacking','Avoiding','Carrying','Chasing','Copulating','Embracing',
'Grooming','Grunting-Lipsmacking','Invisible','Mounting','Other','Playing with',
'Presenting','Resting','Submission','Supplanting','Threatening','Touching'}
OR type ALL if you want them all
THE CATEGORY AND THE BEHAVIOUR SHOULD MATCH'''
inter = []
data_date = data_groupby_date[date]
if category in Category:
data_date = interactions_category(data_date, category)
if behavior in Behavior:
data_date = interactions_behavior(data_date, behavior)
data_date = interactions(data_date)
actor = list(data_date['Actor'])
recipient = list(data_date['Recipient'])
l = len(actor)
for i in range(l):
inter.append((actor[i], recipient[i]))
weight = [inter.count(inter[i]) for i in range(len(inter))]
weighted_inter = [(inter[i][0], inter[i][1], weight[i]) for i in range(len(inter))]
weighted_inter = list(set(weighted_inter))
G = nx.Graph()
G.add_nodes_from(Actor)
G.add_weighted_edges_from(weighted_inter)
return G
def draw_day_interactions(data_groupby_date, date, category, behavior):
'''draw the network of one day interaction, based on the function
get_day_interactions.
category = {'Affiliative', 'Agonistic', 'Other'}
behavior = {'Attacking','Avoiding','Carrying','Chasing','Copulating','Embracing',
'Grooming','Grunting-Lipsmacking','Invisible','Mounting','Other','Playing with',
'Presenting','Resting','Submission','Supplanting','Threatening','Touching'}
OR type 'ALL' if you want them all
THE CATEGORY AND THE BEHAVIOUR SHOULD MATCH, refer to the ethnogram'''
G = get_day_interactions(data_groupby_date, date, category, behavior)
pos = nx.circular_layout(G)
edges = G.edges()
weights = [G[u][v]['weight'] for u,v in edges]
if category in Category:
title = '{} interactions on the {}'.format(category, date)
if behavior in Behavior:
title = '{} ({}) interactions on the {}'.format(behavior, category, date)
else:
title = 'All types of interactions on the {}'.format(date)
plt.title(title)
return nx.draw(G, pos, width = weights, with_labels = True)
Example of a computed graph: graph of all the affiliative interactions between the baboons on the 26/06/2019.
draw_day_interactions(data_groupby_date, dates[9], 'Affiliative', 'All')
Here another example, the graph of all the agonistic interactions between the baboons on the 26/06/2019, which are sparser compared to the previous one.
draw_day_interactions(data_groupby_date, dates[9], 'Agonistic', 'All')
The same idea can be lead but with considering cummulative interactions over mutiples days:
def get_cummulative_day_interactions(data_groupby_date, date1, date2, category, behavior):
'''get the network of the cummulative interactions between pairs of baboons, between
date1 and date2
category = {'Affiliative', 'Agonistic', 'Other'}
behavior = {'Attacking','Avoiding','Carrying','Chasing','Copulating','Embracing',
'Grooming','Grunting-Lipsmacking','Invisible','Mounting','Other','Playing with',
'Presenting','Resting','Submission','Supplanting','Threatening','Touching'}
OR type ALL if you want them all
THE CATEGORY AND THE BEHAVIOUR SHOULD MATCH'''
l = dates.index(date2)-dates.index(date1)
G = nx.Graph()
G.add_nodes_from(Actor)
if l > 0 :
for i in range(l):
inter = []
data_date = data_groupby_date[dates[i]]
if category in Category:
data_date = interactions_category(data_date, category)
if behavior in Behavior:
data_date = interactions_behavior(data_date, behavior)
data_date = interactions(data_date)
act = list(data_date['Actor'])
recip = list(data_date['Recipient'])
L = len(act)
for j in range(L):
inter.append((act[j], recip[j]))
weight = [inter.count(inter[i]) for i in range(len(inter))]
weighted_inter = [(inter[i][0], inter[i][1], weight[i]) for i in range(len(inter))]
weighted_inter = list(set(weighted_inter))
G.add_weighted_edges_from(weighted_inter)
return G
def draw_cummulative_day_interactions(data_groupby_date, date1, date2, category, behavior):
'''draw the network of cummulative day interaction, based on the function
get_cummulative_day_interactions
category = {'Affiliative', 'Agonistic', 'Other'}
behavior = {'Attacking','Avoiding','Carrying','Chasing','Copulating','Embracing',
'Grooming','Grunting-Lipsmacking','Invisible','Mounting','Other','Playing with',
'Presenting','Resting','Submission','Supplanting','Threatening','Touching'}
OR type ALL if you want them all
THE CATEGORY AND THE BEHAVIOUR SHOULD MATCH'''
G = get_cummulative_day_interactions(data_groupby_date, date1, date2, category, behavior)
pos = nx.circular_layout(G)
edges = G.edges()
weights = [G[u][v]['weight'] for u,v in edges]
if category in Category:
title = '{} interactions between the {} and the {}'.format(category, date1, date2)
if behavior in Behavior:
title = '{} ({}) interactions between the {} and the {}'.format(behavior, category,
date1, date2)
else:
title = 'All types of interactions between the {} and the {}'.format(date1, date2)
plt.title(title)
return nx.draw(G, pos, width = weights, with_labels = True)
Here for instance we have all the cummulative affiliative interactions between June 13 and June 20.
draw_cummulative_day_interactions(data_groupby_date, dates[0], dates[5], 'Affiliative', 'all')
Here we have all the agonistic cummulative interactions between June 13 and June 20. We can noticed that there are much more less agonistic interactions
draw_cummulative_day_interactions(data_groupby_date, dates[0], dates[5], 'Agonistic', 'all')
The last functions create graphs by giving a weigth of one to each and every interaction of any kind. We wanted to hone a bit the graph creation and thus decided to be able to select the weight of each interaction beforehand. That is what the next function cumpute_interac_weights is doing. Something really different with this function, is that it takes the interactions for the whole periode of observation. The output of this function is not readable if the weights are not normalized. This is due to the great amount of the number of observations.
def compute_interac_weights(df, param,
weight_prop = 'duration',
normalized = False,
directed = False):
'''takes the dataframe df and computes the weights considering that:
weight = sum_i of alpha_i*weight_prop_i with alpha_i the
coefficient for the behavior i, and weight_prop_i. weight_prop is
either the duration or the count of this behavior, see below for
more info
param is a dictionnary containing these coefficients, so, for
instance:
param = {'Touching': 1,
'Grooming': 2}
then it will count Grooming as twice as important than Touching
weight_prop: 'duration' (default) or 'count'
states the value you want to multiply your coefficients with
normalize: False (default) or True normalize means that you can
normalize the weights by the total of weight_prop that you defined
directed: False (default) or True
decides if we compute the interactions as symetric (we add weights
of the interaction (u, v, w1) with (v, u, w2) like they are the
same) or if we make the edges directed (thus, (u, v, w1) will be
different than(v, u, w1))
'''
# assert the weight prop string
weightprop_verif = weight_prop in ['count', 'duration']
assert weightprop_verif, "'weight_prop' option has an invalid value"
# getting only the baboons interacting in df
df_int = interactions(df)
# EXPLANATION OF THE FOLLOWING LINES here, we take advantage of
# the function set and of the pandas method groupby. If the user
# wants a directed graph, we create a column with a tuple
# containing actor and recipient (a tuple will remember the order,
# and thus the direction)
# but if the user wants an undirected graph, we create a column
# with set, which will merge symetric pairs by itself (a set will
# forget the order, thus treating [a,b] exactly like [b,a], so it
# wont take into account the direction)
# then, we group by this column we created, and all we have to do
# is just to sum over the column we want to get the weight...
# neat!
actors_array = df_int[['Actor',
'Recipient']].to_numpy()
interac_pair_list = []
for row in actors_array:
if directed:
interac_pair_list.append(f"{row[0]}!{row[1]}")
else:
sortedrow = sorted(row)
interac_pair_list.append(f"{sortedrow[0]}!{sortedrow[1]}")
df_int['interac_pair'] = pd.Series(interac_pair_list,
index = df_int.index)
# df interactions groups
df_int_groups = df_int.groupby('interac_pair')
# the list we will loop on
pairs = df_int_groups.groups.keys()
# this is the list which will contain all the values for the
# dataframe we will return
weighted_pairs_list = []
# maybe a double iterator would be good here
if normalized:
if weight_prop == 'duration':
total = df_int['Duration'].sum()
if weight_prop == 'count':
total = df_int['Duration'].count()
else:
total = 1.
for pair in pairs:
df_pair = df_int_groups.get_group(pair)
pair_weight = 0
for behavior in param.keys():
coefficient = param[behavior]
df_pair_behavior = interactions_behavior(df_pair, behavior)
if weight_prop == 'duration':
weight = coefficient*df_pair_behavior['Duration'].sum()
elif weight_prop == 'count':
weight = coefficient*df_pair_behavior['Behavior'].count()
actor, recipient = pair.split('!')
pair_weight += weight
weighted_pairs_list.append([actor, recipient, pair_weight/total])
computed_df = pd.DataFrame(weighted_pairs_list)
if directed:
computed_df.columns = ['actor', 'recipient', 'weight']
else:
computed_df.columns = ['actor1', 'actor2', 'weight']
return np.asarray(computed_df)
To test the function, here we are setting a weight to each behaviour, and drawing graphs to represent the interactions:
param = {'Attacking': 0, 'Avoiding': 0, 'Carrying': 3, 'Chasing': 0, 'Copulating': 1,
'Embracing': 1, 'Grooming': 3, 'Grunting-Lipsmacking': 1, 'Invisible': 0,
'Mounting': 0, 'Other': 1, 'Playing with': 1, 'Presenting': 1, 'Resting': 0,
'Submission': 0, 'Supplanting': 0, 'Threatening': 0, 'Touching': 1}
interactions_tuples = compute_interac_weights(data, param, weight_prop = 'duration', normalized = True,
directed = False)
G = nx.Graph()
G.add_nodes_from(Actor)
G.add_weighted_edges_from(interactions_tuples)
pos = nx.circular_layout(G)
edges = G.edges()
weights = [2*G[u][v]['weight'] for u,v in edges]
nx.draw(G, pos, width = weights, with_labels = True)
One thing that we can also do is compare the data from the captors (RFID_data.csv) from the data taken by hand. The two sets of data have been recorded during the same period of time, we expect the graphs given by both data collection to be the same.
rfid_data_df = "RFID_data.csv"
rfid_data_df = pd.read_csv(rfid_data_df, sep='\t')
# getting the list of all names
rfid_baboons_list = rfid_data_df['i'].to_list() + rfid_data_df['j'].to_list()
rfid_baboons_list= set(rfid_baboons_list)
# in case we want to use datetime objects
rfid_data_df['datetime_obj'] = pd.to_datetime(rfid_data_df['DateTime'],
format = "%d/%m/%Y %H:%M")
rfid_actors = rfid_data_df.loc[:,['i','j']].to_numpy()
sorted_actors = rfid_actors.copy()
# we sort because we want an undirected graph
sorted_actors.sort(axis=1)
a = pd.DataFrame(sorted_actors)
# easier to use pandas groupby if it is with one string
interac_pair = a.iloc[:,0] + "!" + a.iloc[:,1]
rfid_data_df['interac_pair'] = pd.Series(interac_pair)
rfid_weights = rfid_data_df.groupby('interac_pair').count()
rfid_weights = pd.DataFrame(rfid_weights.iloc[:,0])
actor1_list = []
actor2_list = []
for key_pair in rfid_weights.index:
actor1, actor2 = key_pair.split('!')
actor1_list.append(actor1)
actor2_list.append(actor2)
rfid_weights['actor1'] = pd.Series(actor1_list,
index = rfid_weights.index)
rfid_weights['actor2'] = pd.Series(actor2_list,
index = rfid_weights.index)
rfid_weights = rfid_weights[['actor1', 'actor2', 't']]
rfid_weights = rfid_weights.rename(columns = {'t':'weight'})
# normalize the weights
rfid_weights['weight'] /= rfid_weights['weight'].max()
G = nx.Graph()
G.add_nodes_from(rfid_baboons_list)
interaction_tuples = interactions_to_edges(rfid_weights)
G.add_weighted_edges_from(interaction_tuples)
pos = nx.circular_layout(G)
edges = G.edges()
weights = [G[u][v]['weight'] for u,v in edges]
nx.draw(G, pos, width = weights, with_labels = True)
We are going to compare the graph we see just above with the graph constructed with the observed data:
condition_1 = data['Actor'].map(lambda x: x in rfid_baboons_list)
condition_2 = data['Recipient'].map(lambda x: x in rfid_baboons_list)
# getting the data for the baboons with the captors only
data_rfid_filter_df = data[condition_2 & condition_1]
list_behavior = list(set(data_rfid_filter_df['Behavior']))
# setting all the behaviors with coefficient 1
param = dict(zip(list_behavior, [1]*len(list_behavior)))
data_filtered_interac_weight = compute_interac_weights(data_rfid_filter_df,
param,
weight_prop = 'count',
normalized = True,
directed = False)
data_filtered_interac_weight = pd.DataFrame(data_filtered_interac_weight)
G2 = nx.Graph()
G2.add_nodes_from(rfid_baboons_list)
interaction_tuples = interactions_to_edges(data_filtered_interac_weight)
G2.add_weighted_edges_from(interaction_tuples)
pos = nx.circular_layout(G2)
edges = G2.edges()
weights = [G[u][v]['weight'] for u,v in edges]
nx.draw(G2, pos, width = weights, with_labels = True)
We can see that they are quite similar!
Here we are trying to seek communities among baboon, and see how they are evolving with time.
from cdlib import algorithms
def get_communities(G):
'''return a network
Add an attribute to the node which is a integer corresponding to
the group who the baboon belongs to after a community detection'''
coms = algorithms.louvain(G)
partition = coms.to_node_community_map()
partition = dict(partition)
partition = {k:v[0] for k,v in zip(partition.keys(), partition.values())}
for name, com in zip(partition.keys(), partition.values()):
nx.set_node_attributes(G, {name:{'group':com}})
return G
def draw_communities(G):
'''For a given graph G which has an attribute called 'group', this function reprensents
the graph'''
groups = [0, 1, 2, 3, 4, 5, 6, 7, 8 , 9]
colors = ['firebrick', 'mistyrose', 'coral', 'purple', 'sienna',
'royalblue', 'aqua', 'plum', 'seagreen']
color_map = []
group = nx.get_node_attributes(G, 'group')
for node in G:
for i in range(len(groups)):
if group.get(node) == groups[i]:
color_map.append(colors[i])
pos = nx.circular_layout(G)
edges = G.edges()
weights = [G[u][v]['weight'] for u,v in edges]
return nx.draw(G, pos, width = weights, node_color=color_map, with_labels = True)
g = get_day_interactions(data_groupby_date, dates[0], 'Affiliative', 'all')
G = get_communities(g)
draw_communities(G)
Since the data have been gathered over multiple days, representing the evolution of the interactions seems to be a interesting idea.
from matplotlib import animation, cm, colors, rc
rc('animation', html='html5')
fig = plt.figure()
draw_day_interactions(data_groupby_date, dates[0], 'all', 'all')
def animate(frame):
fig.clear()
draw_day_interactions(data_groupby_date, dates[frame], 'all', 'all')
animation.FuncAnimation(fig, animate, frames=len(dates), interval=1000, repeat=True)
fig = plt.figure()
g = get_day_interactions(data_groupby_date, dates[0], 'all', 'all')
g = get_communities(g)
draw_communities(g)
def animate(frame):
fig.clear()
g = get_day_interactions(data_groupby_date, dates[frame], 'all', 'all')
g = get_communities(g)
draw_communities(g)
animation.FuncAnimation(fig, animate, frames=len(dates), interval=1000, repeat=True)
We consruct an interaction network aggregated over the entire observation period. Nodes represent baboons and weighted edges the number of observed interaction betwween two baboons during the observation period.
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import pandas as pd
import seaborn as sns
#read file
data = pd.read_csv("OBS_data.csv",delimiter='\t')
# remove missing value
data = data.dropna()
# remove data 'Actor' = 'Recepient'
data = data[~(data['Actor'] == data['Recipient'])]
# correct typo
data.loc[data['Recipient'] == 'MALI ', ['Recipient']] = 'MALI'
# reset index
data = data.reset_index()
# Convert DateTime to the "datetime" standard format
data['DateTime'] = pd.to_datetime(data['DateTime'], dayfirst=True)
#creating a new column interaction pairs
interaction_pairs = [tuple(sorted((data['Actor'][i], data['Recipient'][i]))) for i in range(len(data))]
data['interaction_pairs'] = pd.Series(interaction_pairs, index = data.index)
# calculate the weight of each edge
weight = data.groupby('interaction_pairs').size().reset_index(name='counts')
# define edges list
edges_list = [(weight['interaction_pairs'][i][0], weight['interaction_pairs'][i][1], weight['counts'][i])
for i in range(len(weight))]
# define graph
G = nx.Graph()
G.add_weighted_edges_from(edges_list)
# save the network
#nx.write_graphml_lxml(G, "interaction_network.graphml")
We visualize the interaction network aggregated over the entire observation period using Gephi. The spatialisation Force Atlas 2 was used. The thickness of the lines is proportional to the weights of the edges.
We want to compare the resulting aggregated network with networks aggregated on shorter time windows. To do so we calculate the cosine similarity between the network weights. A cosine similarity measures the cosine of the angle between two vectors. In positive space cosine similarity measure is bounded in [0,1]. It takes the value 1 if the vectors are proportional and 0 if they are perpendicular.
We define the Global Cosine Similarity (GCS) measure between two networks as the cosine similarity between the two vectors formed by the list of all edge weights in each network (using a weight 0 if an edge is not present) \begin{equation} GCS_{1,2} = \frac{\sum_{i>j}w_{ij}^{(1)}w_{ij}^{(2)}}{\sqrt{\sum_{i>j}(w_{ij}^{(1)})^2}\sqrt{\sum_{i>j}(w_{ij}^{(2)})^2}} \end{equation}
def GCS(weight, weight_agg):
'''calculates the cosine similarity between the network weights
weight is the weight list during the entire observation period
weigh_agg is the weight list of the aggregated interaction network during a given aggrgation time window'''
# calculate the numerateur
dot_prod = 0
for i in range(len(weight_agg)):
a = weight['interaction_pairs']== weight_agg['interaction_pairs'][i]
if a.any():
dot_prod += int(weight[a]['counts']) * int(weight_agg['counts'][i])
# calculate the denominateur
norm_weight = np.linalg.norm(np.array(weight['counts']))
norm_weight_agg = np.linalg.norm(np.array(weight_agg['counts']))
return dot_prod/(norm_weight*norm_weight_agg)
# interaction by day
inter_byday = data.groupby([data["DateTime"].dt.floor("d"), 'interaction_pairs']).size().reset_index(name='counts')
result = []
for i in sorted(set(inter_byday['DateTime'])):
weight_agg = inter_byday[inter_byday['DateTime'] <= i].groupby('interaction_pairs').size().reset_index(name='counts')
result.append(GCS(weight, weight_agg))
# Plot results
plt.plot(np.arange(len(result))+1, result, '.')
plt.xlabel('Aggregation time window (days)')
plt.ylabel('GCS')
plt.xticks(np.arange(1,22,2))
plt.show()
The graph above shows the resulting the global similarities between the network aggregated on the whole period of observation and networks aggregated on shorter time windows. The GSC increases slowly and reaches a hight value after only 10 days of observation.
Later we compute the aggregated interaction networks over successive time windows of 3 days. We compute the cosine similarities between each couple to determine how stable the aggregated networks are. We show in figure below the resulting colour-coded matrices for the GCS.
weight_agg_list = []
# list of dates 3 day interval
dates_list = sorted(set(inter_byday['DateTime']))[::3]
i = dates_list[0]
for j in dates_list[1:]:
weight_agg = inter_byday[(inter_byday['DateTime'] >= i) & (inter_byday['DateTime'] < j)].groupby('interaction_pairs').size().reset_index(name='counts')
i = j
weight_agg_list.append(weight_agg)
# compute GCS betwwenn each couple
m = len(weight_agg_list)
A = np.zeros((m,m))
for i in range(m):
for j in range(m):
A[i,j] = GCS(weight_agg_list[i], weight_agg_list[j])
# plot results
plt.matshow(A)
plt.clim(0,1)
plt.colorbar()
plt.show()
The values obtained are low with average similarity values of 0.59. We conclude that important differences are measured between the interaction networks aggregated over 3 days of observation. We can conclude the later are not satble. In order to have a stable network, a longer aggregation window should be used.
# remove 'EXTERNE' and 'SELF'
G.remove_node('EXTERNE')
G.remove_node('SELF')
# number of nodes
n = len(G.nodes())
# number of edges
m = len(G.edges())
print('#nodes=',n,' #edges=', m)
#nodes= 19 #edges= 158
From the raw data, it is possibble to first plot the distribution of the interaction times, to get a feeling of the length of the interactions between baboons.
import seaborn as sns
duration = data['Duration']
sns.histplot(duration)
<AxesSubplot:xlabel='Duration', ylabel='Count'>
Because the observations were not being lead anymore when the interaction time was exceeding 300 seconds, there is a heap around a duration of 300 seconds. These interaction mostly correspond to 'Invisible', 'Resting, 'Grooming', 'Other' and 'Embracing':
a = data[(data["Duration"]>=280)]
sns.histplot(a['Behavior'])
<AxesSubplot:xlabel='Behavior', ylabel='Count'>
def average_degree(G):
return 2*len(G.edges())/len(G.nodes())
print( 'average degree <k> = ', round(average_degree(G),2))
average degree <k> = 16.63
The average degree is equal to 16.63. The later does not reveal enough about the structure of the graph. So we examine the distribution of degrees.
Below a visualisation of the degrree distribution of the interaction network using Gephi. The size of the nodes is proportional to the degree of the node. Hight degree nodes appear in red and low degree nodes appear in blue.
The degree distribution shows that almost all the nodes have degrees near the average degree (std = 1.66).
plt.hist(list(dict(G.degree()).values()), bins='auto', alpha=0.5)
plt.show()
Nevertheless, degree may be less suitable than strength as individuals can interact frequently with few social partners. The histogram below shows that the weight distribution could be modeled by a power low with a high majority of low weight edges and a small minority of edges with very high weight.
#Weight distribution
sns.histplot([i[-1]['weight'] for i in G.edges(data=True)],bins='auto')
plt.xlabel('Weight')
plt.show()
So we perform weight thresholding and remove edges with low weight (<10, i.e. we keep edges corresponding to at least approximatively an interaction every 3 days). We obtain the following graph.
ebunch = [(i[0],i[1]) for i in G.edges(data=True) if i[-1]['weight']<=10]
G.remove_edges_from(ebunch)
print('density =',round(nx.density(G),2))
print('clustering coefficient =', round(nx.transitivity(G),2))
density = 0.49 clustering coefficient = 0.71
The density is equal to 0.49. We recall that a density is 1 for a fully connected graph. The relatively high density could be explained by the fact that the baboons are in an enclosure. Captivity can reinforce social relationships, as animals do not have to search for food, they typically spend more time engaged in social activities (with affiliative or agonistic interactions). There are also other factors such as group size (the larger the group, the lower the density), seasonality (higher density during the mating season)...
The clustering coefficient is 0.71. The high clustering coefficient means that the network contains tightly connected communities.
print('average shortest path = ', round(nx.average_shortest_path_length(G, weight=True),2))
print('diameter =', nx.diameter(G))
average shortest path = 1.64 diameter = 4
The average shortest path is 1.68. The network is hairball like. This suggests potential for rapid transfer among all group members (information or disease transmission).
sorted(nx.betweenness_centrality(G).items(), key=lambda x: x[1])
[('ARIELLE', 0.0), ('ATMOSPHERE', 0.0), ('KALI', 0.0), ('VIOLETTE', 0.0008169934640522876), ('EWINE', 0.004435107376283847), ('FANA', 0.007703081232492998), ('ANGELE', 0.013702147525676936), ('LIPS', 0.024301515477986067), ('NEKKE', 0.024301515477986067), ('FELIPE', 0.026947018123488722), ('MUSE', 0.034413680001915296), ('FEYA', 0.03446635064282123), ('HARLEM', 0.04107591754650579), ('LOME', 0.04849050731403671), ('MALI', 0.05595118389236037), ('MAKO', 0.07732409202997438), ('BOBO', 0.09904474610356961), ('PETOULETTE', 0.1110566448801743), ('PIPO', 0.11492374727668844)]
The node with highest betweeness is 'PIPO', 'PETOULETTE', and 'BOBO'. Betweenness centrality indicates the role of a node in transmission of information, disease, etc., as it indicates to what extent a node connects subgroups, like a bridge. Besides, animals with high betweenness are likely to be important for group stability, and their removal (by death for example) may fragment the group into smaller subgroups.
We have presented methods for infering an interaction and contact network respectively from behavior and sensor data. We showed that the baboons have complex and dynamic interaction networks resulting in daily changes of network features. Then we compared the interaction netwok infered from observation data and contact networks infered from sensor data. We later examined the stability of the interaction network over time. Finally, we provided a qualitative discussion of the properties of the aggregated interaction network over the entire observation period. We show that the social organisation and structure of these species is rich and complex.