Bitcoin network analysis: Spring 2021


Introduction
This is the page of the Bitcoin Network analysis course, part of the Master 2 Finance Technology Data at Paris 1 Pantheon Sorbonne.
The class covers: 1)Fundamentals of Network Science, in particular network discription, node centralities, community organisation, etc. 2)Network Analysis of the Bitcoin Blockchain Data.

Objectives
This class is thought to provide an introduction to network science, and its application to financial transactions in general, and cryptocurrencies in particular.

Overview of the course
The course is composed of 18h of classes, TD/TP.

Online settings

Because of the sanitary situation, classes this year will be online only, I'm really sorry about that, because we usually have a lot of interesting exchanges during the class. From my previous experiences, I learned that visioconference only classes are boring and tiring, both for students and teachers. So I proposed a mixed class, with live discussions, pre-recorded videos that you can watch at your own path, and exercises that you can do in autonomy (but I'm here to answer your questions of course).

Online tools: Note that I'm requested to note your presence, so please show up at the beginning of each class (9a.m, 2p.m.) and say "hello" in the chat and/or Discord.

Overview of the class

Below is an overview of the class, and slides of presentations. The class is organized in 3 full days.

Day 1

Morning
  1. 9:00-9:30: Introduction(Live) PDF
  2. 9:30-10:45: Describing Networks: Video - PDF
  3. 10:45-11:15: Describing Nodes: Video - PDF
  4. 11:15-12:00: Gephi:Exercises , PDF-hands-on
Afternoon
  1. 14:00-14:45: Bitcoin Network description(Live) PDF
  2. 14:45-16:15: Networkx Introduction
  3. 16:15-17:00: Bitcoin transaction data manipulation

Day 2

Morning
  1. 9:00-10:00: Community Structure(Live) PDF
  2. 10:00-12:00: Finishing last week exercises and videos
  3. 10:00-12:00: Exercise on communities: PDF
Afternoon
  1. 14:00-15:00: Projet Presentation(Live) PDF
  2. 15:00-17:00: Finishing exercises
  3. 15:00-17:00: Project Exploration: Propose a question to explore in the Discord. Do you need data aggregated in a different way? (every month, year...) Check the data section to access dayly data.

Day 3

Morning
  1. 9:00-10:00: Machine Learning on Graphs(Live) PDF
  2. 10:00-12:00: Finishing last week exercises
  3. 10:00-12:00: Exercise on ML on graphs: PDF
Afternoon
  1. 13:30-16:30: Working on the project
  2. 14:30-16:30: short individual meeting with each group to discuss about their project

Datasets

Here are some network datasets to play with:
  • GOT: Game of Thrones book series characters, Book 1.
  • airports: airline connetions between airports
  • Bitcoin2016-1M: 1 Million transactions between bitcoin users
  • (OLD, do not use)Transactions by day: each file corresponds to a day. Values are rounded to the second decimal, so be careful to use only dollar values for small values (I will change it later)
  • Transactions by day: each file corresponds to a day. Format is "parquet", so load with pd.read_parquet("myfile"). You need to first install the package "pyarrow", typically with "pip install pyarrow". value:value in satoshi.time: timestamp of the transaction.src_identity: source of the transaction.dst_identity: destination of the transaction.PriceUSD: Value of a bitcoin in USD for that day. The id of actors can be: a name (known actor), an integer (group of addresses, "wallet"), a bitcoin address (address belongs to no cluster).

Evaluation: Project

Students work in groups of 2 or 3.
They have to send me their project before Sunday, March 7, 2021, 23:59.
The details about the project can be found there: TBD.

Tools

For tutorials and the experimental part of lectures, you need to use some softwares, detailed below.

Gephi

Gephi is a software for basic graph manipulation and visualization. Although you can't do much in term of graph analysis, it is really convenient to explore and visualize graphs of small to medium size ( < 1000 nodes).
It can be donwloaded there : Gephi.
Gephi requires Java, and suffer from a few bugs on windows (but there is no better alternative). Here are solutions to common problems:

Python

Most of the experiments are done in python. If you're not familiar with this language, there are numerous tutorials on the web. A good one for instance is from w3schools. If you want to be all set-up for experiments, here is a list of packages we will use. Note that some of them are only available with pip, and not anaconda. If you're using anaconda, you can neverthless use them, using the pip command (pip install package_name).
  • networkx. Generic network analysis
  • notebook. Jupyter notebook
  • cdlib. Community detection
  • tnetwork. Temporal networks
  • scikit-learn. Machine learning/Data mining
  • seaborn. ploting library