Master DISS : Machine Learning - 2024/2025

Introduction
This is the page of the Machine Learning class of DISS International Master at Lyon 1 University.

Rooms - Schedule

Please refer to the Week Schedule there: https://adelb.univ-lyon1.fr/

Program, classes and content

Below is an overview of the courses and practicals.
As the classes are done, I'll add my presentation slides, some notebooks, etc.
This organization can be subject to changes !.

Day Topic Resources
Wednesday Sep. 11(a.m) (RC+CC) Introduction, Data Preparation Slides - TP
Wednesday Sep. 18(a.m) (RC+CC) Unsupervised: Clustering beyond k-means
Wednesday Sep. 18(p.m) (RC+CC) Data Challenge + Project presentation
Wednesday Sep. 25(a.m) (RC+CC) Supervised Machine Learning: Basics
Wednesday Oct. 2(a.m) (RC+CC) Supervised ML 2
Wednesday Oct. 09(a.m) Mathieu Lefort: Deep Neural Networks CM - Lien Moodle
Wednesday Oct. 09(p.m) (RC+CC) Networks 1
Wednesday Oct. 16(a.m) Mathieu Lefort: Deep Neural Networks CM - Lien Moodle
Wednesday Oct. 16(p.m) Networks 2
Wednesday Oct. 23(a.m) Mathieu Lefort: Deep Neural Networks TP - Lien Moodle
Wednesday Oct. 23(p.m) Mathieu Lefort: Deep Neural Networks TP - Lien Moodle
Wednesday Oct. 30(a.m) (Bruno Yun) - ML pipeline Content
Wednesday Nov. 6(a.m) (Bruno Yun) - ML for Textual Data Content
Wednesday Nov. 13(a.m) (Bruno Yun) - ML with Large Language Models (part 1) Content
Wednesday Nov. 20(a.m) (Bruno Yun) - ML with Large Language Models (part 2) Content
Wednesday Nov. 27(a.m) ? ?
Wednesday Dec. 4(a.m) Session working on the Project
Wednesday Dec. 11 (9h45-11h45) ?
Wednesday Dec. 18 (9h45-11h45) Written Exam


Data

Data Exploration

Article presentation

Note on the article presentation

  • Presentation duration: 17 minutes: 12 minutes presentation, 5 minutes questions. (respecting it is part of the grade)
  • Select Key Information: Focus on the most important points of your article. Don't include everything; choose what's most relevant to your audience.
  • Charts and Tables: When presenting a chart or table, ensure you:
    • Highlight the main takeaways.
    • Direct the audience's attention to significant data points.
    • Provide a brief explanation to set the context.
  • Questions: Anticipate and address the most significant or likely question your audience might have about your article.
  • Keep Slides Clear:
    • Avoid using complete sentences; use bullet points or short phrases instead.
    • Ensure slides are readable and not crowded with text.
  • Presentation Length: Aim for approximately 12 slides for a 12-minute presentation. This keeps your content paced well and prevents rushing.
  • Quality over Quantity: Focus on presenting high-quality, relevant information rather than trying to include everything.
  • Address Uncertainties: If there's something in the article you don't understand, be transparent about it. This can lead to productive discussions.
  • Originality:
    • Avoid copying slides from online sources; always create your own content.
    • Use your own figures, diagrams, and visuals whenever possible.
  • Storytelling: Frame your presentation as a narrative or story. This helps in engaging your audience and making complex information more accessible.
  • Pacing: Speak clearly and at a moderate pace. Speaking too quickly can make it difficult for your audience to follow along.
  • Relevance of Plots: Only include plots or graphs if there's sufficient time to explain them. If the audience can't understand a plot's relevance, it's best to leave it out.

Tools

For tutorials and the experimental part of lectures, you need to use some softwares, detailed below.

Python

Most of the experiments are done in python. If you're not familiar with this language, there are numerous tutorials on the web. A good one for instance is from w3schools. If you want to be all set-up for experiments, here is a list of packages we will use. Note that some of them are only available with pip, and not anaconda. If you're using anaconda, you can neverthless use them, using the pip command (pip install package_name).
  • notebook. Jupyter notebook
  • pandas. Pandas
  • scikit-learn. Machine learning/Data mining
  • seaborn. ploting library
  • networkx. Generic network analysis
  • cdlib. Community detection
A quick tutorial for pandas: ici
A quick tutorial about python data structures (lists, dictionaries, sets...)ici

Gephi

Gephi is a software for basic graph manipulation and visualization. Although you can't do much in term of graph analysis, it is really convenient to explore and visualize graphs of small to medium size ( < 1000 nodes).
It can be donwloaded there : Gephi.
Gephi requires Java, and suffer from a few bugs on windows (but there is no better alternative). Here are solutions to common problems:

Exams

Coefficients

  • CT - final exam: 50%
  • Presentation: 20%
  • Project Rémy Cazabet: 10%
  • Project Mathieu Lefort: 10%
  • Project Bruno Yun: 10%

Project

Final exam

Final exam will take place on December 11, with questions on all parts of the class by all teachers.

2022 Exam (my part)