Master DISS : Machine Learning - 2024/2025

Introduction
This is the page of the Machine Learning class of DISS International Master at Lyon 1 University.

Rooms - Schedule

Please refer to the Week Schedule there: https://adelb.univ-lyon1.fr/

Program, classes and content

Below is an overview of the courses and practicals.
As the classes are done, I'll add my presentation slides, some notebooks, etc.
This organization can be subject to changes !.

--> -->
Day Topic Resources
Wednesday Sep. 18(a.m) (RC+CC) Introduction, Data Preparation Slides - TP
Wednesday Sep. 18(p.m) (RC+CC) Unsupervised: Clustering beyond k-means Slides - TP
Wednesday Sep. 25(a.m) (RC+CC) Supervised Machine Learning: Basics Slides - TP
Wednesday Oct. 2(a.m) (RC+CC) Supervised ML 2 + Articles introduction Slides - TP
Wednesday Oct. 09(a.m) Mathieu Lefort: Deep Neural Networks CM - Lien Moodle
Wednesday Oct. 09(p.m) (RC+CC) Networks 1 Slides - TP1 - TP2 - CheatSheet_intro - CheatSheet_matrices - CheatSheet_centralities
Wednesday Oct. 16(a.m) Mathieu Lefort: Deep Neural Networks CM - Lien Moodle
Wednesday Oct. 23(a.m) Mathieu Lefort: Deep Neural Networks TP - Lien Moodle
Wednesday Oct. 23(p.m) Mathieu Lefort: Deep Neural Networks TP - Lien Moodle
Wednesday Oct. 30(a.m) (RC+CC) Présentation d'articles
Wednesday Nov. 6(a.m) Andrea Failla: OSNAM
Wednesday Nov. 13(a.m) Bruno Yun - ML pipeline Content
Wednesday Nov. 13(p.m) Bruno Yun - ML for Textual Data Content
Wednesday Nov. 20(a.m) Bruno Yun - ML with Large Language Models (part 1) Content
Wednesday Nov. 20(p.m) Bruno Yun - ML with Large Language Models (part 2) Content
Wednesday Nov. 27(a.m) Andrea Failla: Sentiment analysis
Wednesday Dec. 4(a.m) Andrea Failla: Project
Wednesday Dec. 11 (9h45-11h45) Andrea Failla: Project
Wednesday Dec. 18 (9h45-11h45) Written Exam


Data

Data Exploration Clustering Supervised Learning Networks

Article presentation

Note on the article presentation

  • Presentation duration: 17 minutes: 12 minutes presentation, 5 minutes questions. (respecting it is part of the grade)
  • Select Key Information: Focus on the most important points of your article. Don't include everything; choose what's most relevant to your audience.
  • Charts and Tables: When presenting a chart or table, ensure you:
    • Highlight the main takeaways.
    • Direct the audience's attention to significant data points.
    • Provide a brief explanation to set the context.
  • Questions: Anticipate and address the most significant or likely question your audience might have about your article.
  • Keep Slides Clear:
    • Avoid using complete sentences; use bullet points or short phrases instead.
    • Ensure slides are readable and not crowded with text.
  • Number your slides: This allow to ask questions more easily
  • Presentation Length: Aim for approximately 12 slides for a 12-minute presentation. This keeps your content paced well and prevents rushing.
  • Quality over Quantity: Focus on presenting high-quality, relevant information rather than trying to include everything.
  • Address Uncertainties: If there's something in the article you don't understand, be transparent about it. This can lead to productive discussions.
  • Originality:
    • Avoid copying slides from online sources; always create your own content.
    • Use your own figures, diagrams, and visuals whenever possible.
  • Storytelling: Frame your presentation as a narrative or story. This helps in engaging your audience and making complex information more accessible.
  • Pacing: Speak clearly and at a moderate pace. Speaking too quickly can make it difficult for your audience to follow along.
  • Reading: Do not read! Neither the slides nor a written text. You can use notes if it helps, but only for quick glances.
  • Relevance of Plots: Only include plots or graphs if there's sufficient time to explain them. If the audience can't understand a plot's relevance, it's best to leave it out.
Proposed articles
  • Karczmarek, P., Kiersztyn, A., Pedrycz, W., & Al, E. (2020). K-Means-based isolation forest. Knowledge-based systems, 195, 105659.
  • Carreira-Perpinán, M. A. (2015). A review of mean-shift algorithms for clustering. arXiv preprint arXiv:1503.00687.
  • Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.
  • Bandyapadhyay, S., Fomin, F. V., Golovach, P. A., Lochet, W., Purohit, N., & Simonov, K. (2023). How to find a good explanation for clustering?. Artificial Intelligence, 322, 103948.
  • He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622.
  • Ran, X., Zhou, X., Lei, M., Tepsan, W., & Deng, W. (2021). A novel k-means clustering algorithm with a noise algorithm for capturing urban hotspots. Applied Sciences, 11(23), 11202.
  • Zafar, M. B., Valera, I., Rogriguez, M. G., & Gummadi, K. P. (2017, April). Fairness constraints: Mechanisms for fair classification. In Artificial intelligence and statistics (pp. 962-970). PMLR.

Tools

For tutorials and the experimental part of lectures, you need to use some softwares, detailed below.

Python

Most of the experiments are done in python. If you're not familiar with this language, there are numerous tutorials on the web. A good one for instance is from w3schools. If you want to be all set-up for experiments, here is a list of packages we will use. Note that some of them are only available with pip, and not anaconda. If you're using anaconda, you can neverthless use them, using the pip command (pip install package_name).
  • notebook. Jupyter notebook
  • pandas. Pandas
  • scikit-learn. Machine learning/Data mining
  • seaborn. ploting library
  • networkx. Generic network analysis
  • cdlib. Community detection
A quick tutorial for pandas: ici
A quick tutorial about python data structures (lists, dictionaries, sets...)ici

Gephi

Gephi is a software for basic graph manipulation and visualization. Although you can't do much in term of graph analysis, it is really convenient to explore and visualize graphs of small to medium size ( < 1000 nodes).
It can be donwloaded there : Gephi.
Gephi requires Java, and suffer from a few bugs on windows (but there is no better alternative). Here are solutions to common problems:

Exams

Coefficients

  • CT - final exam: 50%
  • Presentation: 20%
  • Project Rémy Cazabet: 10%
  • Project Mathieu Lefort: 10%
  • Project Bruno Yun: 10%

Project

Final exam

Final exam will take place on December 11, with questions on all parts of the class by all teachers.

2022 Exam (my part)