> Studies > Engineering degree
The course introduces the first tools for machine learning processing of quantitative and qualitative data.
Students will learn how to manipulate data in order to prepare it for analysis.
The analysis methods that will be learned allow for automatic classification; construction of predictive models; evaluation of the performance of the methods; diagnosis of the limits of the applications of these methods
.1 Issues in automatic classification
supervised vs unsupervised methods; some unsupervised methods (k-means, dendrograms).
.2 Overview of supervised methods
decision tree, naive bayesian, logistic, SVM, neurons, decision rules....
Implement and question the qualities and defects of each method.
.3 Evaluation of methods.
Performance criteria (quality vs complexity; predictions/black boxes vs knowledge/white boxes)
cross-validation (of predictions, parameters...).
.4 Interests and limitations
Data selection bias, model confirmation bias (cf. C. O'Donnell), etc.
Micro-workers of clicks (AmazonTurk, FB moderators, apple "spies", etc).
Students will have taken and validated the following courses: Probability and Statistics; Programming with R, Programming with Python
This weighting is compatible with the organization of distance learning courses and exams
At least 2 marks for practical work or continuous assessment: TP1 and TP2
One exam grade : E1
Grade = 0.4*((TP1+TP2)/2) + 0.6 * E1
Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel
Note de contrôle continu : TP (basée sur au moins 2 notes TP1 et TP2)
Note d'examen individuelle : EX
Note = 0.4*TP + 0.6*EX
The course exists in the following branches:
Course ID : 4GMC14B1
Course language(s):
You can find this course among all other courses.
J.H. McDonald, (2009), Handbook of Biological Statistics, Sparky House Publishing.
I.H. Witten et E. Frank, (2005), DataMining – Practical machine learning tools and technics, Elsevier.
Stéphane Tufféry, (2005), Datamining et statistique Décisionnelle – L’intelligence dans les bases de données, Ed. Technip.
Cornillon et al., (2008), Statistiques avec R, Presses Universitaires de Rennes.
Gaël Millot, (2011), Comprendre et réaliser les tests statistiques à l'aide de R, 2ème édition, Editions De Boeck, 767 pages
Hill, Griffiths and Lim, (2011), Principles of Econometrics, Fourth Edition
Date of update June 14, 2021