The course introduces the first tools for machine learning processing of quantitative and qualitative data.
Students will learn how to manipulate data in order to prepare it for analysis.
The analysis methods that will be learned allow for automatic classification; construction of predictive models; evaluation of the performance of the methods; diagnosis of the limits of the applications of these methods


Iragael JOLY, Irene GANNAZ


.1 Issues in automatic classification
supervised vs unsupervised methods; some unsupervised methods (k-means, dendrograms).
.2 Overview of supervised methods
decision tree, naive bayesian, logistic, SVM, neurons, decision rules....
Implement and question the qualities and defects of each method.
.3 Evaluation of methods.
Performance criteria (quality vs complexity; predictions/black boxes vs knowledge/white boxes)
cross-validation (of predictions, parameters...).
.4 Interests and limitations
Data selection bias, model confirmation bias (cf. C. O'Donnell), etc.
Micro-workers of clicks (AmazonTurk, FB moderators, apple "spies", etc).


Students will have taken and validated the following courses: Probability and Statistics; Programming with R, Programming with Python


This weighting is compatible with the organization of distance learning courses and exams

At least 2 marks for practical work or continuous assessment: TP1 and TP2
One exam grade : E1

Grade = 0.4*((TP1+TP2)/2) + 0.6 * E1

J.H. McDonald, (2009), Handbook of Biological Statistics, Sparky House Publishing.
I.H. Witten et E. Frank, (2005), DataMining – Practical machine learning tools and technics, Elsevier.
Stéphane Tufféry, (2005), Datamining et statistique Décisionnelle – L’intelligence dans les bases de données, Ed. Technip.
Cornillon et al., (2008), Statistiques avec R, Presses Universitaires de Rennes.
Gaël Millot, (2011), Comprendre et réaliser les tests statistiques à l'aide de R, 2ème édition, Editions De Boeck, 767 pages
Hill, Griffiths and Lim, (2011), Principles of Econometrics, Fourth Edition


