• Understand the issues of data analysis.
• Synthesise, organise information in multidimensional data (n individuals, p variables, t periods).
• Interpret, understand and produce statistical results.
• Know some standard methods for professional analyses. In particular data mining methods (analysis of non-parametric data), statistical methods (tests, functional form and model fitting for predicting).
• Understand the limitations of these approaches, consider alternatives, extensions, etc..
• Apply all the above points to data from Industrial Engineering jobs (calculations of cost, time, quality testing, satisfaction survey, reliability tests etc...).
This course aims at analysing the data in a systematic way according to the following procedure: Description, Segmentation, Modelling, Prediction, Validation.
After reviewing descriptive statistics, the course focuses on analysing univariate, bivariate and multivariate data. The challenge is to use the most appropriate method for the type of data studied (qualitative / quantitative) and study the issues which arise.
Here are some of the standard tools:
• Methods of data mining: Analysis of Variance (ANOVA), Correspondence analysis (CA), Principal Component Analysis (PCA), Clusterisation, Data Envelopment Analysis (DEA), generation of rules, neural network.
• Methods of decision-making statistics: Testing parametric and non-parametric statistics (tests on expectations & proportions, tests on independence between quantitative and qualitative variables...).
• Modelling methods by linear regression (continuous variables) and logistic regression (discrete variables).
Particular attention will be paid to the question of missing data, data outliers, error detection, the choice of variables and their transformations, validating data, measuring the quality of the models and their predictions.
Part of the course takes place in tutorials and/or a case study.
There is the possibility to use different software for statistics, linear programming and data mining.
Continuous assessment, a written report and case study (either individual or group work).
The course exists in the following branches:
J.H. McDonald, (2009), Handbook of Biological Statistics, Sparky House Publishing.
I.H. Witten et E. Frank, (2005), DataMining – Practical machine learning tools and technics, Elsevier.
Stéphane Tufféry, (2005), Datamining et statistique Décisionnelle – L’intelligence dans les bases de données, Ed. Technip.
Cornillon et al., (2008), Statistiques avec R, Presses Universitaires de Rennes.
Gaël Millot, (2011), Comprendre et réaliser les tests statistiques à l'aide de R, 2ème édition, Editions De Boeck, 767 pages
Hill, Griffiths and Lim, (2011), Principles of Econometrics, Fourth Edition
Date of update June 5, 2015