Number of hours
- Lectures 17.25
- Projects -
- Tutorials 12.75
- Internship -
- Laboratory works -
- Written tests 1.0
ECTS
ECTS 3.0
Goal(s)
Part I
Students will learn about data preparation methods for machine learning, knowledge engineering and text mining, and how to integrate them into data science projects.
Students will know how to manage their data, sort it, and organize it efficiently. They will be able to present relevant visualizations of their data and results. They will have acquired the behavior of a responsible and ethical data engineer.
Part II
The course introduces the first tools for processing quantitative and qualitative data using machine learning.
The analysis methods that will be learned allow for automatic classification; construction of predictive models; evaluation of the performance of the methods; diagnosis of the limits of the applications of these methods.
Content(s)
Part I
.0 Introduction: Data Science Project Management
Steering data science projects, based on CRISP-DM
.1 Data handling & Data Engineer responsibilities (ethics, security, etc.)
.1.1 Technical data management
Data format, variable formats; basic operations (reads, writes; sorts; selections, projections, filters; merges)
.1.2 Technical management of results (visualization)
Types of graphics, principles of good visualization
Make and discuss technical choices and representations
.1.3 Societal management
Legal aspects (RGPD), sustainable aspects (risks on people [customer and staff] as well as environmental costs), security (who holds the data, spying...).
.1.4 Implementation : Micro-project
Part II
.1 Issues in machine learning, supervised machine learning (regression, classification)
supervised vs unsupervised methods (quick presentation of some unsupervised methods (k-means, dendrograms)).
.2 Regression and classification methods: linear regression and logistic regression; models, algorithms and resolutions
.3 Internal evaluation of regression and classification: Errors, residuals and prediction evaluation
.4 External Evaluation: Statistical Assumptions and Model Evaluation
.5 Implementation on different databases
Students will have taken and validated the following courses: Probability and Statistics; Programming with R and Python
This weighting is compatible with the organization of distance learning courses and exams
Part I
Continuous assessment grade : TP (based on at least 2 grades TP1 and TP2)
Individual examination grade : EX
Grade = 0.4*TP + 0.6*EX
Part II
Continuous assessment grade : TP (based on at least 2 grades TP1 and TP2)
Individual examination grade : EX
Grade = 0.4*TP + 0.6*EX
Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel
Notes de contrôle continu (au moins 2 notes de TP: TP1 et TP2)
Une note d'examen individuelle: E1
Note = 0.4*((TP1+TP2)/2) + 0.6*E1
The course exists in the following branches:
- Curriculum - Master 1 GI program GID - Semester 7
- Curriculum - Master 1 GI SIE program - Semester 7
Course ID : WGUS2092
Course language(s):
You can find this course among all other courses.
Elff, (2020), Data Management in R , SAGE publication
Nicholas J. Horton and Ken Kleinman , (2016), Using R and RStudio for Data Management, Statistical Analysis, and Graphics (second edition)
J.H. McDonald, (2009), Handbook of Biological Statistics, Sparky House Publishing.
I.H. Witten et E. Frank, (2005), DataMining – Practical machine learning tools and technics, Elsevier.
Stéphane Tufféry, (2005), Datamining et statistique Décisionnelle – L’intelligence dans les bases de données, Ed. Technip.
Cornillon et al., (2008), Statistiques avec R, Presses Universitaires de Rennes.
Gaël Millot, (2011), Comprendre et réaliser les tests statistiques à l'aide de R, 2ème édition, Editions De Boeck, 767 pages
Hill, Griffiths and Lim, (2011), Principles of Econometrics, Fourth Edition