Students will learn about data preparation methods for machine learning, knowledge engineering and text mining, and how to integrate them into data science projects.
Students will know how to manage their data, sort it, and organize it efficiently. They will be able to present relevant visualizations of their data and results. They will have acquired the behavior of a responsible and ethical data engineer.
B0 Introduction: Data Science Project Management
Steering data science projects, based on CRISP-DM
B1 Data handling & Data Engineer responsibilities (ethics, security, etc.)
B1.1 Technical data management
Data format, variable formats; basic operations (reads, writes; sorts; selections, projections, filters; merges)
B1.2 Technical management of results (visualization)
Types of graphics, principles of good visualization
Make and discuss technical choices and representations
B1.3 Societal management
Legal aspects (RGPD), sustainable aspects (risks on people [customer and staff] as well as environmental costs), security (who holds the data, spying...).
B1.4 Implementation : Micro-project
Students will have taken and validated the following courses: Probability and Statistics; Programming with R
This weighting is compatible with the organization of distance learning courses and exams
Continuous assessment marks (at least 2 TP marks: TP1 and TP2)
An individual examination grade: E1
Grade = 0.4*((TP1+TP2)/2) + 0.6*E1
Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel
Notes de contrôle continu (au moins 2 notes de TP: TP1 et TP2)
Une note d'examen individuelle: E1
Note = 0.4*((TP1+TP2)/2) + 0.6*E1
The course exists in the following branches:
Course ID : 4GMC1411
You can find this course among all other courses.
Elff, (2020), Data Management in R , SAGE publication
Nicholas J. Horton and Ken Kleinman , (2016), Using R and RStudio for Data Management, Statistical Analysis, and Graphics (second edition)
Date of update June 14, 2021