Data Engineering techniics and responsibilities - 4GMC1411
A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail
Number of hours
- Lectures : 9.0
- Tutorials : 6.0
- Laboratory works : -
- Projects : -
- Internship : -
- Written tests : 1.0
ECTS : 1.5
-
Goals
Students will learn about data preparation methods for machine learning, knowledge engineering and text mining, and how to integrate them into data science projects.
Students will know how to manage their data, sort it, and organize it efficiently. They will be able to present relevant visualizations of their data and results. They will have acquired the behavior of a responsible and ethical data engineer.
Content B0 Introduction: Data Science Project Management
Steering data science projects, based on CRISP-DM
B1 Data handling & Data Engineer responsibilities (ethics, security, etc.)
B1.1 Technical data management
Data format, variable formats; basic operations (reads, writes; sorts; selections, projections, filters; merges)
B1.2 Technical management of results (visualization)
Types of graphics, principles of good visualization
Make and discuss technical choices and representations
B1.3 Societal management
Legal aspects (RGPD), sustainable aspects (risks on people [customer and staff] as well as environmental costs), security (who holds the data, spying...).
B1.4 Implementation : Micro-project
PrerequisitesStudents will have taken and validated the following courses: Probability and Statistics; Programming with R
Tests This weighting is compatible with the organization of distance learning courses and exams
Continuous assessment marks (at least 2 TP marks: TP1 and TP2)
An individual examination grade: E1
Grade = 0.4*((TP1+TP2)/2) + 0.6*E1
Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel
Note de contrôle continu : TP (basée sur au moins 2 notes TP1 et TP2)
Note d'examen individuelle : EX
Note = 0.4*TP + 0.6*EX
Calendar The course exists in the following branches:
- Curriculum - Engineer student Master SCM - Semester 7
- Curriculum - Engineer student Master PD - Semester 7
see
the course schedule for 2022-2023
Bibliography Elff, (2020), Data Management in R , SAGE publication
Nicholas J. Horton and Ken Kleinman , (2016), Using R and RStudio for Data Management, Statistical Analysis, and Graphics (second edition)
A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail
Date of update June 14, 2021