UE Smart Analytics for Big Data - 5GUC3500

Informations générales

  • Number of hours

    • Lectures 24.0
    • Projects -
    • Tutorials 24.0
    • Internship -
    • Laboratory works -
    • Written tests -

    ECTS

    ECTS 6.0

Goal(s)

    1. Keywords

Data Science · Data Architecture · Data Exploration · Machine Learning · Generative AI · Industrial Analytics · Big Data · Reproducible Analysis

---

  1. Motivation

Industry 4.0, digital factories, and the Internet of Things are reshaping industrial engineering at every level, from the factory floor to supply chain decisions. The fusion of physical and digital worlds has made data a core strategic asset: its volume, variety, and velocity keep growing while its sources — sensors, cameras, ERP systems, connected devices — multiply in complexity.

The rise of generative AI and foundation models adds a new dimension to this landscape. Tools such as large language models, retrieval-augmented generation systems, and AI-assisted decision support are entering production environments. Yet their value depends entirely on the quality, structure, and governance of the underlying data. An engineer who cannot evaluate data quality, design a rigorous pipeline, or critically interpret a model's output will be poorly equipped to deploy or oversee these systems — regardless of how powerful the tools become.

Future industrial practitioners must therefore develop an integrated competency: moving fluently from raw data collection through storage, preparation, and modeling, to interpretable results and sound decisions. This course builds that competency end-to-end, with a constant focus on reproducibility, transparency, and industrial applicability.

---

  1. Teaching Team
    1. Grenoble INP — Génie Industriel & CNRS
  • Iragaël Joly[^1], MCF HDR, Grenoble INP Génie Industriel
  • Genoveva Vargas Solar, DR, CNRS, LIG, HADAS Group
  • Nhung Dang, PhD Student

[^1]: Corresponding teacher: iragael.joly@grenoble-inp.fr

    1. Industrial Contributors
  • Sébastien Raynoird-Thal, Efimove
  • Alexandre Zouaoui, Neovision
  • Alexis Mignon, Probayes
    1. Case studies topics
  • predictive maintenance
  • cost prediction
  • urban mobility analysis
  • machine parameters optimisation

Responsible(s)

Iragael JOLY

Content(s)

  1. Learning Objectives and Course Structure

The course is built around four learning objectives, each mapped to a specific part of the curriculum and to a dedicated evaluation mode.

| Objective | Course Part | Assessment |
|---------------------------------------------------------|---------------|---------------------------------|
| 1 — Data pipeline & project management | Part I | Quiz / TP Lesson 2 |
| 2 — Data handling, cleaning & querying | Part II | TP Lesson 3 |
| 3 — ML/DL: Exploratory and predictive analytics | Part III | TP Lessons 4–6 + group project |
| 4 — Industrial complex data (volume, temporal, spatial) | Part IV | Industrial challenge |

  1. Course Description
    1. Part I — Foundations of Data Science Workflow

This opening part establishes the conceptual and operational framework that runs through the entire course. Students are introduced to the data lifecycle, AI regulation (AI Act, GDPR), and the principles of reproducibility and transparency that condition every subsequent step. Generative AI tools — foundation models, prompt engineering, retrieval-augmented generation — are examined critically in their industrial context, distinguishing what they automate from what requires engineering judgment. This part directly addresses *Objective 1*: acquiring an end-to-end vision of the data science pipeline and the capacity to manage a data project from framing to delivery.

      1. Topics covered
  • AI & Data Science landscape; lifecycle of a data project
  • Regulatory environment: AI Act, GDPR, algorithmic accountability
  • Reproducibility: version control, literate programming, workflow management
  • Generative AI in industry: foundation models, prompt engineering, RAG, limitations, carbon footprint
  • Data governance and security

---

    1. Part II — Data Preparation

The quality of any analysis is bounded by the quality of its inputs. This part translates *Objective 2* into operational skills: storage architecture, data profiling and diagnosis, cleaning, transformation, and feature engineering. Students work on realistic messy datasets and build reusable preparation pipelines. Without a solid command of this part, the methods introduced in Part III cannot be reliably applied or interpreted.

      1. Topics covered
  • Storage architectures: relational, NoSQL, distributed systems
  • Data quality assessment: profiling, outlier detection, missing data strategies
  • Cleaning and transformation: normalisation, encoding, imputation
  • Feature engineering: variable construction, dimensionality considerations
  • Reproducible pipelines: structuring data preparation for reuse and auditability

---

    1. Part III — Analytics

With a reliable data pipeline in place, students turn to the methods at the core of industrial analytics. This part addresses *Objectives 3 and 4*: selecting the right method for a given data type, evaluating model performance, and interpreting results in an industrial context. Supervised and unsupervised learning are covered alongside model explainability. An introduction to causal inference prepares the transition from descriptive analysis — the what — towards prescriptive reasoning — the why and what if.

      1. Topics covered
  • Supervised learning: classification and regression; CART, random forests, gradient boosting
  • Unsupervised learning: clustering (k-means, hierarchical, DBSCAN), association rules
  • Model evaluation: metrics, cross-validation, confusion matrices, ROC
  • Feature selection and dimensionality reduction
  • Explainability: variable importance, SHAP values, model interpretation
  • Introduction to causal inference

---

    1. Part IV — Industrial Projects

This final part integrates all competencies developed across the course. Students work on real datasets provided by industrial partners (*Objective 4*), producing a complete analysis from data ingestion to communicated results (*Objective 2 and 3*). The mobility case study and the industrial challenge are the summative assessments of all four objectives. They also develop transversal skills: project management, teamwork, critical reading of data, and professional communication of findings to a non-specialist audience (*Objective 1*).

      1. Topics covered
  • Mobility survey analysis: trip scheduling, duration modelling, spatial patterns
  • Industrial challenge (end-to-end project with industrial partner)
  • Reporting and visualisation: reproducible reports, interactive dashboards, GIS
  • Project management and team organisation

---

  1. Evaluation

| Component | Weight | Timing | Format |
|:----------|:------:|:------:|:-------|
| Practical assessments (Exams) | 40 % | Each part I-II-III | In-class individual work |
| Group analytical project | 35 % | Parts III–IV | Team report + oral presentation |
| Industrial challenge | 25 % | Part IV | End-to-end team project on industrial data |

*Practical assessments* evaluate theoritical knowledge and ability to interprete results of tools after each part: foundation of data science (Part I), data preparation (Part II), supervised/unsupervised modelling (Part III). They are conducted during scheduled sessions and are individual.

*Group project* (teams of 3–4): complete analysis of a real mobility survey dataset, covering all stages from data preparation to results communication. Evaluated on methodological rigour, appropriateness of choices, and quality of presentation.

*Industrial challenge* (teams of 3–4): short end-to-end project in near-real conditions with an industrial partner. Evaluated jointly by academic and industrial supervisors.

---

  1. Target Audience

This course is designed for:

  • **Engineering students (GI)**: 3rd-year students at Grenoble INP Génie Industriel
  • *Apprentices*: engineering apprentices at equivalent level
  • *Master's students*: M2 ICL, M2 SIE

All participants must satisfy the prerequisites listed above. The course is intended for profiles seeking to operate at the interface between industrial engineering and data-driven decision making.

---

  1. Software and Tools
  • *R / RStudio* — statistical computing, data preparation, reporting (RMarkdown)
  • *Python* — data engineering, machine learning (scikit-learn, pandas)
  • *Git / GitHub* — version control, reproducible workflows
  • *Docker* — reproducible computing environments
  • **QGIS / R (sf, tmap)** — geospatial analysis and cartography
  • *Kaggle / Azure ML* — cloud-based ML environments for industrial cases

Prerequisites

  1. Prerequisites

Students are expected to demonstrate prior competency in the following areas before entering the course:

  • *Mathematics & statistics*: probability, descriptive and inferential statistics, introduction to linear and logistic regression modelling
  • *Programming*: first experience in R or Python (data structures, control flow, basic plotting)
  • *Databases*: relational model, SQL querying
  • *General*: scientific reading and written communication in English

Students who do not satisfy these prerequisites may be excluded from the course or, at the teachers' discretion, admitted on condition that they complete a self-study remediation programme prior to the first session.

Test

This weighting is compatible with teaching and examen by distance

Individual evaluation: final exam (E)

In Class projects : (P = average of all projects grades))

Application Project realized in group Defense (D) and Report (R)

Industrial case studies (ICS): Industrial data studies and data challenges

Second session Individual examination grade : EX (based on written or oral evaluation) : (E6)

N1 = 0.40*E + 0.35* (R + D)/2 + 0.25 * ICS

N2 = E6

The exam is given in english only FR

Calendar

The course exists in the following branches:

  • Curriculum - Engineer student Master SCM - Semester 9 (this course is given in english only EN)
  • Curriculum - Engineer student Master PD - Semester 9 (this course is given in english only EN)
  • Curriculum - Engineer IPID apprentice program - Semester 9 (this course is given in english only EN)
  • Curriculum - Master 2 GI SIE - Semester 9 (this course is given in english only EN)
  • Curriculum - Master 2 GI GID - Semester 9 (this course is given in english only EN)
see the course schedule for 2026-2027

Additional Information

Course ID : 5GUC3500
Course language(s): FR

You can find this course among all other courses.

Bibliography

Békés, Gábor, and Gábor Kézdi. 2021. Data Analysis for Business, Economics and Policy.
Chapman, Pete, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, andRudiger Wirth. 2000. “CRISP-DM 1.0 Step-by-Step Data Mining Guide.” The CRISP-DM con-sortium. https://maestria-datamining-2010.googlecode.com/svn-history/r282/trunk/dmct-teorica/tp1/CRISPWP-0800.pdf.
Greene, W. H. 2008. Econometric Analysis, 6th. Prentice-HallOxford: Clarendon Press.
Harrell, F. E. 2013. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression,and Survival Analysis. Springer Series in Statistics. Springer New York. https://books.google.fr/books?id=7D0mBQAAQBAJ.
Hayter, A. J. 2012. Probability and Statistics for Engineers and Scientists. Cengage Learning. https://books.google.fr/books?id=Z3lr7UHceYEC.
Horton, Nicholas, and Ken Kleinman. 2015. Using r and Rstudio for Data Management, Statistical Analysis,and Graphics, Second Edition. Using R and RStudio for Data Management, Statistical Analysis, andGraphics, Second Edition. https://doi.org/10.1201/b18151.
Hougaard, P. 2000. Analysis of Multivariate Survival Data. Springer Verlag.
Mount, J., and N. Zumel. 2019. Practical Data Science with r. Manning. https://www.manning.com/books/practical-data-science-with-r.
Shmueli, G., P. C. Bruce, I. Yahav, N. R. Patel, and K. C. Lichtendahl Jr. 2017. Data Mining for BusinessAnalytics Concepts, Techniques, and Applications in r.
Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer Texts inStatistics. New York: Springer. https://doi.org/10.1007/978-0-387-21736-9.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC.https://yihui.name/knitr/.
———. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida:Chapman; Hall/CRC. https://github.com/rstudio/bookdown.6

Contacts

Academic staff

Registrar's office