Aller au menu Aller au contenu
Design & Organisation
High level education
Design & Organisation
Design & Organisation

> Studies > Engineering degree

- 4GMC14C1

A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail cet article Facebook Twitter Linked In
  • Number of hours

    • Lectures : 7.5
    • Tutorials : -
    • Laboratory works : 7.5
    • Projects : -
    • Internship : -
    • Written tests : 1.0
    ECTS : 1.5

Goals

The course introduces the first tools for processing textual data.
Students will learn how to manipulate textual data in order to prepare it for analysis.
The analysis methods that will be learned allow students to match strings and vocabulary; to explore texts and identify their characteristics (mutual information, similar or complementary information); to classify texts and identify typologies.

Content

.1 Introduction to NLP (Natural Language Processing) and TM (Text-Mining) (1CM, 1TP)
.2 Pre-processing (1CM, 1TP)
Parsing, tokenization, case folding, lemmatisation, stemming, POS-tagging, sentence splitting, stop words removal…
.3 Word integration (1CM, 1TP)
Vector Space Model, Bag-of-words model, TF, TF-IDF, Word2vec, GLOVE…
.3 Feature selection (1CM, 1TP)
X², mutual information, information gain…
.4 Text classification (1CM, 1TP)
One VS. Multi class, Bias VS. Variance, Kappa test, training set, validation set, testing set, accuracy, validation set, leave-one-out cross-validation, K-fold cross validation, precision, recall, F-Measure, confusion matrix…

Prerequisites

Students will have taken and validated the following courses: Probability and Statistics; Programming with R

Tests

Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel

Notes de contrôle continu (au moins 2 notes de TP: TP1 et TP2)
Une note d'examen individuelle: E1

Note = 0.4*((TP1+TP2)/2) + 0.6*E1

Calendar

The course exists in the following branches:

  • Curriculum - Engineer student Master SCM - Semester 7
  • Curriculum - Engineer student Master PD - Semester 7
see the course schedule for 2021-2022

Additional Information

Course ID : 4GMC14C1
Course language(s): FR

You can find this course among all other courses.

Bibliography

Silge and Robinson (2017), Text Mining with R, O'Reilly
Stéphane Tufféry, (2005), Datamining et statistique Décisionnelle – L’intelligence dans les bases de données, Ed. Technip.
Cornillon et al., (2008), Statistiques avec R, Presses Universitaires de Rennes.
Gaël Millot, (2011), Comprendre et réaliser les tests statistiques à l'aide de R, 2ème édition, Editions De Boeck, 767 pages

A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail cet article Facebook Twitter Linked In

Date of update June 14, 2021

Université Grenoble Alpes