Natural Language Processing and Text-Mining - 4GMC14C1

Number of hours
- Lectures 7.5
- Projects -
- Tutorials 7.5
- Internship -
- Laboratory works -
- Written tests 1.0
ECTS
ECTS 1.5

Goal(s)

The course introduces the first tools for processing textual data.
Students will learn how to manipulate textual data in order to prepare it for analysis.
The analysis methods that will be learned allow students to match strings and vocabulary; to explore texts and identify their characteristics (mutual information, similar or complementary information); to classify texts and identify typologies.

Responsible(s)

Romain PINQUIE, Iragael JOLY

Content(s)

1. Introduction to NLP (Natural Language Processing) and TM (Text-Mining) (1 C-TD)
2. Pre-processing (2 C-TD): Parsing, tokenization, case folding, lemmatisation, stemming, POS-tagging, sentence splitting, stop words removal…
3. Text representations (1 C-TD): Vector Space Model, Bag-of-words model, TF, TF-IDF, Word2vec, GLOVE…
4. Feature selection (1 C-TD): X², mutual information, information gain…
5. Text classification (2 C-TD): One VS. Multi class, Bias VS. Variance, Kappa test, training set, validation set, testing set, accuracy, validation set, leave-one-out cross-validation, K-fold cross validation, precision, recall, F-Measure, confusion matrix…

Prerequisites

Students will have taken and validated the following courses: Probability and Statistics; Programming with R

Test

the following weighting of grades is compatible with distance exams

In-class exam (at least 2 grades TP1 and TP2
Individual final exam grade: E1

Grade = 0.4*((TP1+TP2)/2) + 0.6*E1

Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel

N1 = E1
N2 = E2

Calendar

The course exists in the following branches:

Curriculum - Engineer student Master SCM - Semester 7
Curriculum - Engineer student Master PD - Semester 7

see the course schedule for 2023-2024

Additional Information

Course ID : 4GMC14C1
Course language(s):

You can find this course among all other courses.

Bibliography

Manning, Christopher D., and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Cambridge, Mass.: MIT Press.

Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda. 2018. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning (1st. ed.). O'Reilly Media, Inc.

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, USA.

Update - 13/07/2023

French State controlled diploma conferring a Master's degree

diplôme conférant grade de master contrôlé par l'Etat

Common Core
Common Core presentation
Programme courses S5
Programme courses S6
Supply Chain Management
Programme presentation
Programme courses S7
Programme courses S8
Programme courses S9
Programme courses S10
Product Design
Programme presentation
Programme courses S7
Programme courses S8
Programme courses S9
Programme courses S10

Contacts

Academic staff

Head of studies:
Pierre Lemaire
Head of 1st Year Program:
Abdourahim Sylla
Head of Supply Chain Management Program:
Irène Gannaz
Head of Product design Program:
Yann Ledoux

Registrar's office

Head of Registrar's office:
genie-industriel.scolarite@grenoble-inp.fr
Secretary's office 1st Year:
Valérie Demicheli
Secretary's office 2nd Year:
Sylvie Malandrino
Secretary's office 3rd Year:
Vincente Odier
International relations department:
Nadia Dehemchi