Natural Language Processing and Text-Mining - 4GMC14C1
A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail
Number of hours
- Lectures : 7.5
- Tutorials : 7.5
- Laboratory works : -
- Projects : -
- Internship : -
- Written tests : 1.0
ECTS : 1.5
-
Goals
The course introduces the first tools for processing textual data.
Students will learn how to manipulate textual data in order to prepare it for analysis.
The analysis methods that will be learned allow students to match strings and vocabulary; to explore texts and identify their characteristics (mutual information, similar or complementary information); to classify texts and identify typologies.
Content 1. Introduction to NLP (Natural Language Processing) and TM (Text-Mining) (1 C-TD)
2. Pre-processing (2 C-TD): Parsing, tokenization, case folding, lemmatisation, stemming, POS-tagging, sentence splitting, stop words removal…
3. Text representations (1 C-TD): Vector Space Model, Bag-of-words model, TF, TF-IDF, Word2vec, GLOVE…
4. Feature selection (1 C-TD): X², mutual information, information gain…
5. Text classification (2 C-TD): One VS. Multi class, Bias VS. Variance, Kappa test, training set, validation set, testing set, accuracy, validation set, leave-one-out cross-validation, K-fold cross validation, precision, recall, F-Measure, confusion matrix…
PrerequisitesStudents will have taken and validated the following courses: Probability and Statistics; Programming with R
Tests the following weighting of grades is compatible with distance exams
In-class exam (at least 2 grades TP1 and TP2
Individual final exam grade: E1
Grade = 0.4*((TP1+TP2)/2) + 0.6*E1
Cette pondération est compatible avec une organisation des enseignements et des examens en distanciel
N1 = E1
N2 = E2
Calendar The course exists in the following branches:
- Curriculum - Engineer student Master SCM - Semester 7
- Curriculum - Engineer student Master PD - Semester 7
see
the course schedule for 2022-2023
Bibliography Manning, Christopher D., and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Cambridge, Mass.: MIT Press.
Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda. 2018. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning (1st. ed.). O'Reilly Media, Inc.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, USA.
A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mail
Date of update June 14, 2021