Statistical Natural Language Processing

Overview

Semester Winter 2023/24
ECTS 5
Level Master
Language English

General

Lectures

  • InstructorHenning Wachsmuth
  • LocationAppelstr. 11, A145
  • Time. Thursday, 11:00–12:30
  • First date. October 12, 2023
  • Last date. January 25, 2024

Tutorials

  • Instructor. Gabriella Skitalinska
  • LocationAppelstr. 9A, MZ2
  • Time. Wednesday, 13:15–14:45
  • First date. October 18, 2023
  • Last date. January 24, 2024

Description

This course teaches students the major skills needed to tackle typical natural language processing (NLP) tasks with statistical methods. Starting from basics of NLP and machine learning, the course introduces the main learning-based NLP techniques, from clustering and classification to sequence labeling and neural language models. The application of these techniques is exemplified for various NLP tasks, such as topic modeling, sentiment analysis, and coreference resolution. Students learn to design, implement, and evaluate respective NLP methods, both theoretically and in practical assignments.

Topics

  • Recap of basics of data science and natural language processing
  • Unsupervised NLP techniques, such as representation learning and clustering
  • Supervised NLP techniques, such classification, regression, and sequence labeling
  • Neural NLP techniques, such as feedforward networks, recurrent networks, and transformers
  • Practical issues when applying NLP to real-world tasks

Recommended pre-requisites

  • Basics of statistics
  • Knowledge of programming, ideally Python
  • Bachelor's course: Introduction to Natural Language Processing
  • Alternatively: Any course on machine learning or artificial intelligence

Recommended literature

  • Daniel Jurafsky and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall, 2nd edition. Free draft of third edition: Speech and Language Processing

Recommended other courses

Lecture slides

  • Orga 01 – Organizational information [slides]
  • Orga 02 – Tentative exam dates [slides]
     
  • Part 1 – Overview [slides]
  • Part 2 – Basics of Data Science [slides]
  • Part 3 – Basics of Natural Language Processing [slides]
  • Part 4 – Representation Learning [slides]
  • Part 5 – NLP using Clustering [slides]
  • Part 6 – NLP using Classification and Regression [slides]
  • Part 7 – NLP using Sequence Labeling [slides]
  • Part 8 – NLP using Neural Networks [slides]
  • Part 9 – NLP using Transformers  [slides]
  • Part 10 – Practical Issues [slides]