Theses @ NLP Group

The NLP Group is continuously looking for students who would like write their bachelor's or master's thesis in the area of natural language processing, possibly with connections to information retrieval and general artificial intelligence.

Topics

All thesis topics should be related to the main research directions of the NLP Group, which include computational argumentation, computational sociolinguistics, and computational explanation

Below, we provide a selection of currently available topics. Details of the topics are discussed and shaped jointly in the beginning of the thesis process. Other topics are possible, including own ideas from the student's side, if they go hand in hand with our research interests.

  • Towards Conversational Data Annotation: Investigating the role of explanation generation on annotation agreement

    Existing labeling procedures treat labeling as multiple choice answering questions with a list of radio buttons to choose from. In this way, current labeling protocols treat labeling errors similarly and do not distinguish between errors, misunderstandings, and cultural and personal differences. Large language models show excellent potential to improve annotation guideline procedures because of their conversational nature and their ability to learn and follow instructions. An open research direction is how LLMs can support annotators in tackling difficult annotation tasks. In this thesis, we will investigate a specific setup where an LLM will assist annotators by answering their questions and actively learning how to explain the annotations provided by the annotators. The explanations will be first generated by the annotators in the first iteration. In this setup, users have to provide natural instance-specific explanations for their labels. An LLM will learn these explanations and in the next iteration, it will provide the user with possible instance-specific explanations to choose from. When an LLM detects possible online label errors it will provide possible explanations and counterarguments to nudge the annotator toward more careful consideration of their choice.

    Supervisor: Dr. Ajjour

  • Exploring Instruction Fine-tuned LLMs for Predicting Natural Language Tasks from Data Points

    Instruction Fine-tuned LLMs have pushed the boundaries of what is possible in natural language processing (NLP), setting new standards in various applications. During training of these models, the model is provided with a prompt or instruction that guides the model to perform a specific task, together with some data points representing the input-output pairs for the task. However, the relevance of the instruction of the task to the data points is not well understood. In this project, we aim to explore the relationship between the instruction and the data points in the context of predicting natural language tasks from data points.
     

    Advisor: Timon Ziegenbein

  • Human-Centered Explanation Generation using Style Vectors and LLMs

    Computational explanation generation is an important research area for explaining the decisions of an AI application to make them transparent and understandable to humans. Coherent explanations can be automatically generated by Large Language Models (LLMs). However, to ensure that these explanations are effectively understood by the human, it is important to be aware of the human's background and capabilities. For example, explaining a topic to a child requires a different level of complexity and specificity than explaining it to an expert on the topic. Therefore, it is necessary to control the LLM to generate explanations that are tailored to a specific target group. One way to control an LLM is to train so-called style vectors based on a person's writing style and to use them to influence the style of the generated text. Your task will be to investigate whether style vectors can also be used to extract attributes that help to best generate explanations for a specific target group.

    Advisor: Leandra Fichtel

Working on the outlined and similar topics involves dealing with state-of-the-art technologies such as neural transformers, contrastive learning, multitask learning, and/or various others. Most topics target the development and empirical evaluation of NLP methods for specific tasks.
 

Interested?

Candidates should have very good programming skills (preferably in Python) as well as some experience with machine learning and other AI methods (ideally with NLP). You should be enrolled in one of the computer science programs at Leibniz University Hannover.

In case you are interested in a specific topic, please send a mail to the advisor of that topic, including information about the prior knowledge and experience have:

  • What relevant courses did you take?
  • What experience with AI development and evaluation do you have?
  • What other relevant knowledge do you have?

In case you are unsure about the topic, but interested in writing your thesis with the NLP Group, please send a mail to the head of the group.

Evaluation

The grading of a thesis is based on a weighted grades for two parts: 

  • The developed solution to the problem tackled in thesis (45%)
  • The written thesis presenting the solution (55%)

The grading of the developed solution takes five criteria into account:

  • Difficulty / Complexity. How difficult was it to develop the solution? How much effort was put into it? Is the complexity justified? ... 
  • Technical quality. Is the design and realization of the solution well-made? Are the experiments systematic and scientifically sound? ...
  • Novelty and own ideas. Does the solution have scientific novelty? Have own ideas been developed and realized in the solution? ...
  • Impact / Publishability. Does the solution improve the state of the art? Are the results worth publishing? Can they be published as is? ...
  • Implementation and data. How easy is it to read and reuse the code? If data has been created, is it well-organized? Are they well-documented? ...

The grading of the written thesis takes six criteria into account:

  • Abstract, introduction, and conclusion. Are problem, solution, and results well-introduced? Are the right conclusions made? Is the whole story told? ... 
  • Background and related work. Are basics well-described and relevant? Is the connection to the thesis clear? Is the state of the art well-discussed? ...
  • Approaches and data. Is the presentation of the developed approaches and data clear, complete, and on the right technical level? ...
  • Experiments, evaluation, and discussion. Are the experiments described systematically? Are the results clearly presented and correctly interpreted? ...
  • Form, layout, and style. Is the structure convincing? Is the writing clear and error-free? Do tables and figures support it? Are citations correct? ...
  • Scientific quality. Does the thesis adhere to scientific standards? Does the presentation follow community principles? …

Past Theses (as of Winter 2022)

  • Evaluating Data-Driven Approaches to Improve Word Lists for Measuring Social Bias in Word Embeddings. Master's thesis, Vinay Kaundinya Ronur Prakash, UPB.
  • Audience Aware Counterargument Generation. Master's thesis. Mahammad Namazov, 2023, UPB.
  • Improving Learners’ Arguments by Detecting and Generating Missing Argument Components. Master's thesis, Nick Düsterhus, 2023, UPB.
  • Gender-inclusive Coreference Resolution using Pronoun Preference. Master's thesis, Jan-Luca Hansel, 2023, UPB.
  • Dialect-aware Social Bias Detection using Ensemble and Multi-Task Learning. Master's thesis, Sai Nikhil Menon, 2022, UPB.
  • Counter Argument Generation Using a Knowledge Graph. Master's thesis, Indranil Ghosh, 2022, UPB.
  • Domain-aware Text Professionalization using Sequence-to-Sequence Neural Networks. Bachelor's thesis, Juela Palushi, 2022, UPB.

Past Theses (Summer 2018 – Summer 2022)

  • Detection and Mitigation of Subjective Bias in Argumentative Text. Master's thesis, Sambit Mallick, 2022, UPB.
  • Cross-domain analysis of argument quality and its connection to offensive language. Bachelor's thesis, Patrick Bollmann, 2022, UPB.
  • Cross-domain Aspect-based Sentiment Analysis with Multimodal Sources.Master's thesis, Pavan Kumar Sheshanarayana, 2022, UPB.
  • Comparative Evaluation of Automatic Summarization Techniques for German Court Decision Documents. Master's thesis, Josua Köhler, 2022, UPB.
  • Computational Analysis of Cultural Differences in Learner Argumentation.Master's thesis, Garima Mudgal, 2022, UPB.
  • Propaganda Technique Detection Using Connotation Frames. Master's thesis, Vinaykumar Budanurmath, 2022, UPB.
  • Contrastive Argument Summarization using Supervised and Unsupervised Learning. Master's thesis, Jonas Rieskamp, 2022, UPB.
  • Mitigation of Gender Bias in Text using Unsupervised Controllable Rewriting.Master's thesis, Maja Brinkmann, 2021, UPB.
  • Assessing Stereotypical Social Biases in Text Sequences using Language. Master's thesis, Meher Vivek Dheram, 2021, UPB.
  • Modeling Context and Argumentativeness of Sentences in Argument Snippet Generation. Master's thesis, Harsh Shah, 2021, UPB.
  • Political Speaker Transfer: Learning to Generate Text in the Styles of Barack Obama and Donald Trump. Master's thesis, Jonas Bülling, 2021, UPB.
  • Quantifying Social Biases in News Articles with Word Embeddings. Bachelor's thesis, Maximilian Keiff, 2021, UPB.
  • Computational Text Professionalization using Neural Sequence-to-Sequence Models. Master's thesis, Avishek Mishra, 2021, UPB.
  • Assessing the Argument Quality of Persuasive Essays using Neural Text Generation. Master's thesis, Timon Gurcke, 2021, UPB.
  • Automatic Conclusion Generation using Neural Networks. Bachelor's thesis, Torben Zöllner, 2020, UPB.
  • Computational Analysis of Metaphors based on Word Embeddings. Bachelor's thesis,  Simon Krenzler, 2020, UPB. 
  • Semi-supervised Cleansing of Web-based Argument Corpora. Bachelor's thesis, Jonas Dorsch, 2020, BUW.
  • Countering Natural Language Arguments using Neural Sequence-to-Sequence Generation. Master's thesis, Arkajit Dhar, 2020, UPB.
  • Snippet Generation for Argument Search. Bachelor's thesis, Nick Düsterhus, 2019, UPB.
  • Argument Quality Assessment in Natural Language using Machine Learning — bachelor's thesis, Till Werner, 2019, UPB.
  • Stance Classification in Argument Search. Master's thesis, Philipp Heinisch, 2019, UPB.
  • Towards a Large-scale Causality Graph. Bachelor's thesis, Yan Scholten, 2019, UPB.

Past Theses (Summer 2009 – Winter 2017)

  • Cross-Domain Mining of Argumentation Strategies using Natural Language Processing. Master's thesis, 2017, BUW.
  • Mining Relevant Arguments at Web Scale. Master's thesis, 2017, BUW.
  • Identifying Controversial Topics in Large-Scale Social Media Data. Master's thesis, 2016, BUW.
  • Efficiency and Effectiveness of Multi-Stage Machine Learning Algorithms for Text Quality Assessment. Master's thesis, 2013, UPB.
  • An Expert System for the Automatic Construction of Information Extraction Pipelines. Master's thesis, 2012, UPB.
  • Efficiency and Effectiveness of Text Classification in Information Extraction Pipelines. Master's thesis, 2012, UPB.
  • Efficient Information Extraction for Creating Use Case Diagrams from Text. Master's thesis, 2012, UPB.
  • Heuristic Search for the Run-time Optimization of Information Extraction Pipelines. Master's thesis, 2012, UPB.
  • Aggregation and Visualization of Market Forecasts. Bachelor's thesis, 2011, UPB.
  • Branch Categorization based on Statistical Analysis of Information Retrieval Results. Bachelor's thesis 2011, UPB.
  • Evaluation of Cooperative Robot Motion Strategies in Simbad. Bachelor's thesis, 2009, UPB.

LUH: Leibniz University Hannover, UPB: Paderborn University, BUW: Bauhaus-Universität Weimar