InstituteResearch GroupsNLP
Computational Sociolinguistics

Computational Sociolinguistics (CSL)

Computational Sociolinguistics, the intersection of computational social science and natural language processing, investigates research questions from the social sciences through empirical analyses of natural language data. The focus is often on insights into social phenomena and dynamics rather than the technologies behind. The NLP group conducts computational sociolinguistics research mainly on the following topics.

Social Bias Analysis and Mitigation

Social bias can emerge from pre-existing beliefs about the characteristics of group members of any social group, often leading to prejudices and discrimination. Language can be a major factor in carrying and reinforcing those biases, causing NLP models that learn from it through texts to inherit those biases. Triggered by our project on bias in AI models, we have analyzed social bias in controversial discussions to understand the possible influence on downstream systems (ArgMining 2020, best paper award). 

Moreover, we developed a novel method to evaluate bias measures that are widely used in NLP systems (IJCAI 2021). These measures evaluate bias in distributional embedding representations. We also evaluated the impact of embedding training algorithms on the perception of social bias and its interaction with political bias (EMNLP Findings 2022).

Media Bias Analysis and Mitigation

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. To solve this problem, we developed NLP models to analyze bias news articles, learning about the relation between sentence-level and article-level bias (EMNLP Findings 2020), and studying at what granularity level and how sequential patterns media bias is manifested (NLP+CSS 2020).

The second research direction of media bias is bias mitigation, we studied bias transfer based on a new corpus with bias-labeled news articles: Given an article with a political bias, generate an article with the same topic but opposite bias. To tackle the challenges of such bias flipping, we cross-aligned autoencoder incorporating information from an article’s content (INLG 2018). Moreover, we developed text generation methods for a controlled neural reframing of news articles at the sentence level (EMNLP Findings 2021).

Communication in Crowdworking

Crowdworking is one phenomenon of the digitization of the society. Online platforms provide connections between requesters of tasks and workers who solve the tasks in an anonymous and distant manner. However, the quality of crowdworking is affected by communication problems in the task design, operation, and evaluation. To learn about the workers' side, we compared existing research on problems in crowdworking with complaints mined from a workers' online discussion forum (COLING 2020).

A key problem in crowd working is a proper task design. We therefore developed computational methods to assess task clarity (HT 2021). Recently, we built an assistance tool on this basis, and developed it in user studies with requesters and crowdworkers.