Quantifying the Bias of Transformer-Based Language Models for African American English in Masked Language Modeling

Speaker : Jerome Ramos
UCL
Date: 10/05/2023
Time: 3:00 pm - 4:00 pm
Location: Room 4B01

Abstract

In recent years, groundbreaking transformer-based language models (LMs) have made tremendous advances in natural language processing (NLP) tasks. However, the measurement of their fairness with respect to different social groups still remains unsolved. In this paper, we propose and thoroughly validate an evaluation technique to assess the quality and bias of language model predictions on transcripts of both spoken African American English (AAE) and Spoken American English (SAE). Our analysis reveals the presence of a bias towards SAE encoded by state-of-the-art LMs such as BERT and DistilBERT and a lower bias in distilled LMs. We also observe a bias towards AAE in RoBERTa and BART. Additionally, we show evidence that this disparity is present across all the LMs when we only consider the grammar and the syntax specific to AAE.

Short bio:
Jerome Ramos is a second year PhD student at University College London supervised by Dr. Aldo Lipani. His research interests include explainability and scrutability in conversational recommender systems.