EMNLP'22 Presentation of Proxy Clean Work: Mitigating Bias by Proxy in Pre-Trained Models
EMNLP'22 Presentation of Proxy Clean Work: Mitigating Bias by Proxy in Pre-Trained Models
09 December 2022
Transformer-based pre-trained models are known to encode societal biases, not only in their contextual representations but also in their downstream predictions when fine-tuned on task-specific data. We present D-BIAS, an approach that selectively eliminates stereotypical associations (e.g, co-occurrence statistics) at fine-tuning, such that the model doesn’t learn to excessively rely on those signals. D-BIAS attenuates biases from both identity words and frequently co-occurring proxies, which we select using pointwise mutual information. We apply D-BIAS to a) occupation classification, and b) toxicity classification and find that our approach substantially reduces downstream biases (> 60% in toxicity classification for iden- tities that are most frequently flagged as toxic on online platforms). In addition, we show that D-BIAS dramatically improves upon scrubbing, i.e., removing only the identity words in question. We also demonstrate that D-BIAS easily extends to multiple identities and achieves competitive performance with two recently proposed debiasing approaches: R-LACE and INLP.
Venue : The 2022 Conference on Empirical Methods in Natural Language Processing
File Name : ProxyClean#3364.pdf