Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Although prior work on bias mitigation has focused on promoting social equality and demographic parity, less attention has been given to aligning language model outputs with desired distributions. For example, we might want to align a model with real-world distributions to support factual grounding. Thus, we define bias as deviation from a desired distribution, which may be an equal or real-world distribution — depending on application goals. We propose a weighted adaptive loss based fine-tuning method that aligns LLM’s gender–profession output distribution with the desired distribution, while preserving language modeling performance. Using three profession sets—male-dominated, female-dominated, and gender-balanced— derived from U.S. labor statistics (2024), we evaluate both our adaptive method for reflecting reality and a non-adaptive variant for equality. Experiments on three masked language models and one autoregressive model (LLaMA 3.2-3B-Instruct) show near-complete bias mitigation under equality and about 30–75% (MLMs) and 50% (ALM) bias reduction, when aligning LLMs to distributions in the real world.