Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Classifiers trained on historical data are deployed in the real-world to automate decisions from hiring to loan issuance. Judging the fairness and efficiency of these systems, and their human counterparts, is a complex and important topic studied across both computational and social sciences. One common way to address bias in classifiers is to resample the training data to offset distributional disparities. In the hiring domain, where results may vary by a protected class, many interventions from the literature equalize the hiring rate within the training set to alleviate bias in the resulting classifier. While simple and seemingly effective, these methods have typically only been evaluated using data obtained through convenience samples, e.g., results of some real world hiring process, introducing selection and label bias into the evaluation. In the social and health sciences, audit studies, in which fictitious testers'' (resumes) are sent to subjects (job openings) in a randomized control trial, provide high quality data that support rigorous estimates of discrimination by controlling for confounding factors. In this paper, we investigate how data from audit studies can be used to improve our ability to both train and evaluate automated hiring algorithms. We find that audit data of real-world hiring reveals cases where equalizing base rates across classes \emph{appears} to achieve parity using traditional measures, but in fact has $\approx$ 10\% disparity when measured appropriately. We also show that corrections based on individual treatment effect estimation methods combined with audit study data can overcome these issues, underscoring the need for rigorous data collection in fairness research.