Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Machine learning models are often used to make predictions about the outcomes of applications to selective programs. Many prospective school or college applicants turn to machine learning models to predict whether they will be admitted to a program, and employers may use algorithmic tools to filter out resumes predicted to have a low probability of being hired when offering interviews for a job opening. However, such decision processes differ substantially from the conventional machine learning setting: decisions are not independent across applicants. Whether a student is admitted depends on the other applicants who apply because admissions decisions are \textit{capacity-constrained}. We formalize how the nature of admission decisions results in a data-generating process which is incompatible with traditional machine learning assumptions. We characterize how selection functions properties affect the difficulty of generalization to applicant pool distribution shifts, introducing two concepts: stability, which measures how many existing decisions can change when a single new applicant is introduced; and variability, which measures the number of unique students whose decisions can change. We demonstrate our theory on admissions data from the New York City high school matching system, showing that machine learning performance degrades as the applicant pool increasingly differs from the training data. Furthermore, there are larger performance drops for schools using decision rules that are less stable and more variable. Our work raises questions about the reliability of predicting individual admissions probabilities.
