Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Conventional fairness in multi-tenant Large Language Model (LLM) inference services is typically defined by system-centric metrics, such as equitable resource allocation. This paper argues that this paradigm is fundamentally flawed, as it creates a gap between measured system performance and actual user-perceived quality. We challenge this notion by introducing and formalizing Experiential Fairness, a user-centric paradigm that shifts the objective from equality of opportunity (resource access) to equity of outcome (user experience). To operationalize this, we propose ExFairS, a lightweight scheduling framework that evaluates each user's state via a composite metric integrating SLO compliance with resource consumption, and then acts on this evaluation through a credit-based priority mechanism. Extensive experiments on an 8-GPU NVIDIA V100 node show that ExFairS reduces the SLO violation rate by up to 100% and improves system throughput by 14-21.9%, outperforming state-of-the-art schedulers and delivering a demonstrably higher degree of Experiential Fairness.
