Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Fair clustering has attracted increased attention in recent years. In this work, we study the individually fair $k$-means problem in Euclidean space. While single-swap local search methods have achieved near-linear running time and constant approximation guarantees, their performance often depends on the aspect ratio $\Delta$ of the dataset (the ratio between the diameter and the minimum interpoint distance of the dataset). How to apply multi-swap local search while obtaining linear running time with better approximation ratio is still a challenging task. To address this, we introduce a collaborative initialization framework for individually fair $k$-means that integrates greedy with sampling techniques. This framework eliminates dependence on the aspect ratio $\Delta$ and yields an $(O(1), 4)$-bicriteria approximation in linear time. While the current state-of-the-art near-linear time algorithm achieves a $(2000, 6)$-bicriteria approximation in $O(ndk^2 \log(n\Delta))$ time under the assumption that optimal centers are identical to their corresponding centroids, this assumption is generally not satisfied under individual fairness constraint. In contrast, we propose a multi-swap local search algorithm that improves the approximation guarantee to $(62, 7)$. Our method runs in linear time $O(nd \cdot \mathrm{poly}(k))$ with constant probability and eliminates the need for this restrictive assumption. We validate our theoretical results through extensive experiments on both real-world and synthetic datasets, including large-scale benchmarks with up to 100 million points. Our empirical evaluation demonstrates superior performance in terms of clustering quality and computational efficiency, along with scalability under varying parameter settings.
