
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
ml
clustering
This paper proposes a novel $k$-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the $k$-medoids objective. A single batch of size $m \ll n$ provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to $\mathcal{O}(mn)$, instead of $\mathcal{O}(n^2)$ compared to most $k$-medoids baselines. We obtain theoretical results highlighting that a batch of size $m=\mathcal{O}(\log(n))$ is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time.