Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Efficient spam detection in resource-constrained environments remains challenging due to class imbalance, noisy text, and the computational demands of large Transformer models. We introduce a novel coreset selection framework based on a unified Entropy–Class-Balanced Uncertainty-Density Ranking (CBUDR) scheme. Our method prioritizes highly informative and uncertain samples while ensuring diversity and class balance within the selected subset. The framework flexibly supports multiple selection strategies, including Top-K, Bottom-K, and adaptive class-wise schemes, enabling robust performance even when training on as little as 5% of the dataset. Extensive experiments on benchmark datasets (UCI SMS, UTKML Twitter, LingSpam) show that our ranking scheme achieves competitive accuracy, precision, and recall while significantly reducing computational cost. These results demonstrate that carefully designed coreset strategies can surpass full-data performance in both balanced and imbalanced settings, highlighting the potential for deployment on low-power devices and mobile platforms.