Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The computational cost of large language models (LLMs) is a primary obstacle to sustainable deployment. Static resource allocation is inefficient, as not all inputs require the same depth of processing. We propose a framework for adaptive, compute-efficient learning via conceptual criticality, which dynamically tailors computation to the assessed difficulty of an input. A lightweight criticality prediction module es- timates conceptual complexity on a continuous scale, and this score governs the LLM’s inference pathway, selectively activating token pruning, layer skipping, and quantization. Simple inputs are processed with minimal FLOPs and la- tency, while complex inputs use the model’s full capacity to preserve accuracy. We benchmark our framework and in- troduce metrics to quantify sensitivity to input criticality and per-sample computational savings. Results demonstrate an improved accuracy-efficiency trade-off, paving the way for more resource-aware systems.