Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The rapid iterations of Large Language Models (LLMs) has intensified the need for scalable, cost efficient routing systems. Current frameworks suffer from model lock-in, requiring exhaustive evaluations or retraining to integrate new models, as a critical bottleneck in rapidly evolving LLM ecosystems. We present \systemname, a zero-shot difficulty-aware framework that dynamically routes queries to optimal LLMs using only 100 anchor samples per new model. \systemname introduces three innovations: (1) universal difficulty tiers that runs model-agnostic capability profiling, (2) a context-aware difficulty predictor that maps textual prompts to complexity scores without retroactive testing, and (3) a dual-mode ILP optimizer that balances cost and accuracy under varying constraints. Overall, by decoupling routing logic from model-specific data, our framework enables seamless integration of new LLMs, breaking the scalability limitations of existing systems. Our extensive experimental results demonstrate that \systemname reduces the serving costs of newly onboarded models by 24.50\% without any accuracy loss, and by up to 70.1\% with only minor accuracy reductions.