Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Transformers have reshaped modern artificial intelligence, yet their theoretical foundations remain incomplete. This thesis investigates the approximation power and memory limitations of transformers. I combine tools from approximation theory and statistical learning theory to provide provable guarantees on expressivity, memorization capacity, and inherent architectural constraints. My contributions include the first rigorous proof of memory bottlenecks in prompt tuning and new results on the expressivity of transformers. The long-term goal of my doctoral research is to develop a principled theoretical framework that grounds the empirical behavior of large-scale transformer models in formal approximation-theoretic results.
