Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Despite the remarkable success of semantic token learning in NLP and vision domains, token-level representation mechanisms face fundamental challenges when extended to continuous time series analysis. We identify a core limitation lies in the intrinsic absence of semantically meaningful tokenization boundaries within time-series, which differs substantially from discrete text tokens and presents unique complexities compared to spatially coherent image patches. While existing works mechanically apply fixed-length partitioning, recent evidence from time series foundation models reveals performance ceilings in prediction tasks under such paradigms. This paper introduces a novel tokenization framework known as physics-aware tokenization (PATK), designed to implement adaptive time-frequency tokenization via distribution-sensitive sampling strategies. Key innovations include: 1) A Rate-of-Variation (RoV) distribution is meticulously structured to encompass multi-scale temporal dynamics in the time domain, alongside a Spectral Energy Intensity (SEI) distribution devised to reveal global seasonal patterns within the frequency domain; 2) A physics-aware hidden Markov modeling (PA-HMM) is then established to adaptively breaks down continuous time-series into distinct tokens with elastic lengths, responding to physics-aware probabilities sampled from RoV and SEI distributions. The proposed PATK allows steady integration with both conventional Transformers and advanced large-scale time series models (including LLM-transferred methods and pretrained time series foundation models). Simulations across various datasets demonstrate that PATK excels in classification and forecasting tasks, showing notable adaptability to model long-term dependencies, strengthening resilience against disturbances, and robustness to missing data events.