Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Auto-regressive (AR)-based decoders, owing to their flexibility in handling variable-length outputs and their strong capability in modeling character-level dependencies, have emerged as the predominant decoding paradigm in the field of scene text recognition (STR). However, AR-based decoders suffer from attention drift, slow decoding speed, and difficulty capturing global dependencies, restricting their performance in various scenarios. In this paper, we propose a novel paradigm for AR-based decoding, called One-Token to Sequence (One2Seq), to address the above issues. Unlike existing methods, we encode the semantic features into a single context token and design a One-Token Wise Decoder to perform the decoding, which alleviates the attention drift caused by the accumulation of semantic information. Moreover, we proposed Positioal-aware Hash Embedding to embed the decoded characters, ensuring the order information is obtained in the context token. By continuously updating this token, One2Seq fully leverages the decoded semantic information while avoiding the computational overhead associated with the growing query sequence. Furthermore, to leverage global information for decoding, we propose Dynamic Global Infusion to dynamically integrates global visual features into the context token. Equipped with the enriched context token, the model has an enhanced ability to extract discriminative local features under the guidance of global context, thereby enhancing recognition accuracy. Extensive experiments reveal that, with its ingenious design, One2Seq exhibits marked superiority on both accuracy and decoding speed compared to existing STR models.

Downloads

Paper

Next from AAAI 2026

Efficient Segmentation with Multimodal Large Language Model via Token Routing
poster

Efficient Segmentation with Multimodal Large Language Model via Token Routing

AAAI 2026

+1Zelin Peng
Yu Huang and 3 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved