profile picture

Ryota Tanaka

NTT Human Informatics Laboratories/Tohoku University

multimodal

vision-language

gpt

multi-modal vision

instruction tuning

visual instruction tuning

lvlm

question answering

visual prompt injection

language and vision

3

presentations

SHORT BIO

I am currently serving as a researcher in NTT Human Informatics Laboratories. My current research topics lie in deep learning applications in natural language processing and vision & language, in particular, visual document understanding. Concurrently, I am also pursuing my Ph.D. at Tohoku NLP, under the supervision of Prof. Jun Suzuki. I obtained my B.Eng. and M.I. degrees from Nagoya Institute of Technology in 2018 and 2020, respectively, where I was fortunate to be advised by Prof. Akinobu Lee.

Presentations

Empirical Analysis of Large Vision Language Models against Goal Hijacking via Visual Prompt Injection

Subaru Kimura and 4 other authors

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Ryota Tanaka and 4 other authors

Instruction-Following Evaluation for Large Vision-Language Models

Daiki Shiono and 3 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved