Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank.

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Text-to-image multimodal tasks, generating/retrieving an image from a given text description, are extremely challenging tasks since raw text descriptions cover quite limited information in order to fully describe visually realistic images. We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input. First, we use the text description as initial input and conduct dependency parsing to extract the syntactic structure and analyse the semantic aspect, including object quantities, to extract the scene graph. Then, we train the extracted objects, attributes, and relations in the scene graph and the corresponding geometric relation information using Graph Convolutional Networks, and it generates text representation which integrates textual and visual semantic information. The text representation is aggregated with word-level and sentence level embedding to generate both visual contextual word and sentence representation. For the evaluation, we attached VICTR to the state-of-the-art models in text-to-image generation.VICTR is easily added to existing models and improves across both quantitative and qualitative aspects.

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

This is an in-person poster sessions. If you are a virtual attendee you can browse the posters below and ask questions in the QA box next to the poster **Chair:** Hyun-Je Song

PS2 - DEMO Language Modeling

poster

[![](https://assets.underline.io/uploads/markdown_image/1/image/81a3c0317d24f663b49b996875024d45.png)](https://aclanthology.org/volumes/2022.coling-1/)



### THE CONFERENCE WE KNOW AND WE WANT.
**COLING**, the International Conference on Computational Linguistics, is one of the premier conferences for the natural language processing and computational linguistics.

First established in 1965, the biennial COLING conference is held in diverse parts of the globe and attracts participants from both top-ranked research centers and emerging countries. Today, the most important developments in our field are taking place not only in universities and academic research institutes but also in industrial research departments including tech-startups. COLING provides opportunities for all these communities to showcase their exciting discovery.

In fall of 2022, COLING will be held in Gyeongju in a hybrid format. All participants can either present at the venue site or join virtually. As more people get vaccinated, we are happy to provide safer environments for our colleagues. We believe that COLING 2022 will be one of the conferences, free from the pandemic. The hybrid format gives presenters and sponsors a valuable opportunity to promote their companies in both an online and in-person venue. For the first time in a long time, customers can interact with their sponsor's products first-hand. The online venue, too, gives sponsors the chance to network with those unable to attend the in-person session.

The hybrid format ultimately lends itself to greater exposure for our sponsors: COLING2022 will let you reach more potential partners and customers than ever before!

To gain access to this event page you are required to register. Find more information and pay the registration fee on event the organizer’s website **[https://coling2022.org/reg](https://coling2022.org/reg)**

COLING 2022

COLING, the International Conference on Computational Linguistics, is one of the premier conferences for the natural language processing and computational linguistics.

LONG21: Multimodality 2

technical paper

COLING, the International Conference on Computational Linguistics, is one of the premier conferences for natural language processing and computational linguistics. Often grouped within the field of artificial intelligence, but actually pre-dating the development of artificial intelligence, advances in computational linguistics and natural language processing are now some of the major drivers behind the use of artificial intelligence for commercial and social applications – for example, on-line search, machine translation and with voice-assisted conversational devices.

First established in 1965, the biennial COLING conference is held in diverse parts of the globe and attracts participants from both top-ranked research centers and emerging countries. Today, the most important developments in our field are taking place not only in universities and academic research institutes, but also in industrial research departments and in technological startups. COLING conferences provide opportunities for all these communities to showcase their exciting developments.

COLING 2020

COLING, the International Conference on Computational Linguistics, is one of the premier conferences for natural language processing and computational linguistics. Often grouped within the field of artificial intelligence, but actually pre-dating the development of artificial intelligence, advances in computational linguistics and natural language processing are now some of the major drivers behind the use of artificial intelligence for commercial and social applications – for example, on-line search, machine translation and with voice-assisted conversational devices.

Siwen Luo

2

29

SHORT BIO

Presentations

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES