Scholarly Archive

Ours
Research Journey

A curated archive of our research. We strive to give our best to every project, hoping to offer a little more clarity and insight to the evolving field of Multimodal Artificial Intelligence with every discovery we share.

2025

4 Results Found
AAAI WorkshopPublication

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

V

Tue-Thu Van-Dinh, Hoang-Duy Tran, Truong-Binh Duong, Mai-Hanh Pham, Binh-Nam Le-Nguyen, Quoc-Thai Nguyen

"The paper introduces ViInfographicVQA, the first Vietnamese benchmark for Infographic Visual Question Answering, containing 6,747 infographics and 20,409 verified QA pairs across multiple domains. It includes two tasks: Single-image VQA and Multi-image VQA, the latter requiring reasoning across multiple related infographics. Experiments show significant performance gaps in current vision-language models, especially for cross-image reasoning tasks."

VisionDocs @ ICCV2025Publication

Describe Anything Model for Visual Question Answering on Text-rich Images

V

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

"A framework that leverages the region-aware capabilities of the Describe Anything Model (DAM) for Visual Question Answering, particularly in text-rich images. It aggregates answers from multiple regional views of an image to better capture fine-grained textual evidence needed for reasoning. Experiments on six VQA benchmarks show consistent improvements over DAM, including a 7+ point gain on DocVQA, demonstrating strong performance with fewer parameters."

ICISN 2025Publication

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

V

Truong-Binh Duong, Hoang-Minh Tran, Binh-Nam Le-Nguyen, Dinh-Thang Duong

"A Vietnamese version of the VQA-X dataset to support research in Visual Question Answering with Natural Language Explanations. It uses an automated translation pipeline with multiple LLMs to generate and select high-quality Vietnamese translations. Experiments show that NLX-GPT achieves state-of-the-art performance on the new ViVQA-X dataset."

AAAI WorkshopPublication

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

V

Khoi Anh Nguyen, Linh Yen Vu, Thang Dinh Duong, Thuan Nguyen Duong, Huy Thanh Nguyen, Vinh Quang Dinh

"The paper proposes a Vietnamese VQA training framework using paraphrase-based augmentation and dynamic curriculum learning. Augmented samples are treated as easy and raw samples as hard, with the training ratio gradually adjusted to increase difficulty. Results show improvements on OpenViVQA and mixed results on ViVQA."

2024

1 Results Found
AICI2024Publication

Heterogeneous Transfer Learning Using Pre-trained Feature Mapping and Exchange

V

Nguyen Thuan Duong, Dinh Thang Duong, Hong Phuc Nguyen, and Quang Vinh Dinh

"We propose TLVFC, a transfer learning method that enables knowledge transfer between neural networks with different architectures. It initializes convolutional layers via variance-aligned matching of pre-trained weights and fully connected layers using weight distributions, with a feature exchange mechanism during fine-tuning."

VLAI Logo
VLAI Research

Vision and Language in AI (VLAI), a Vietnamese research group cultivating a deeper harmony between multimodal perception and human language in AI.

Hugging Face

Contact

Computer Science Dept.

Research Laboratory

[email protected]

© 2026 VLAI Research Group. All rights reserved.