Scholarly Archive

Ours
Research Journey

A curated archive of our research. We strive to give our best to every project, hoping to offer a little more clarity and insight to the evolving field of Multimodal Artificial Intelligence with every discovery we share.

Archive: 5 Entries

2025

4 Results Found

AAAI WorkshopPublication

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

Tue-Thu Van-Dinh, Hoang-Duy Tran, Truong-Binh Duong, Mai-Hanh Pham, Binh-Nam Le-Nguyen, Quoc-Thai Nguyen

"The paper introduces ViInfographicVQA, the first Vietnamese benchmark for Infographic Visual Question Answering, containing 6,747 infographics and 20,409 verified QA pairs across multiple domains. It includes two tasks: Single-image VQA and Multi-image VQA, the latter requiring reasoning across multiple related infographics. Experiments show significant performance gaps in current vision-language models, especially for cross-image reasoning tasks."

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

Paper Code

VisionDocs @ ICCV2025Publication

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

"A framework that leverages the region-aware capabilities of the Describe Anything Model (DAM) for Visual Question Answering, particularly in text-rich images. It aggregates answers from multiple regional views of an image to better capture fine-grained textual evidence needed for reasoning. Experiments on six VQA benchmarks show consistent improvements over DAM, including a 7+ point gain on DocVQA, demonstrating strong performance with fewer parameters."

Describe Anything Model for Visual Question Answering on Text-rich Images

Paper Code

ICISN 2025Publication

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Truong-Binh Duong, Hoang-Minh Tran, Binh-Nam Le-Nguyen, Dinh-Thang Duong

"A Vietnamese version of the VQA-X dataset to support research in Visual Question Answering with Natural Language Explanations. It uses an automated translation pipeline with multiple LLMs to generate and select high-quality Vietnamese translations. Experiments show that NLX-GPT achieves state-of-the-art performance on the new ViVQA-X dataset."

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Paper Code

AAAI WorkshopPublication

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Khoi Anh Nguyen, Linh Yen Vu, Thang Dinh Duong, Thuan Nguyen Duong, Huy Thanh Nguyen, Vinh Quang Dinh

"The paper proposes a Vietnamese VQA training framework using paraphrase-based augmentation and dynamic curriculum learning. Augmented samples are treated as easy and raw samples as hard, with the training ratio gradually adjusted to increase difficulty. Results show improvements on OpenViVQA and mixed results on ViVQA."

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Paper Code

2024

1 Results Found

AICI2024Publication

Heterogeneous Transfer Learning Using Pre-trained Feature Mapping and Exchange

Nguyen Thuan Duong, Dinh Thang Duong, Hong Phuc Nguyen, and Quang Vinh Dinh

"We propose TLVFC, a transfer learning method that enables knowledge transfer between neural networks with different architectures. It initializes convolutional layers via variance-aligned matching of pre-trained weights and fully connected layers using weight distributions, with a feature exchange mechanism during fine-tuning."

Heterogeneous Transfer Learning Using Pre-trained Feature Mapping and Exchange

Paper Code

Ours Research Journey

2025

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

Describe Anything Model for Visual Question Answering on Text-rich Images

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

2024

Heterogeneous Transfer Learning Using Pre-trained Feature Mapping and Exchange

Ours
Research Journey