Welcome to VLAI

Vision and Language in AI (VLAI), a Vietnamese research group cultivating a deeper harmony between multimodal perception and human language in AI.

[Describe the image: A lab where vision meets language. Analyze the future of AI... Cross-modal reasoning in progress...][Describe the image: A lab where vision meets language. Analyze the future of AI... Cross-modal reasoning in progress...][Describe the image: A lab where vision meets language. Analyze the future of AI... Cross-modal reasoning in progress...]

WHAT WE DO

Explore our latest experiments, technical updates, and public releases.
A collective dedicated to advancing AI in vision-language research through open-source development.

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

AAAI Workshop2025

Tue-Thu Van-Dinh, Hoang-Duy Tran, Truong-Binh Duong, Mai-Hanh Pham, Binh-Nam Le-Nguyen, Quoc-Thai Nguyen

The paper introduces ViInfographicVQA, the first Vietnamese benchmark for Infographic Visual Question Answering, containing 6,747 infographics and 20,409 verified QA pairs across multiple domains. It includes two tasks: Single-image VQA and Multi-image VQA, the latter requiring reasoning across multiple related infographics. Experiments show significant performance gaps in current vision-language models, especially for cross-image reasoning tasks.

View Publication →

Special thanks to AI VIET NAM for empowering our team.

About AI VIET NAM
VLAI Logo
VLAI Research

Vision and Language in AI (VLAI), a Vietnamese research group cultivating a deeper harmony between multimodal perception and human language in AI.

Hugging Face

Contact

Computer Science Dept.

Research Laboratory

[email protected]

© 2026 VLAI Research Group. All rights reserved.