ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
Tue-Thu Van-Dinh, Hoang-Duy Tran, Truong-Binh Duong, Mai-Hanh Pham, Binh-Nam Le-Nguyen, Quoc-Thai Nguyen
"The paper introduces ViInfographicVQA, the first Vietnamese benchmark for Infographic Visual Question Answering, containing 6,747 infographics and 20,409 verified QA pairs across multiple domains. It includes two tasks: Single-image VQA and Multi-image VQA, the latter requiring reasoning across multiple related infographics. Experiments show significant performance gaps in current vision-language models, especially for cross-image reasoning tasks."




