DARE: Diverse Visual Question Answering with Robustness Evaluation

Published in Transactions of the Association for Computational Linguistics (2025), 2024

To couple challenging VL scenarios with comprehensive robustness evaluation, we introduce DARE, Diverse Visual Question Answering with Robustness Evaluation, a carefully created and curated multiple-choice VQA benchmark. DARE evaluates VLM performance on five diverse categories and includes four robustness-oriented evaluations based on the variations of: prompts, the subsets of answer options, the output format and the number of correct answers.

Full Paper

Share on

Twitter Facebook LinkedIn

Hannah Sterz

Share on