Publications

DARE: Diverse Visual Question Answering with Robustness Evaluation

Published in arxiv preprint, 2024

To couple challenging VL scenarios with comprehensive robustness evaluation, we introduce DARE, Diverse Visual Question Answering with Robustness Evaluation, a carefully created and curated multiple-choice VQA benchmark. DARE evaluates VLM performance on five diverse categories and includes four robustness-oriented evaluations based on the variations of: prompts, the subsets of answer options, the output format and the number of correct answers.

M2QA: Multi-domain Multilingual Question Answering

Published in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

We introduce M2QA, a multi-domain multilingual question answering benchmark. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. We use M2QA to explore cross-lingual cross-domain performance of fine-tuned models and state-of-the-art LLMs and investigate modular approaches to domain and language adaptation.

UKP-SQUARE: An Online Platform for Question Answering Research.

Published in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2022

We present UKP-SQUARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.