Publications

DARE: Diverse Visual Question Answering with Robustness Evaluation

Published in arxiv preprint, 2024

To couple challenging VL scenarios with comprehensive robustness evaluation, we introduce DARE, Diverse Visual Question Answering with Robustness Evaluation, a carefully created and curated multiple-choice VQA benchmark. DARE evaluates VLM performance on five diverse categories and includes four robustness-oriented evaluations based on the variations of: prompts, the subsets of answer options, the output format and the number of correct answers.

M2QA: Multi-domain Multilingual Question Answering

Published in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

We introduce M2QA, a multi-domain multilingual question answering benchmark. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. We use M2QA to explore cross-lingual cross-domain performance of fine-tuned models and state-of-the-art LLMs and investigate modular approaches to domain and language adaptation.

Hannah Sterz

Publications

DARE: Diverse Visual Question Answering with Robustness Evaluation

M2QA: Multi-domain Multilingual Question Answering

Scaling Sparse Fine-Tuning to Large Language Models

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

UKP-SQUARE: An Online Platform for Question Answering Research.