Publications

ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance

Published in Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

As they become increasingly multilingual, Large Language Models exhibit more language confusion, i.e., they tend to generate answers in a language different from the language of the prompt or the answer language explicitly requested by the user. In this work, we propose ReCoVeR (REducing language COnfusion in VEctor Representations), a novel lightweight approach for reducing language confusion based on language-specific steering vectors.

DARE: Diverse Visual Question Answering with Robustness Evaluation

Published in Transactions of the Association for Computational Linguistics (2025), 2024

To couple challenging VL scenarios with comprehensive robustness evaluation, we introduce DARE, Diverse Visual Question Answering with Robustness Evaluation, a carefully created and curated multiple-choice VQA benchmark. DARE evaluates VLM performance on five diverse categories and includes four robustness-oriented evaluations based on the variations of: prompts, the subsets of answer options, the output format and the number of correct answers.

M2QA: Multi-domain Multilingual Question Answering

Published in Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

We introduce M2QA, a multi-domain multilingual question answering benchmark. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. We use M2QA to explore cross-lingual cross-domain performance of fine-tuned models and state-of-the-art LLMs and investigate modular approaches to domain and language adaptation.

UKP-SQUARE: An Online Platform for Question Answering Research.

Published in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2022

We present UKP-SQUARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.