Variational Open-Domain Question Answering

Research output: Contribution to journal › Conference article › Research › peer-review

Valentin Liévin
Andreas Geert Motzfeldt
Ida Riis Jensen
Winther, Ole

Retrieval-augmented models have proven to be effective in natural language processing tasks, yet there remains a lack of research on their optimization using variational inference. We introduce the Variational Open-Domain (VOD) framework for end-to-end training and evaluation of retrieval-augmented models, focusing on open-domain question answering and language modelling. The VOD objective, a self-normalized estimate of the Rényi variational bound, approximates the task marginal likelihood and is evaluated under samples drawn from an auxiliary sampling distribution (cached retriever and/or approximate posterior). It remains tractable, even for retriever distributions defined on large corpora. We demonstrate VOD's versatility by training reader-retriever BERT-sized models on multiple-choice medical exam questions. On the MedMCQA dataset, we outperform the domain-tuned Med-PaLM by +5.3% despite using 2.500× fewer parameters. Our retrieval-augmented BioLinkBERT model scored 62.9% on the MedMCQA and 55.0% on the MedQA-USMLE. Last, we show the effectiveness of our learned retriever component in the context of medical semantic search.

Original language	English
Journal	Proceedings of Machine Learning Research
Volume	202
Pages (from-to)	20950-20977
Number of pages	28
ISSN	2640-3498
Publication status	Published - 2023
Event	40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States Duration: 23 Jul 2023 → 29 Jul 2023

Conference

Conference	40th International Conference on Machine Learning, ICML 2023
Country	United States
City	Honolulu
Period	23/07/2023 → 29/07/2023

Department of Biology

Variational Open-Domain Question Answering

Conference

Bibliographical note

Links