We propose a new setup for evaluating existing knowledge intensive tasks in which we generalize the background corpus to a universal web snapshot. We investigate a slate of NLP tasks which rely on knowledge - either factual or common sense, and ask systems to use a subset of CCNet—the S PHERE corpus—as a knowledge source. In contrast to Wikipedia, otherwise a common background corpus in KI-NLP, S PHERE is orders of magnitude larger and better reflects the full diversity of knowledge on the web.
2021: Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Ouguz, Edouard Grave, Wen-tau Yih, Sebastian Riedel
https://arxiv.org/pdf/2112.09924v2.pdf
view more