Details zu Publikationen

Learning Efficient Information Extraction on Heterogeneous Texts

verfasst von
Henning Wachsmuth, Benno Stein, Gregor Engels

From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

Externe Organisation(en)
Universität Paderborn
Bauhaus-Universität Weimar
Aufsatz in Konferenzband
Anzahl der Seiten
ASJC Scopus Sachgebiete
Artificial intelligence, Software
Elektronische Version(en) (Zugang: Offen)