Publications Details

Publication Details

Learning Efficient Information Extraction on Heterogeneous Texts

authored by
Henning Wachsmuth, Benno Stein, Gregor Engels

From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

External Organisation(s)
Paderborn University
Bauhaus-Universität Weimar
Conference contribution
No. of pages
Publication date
Publication status
ASJC Scopus subject areas
Artificial Intelligence, Software
Electronic version(s) (Access: Open)