Learning Efficient Information Extraction on Heterogeneous Texts
- authored by
- Henning Wachsmuth, Benno Stein, Gregor Engels
- Abstract
From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
- External Organisation(s)
-
Paderborn University
Bauhaus-Universität Weimar
- Type
- Conference contribution
- Pages
- 534-542
- No. of pages
- 9
- Publication date
- 10.2013
- Publication status
- Published
- ASJC Scopus subject areas
- Artificial Intelligence, Software
- Electronic version(s)
-
https://aclanthology.org/I13-1061 (Access:
Open)