Learning Efficient Information Extraction on Heterogeneous Texts

verfasst von: Henning Wachsmuth, Benno Stein, Gregor Engels
Abstract: From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
Externe Organisation(en): Universität Paderborn
Bauhaus-Universität Weimar
Typ: Aufsatz in Konferenzband
Seiten: 534-542
Anzahl der Seiten: 9
Publikationsdatum: 10.2013
Publikationsstatus: Veröffentlicht
ASJC Scopus Sachgebiete: Artificial intelligence, Software
Elektronische Version(en): https://aclanthology.org/I13-1061 (Zugang: Offen)

BibTeX

@inproceedings{ab6dad213f4e4839839a721844617d24,
title = "Learning Efficient Information Extraction on Heterogeneous Texts",
abstract = "From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
note = "Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.; 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 ; Conference date: 14-10-2013 Through 18-10-2013",
year = "2013",
month = oct,
language = "English",
pages = "534--542",
editor = "Ruslan Mitkov and Park, {Jong C.}",
booktitle = "Proceedings of the Sixth International Joint Conference on Natural Language Processing",
publisher = "Asian Federation of Natural Language Processing",
}

Details zu Publikationen

Learning Efficient Information Extraction on Heterogeneous Texts

Gefördert vom