Information Extraction as a Filtering Task

verfasst von: Henning Wachsmuth, Benno Stein, Gregor Engels
Abstract: Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).
Externe Organisation(en): Universität Paderborn
Bauhaus-Universität Weimar
Typ: Aufsatz in Konferenzband
Seiten: 2049-2058
Anzahl der Seiten: 10
Publikationsdatum: 27.10.2013
Publikationsstatus: Veröffentlicht
ASJC Scopus Sachgebiete: Entscheidungswissenschaften (insg.), Betriebswirtschaft, Management und Rechnungswesen (insg.)
Elektronische Version(en): https://doi.org/10.1145/2505515.2505557 (Zugang: Geschlossen)

BibTeX

@inproceedings{d4f0797e60284448ae48e67f8398743e,
title = "Information Extraction as a Filtering Task",
abstract = "Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).",
keywords = "Filtering, Information extraction, Relevance, Run-time efficiency, Truth maintenance",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
year = "2013",
month = oct,
day = "27",
doi = "10.1145/2505515.2505557",
language = "English",
isbn = "9781450322638",
pages = "2049--2058",
booktitle = "CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",
note = "22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 ; Conference date: 27-10-2013 Through 01-11-2013",
}

Details zu Publikationen

Information Extraction as a Filtering Task

Gefördert vom