Publication Details

Constructing Efficient Information Extraction Pipelines

authored by
Henning Wachsmuth, Benno Stein, Gregor Engels
Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

External Organisation(s)
Paderborn University
Bauhaus-Universität Weimar
Type
Conference contribution
Pages
2237-2240
No. of pages
4
Publication date
10.2011
Publication status
Published
ASJC Scopus subject areas
Decision Sciences(all), Business, Management and Accounting(all)
Electronic version(s)
https://doi.org/10.1145/2063576.2063935 (Access: Closed)