Constructing Efficient Information Extraction Pipelines

verfasst von
Henning Wachsmuth, Benno Stein, Gregor Engels

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

Externe Organisation(en)
Universität Paderborn
Bauhaus-Universität Weimar
Aufsatz in Konferenzband
Anzahl der Seiten
ASJC Scopus Sachgebiete
Entscheidungswissenschaften (insg.), Betriebswirtschaft, Management und Rechnungswesen (insg.)
