Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning

authored by: Felix Mohr, Marcel Wever, Alexander Tornede, Eyke Hüllermeier
Abstract: Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.
External Organisation(s): Paderborn University
Type: Article
Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume: 43
Pages: 3055-3066
No. of pages: 12
ISSN: 0162-8828
Publication date: 01.09.2021
Publication status: Published
Peer reviewed: Yes
ASJC Scopus subject areas: Software, Computer Vision and Pattern Recognition, Computational Theory and Mathematics, Artificial Intelligence, Applied Mathematics
Electronic version(s): https://doi.org/10.1109/tpami.2021.3056950 (Access: Closed)

BibTeX

@article{49316f6e9f9d449f846952458027bee8,
title = "Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning",
abstract = "Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.",
keywords = "Automated machine learning, hierarchical runtime prediction, runtime prediction for classifiers and pipelines",
author = "Felix Mohr and Marcel Wever and Alexander Tornede and Eyke H{\"u}llermeier",
note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",
year = "2021",
month = sep,
day = "1",
doi = "10.1109/tpami.2021.3056950",
language = "English",
volume = "43",
pages = "3055--3066",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",
number = "9",
}

Publication Details

Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning

Funded by