Sample, estimate, tune: Scaling Bayesian auto-tuning of data science pipelines

Indexed in

License and use

Citations

Cited 7 times in Scopus logo

Cited 4 times in Web of Science logo

Altmetrics

Analysis of institutional authors

Cuesta-Infante AAuthor

October 10, 2022

Publications

Proceedings Paper

Sample, estimate, tune: Scaling Bayesian auto-tuning of data science pipelines

Publicated to:Proceedings - 2017 International Conference On Data Science And Advanced Analytics, Dsaa 2017. 2018-January 361-372 - 2017-01-01 2018-January(), DOI: 10.1109/DSAA.2017.82

Authors: Anderson, Alec; Dubois, Sebastien; Cuesta-Infante, Alfredo; Veeramachaneni, Kalyan

Affiliations

MIT, LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA - Author

MIT, LIDS, Cambridge, MA, United States - Author

Stanford Univ, Palo Alto, CA 94304 USA - Author

Stanford University, Palo Alto, CA, United States - Author

Univ Ray Juan Carlos, Madrid, Spain - Author

Universidad Ray Juan Carlos, Madrid, Spain - Author

Abstract

In this paper, we describe a system for sequential hyperparameter optimization that scales to work with complex pipelines and large datasets. Currently, the state-of-the-art in hyperparameter optimization improves on randomized and grid search by using sequential Bayesian optimization to explore the space of hyperparameters in a more informed way. These methods, however, are not scalable, as the entire data science pipeline still must be evaluated on all the data. By designing a sub sampling based approach to estimate pipeline performance, along with a distributed evaluation system, we provide a scalable solution, which we illustrate using complex image and text data pipelines. For three pipelines, we show that we are able to gain similar performance improvements, but by computing on substantially less data. © 2017 IEEE.

Keywords

Advanced analyticsBayesian optimizationDistributed evaluationHyper-parameter optimizationsHyperparametersLarge datasetLarge datasetsOptimizationPipeline performancePipelinesScalable solutionState of the art

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-08-02:

WoS: 4
Scopus: 7

Impact and social visibility

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: United States of America.

Indexed in

License and use

Citations

Altmetrics

Analysis of institutional authors

Share

Sample, estimate, tune: Scaling Bayesian auto-tuning of data science pipelines

Affiliations

Abstract

Keywords

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Impact and social visibility

Leadership analysis of institutional authors