{rfName}
Sa

Indexed in

License and use

Altmetrics

Analysis of institutional authors

Cuesta-Infante AAuthor

Share

October 10, 2022
Publications
>
Proceedings Paper
No

Sample, estimate, tune: Scaling Bayesian auto-tuning of data science pipelines

Publicated to:Proceedings - 2017 International Conference On Data Science And Advanced Analytics, Dsaa 2017. 2018-January 361-372 - 2017-01-01 2018-January(), DOI: 10.1109/DSAA.2017.82

Authors: Anderson, Alec; Dubois, Sebastien; Cuesta-Infante, Alfredo; Veeramachaneni, Kalyan

Affiliations

MIT, LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA - Author
MIT, LIDS, Cambridge, MA, United States - Author
Stanford Univ, Palo Alto, CA 94304 USA - Author
Stanford University, Palo Alto, CA, United States - Author
Univ Ray Juan Carlos, Madrid, Spain - Author
Universidad Ray Juan Carlos, Madrid, Spain - Author
See more

Abstract

In this paper, we describe a system for sequential hyperparameter optimization that scales to work with complex pipelines and large datasets. Currently, the state-of-the-art in hyperparameter optimization improves on randomized and grid search by using sequential Bayesian optimization to explore the space of hyperparameters in a more informed way. These methods, however, are not scalable, as the entire data science pipeline still must be evaluated on all the data. By designing a sub sampling based approach to estimate pipeline performance, along with a distributed evaluation system, we provide a scalable solution, which we illustrate using complex image and text data pipelines. For three pipelines, we show that we are able to gain similar performance improvements, but by computing on substantially less data. © 2017 IEEE.

Keywords

Advanced analyticsBayesian optimizationDistributed evaluationHyper-parameter optimizationsHyperparametersLarge datasetLarge datasetsOptimizationPipeline performancePipelinesScalable solutionState of the art

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

Independientemente del impacto esperado determinado por el canal de difusión, es importante destacar el impacto real observado de la propia aportación.

Según las diferentes agencias de indexación, el número de citas acumuladas por esta publicación hasta la fecha 2025-08-02:

  • WoS: 4
  • Scopus: 7

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-08-02:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 18.
  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 19 (PlumX).

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 3.

Leadership analysis of institutional authors

This work has been carried out with international collaboration, specifically with researchers from: United States of America.