FUSION
FUnctionality Sharing In Open eNvironments
Heinz Nixdorf Chair for Distributed Information Systems
 

Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles

Title: Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles
Authors: Sheeba Samuel, Frank Löffler and Birgitta König-Ries
Source: Provenance Week 2020
Place: Charlotte, North Carolina, USA
Date: 2020-06-22
Type: Publication
Abstract:

Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are reproducible. Unfortunately, that often is not the case. Rather, ML, similar to many other disciplines, faces a reproducibility crisis. In this paper, we describe our goals and initial steps in supporting the end-to-end reproducibility of ML pipelines. We investigate which factors beyond the availability of source code and datasets influence reproducibility of ML experiments. We propose ways to apply FAIR data practices to ML workflows. We present our preliminary results on the role of our tool, ProvBook, in capturing and comparing provenance of ML experiments and their reproducibility using Jupyter Notebooks.

File: Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles
URL: http://arxiv.org/abs/2006.12117
BibTex:
@article{DBLP:journals/corr/abs-2006-12117,
  author    = {Sheeba Samuel and
               Frank L{\"{o}}ffler and
               Birgitta K{\"{o}}nig{-}Ries},
  title     = {Machine Learning Pipelines: Provenance, Reproducibility and {FAIR}
               Data Principles},
  journal   = {CoRR},
  volume    = {abs/2006.12117},
  year      = {2020},
  url       = {https://arxiv.org/abs/2006.12117},
  archivePrefix = {arXiv},
  eprint    = {2006.12117},
  timestamp = {Tue, 23 Jun 2020 17:57:22 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2006-12117.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}