FUSION
FUnctionality Sharing In Open eNvironments
Heinz Nixdorf Chair for Distributed Information Systems
 

Reproducibility of Machine Learning Experiments given the provenance data (taken)

Title: Reproducibility of Machine Learning Experiments given the provenance data (taken)
Author(s): Tarek Al Mustafa
Supervisor(s): Dr. Sheeba Samuel, Prof. Dr. Birgitta König-Ries
School: Friedrich-Schiller-Universität Jena
Thesis Type: Bachelor
Date: 2021-05-17
Abstract: According to the Oxford Dictionary, provenance is defined as “the source or origin of an object; its history or pedigree”. Provenance of a data product is its description along with the explanation of how and why it got to the current state. Machine Learning (ML) is an emerging tool currently being applied in various application areas including medicine, computer vision, security, privacy, etc. Reproducibility of Machine Learning Experiments given the provenance data including model specifications, hyperparameters, etc. The task in this project is to understand and develop a solution to reproduce Machine Learning experiments from the provenance data which includes model specifications, hyperparameters, etc. The task involves to create a checklist/data model which specifies the mandatory requirements for reproducing ML experiments. Based on this data model, the solution generates the code (for e.g. Tensorflow, Pytorch) from the provenance data/model specifications. The code is then run and reproduced. The result from the generated code is then compared among different libraries for machine learning.