Navigation
Data-Mining: Assoziations-Algorithmen und die Auswirkungen auf generierte Regeln im Bereich der Biodiversität
Disambiguation of Ontological Concepts for Semantic Dataset Annotation
Exploratory Semantic Dataset Search (taken)
Reproducibility of Machine Learning Experiments given the provenance data (taken)
Tracking Provenance in Machine Learning Scripts (taken)
Tracking Provenance in Machine Learning Scripts (taken)
Title: | Tracking Provenance in Machine Learning Scripts (taken) |
---|---|
Author(s): | Dominik Kerzel |
Supervisor(s): | Dr. Sheeba Samuel, Prof. Dr. Birgitta König-Ries |
School: | Friedrich-Schiller-Universität Jena |
Thesis Type: | Bachelor |
Date: | 2021-04-01 |
Abstract: | According to the Oxford Dictionary, provenance is defined as “the source or origin of an object; its history or pedigree”. Provenance of a data product is its description along with the explanation of how and why it got to the current state. Machine Learning (ML) is an emerging tool currently being applied in various application areas including medicine, computer vision, security, privacy, etc. The tremendous growth of data and scripts requires the need for provenance tracking from Machine Learning Scripts. The task in this thesis is how to automatically identify the relationships between data and ML models from the scripts. How to track which datasets and columns have been used to derive the features of a ML model? The developed solution will track and create a report on the provenance of the Machine Learning Scripts. As a start, the solution will capture information from Python scripts. The developed solution will be evaluated with the Machine Learning Scripts available in GitHub. |