FUnctionality Sharing In Open eNvironments
Heinz Nixdorf Chair for Distributed Information Systems

A Provenance-based Semantic Approach to Support Understandability, Reproducibility, and Reuse of Scientific Experiments

Title: A Provenance-based Semantic Approach to Support Understandability, Reproducibility, and Reuse of Scientific Experiments
Author(s): Sheeba Samuel
Supervisor(s): Birgitta König-Ries
Thesis Type: PhD Dissertation
Publication Place: Friedrich Schiller University Jena
Date: 2019-12-20
Abstract: Understandability and reproducibility of scientific results are vital in every field of science. The scientific community is interested in the results of experiments which are understandable, reproducible and reusable. Recently, there is a rapidly growing awareness in different scientific disciplines on the importance of reproducibility. Several reproducibility measures are being taken to make the data used in the publications findable and accessible. However, these measures are usually taken when the papers are published online. But, there are many challenges faced by scientists from the beginning of an experiment to the end in particular for data management. The explosive growth of heterogeneous research data and understanding how this data has been derived is one of the research problems faced in this context. Provenance, which describes the origin of data, plays a key role to tackle this problem by helping scientists to understand how the results are derived. Interlinking the data, the steps and the results from the computational and non-computational processes of a scientific experiment is important for the reproducibility. The lack of tools which address this requirement fully is the driving force behind this research work. Working towards this goal, we introduce the notion of “end-to-end provenance management” of scientific experiments to help scientists understand and reproduce the experimental results. The main contributions of this thesis are: (1) We propose a provenance model ”REPRODUCE-ME” to describe the scientific experiments using semantic web technologies by extending existing standards. (2) We study computational reproducibility and important aspects required to achieve it. (3) Taking into account the REPRODUCE-ME provenance model and the study on computational reproducibility, we introduce our tool, ProvBook, which is designed and developed to demonstrate computational reproducibility. It provides features to capture and store provenance of Jupyter notebooks and helps scientists to compare and track their results of different executions. (4) We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis withReproducibility) for the end-to-end provenance management. This collaborative framework allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational steps in an interoperable way. We apply our contributions to a set of scientific experiments in microscopy research projects.
File: PhDDissertation_SheebaSamuel
URL: https://doi.org/10.22032/dbt.40396