A Data-driven Approach for Core Biodiversity Ontology Development.
A deep learning-based approach for segmenting and counting reproductive organs from digitized herbarium specimen images using refined Mask Scoring R-CNN
A Test Collection for Dataset Retrieval in Biodiversity Research
BEXIS2: A FAIR-aligned data management system for biodiversity, ecology and environmental data
BiodivOnto: Towards a Core Ontology for Biodiversity
Building high-quality merged ontologies from multiple sources with requirements customization
Capturing and Semantically Describing Provenance to Tell the Story of R Scripts
Comprehensive leaf size traits dataset for seven plant species from digitised herbarium specimen images covering more than two centuries
Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?
Deep leaf: Mask R-CNN based leaf detection and segmentation from digitized herbarium specimen images
ISTMINER: Interactive Spatiotemporal Co-occurrence Pattern Extraction: A Biodiversity case study
Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles.
PhenoDeep: A Deep Learning-Based Approach for Detecting Reproductive Organs from Digitized Herbarium Specimen Images
ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks
Results of the Ontology Alignment Evaluation Initiative 2021
Towards an Ontology Network for the reproducibility of scientific studies
Towards Scientific Data Synthesis Using Deep Learning and Semantic Web
Towards Tracking Provenance from Machine Learning Notebooks
Understanding experiments and research practices for reproducibility: an exploratory study
[Dai:Si] – A Modular Dataset Retrieval Framework with a Semantic Search for Biological Data
Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles.
Title: | Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles. |
---|---|
Authors: | Sheeba Samuel, Frank Löffler, Birgitta König-Rie |
Source: | Provenance Week 2021 |
Place: | Provenance and Annotation of Data and Processes - 8th and 9th International Provenance and Annotation Workshop, IPAW 2020 + IPAW 2021, Virtual Event, July 19-22, 2021 |
Date: | 2021-07-19 |
Type: | Publication |
Abstract: |
Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are reproducible. Unfortunately, that often is not the case. Rather, ML, similar to many other disciplines, faces a reproducibility crisis. In this paper, we describe our goals and initial steps in supporting the end-to-end reproducibility of ML pipelines. We investigate which factors beyond the availability of source code and datasets influence reproducibility of ML experiments. We propose ways to apply FAIR data practices to ML workflows. We present our preliminary results on the role of our tool, ProvBook, in capturing and comparing provenance of ML experiments and their reproducibility using Jupyter Notebooks. We also present the ReproduceMeGit tool to analyze the reproducibility of ML pipelines described in Jupyter Notebooks. |
URL: | https://doi.org/10.1007/978-3-030-80960-7_17 |
BibTex: |
@InProceedings{samuel2021machine, author="Samuel, Sheeba and L{\"o}ffler, Frank and K{\"o}nig-Ries, Birgitta", editor="Glavic, Boris and Braganholo, Vanessa and Koop, David", title="Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles", booktitle="Provenance and Annotation of Data and Processes", year="2021", publisher="Springer International Publishing", address="Cham", pages="226--230", isbn="978-3-030-80960-7" } |