Integrative Data management and processing

Project Description

The overall goal of this project is to provide a knowledge management, sharing and processing platform that will enable the CRC/TR to sustainably store data, to efficiently access and process data as well as to explore new, integrative, and reproducible methods for knowledge generation from data. Such a platform will not only be beneficial to the work of individual researchers, but also crucial for communication of data and knowledge across different projects and/or CRC/TR funding periods. Moreover, it is needed to guarantee the sustainability of the CRC/TR results beyond its funding time. The provisioning of such a platform is challenging due not only to the huge expected data volume, but also the heterogeneity of the data in terms of type and processing needs and the fact, that for most projects, raw data is extensively processed with proprietary software before it is usable to answer research questions.

The main research issues that need to be addressed in the first phase are threefold: (1) The design and implementation of a data storage concept guaranteeing high reliability and scalability to a large amount of data. A hierarchical concept is necessary that takes into account the different requirements of individual projects regarding the efficiency of data access. (2) The design and implementation of a meta database that stores information describing the acquisition, quality, provenance, and interpretation of data. This includes linking to the actual data and processes. (3) The design and implementation of a collaborative platform as part of a virtual research environment that enables the efficient analysis of data across the locations and the participating institutions in Jena and Würzburg. The goal is to make possible this data analysis for a large class of relevant problems in an interactive fashion and to open avenues for interproject exploration and knowledge generation.

Rather than setting up such a platform from scratch, the solution will be based on a state-of-the-art data management platform that already provides a rich set of core functionalities. However, significant research effort is needed for its adaption and extension to meet the specific requirements of the CRC/TR. By considerably extending a popular platform with cutting-edge data management and processing capabilities, the work of this project will be beneficial not only to the CRC/TR, but also to the wider research community.

Beyond the technological development of the platform, another core topic of this project will be the design and implementation of innovative training concepts to ensure a thorough understanding of issues related to data management and processing by the involved CRC/TR scientists.

Over the 12-year course of the CRC/TR, we intend to develop a central software platform to store, manage, and share relevant data and processing information of all projects ensuring reproducibility and sustainability of research results. In addition, the platform will allow for an interactive access and a systematized analysis of the large and heterogeneous data. Thus, this project will not only provide CRC/ TR scientists with a tool to organize and analyze their data, but also foster the exchange of tools and the sharing of knowledge.

Title Year Authors Journal Links
Combining P-Plan and the REPRODUCE-ME ontology to achieve semantic enrichment of scientific experiments using interactive notebooks 2018 Samuel, S. and König-Ries, B. Posters & Demo Track at the 5th Extended Semantic Web Conference (ESWC), Crete, Greece More
The story of an experiment: A provenance-based semantic approach towards research reproducibility 2018 Samuel, S., Groeneveld, K. Taubert, F., Walther, D., Kache, T., Langenstück, T., König-Ries, B., Bücker, H.M., and Biskup, C. 11. Intl. Conf. on Semantic Web Applications and Tools for Health Care and Life Sciences. Antwerp, Belgium. More
ProvBook: Provenance-based semantic enrichment of interactive notebooks for reproducibility 2018 Samuel S. and König-Ries, B. Posters & Demo Track at the 17th International Semantic Web Conference (ISWC), Monterey California More
Towards Reproducibility of Microscopy Experiments 2017 Samuel, S., Taubert, F., Walther, D., könig-Ries, B., and Bücker, H. M. D-Lib Magazine More
REPRODUCE-ME: Ontology-based Data Access for Reproducibility of Microscopy Experiments 2017 Samuel, S., and König-Ries, B. 14th European Semantic Web Symposium (ESWS)
On the reproducibility of biological image workflows by annotating computational results automatically 2017 Taubert, F., and Bücker, H.M. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 1538-1545 More
Automatic differentiation of computer programs in the time and frequency domain 2017 Bücker, H.M., and Walther, D. Proceedings of the 2017 European Conference on Electrical Engineering and Computer Science EECS, Bern, Switzerland, November 17–19, 2017, 335–340, Los Alamitos, CA, USA, 2017. IEEE Computer Society More
Integrative data management for reproducibility of microscopy experiments 2017 Samuel, S. PhD Symposium at 14th Extended Semantic Web Conference (ESWC), Portoroz, Slovenia More
A quality management workflow proposal for a biodiversity data regulation repository 2014 Owonibi, M. and König-Ries, B. Proc. of the 2nd International Workshop on Modeling and Management of Big Data (MOBiD’14) More
RIOS: Efficient I/O in reverse direction 2014 Willkomm, J., Bischof, C. H., and Bücker, H. M. Software: Practice and Experience More
Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data: A Computer Science Perspective on Biodiversity Research 2014 Beckstein, C., Böcker, S., Bogdan, M., Bruehlheide, H., Bücker, H. M., Denzler, J., Dittrich, P., Grosse, I., Hinneburg, H., König-Ries, B., Löffler, F., Marz, M., Müller-Hannemann, M., Winter, M., and Zimmermann, W. Proceedings of the 3rd International Conference on Data Management Technologies and Applications More
A conceptual model for data management in the field of ecology 2014 Chamanara, J. and König-Ries, B. Ecological Informatics More
A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications 2013 Fortmeier, O., Bücker, H. M., Fagginger Auer, B. O., and Bisseling, R. H. Parallel Computing More
Diverse or uniform? Intercomparison of two major German project databases for interdisciplinary functional biodiversity research 2012 Lotz, T., Nieschulze, J., Bendix, J., Dobbermann, M., and König-Ries, B. Ecological Informatics More
Solving a parameter estimation problem in a three-dimensional conical tube on a parallel and distributed software infrastructure 2011 Bücker, H. M., Fortmeier, O., and Petera, M. J Comput Sci More
Parallel re-initialization of level set functions on distributed unstructured tetrahedral grids 2011 Fortmeier, O. and Bücker, H. M. J Comput Phys More
EFCOSS: An interactive environment facilitating optimal experimental design 2010 Rasch, A. and Bücker, H. M. ACM Transactions on Mathematical Software More
Diane: A matchmaking-centered framework for automated service discovery, composition, binding, and invocation on the web 2007 Küster, U., König-Ries, B., Klein, M., and Obreiter, P. International Journal of Electronic Commerce More