A virtual “Werkstatt” for digitization in the sciences
AgriSem
AD_INFRA1
Arbeitsgruppe offenes Design digitaler Verwaltungsarchitekturen
Microverse
BExIS++
BExIS
Facilitating Semantic Data Interoperability and Integration for Citizen Science Sensor Data
GFBio
iBid
Leveraging Knowledge Graphs for iDiv and Biodiversity
INAS
Jena Data Center (JDC)
NFDI4Biodiversity
Semantic Annotations for Building a Reproducible and Interoperable Solution for End-to-End Machine Learning Pipelines
Semantic Description and (semi-) Automatic Annotation of Citizen Science Data
ThurAI
Wissensmodellierung und semantische Vernetzung heterogener Daten und Anwendungen
QUIS: Query Heterogeneous Data In-Situ
Startdate: 2011-09-01 Finishdate: 2017-12-31 Status: completed |
Member: Former Member: |
Description
Data of interest are often found in a variety of data sources, many of which are not relational databases, but have their own data organization and query capabilities. To answer questions of interest, one has to run queries across data from these heterogeneous sources. The traditional approach is to perform multiple individual data transformation tasks, one per data source, to import the data into a common repository where they can be queried and analyzed. Drawbacks of this approach include the manual effort and the cost of transforming and importing potentially large data sets, and the lost opportunity to exploit any query facilities provided by the data sources.
QUIS (QUery In-Situ) proposes an approach for querying the data “in-situ” to the greatest extent possible, by taking the user query and transforming appropriate portions of it into corresponding query expressions on individual data sources. Realizing this approach requires the development of a unified query model. This model can extract sub-queries matching heterogeneous capabilities of individual sources, perform heterogeneous joins on intermediate results as necessary, and deal with barriers such as incompatible type systems en route.
Early experiments have shown that QUIS almost eliminates the time to prepare the data while paying only a small cost in query execution time compared to a fully integrated, indexed, and loaded relational database.
The system is an open source project maintained on Github. It consists of a query execution engine, GUI and command line based clients, and an R package, RQUIS, to provide the functionality from inside R.
A set of executable versions and their documentations are available online. Also, a Docker image is provided for easy installation on Linux machines as well as on private or public clouds. You can pull the image by issuing this command on your terminal: docker pull javadch/rquis