SciQL: A Query Language for Unified Scientific Data Processing and Management

Title: SciQL: A Query Language for Unified Scientific Data Processing and Management
Authors: Javad Chamanara, Birgitta König-Ries
Source: PIKM '12, Proceedings of the 5th Ph.D. workshop on Information and knowledge
Place: ACM, New York, USA
Date: 2012-11-01
Type: Conference Paper
Abstract:

Science is more and more data-driven. This means, that a significant part of a scientist’s work is dedicated to accessing, visualizing, integrating and analyzing data from a possibly wide range of heterogeneous sources. In this paper we propose SciQL, a query language that supports scientists in this task and allows them to focus on their main purpose, i.e., on doing research.

SciQL sits between scientists or data processing tools on the one hand and different data sources on the other hand in order to decouple users from technical aspects of accessing data. It allows users to express their data management, refinement, transformation, processing procedures and visualizations in SciQL regardless of the syntax and capabilities of the underlying physical data source sources. This way scientists and client tools deal with only one language to interact with different data sources, e.g., text files, spreadsheets, relational DBMSs, or MapReduce systems. To achieve this, SciQL provides various constructs among them Schema Definition, (e.g., schema design and Data transformation), Data Retrieval (connecting to various data sources and formats, filtering, joining, grouping), Data Manipulation (e.g. Updating, deleting, versioning and provenance) and Visualization commands and data structures can be named.

In this paper, we will discuss the general idea why we believe SciQL is needed, and explain the goals and the steps we intend to take in order to achieve these aims.