FUnctionality Sharing In Open eNvironments
Heinz Nixdorf Chair for Distributed Information Systems
Title: Informationsintegration
Lecturers: Alsayed Algergawy
Starts on: 2021-10-18
Ends on: 2022-02-11
Time and Location:
Website: Link


Data integration is a process that combines data from several disparate data sources. It becomes one of the key challenges within most IT projects. Since these data sources are independently engineered and developed by different people, so, they contain a large number of heterogeneities. In order to provide a unified view of these data sources, we should deal with different kinds of heterogeneities.

In this course, students will learn techniques and methodologies for integrating data from large sets of heterogeneous data sources. The course will cover the following topics:
– Importance of data integration
– Physical and virtual data integration
– data and semantic heterogeneities
– String, schema, and data matching
– Web data integration


– AnHai Doan, Alon Halevy, Zachary Ives: Principles of Data Integration. Morgan Kaufmann, 2012.
– Ulf Leser, Felix Naumann: Informationsintegration. Dpunkt Verlag, 2007.
– Luna Dong, Divesh Srivastava: Big Data Integration. Morgan & Claypool, 2015.
– Serge Abiteboul, et al: Web Data Management. Cambridge University Press, 2012.
– Jérôme Euzenat, Pavel Shvaiko: Ontology Matching. Springer, 2007.
– Felix Naumann: An Introduction to Duplicate Detection. Morgan & Claypool, 2012.

Software and tools
Data Matching software
Web Data INTEgRation Framework (WInte.r)
Minoan: Entity Resolution (ER) framework
HoloClean: A Machine Learning System for Data Enrichment