3260 papers • 126 benchmarks • 313 datasets
Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration: Dong, Srivastava: Big data integration, 2013. Doan, Halevy, Ives: Principles of Data Integration, 2012.
(Image credit: Open Source)
These leaderboards are used to track progress in data-integration-38
No benchmarks available.
Use these libraries to find data-integration-38 models and implementations
No datasets available.
Adding a benchmark result helps the community track progress.