Data Integration

Maintain high quality and cross referential data to provide holistic data insights

What is Data Integration?

It is the joint venture of technical and business processes in which disparate data source are combined to provide meaningful information to the business user helping them take right business decisions. The technology of these disparate sources may differ from system to system.

What is the need of Data Integration?

Wikipedia defines data integration systems as a triple {G, S, M} where G is the global (or mediated) schema, S is the heterogeneous set of source schemas, and M is the mapping that maps queries between the source and the global schemas. Both G and S are expressed in languages over alphabets composed of symbols for each of their respective relations. The mapping M consists of assertions between queries over G and queries over S. When users pose queries over the data integration system, they pose queries over G and the mapping then asserts connections between the elements in the global schema and the source schemas.

A data integration solution syndicates new sources and provides a unified view of the data to draw data visualization.

Use case for Data Integration

An asset management firm has data related to asset for building and unit level information. This information comes from different sources and in varied formats such as text and excel files. This data needs to be integrated into a single destination so that reporting and visualization is possible. An SSIS package for ETL to load fact and dimension tables to staging, work layer and target layer (in azure) included functionality aimed at cleansing, transforming, and mapping the data, as well as monitoring the integration flow itself (error handling, reporting, etc.).