The integration of data in the data warehouses

Author - Natalia Shakhovska, Ph.D.

The need for data integration occurs because of the heterogeneity of software environment, the distributed nature of the organization, higher requirements for security, the need for multi-listings of metadata, the need for efficient storage and processing of large amounts of information.
The integration of data is the unification of data, which first are led into various systems. Most of these systems can be in one local network, but have different platforms and internal architecture.
The purpose of integration is the unified and coherent picture of corporate data about the subject area. The integration of data can be described using the model, which includes applications, products, technologies and methods.
There are three main methods of data integration: consolidation, federalization and distribution.
The consolidation of data is collection of data from the remote data sources into the data warehouse with the aim of their further processing and analysis.
One of the most common technologies for the support of consolidation in the data warehousing is the ETL technology (extract, transform and load).
Another common technology of data consolidation is ECM (enterprise content management). Most ECM solutions are directed on the consolidation and management of unstructured data, such as documents, reports and web-pages.
Federalization of data provides a single virtual picture of one or more primary data files.
The process of data federalization always consists of the extraction of data from primary systems on the basis of external requirements.
All necessary data conversions are performed during their extraction from the primary files. An example of federalization is the integration of corporate information (EII).

The usage of data extension is carried out by copying data from one place to another. These usages work on-line and move data to the destination.
They are dependent on certain events. Updates in the primary system can be transmitted to the final system synchronously or asynchronously.
Examples of technologies that support the dissemination of data are the Enterprise application integration (EAI) and the Enterprise data replication (EDR). The methods that are used by applications of data integration depend on both the needs of business and technological requirements.
Very often the application of data integration uses the so-called hybrid approach, which includes several methods of integration.
An example of this approach is customer data integration (CDI); its goal is to provide a coherent picture of information about customers.