It is a centralized location where the data from several sources are integrated. The data gathered here is used in several combinations from different streams of the business for improved planning and critical business decisions.
Where is it used?
You know that a data warehouse is a core component of Business intelligence. It is also called an Enterprise data warehouse (EDW).
It is used for reporting and analysis. It stores old data and also uses real-time data to generate business reports.
Below are the familiar sectors where the data warehouse is used.
- Public region: In this area, the data warehouse is used for collecting intelligence in government offices. It is also used for monitoring and analyzing the health records, tax records of each individual in government offices.
- Bank sector: It helps the banking sector to control and investigate the available resources on desks.
- Hospitality Industries like hotels, restaurants: In this sector warehouse is used for promoting themselves and to attract target customers.
- Health care: In this area, the warehouse helps to generate patient treatment reports.
- Airlines: Here warehouse is used for analyzing the works assigned to the airline crew.
- Insurance: In this sector, the warehouse helps to trace the market fluctuations.
How can a data warehouse benefit an organization?
Subject-oriented: A specific business purpose can be analyzed with the data collected from here.
If the business wants to understand the machine downtime and how it can be reduced then data can be collected from the data warehouse to understand the various times or situations during which the machines stopped working, the reasons behind the same, and how this can be reduced.
Integrated: Data from different sources are integrated to provide collective data. For instance, if a company wants to do budgeting for the next quarter, a data warehouse will have all the information required.
From incurred costs to depreciation costs, the entire set of data is available in one single source.
Time-variant: The historical data stored in the system can be utilized by the company at any time to extract relevant reports and understand the overall organization’s health.
But data such as the employee database which includes addresses, phone numbers must not be included as they are subjected to change.
Non-volatile: Once data is entered it remains the same. It must be ensured by the firm that data is highly protected and there is no change for alteration. If there are any modifications made, then it will affect the reports and analysis.
Improved data quality: Helps to improve data quality by providing consistent, accurate data and fixing bad data.
Cost v/s Benefit: Data warehouse is an IT project and it consumes more man-hours and more money from the budget. Its implementation and maintenance are very expensive.
Hence the cost to benefit ratio is very low. If the organization is small and medium, it may affect the revenue of the organization.
Data ownership: We know that basically, data warehouses are software applications for service. The main concern of it is the security of data.
You have to be more sure about the people who handle and analyze the customer data are the employees that your company trusts.
Because leaking of the customer’s personal data within the organization may cause problems for executives and also affect the relationship between the company and the customer.
Data Rigidity: The data that is imported into the data warehouse is often static data sets that have less flexibility. They have less ability to generate a particular solution.
Warehouses are subjected to ad hoc queries that are highly difficult due to their least processing and query speed.
Miscalculation of ETL processing time: The entire process of data warehouse development, that is extraction, cleaning, and loading of consolidated data into the warehouse takes more time.
But usually, organizations do not guess the time required for the ETL process. It leads to a backlog of works in the organization.
Levels of data warehouse architecure
It comprises several levels. A few of them are as mentioned below:
- Data Source Layer
- Data Extraction Layer
- Staging Area
- ETL Layer
- Data Storage Layer
- Data Logic Layer
- Data Presentation Layer
- Metadata Layer
- System Operations Layer
Types of data warehouse architecture
Mainly 3 types
Single tier architecture: It is rarely used architecture. It reduces the amount of data stored by avoiding repetition.
In this type of architecture, only the source layer is available. The single-tier consists of the source layer, data warehouse layer, and analysis layer.
Two-tier architecture: It consists of a data staging area or ETL (extraction, transformation, and loading)along with the source layer.
This layer helps to merge diversified data into one standard schema. This type of architecture consists of the source layer, data staging layer, data warehouse layer, and analysis layer.
Three-tier architecture: In this architecture contains reconciled layer along with the data staging and source layer.
In this architecture, the source layer contains multiple sources and the data warehouse layer contains both data warehouses and data marts.
The role of a reconciled layer is to generate a standard data model for the entire enterprise. This reconciled layer can also use to do some operational works like reporting.
This architecture consists of the source layer, data staging area, reconciled layer, data warehouse layer, and analysis layer.
Types of data warehouse
Following three are main types of data warehouse
1. Enterprise Data Warehouse (EDW): It helps to provide decision support service throughout the enterprise and also helps to classify data according to the subject.
2. Operational Data Store: It helps to store records of employees.
3. Data Mart: It helps to collect data directly from sources.
Data warehouse tools
Following are the few popular tools of data warehouse
- Amazon Redshift
- Microsoft Azure
- CData Sync
- SAP HANA
- Amazon RDS
- Amazon S3
- Maria DB
Difference between database(DB) and data warehouse(DW)
Many people get confused between these two concepts. So here I am going to state the differences.
- DW transfers and stores accumulated data for analytical purposes. Whereas DB collects data for multiple transactions.
- DW developed for accumulation and recapture of the large data sets. But DB developed for write or read access.
- DW made for easier analysis of data collected and stored from multiple databases. DB made for quick record and recapture data.
Data warehouse history
In the 1950s American government and businesses started using punch cards to store computer-generated data. They are being used till the 1980s.
In the 1960s slowly disk storage system comes into the picture and in 1964 the systems became popular, called ‘magnetic storage’ for data.
IBM is the first company that designed and started the usage of the floppy disk drive, later it is called the hard disk drive.
In 1966, IBM designed its own DBMS(database management system) called ‘information management system’ at that time. It contained the following features
- Ability to find out the exact location of data
- Ability to solve the problem of locating more than one unit of data in the same place
- Ability to delete data
- Ability to access the data rapidly
- Ability to allocate the place when data stored is not able to fit in the specified place
In 1970, online applications came into the picture. People come to know that data can be directly accessible and shared between computers.
After that people started using their personal computers. It changed the way of doing work. At the same time, 4GL technology has invented.
The combination of personal computers and 4GL technology gave full freedom to the end-user. It allows end-users to access their own data efficiently and rapidly by providing control over the computer system. But they found the following problems.
- They got mislead by incorrect data
- Old data is not at all useful
- Got confused because of duplicated data
As a solution to these problems rational database being used in the 1980s. It used SQL (structured query language)as its language.
Businesses started assigning personal computers to the employees and widely used office applications( ms word, ms excel, ms office).
In the year 1990 important changes took place. That is the usage of the internet. Internet became very popular and conflict started because of globalization, computerization, and networking.
During 2000, businesses found that they need good integration between systems and consistent data to get accurate business information required for proper decision-making.
Because of expanded databases and application systems getting consistent data became difficult. To fulfill these needs data warehouse is developed by businesses.
Get more definitions about data warehouse and other ERP related terms here.