Contents
Data Warehouse
Data Warehouse is a type of data management system designed to support business intelligence (BI) activities. The Data Warehouse course is a compulsory subject in the Information Systems Study Program, Telkom University. Data Warehouse is one implementation of Data Utilization in Business.
Data Warehouse integrates various data sources and enables organizations/companies to gain valuable business insights from 1 trusted data source, better known as “single source of truth.”
There are 4 main characteristics in the data warehouse : Subject-oriented, Integrated, Nonvolatile, Time-variant
Subject-oriented means that the data warehouse focuses on providing information about the subject/department rather than the ongoing operations of the organization
Integrated means that the data warehouse is integrated from various data sources. So if you only have 1 data source, of course it is not suitable to build a data warehouse
Nonvolatile means that the data that has entered the data warehouse does not change (no update/delete)
Time-variant states that the data warehouse stores historical data with changes from time to time
Star Schema
The shape of the data warehouse is actually like a relational database in general but better known as dimensional modeling
Data warehouse can be built on the concept of ERD. ERD or Entity Relationship Diagram is a data model that displays the structure and relationships between entities
Star schema is one of the most popular forms of data warehouse modeling
Called star schema or star schema because the arrangement of the schema is similar to a star. This scheme is very popular and easy to implement for 1 subject type analysis
Star schema is built with 1 fact table surrounded by dimension tables
Fact table is generally formed from transaction tables which contain quantitative information, while dimension tables are more directed to qualitative data
In the fact table there is a metric known as measure. Measure is a property on which calculations can be performed. For example, you can add up, average, find the minimum and maximum values and so on
Design a Star Schema
Okay, to make it easier, let’s try to take a simple example
For example, there is a business process that handles buying and selling transactions (sales) by buyers (customers) for products at certain locations (locations) within a certain period of time (date).
In a buying and selling transaction, of course there is the identity of the consumers, the locations of the sale, the products and the time of the transaction. In addition, there are also the number of items sold (quantity) and the total selling price (total). Or for example there is also a discount / not in every transaction.
From the case above, we can model it in the form of a star schema which consists of 1 fact table, namely sales
and 4 dimension tables : customer, location, product
and date
.
In addition, there are also 3 measures in the fact table, quantity, total sales
and discount
Deep Dive
Let’s try to detail the example above in more depth
For example, if there is a retail company that has more than 1 store, suppose there is shop A, shop B and shop C which have different systems.
The company wants to build a complete and integrated database so that it can be considered as a single source of truth
For example..
Shop A sells shoes and hats
Shop B sells shoes, jackets and t-shirts
Shop C sells t-shirts and shirts
Company owners want to build business intelligence to get a better decision support system by analyzing sales transactions such as the quantity of items sold, total sales and discounts.
It is also hoped that the analysis can be seen from various perspectives, for example the type of product, time, customer and city where it is sold. If the systems are separate, analysis can only be carried out at each store. Not to mention if there is duplication of customer identity in different stores
The company wants to build a complete and valid database so that a data warehouse is built that can integrate every transaction at the 3 stores
The following are transactions in 3 stores illustrated in 1 table
Transaction at Shop A
Transaction at Shop B
Transaction at Shop C
Data Warehouse Architecture
The proposed data warehouse architecture is not much different as usual. There are several data sources on the left side which in this case are systems from Shop A, B and C. The data sources are extracted at certain time intervals (extract) then transformed into 1 consistent format (transform) and entered into the data warehouse (load)
At the transform stage, if you look in more detail from the 3 data sources, there are slight differences in the customer and location columns
Shop A only uses 1 full name attribute to store the customer’s name, while Stores B and C use 2 attributes, first name and last name.
Transaction at Shop A
Transaction at Shop B
Transaction at Shop C
The built star schema must refer to the same schema. For example, customer names are only stored in one field, so in the process of transforming the customer name data in Stores B and C must be transformed by combining the firstName and lastName attributes.
In addition, we can find that in the scheme Stores A and B store the city and province where the shop is located, while Shop C only stores the city without saving the province location.
Transaction at Shop A
Transaction at Shop B
Transaction at Shop C
For example, in a data warehouse scheme, you are required to store provincial transaction data, so in Shops A and B you only need to extract-load (EL) while in Shop C you have to extract-transform-load (ETL) to add province columns using the VLOOKUP approach
While the star scheme for the case of this sales transaction is described as follows
Data modeling like this will facilitate the analysis of total sales, total items sold and discounts given based on 4 dimensions: product, time, location and customer
From the designed data warehouse, we will easily get business intelligence quickly and easily. For example, it can easily and quickly answer several questions such as:
- How many items sold globally?
- How many shoe items are sold globally?
- How many T-shirt items are sold in the city of Bandung?
- How much revenue/ total sales from sales in East Java?
- How much revenue/total sales from shoe products in 2022?
- How big is the relationship between discounting and hat sales?
- And others
To be able to answer easily use OLAP operations such as Drill down, Roll up, Dice, Slice and Pivot
Details of OLAP operations can be read in the article OLAP Operation in DBMS
That’s enough explanation of the star schema concept in the data warehouse along with case examples, hopefully it can be useful, especially for Telkom University students who take Data Warehouse and Business Intelligence course.
Ayo #RaihMasaDepanmu bersama Telkom University!