star schema data warehouse

Get to know Star Schema in Data Warehouse and How to Design it


Data Warehouse

Data Warehouse is a type of data management system designed to support business intelligence (BI) activities. The Data Warehouse course is a compulsory subject in the Information Systems Study Program, Telkom University. Data Warehouse is one implementation of Data Utilization in Business.

Data Warehouse integrates various data sources and enables organizations/companies to gain valuable business insights from 1 trusted data source, better known as “single source of truth.”

There are 4 main characteristics in the data warehouse : Subject-oriented, Integrated, Nonvolatile, Time-variant

Subject-oriented means that the data warehouse focuses on providing information about the subject/department rather than the ongoing operations of the organization

Integrated means that the data warehouse is integrated from various data sources. So if you only have 1 data source, of course it is not suitable to build a data warehouse

Nonvolatile means that the data that has entered the data warehouse does not change (no update/delete)

Time-variant states that the data warehouse stores historical data with changes from time to time

Star Schema

The shape of the data warehouse is actually like a relational database in general but better known as dimensional modeling

Data warehouse can be built on the concept of ERD. ERD or Entity Relationship Diagram is a data model that displays the structure and relationships between entities

Star schema is one of the most popular forms of data warehouse modeling

Called star schema or star schema because the arrangement of the schema is similar to a star. This scheme is very popular and easy to implement for 1 subject type analysis

Star schema is built with 1 fact table surrounded by dimension tables

Fact table is generally formed from transaction tables which contain quantitative information, while dimension tables are more directed to qualitative data

In the fact table there is a metric known as measure. Measure is a property on which calculations can be performed. For example, you can add up, average, find the minimum and maximum values and so on

Design a Star Schema

Okay, to make it easier, let’s try to take a simple example

For example, there is a business process that handles buying and selling transactions (sales) by buyers (customers) for products at certain locations (locations) within a certain period of time (date).

In a buying and selling transaction, of course there is the identity of the consumers, the locations of the sale, the products and the time of the transaction. In addition, there are also the number of items sold (quantity) and the total selling price (total). Or for example there is also a discount / not in every transaction.

From the case above, we can model it in the form of a star schema which consists of 1 fact table, namely sales and 4 dimension tables : customer, location, product and date.

In addition, there are also 3 measures in the fact table, quantity, total sales and discount

Deep Dive

Let’s try to detail the example above in more depth

For example, if there is a retail company that has more than 1 store, suppose there is shop A, shop B and shop C which have different systems.

The company wants to build a complete and integrated database so that it can be considered as a single source of truth

For example..

Shop A sells shoes and hats

Shop B sells shoes, jackets and t-shirts

Shop C sells t-shirts and shirts

Company owners want to build business intelligence to get a better decision support system by analyzing sales transactions such as the quantity of items sold, total sales and discounts.

It is also hoped that the analysis can be seen from various perspectives, for example the type of product, time, customer and city where it is sold. If the systems are separate, analysis can only be carried out at each store. Not to mention if there is duplication of customer identity in different stores

The company wants to build a complete and valid database so that a data warehouse is built that can integrate every transaction at the 3 stores

The following are transactions in 3 stores illustrated in 1 table

Transaction at Shop A

Star Schema di Data Warehouse

Transaction at Shop B

Star Schema di Data Warehouse

Transaction at Shop C

Star Schema di Data Warehouse

Data Warehouse Architecture

The proposed data warehouse architecture is not much different as usual. There are several data sources on the left side which in this case are systems from Shop A, B and C. The data sources are extracted at certain time intervals (extract) then transformed into 1 consistent format (transform) and entered into the data warehouse (load)

At the transform stage, if you look in more detail from the 3 data sources, there are slight differences in the customer and location columns

Shop A only uses 1 full name attribute to store the customer’s name, while Stores B and C use 2 attributes, first name and last name.

Transaction at Shop A

Transaction at Shop B

Transaction at Shop C

The built star schema must refer to the same schema. For example, customer names are only stored in one field, so in the process of transforming the customer name data in Stores B and C must be transformed by combining the firstName and lastName attributes.

In addition, we can find that in the scheme Stores A and B store the city and province where the shop is located, while Shop C only stores the city without saving the province location.

Transaction at Shop A

Transaction at Shop B

Transaction at Shop C

For example, in a data warehouse scheme, you are required to store provincial transaction data, so in Shops A and B you only need to extract-load (EL) while in Shop C you have to extract-transform-load (ETL) to add province columns using the VLOOKUP approach

While the star scheme for the case of this sales transaction is described as follows

Star Schema di Data Warehouse

Data modeling like this will facilitate the analysis of total sales, total items sold and discounts given based on 4 dimensions: product, time, location and customer

From the designed data warehouse, we will easily get business intelligence quickly and easily. For example, it can easily and quickly answer several questions such as:

  • How many items sold globally?
  • How many shoe items are sold globally?
  • How many T-shirt items are sold in the city of Bandung?
  • How much revenue/ total sales from sales in East Java?
  • How much revenue/total sales from shoe products in 2022?
  • How big is the relationship between discounting and hat sales?
  • And others

To be able to answer easily use OLAP operations such as Drill down, Roll up, Dice, Slice and Pivot

Details of OLAP operations can be read in the article OLAP Operation in DBMS

That’s enough explanation of the star schema concept in the data warehouse along with case examples, hopefully it can be useful, especially for Telkom University students who take Data Warehouse and Business Intelligence course.

Ayo #RaihMasaDepanmu bersama Telkom University!

Leave a Reply

Your email address will not be published. Required fields are marked *