Skip to main content

Slowly Changing Dimension in Data Warehouse

SCD (Slowly Changing Dimension) is a very important concept in ETL and Data Warehouse in general.



“Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse”
-Oracle

This methodology is used to track changes in the dimensions that change slowly over time.

Let’s have an example:

If a customer in a bank has an account and the balance is changing every business day and this is the normal case that does not need an SCD technique.
Then this customer decided to upgrade his account to a premium one.
Our assumptions here:
1. The bank wants to track the balance and other changes starting from the opening of the basic account and do not want to open a new account but just an upgrade to premium.
2. Upgrading the account to a premium one force the bank to change the account number to meet the criteria of the premium accounts(let’s say the premium account must start with number 2 and the basic account must start with 1).
2. While upgrading, the balance does not change.

So the SCD case here will be on the ACCOUNT_NUMBER field. So let’s discuss the SCD on this example.

There are main 3 types to handle this change and it depends on the case for which type to fit your data warehouse and your problem.

the original account of our customer

Our change is from account_number 1111111111111111 to 2222222222222222

Type 1: Overwrite

This type overwrites the whole record with a completely new record. hence, we cannot track the historical changes and this type does not fit the case I mentioned.

We can get benefit from this type when we are changing a record in the whole system and we don’t need the old one such as a bank product (like a credit card) and we need to change its name on the whole level of the system.

SCD Type 1

Type 2: Adding rows

Here we can track changes historically by adding or using a dimension that is unique.

You need to separate every change and be able to track these changes in this order.

In our example, I would add the account type (assuming there is no more than 1 current account per customer) and the version column to track the history of changes.

SCD Type 2 with version number

Another popular way and more common is using from_date and to_date dimensions instead of version.

SCD Type 2 with dates

Note that the to_date in the new record is supposed to be infinity until the dimension needs another change.

Another important note is that in other changes like maybe the marital status or address of a customer will need to add surrogate key (system key) as we don’t have a unique natural key here( neither the customer_id nor the account_number is changed). So, the surrogate key can be useful to distinguish between the 2 records or whatever how many it was changed.

The versions method of type 2 mentioned above can have instead a composite key between any natural key combined with the version column.

The dates method of this type also can have a combination between the natural keys and the dates (but it’s not a good practice to include the dates into a composite key), so maybe a new surrogate key is useful here.

Type 3: Adding Columns

This type can only track the last few changes based on how many columns will you add.

Here we add a column to store the old value and another date column to know when the change has been executed.

SCD Type 3

Type 4: Adding a historical table

The main Customers table that has the latest data
The customer historical table that has the changes

This type creates another historical table that has all the changes. But, the main table has the current and the latest update of the data.

Here we created two tables that have the same structure but the historical one has the updated date and it gonna expand more and more with every change that happens.

The SCD is essential for any ETL/DW developer or architect. You just need to distinguish which type to use in your problem that will help you a well designed historical data warehouse.

Comments