Digifloat

Junk Dimensions contains a combination of low-cardinality attributes that may or may not belong to any other dimension. It can act as a definition table for the combination of different values and can be added to our reporting layer for better performance in defining the dimension attributes and it can also play an important role in managing multi-columnar relationship databases.

Junk Dimensions can be used in multiple scenarios, one of them is when we have the relationship of multiple columns between different tables and when we are building a Dimensional Model.

Let’s say we have multiple different schemas, and each schema has a parent table that may or may not be related with other schema’s parent table through different number of columns. Let’s take 4 different tables as an example and assume that they are related with each other through 4 different columns.

Here we have the MasterOrders table related to DoctorsOrder table with 5 different columns which we can say is our Layer 1. To use this in a reporting tool, Power BI, for example, we need to define a simplified column that would define the uniqueness. As in Power BI, the joins can be made but there will be some columns that will have many to many relationships between them and some might have the one-to-many sided relationships and when we make the multiple relationships between these tables the performance of our reports is affected.

That’s where the Junk Dimension concept comes in, with Junk Dimension we can combine the combination of values and create a unique key on that and then can retrieve in the other tables later with SQL during our Data Warehousing.

We also need to see where there is more uniqueness of the combination of multiple columns exists. In our example, let’s say we have combination of 2M records in DoctorOrder and 5M in MasterOrder.

So, we will be creating the junk dimension based on the MasterOrder.

WITH Temp_OrderKey

AS (

SELECT DISTINCT Order_id

,location_id

,order_location_id

,order_type_id

,order_status_id

FROM MasterOrder

WHERE order_id IS NOT NULL

AND location_id IS NOT NULL

AND order_location_id IS NOT NULL

AND order_type_id IS NOT NULL

AND order_status_id IS NOT NULL

)

SELECT Order_id

,location_id

,order_location_id

,order_type_id

,order_status_id

,ROW_NUMBER() OVER (

ORDER BY order_id

,location_id

,order_location_id

,order_type_id

,order_status_id

) AS Order_Key

FROM Temp_OrderKey

 

After our Junk Dimension is created, we can now make our joins within SQL and only retrieve the OrderKey in our Facts and Dimension tables for simplified joins.

At this point we can call this as our Layer 2, we have OrderJunk in our layer 2 which is related with MasterOrders and DoctorOrders through the 5 columns (order_id, location_id, order_location_id, order_type_id, order_status_id). The SQL to retrieve the Order_Key in MasterOrders and DoctorOrders would be:

—-For DoctorOrders—-

SELECT a.doctor_id

,b.order_key

FROM DoctorOrders a

LEFT JOIN OrderJunk b ON a.order_id = b.order_id

AND a.location_id = b.location_id

AND a.order_location_id = b.order_location_id

AND a.order_type_id = b.order_type_id

AND a.order_status_id = b.order_status_id

 

—-For MasterOrders—-

SELECT a.order_status_id

,b.order_key

FROM MasterOrders a

LEFT JOIN OrderJunk b ON a.order_id = b.order_id

AND a.location_id = b.location_id

AND a.order_location_id = b.order_location_id

AND a.order_type_id = b.order_type_id

AND a.order_status_id = b.order_status_id

 

After creating our Junk Dimension and adding it in our MasterOrders and DoctorOrders tables we have a clean data model that looks like below.

In the end, we can conclude that Junk Dimensions can be used as a middle table in order to make our reporting layer easy to understand and reporting friendly. It acts as a middle layer to convert a multi-columnar primary id into a single-column key in our report-friendly Data Models.

Gulfam Pervaiz

Consultant

Leave a Reply

Your email address will not be published. Required fields are marked *