Friday, July 6, 2018

What is Junk Dimension and its purpose

Definition: By definition, it is a collections of random transnational code flags. It is usually created by joining multiple low cardinality dimensions into one.
As we merge different dimensions in to one, it will reduce the number of foreign keys in the fact table.

If we consider the Data warehouse scenario of Online Ticket Booking, The design should have following 3 tables, with fact having 2 foreign keys one for each dim.

  • Fact_Tickets
  • Dim Transaction Type
  • Dim Cert Type

Both Dims are of very low cardinality(i.e. no of records in each dim would be very low)
Transaction Type : The values could be Net-banking, Card, Wallet. The chances of increasing this count is very low
Movie Certificate : The values could be U, A, U/A. Again the chances of increasing/Changing these is very rare.

The picture below describes the 2 possible designs of it, if  we keep these parameters in dimension table.
Junk dimension


So with the help of junk dimension, we converted 2 forein keys in to one.
No of Records: Notice that, the no of records in the junk transaction is the multiplication of no of records in each dimension. In the example above there are 3 records for both Transaction and Certificate, so the total count in junk dimension is 9.
If we are merging 3 tables with 3 records each, the no of records in the junk dimension would be 27(3*3*3)

Ensure the dimensions we junk together should be of very low cardinality. I would suggest the no of records in the junk transaction should be less than thousand.

Please feel free to post your queries in the comments section and follow this blog.

No comments:

Post a Comment

If you liked the post, please share it.
Subscribe to your email and receive new articles on your email