пятница, 19 сентября 2014 г.

Junk, Heterogeneous, and Degenerate Dimensions


When developing the dimensional model, you’ll probably end up with some attributes that don’t really fit in one of the agreed upon dimension tables. This is usually the case with attributes that have a meaning in a process such as order or shipment status flags, order type, or payment terms. The possible solutions for these attributes, such as leaving them in the fact table, moving them to separate dimensions, or leaving them out altogether all have their specific disadvantages. A fact table should be as narrow as possible so adding text columns is a bad idea. Modeling each attribute as a separate dimension is an equally bad idea, and omitting these attributes means that they cannot be used for any analysis. The best solution is to group these attributes in a separate dimension table called a junk dimension—not junk in the literal sense but as a collection of leftovers that need to be handled in an elegant way. Actually you’ve already seen several examples of the way these dimensions are modeled in the SCD type 4 paragraph. The only difference between a mini- and a junk dimension is the fact that the latter contains often unrelated attributes whereas the attributes in a regular mini-dimension have a relation of some sort and can be named accordingly.
Heterogeneous dimensions are a variation on this theme where different kinds of items are grouped together in a single dimension table. Because our demo company, World Class Movies, has only a single type of product in stock it doesn’t serve as a good example. For this case, however, we can easily find a good example in the local supermarket. Products in a supermarket can belong to different categories such as food, non-food, and beverages. Food can also be categorized into multiple categories with very different characteristics. When these products are stored in a single product dimension, you end up with a table where most of the attributes will be irrelevant to an individual product. Indicating the expiration date or caloric value of a broom simply doesn’t make a lot of sense. The challenge here is to find the right balance between the single product table with many useless attributes and tens of different product dimension tables that are perfectly tailored for a specific product category. In this case, there is no best answer; it all depends on the situation at hand.
Degenerate dimensions are a slightly different kind of breed. These are dimen- sions that do not really exist but should get a place in the dimensional model anyway. A good example is the order number. Order numbers can help to trace some of the information in the data warehouse back to the source system, but there is no real order dimension. All attributes of order and order line are modeled at the lowest level of granularity, which is the order line fact. The attributes that are part of an order, such as order date and customer, are already moved to the order facts. At the end of this process, what do you do with the order number then? There’s no point in creating an order dimension because all of the relevant attributes are already pushed down into the fact table and related dimensions. In these cases where you end up with a single attribute, just add the attribute to the fact table. This attribute is not a measure object and is also not a foreign key to a dimension table; hence it is called a degenerate dimension.