When developing the dimensional model, you’ll probably end up with some
attributes that don’t really fit in one of the agreed upon dimension tables. This
is usually the case with attributes that have a meaning in a process such as
order or shipment status flags, order type, or payment terms. The possible
solutions for these attributes, such as leaving them in the fact table, moving
them to separate dimensions, or leaving them out altogether all have their
specific disadvantages. A fact table should be as narrow as possible so adding text columns is a bad idea. Modeling each attribute as a separate dimension
is an equally bad idea, and omitting these attributes means that they cannot
be used for any analysis. The best solution is to group these attributes in a
separate dimension table called a junk dimension—not junk in the literal sense
but as a collection of leftovers that need to be handled in an elegant way.
Actually you’ve already seen several examples of the way these dimensions
are modeled in the SCD type 4 paragraph. The only difference between a
mini- and a junk dimension is the fact that the latter contains often unrelated
attributes whereas the attributes in a regular mini-dimension have a relation
of some sort and can be named accordingly.
Degenerate dimensions are a slightly different kind of breed. These are dimen-
sions that do not really exist but should get a place in the dimensional model
anyway. A good example is the order number. Order numbers can help to
trace some of the information in the data warehouse back to the source system,
but there is no real order dimension. All attributes of order and order line
are modeled at the lowest level of granularity, which is the order line fact.
The attributes that are part of an order, such as order date and customer, are
already moved to the order facts. At the end of this process, what do you do
with the order number then? There’s no point in creating an order dimension
because all of the relevant attributes are already pushed down into the fact
table and related dimensions. In these cases where you end up with a single
attribute, just add the attribute to the fact table. This attribute is not a measure
object and is also not a foreign key to a dimension table; hence it is called a
degenerate dimension.
Heterogeneous dimensions are a variation on this theme where different kinds
of items are grouped together in a single dimension table. Because our demo
company, World Class Movies, has only a single type of product in stock it
doesn’t serve as a good example. For this case, however, we can easily find a
good example in the local supermarket. Products in a supermarket can belong
to different categories such as food, non-food, and beverages. Food can also be
categorized into multiple categories with very different characteristics. When
these products are stored in a single product dimension, you end up with a
table where most of the attributes will be irrelevant to an individual product.
Indicating the expiration date or caloric value of a broom simply doesn’t make
a lot of sense. The challenge here is to find the right balance between the
single product table with many useless attributes and tens of different product
dimension tables that are perfectly tailored for a specific product category. In
this case, there is no best answer; it all depends on the situation at hand.