Hierarchies are very useful instruments for navigating though your data. A
hierarchy enables a user to start at a high aggregation level (for example, prod-
uct category) and supports drilling into the details of a particular dimension.
Most hierarchies will be implicitly modeled inside the dimension tables. Good
examples of this can be found inside the date dimension with the time hierar-
chies year-quarter-month-date or year-week, or the customer dimension with
country-region-zipcode-address. Those are simple, fixed hierarchies where all
leaves in the hierarchy have the same ‘‘depth.’’ It gets interesting when you
need to model hierarchies of variable depth as well. The common way of
building hierarchies of various depth in a source system is to have the records
in a table reference other records in the same table. Think, for instance, of an
employee table where each employee record points to a manager, which is also
an employee. In that case, the table references itself; hence it is mostly referred
to as a self-join. Oracle’s SQL contains a connect by prior statement, which can
traverse these relationship trees. This is also called recursion, but this isn’t an
ANSI SQL standard statement so most databases, including MySQL, do not
support this.
Luckily, you can also use bridge tables here. Using the bridge table for unbalanced hierarchies is optional; without the bridge table, the dimension table can be joined to the fact table as usual. The bridge table is just there to assist in navigating additional hierarchies. This is why these tables are still sometimes referred to as helper tables. Schema shows the resulting database diagram when using a hierarchy bridge table.
Left tables show what the data in this table looks like when the data for the manager-employee relationships is added, as shown in hierarchy schema below.
This bridge table enables you to roll up the data based on any question you’d like to ask. Care is necessary, however; if you don’t add all needed restrictions, there’s a risk of double counting some of the values. Suppose you want the total revenue of employee 2 by filtering the employee key. Without additional filters on nest_level the result set is doubled because employee_key 2 is entered twice. This is also the major disadvantage of a hierarchy bridge table: Each path from each item to any other item in the same tree is stored in a separate record. As a result, the bridge table gets much larger than the dimension table it belongs to.
Luckily, you can also use bridge tables here. Using the bridge table for unbalanced hierarchies is optional; without the bridge table, the dimension table can be joined to the fact table as usual. The bridge table is just there to assist in navigating additional hierarchies. This is why these tables are still sometimes referred to as helper tables. Schema shows the resulting database diagram when using a hierarchy bridge table.
Left tables show what the data in this table looks like when the data for the manager-employee relationships is added, as shown in hierarchy schema below.
This bridge table enables you to roll up the data based on any question you’d like to ask. Care is necessary, however; if you don’t add all needed restrictions, there’s a risk of double counting some of the values. Suppose you want the total revenue of employee 2 by filtering the employee key. Without additional filters on nest_level the result set is doubled because employee_key 2 is entered twice. This is also the major disadvantage of a hierarchy bridge table: Each path from each item to any other item in the same tree is stored in a separate record. As a result, the bridge table gets much larger than the dimension table it belongs to.
An alternative to using bridge tables to model unbalanced hierarchies is
to force the flattening of the hierarchy. The blank spots in the diagram are
simply filled in by repeating the values of the level above. Table below shows this
principle.
No matter which construct is used to model the hierarchy, the transformation
of the data from a self join relationship into the correct placement in bridge or
hierarchy tables is not a trivial task.
You can now create a navigation structure from the boss to the employee
level where all levels contain a relation to the level above. The number of
columns you need depends on the number of levels in the hierarchy; there
is no need for the additional overhead of a bridge table. There is a potential
risk involved here: A flattened hierarchy assumes a fixed number of levels.
If another level is added to the data, the resulting hierarchy table has to be
restructured again.