techBI: Building Hierarchies

Hierarchies are very useful instruments for navigating though your data. A hierarchy enables a user to start at a high aggregation level (for example, prod- uct category) and supports drilling into the details of a particular dimension. Most hierarchies will be implicitly modeled inside the dimension tables. Good examples of this can be found inside the date dimension with the time hierar- chies year-quarter-month-date or year-week, or the customer dimension with country-region-zipcode-address. Those are simple, fixed hierarchies where all leaves in the hierarchy have the same ‘‘depth.’’ It gets interesting when you need to model hierarchies of variable depth as well. The common way of building hierarchies of various depth in a source system is to have the records in a table reference other records in the same table. Think, for instance, of an employee table where each employee record points to a manager, which is also an employee. In that case, the table references itself; hence it is mostly referred to as a self-join. Oracle’s SQL contains a connect by prior statement, which can traverse these relationship trees. This is also called recursion, but this isn’t an ANSI SQL standard statement so most databases, including MySQL, do not support this.

Luckily, you can also use bridge tables here. Using the bridge table for unbalanced hierarchies is optional; without the bridge table, the dimension table can be joined to the fact table as usual. The bridge table is just there to assist in navigating additional hierarchies. This is why these tables are still sometimes referred to as helper tables. Schema shows the resulting database diagram when using a hierarchy bridge table.

Left tables show what the data in this table looks like when the data for the manager-employee relationships is added, as shown in hierarchy schema below.
This bridge table enables you to roll up the data based on any question you’d like to ask. Care is necessary, however; if you don’t add all needed restrictions, there’s a risk of double counting some of the values. Suppose you want the total revenue of employee 2 by filtering the employee key. Without additional filters on nest_level the result set is doubled because employee_key 2 is entered twice. This is also the major disadvantage of a hierarchy bridge table: Each path from each item to any other item in the same tree is stored in a separate record. As a result, the bridge table gets much larger than the dimension table it belongs to.

An alternative to using bridge tables to model unbalanced hierarchies is to force the flattening of the hierarchy. The blank spots in the diagram are simply filled in by repeating the values of the level above. Table below shows this principle.

You can now create a navigation structure from the boss to the employee level where all levels contain a relation to the level above. The number of columns you need depends on the number of levels in the hierarchy; there is no need for the additional overhead of a bridge table. There is a potential risk involved here: A flattened hierarchy assumes a fixed number of levels. If another level is added to the data, the resulting hierarchy table has to be restructured again.

No matter which construct is used to model the hierarchy, the transformation of the data from a self join relationship into the correct placement in bridge or hierarchy tables is not a trivial task.

пятница, 19 сентября 2014 г.

Building Hierarchies

пятница, 19 сентября 2014 г.