The data warehouse bus architecture was developed by Ralph Kimball and
is extensively described in his books The Data Warehouse Toolkit and The Data
Warehouse Lifecycle Toolkit. Both books are published by Wiley Publishing
and cover the complete lifecycle of modeling, building, and maintaining data
warehouses. The term bus refers to the fact that the different data marts in
the data warehouse are interlinked by using conformed dimensions. A simple
example can explain this. Suppose you have dimension tables for customers,
suppliers, and products dimensions and want to analyze data about sales and
purchase transactions. In case of the purchasing transactions, the customer is
still unknown so it’s not very useful to include the customer dimension in the
purchase star schema. For sales transactions the situation is slightly different:
You need information about the customer who purchased a product and the supplier the product was purchased from. The resulting diagram for this small example data warehouse is shown in schema below:
It’s best to start with a high-level bus architecture matrix before the data mart’s design process is started. Figure 7-4 shows an example matrix, where all identified business facts are placed in the rows and all identified dimensions in the columns. The ‘‘bus’’ is formed by the main business process or the natural flow of events within an organization. In our case, that would be ordering from suppliers, storing and moving inventory, receiving customer orders, shipping DVDs, and handling returns. Within such a main business process it’s easy to check off all relationships between dimensions and facts, which makes the design process easier to manage and can also be used to communicate with the business users about the completeness of the data warehouse.
Using the bus architecture with conformed dimensions is what enables the collection of data marts to be treated as a true Enterprise Data Warehouse. Each dimension table is designed and maintained in only one location, and a single process exists to load and update the data. This contrasts sharply with a collection of independent data marts where each individual data mart is designed, built, and maintained as a point solution. In that case, each data mart contains its own dimensions and each individual dimension has no relation to similar dimensions in other data marts. As a result of this way of working, you might end up having to maintain five or more different product and customer dimensions. We strongly oppose this type of ‘‘architecture’’!The author advice here is to always start with developing and agreeing upon the high-level bus matrix to identify all the entities of interest for the data warehouse. Only after completing this step can the detailed design for the individual dimension and fact tables be started.
It’s best to start with a high-level bus architecture matrix before the data mart’s design process is started. Figure 7-4 shows an example matrix, where all identified business facts are placed in the rows and all identified dimensions in the columns. The ‘‘bus’’ is formed by the main business process or the natural flow of events within an organization. In our case, that would be ordering from suppliers, storing and moving inventory, receiving customer orders, shipping DVDs, and handling returns. Within such a main business process it’s easy to check off all relationships between dimensions and facts, which makes the design process easier to manage and can also be used to communicate with the business users about the completeness of the data warehouse.
Using the bus architecture with conformed dimensions is what enables the collection of data marts to be treated as a true Enterprise Data Warehouse. Each dimension table is designed and maintained in only one location, and a single process exists to load and update the data. This contrasts sharply with a collection of independent data marts where each individual data mart is designed, built, and maintained as a point solution. In that case, each data mart contains its own dimensions and each individual dimension has no relation to similar dimensions in other data marts. As a result of this way of working, you might end up having to maintain five or more different product and customer dimensions. We strongly oppose this type of ‘‘architecture’’!The author advice here is to always start with developing and agreeing upon the high-level bus matrix to identify all the entities of interest for the data warehouse. Only after completing this step can the detailed design for the individual dimension and fact tables be started.