One of the hardest to solve problems in dimensional modeling is the
multi-valued dimension. Again, the actor table is a perfect example: Our
customers buy or rent a single DVD but this item usually has multiple actors
appearing in the movie. Depending on the kind of information we’d like to
retrieve from the data warehouse, there are two possible solutions to solve this
problem. The first one is to list the actors in a text field at the movie level, as
in Table:
This is fine for informational purposes and answers the question ‘‘Which
actors played in movie X?’’ but cannot be used for other, more interesting,
questions as ‘‘In which movies did actor Y play?’’ or ‘‘Which are the top 10
actors based on rental revenue?’’ To solve this last problem, you need a way
to relate multiple facts to multiple dimension values. And because you cannot
directly create many-to-many relationships in a database you need a bridge
table to perform the task. Schema below shows the part of the data warehouse
model with the dvd_actor bridge table.
Note that in this diagram, the table dvd_actor_bridge contains a
dvd_actor_factor field. This is a necessary addition to the bridge table to
force SQL to return the correct results. If we omit the factor in our calculations
involving actors, the revenue will be multiplied by the number of actors
linked to a specific DVD. Debate is, of course, open to whether this should be
an equal division (10 actors, factor 0.1) or that the leading actor gets a higher
factor (10 actors of which one is Brad Pitt; Brad counts for 0.55, the others for
0.05 each). This poses a problem when you want an answer to the question
‘‘How much revenue did we generate with the movies starring Brad Pitt?’’ In
that case, only one actor is selected causing the factor to return a value that’s
too low so the calculation factor should be either omitted or set to the value
1 in this case. Things get really messy when we want to know our top 10
movie stars based on the total revenue of the movies they played in. Some of
these actors might have co-starred in one or more movies in the past, causing
incorrect results. The message here is: Be careful when modeling these kinds
of relationships and make sure that you provide access only to bridged
relations when the user or analyst working with the data knows exactly what
he or she is dealing with.