These are a few general tuning guidelines that will help
with any version of a Teradata database.
Keep in mind the use of function shipping throughout the design and
tuning process. Function ship as often
as possible to let Teradata do the heavy lifting.
Primary Indexes
Secondary Indexes
Statistics
Semantic Views
Partitioned Primary Indexes
Aggregate Join Indexes
Aggregate Tables
Turning a semantic view into a table
Primary Indexes
First make sure that your primary indexes are set
appropriately. This can be determined by
the skew of the primary index (PI) column or columns. A skew factor of more than 25% could
potentially slow down join performance.
When selecting a new primary index consider using the join access paths
to the tables. Putting the PI on join
columns can help performance. Also check
the type of the join the optimizer is using. Usually, merge joins are cheaper
when the joins involve two large tables.
Secondary Indexes
Adding secondary indexes (SI) to join and filter columns is
recommended. The optimizer usually uses
a SI when a filter is applied on the indexed column or columns, and the number
of rows retrieved from the base table is around 10%. Also, when the optimizer does a nested join
on a large table, an SI can sometimes help improve the performance of the join.
Experience indicates that a secondary index works best when a large table is
joined to a small table.
Statistics
One of the most important sources of information the
Teradata optimizer uses for choosing access plans is the set of statistics
Teradata collects about the system and the data in it. Database statistics include “data
demographics” such as the number of rows in each table, the number of unique
values within each column, the skew and distribution of values within each
column, etc. The optimizer uses this
information when selecting the best access plan from among all the possible
access plans that could satisfy a given query.
It is important for the statistics to be updated as
frequently as practical. Whenever new
data is loaded or new indexes are built or any other significant change occurs
to data, the statistics should be updated to reflect the change. Note that out-of-date statistics can be worse
than not collecting statistics at all.
In Teradata, statistics are particularly important because
the optimizer does not allow the user to “override” its decisions through
constructs like hints or rewritten SQL.
The optimizer will select the best plan for a given query, but its
decisions are only as good as the information it uses to make them.
Statistics should be collected and kept up-to-date for PI
columns, SI columns, join columns, filter columns and indexes. Both single and multi-column statistics can
help with performance.
Semantic Views
Semantic views should be designed so that they are not too
complex and involve just the tables needed in the view. Unnecessary table joins
must be avoided. The fewer the table joins the more efficient the view, as it
increases the stability of the EXPLAIN plan.
When creating semantic views, there are a few things to
avoid. Outer joins don’t allow for query
folding. Query folding returns data very
quickly by applying filters first
and then joining the result sets. When
outer joins are used the reverse happens - tables are joined and then the
filters are applied. This is much more
work for the database and therefore gives longer query times. An alternative method to outer joins is to create
a ‘–999’ or ‘Unspecified’ record in each dimension and update the foreign key
with value ‘-999’ (or ‘Unspecified' if using a natural non integer key). Performance will need to be checked to see
which method works better. Outer joins
can cause poor performance, but a badly skewed inner joined table could be
worse.
Another thing to avoid in the semantic layer is using functions
on columns used in a join. When a
function like “case” or “coalesce” is used in a join the optimizer will assign
low confidence to steps in the explain plan.
This could result in a bad or slow running execution plan. The same problem occurs when the data types
on join columns don’t match - the optimizer assigns low confidence in the
explain plan. The confidence of one step in the plan cascades down and
influences the optimizer’s choice of the subsequent steps in the plan and could
negatively impact performance even further.
Also avoid unnecessary aggregations in the semantic view
definition. Forcing an aggregation when
a query may request the lowest level of detail just causes unnecessary
processing. The same goes for “distinct”
– use a “distinct” only if absolutely necessary.
The views should be designed so that redundant joins are not
performed in the BI Tools queries. A common practice is to join multiple tables
and including just the dimensional IDs in the view definition. When the report being
executed requires additional information about the attribute, BI Tools creates
a join between the view and the tables to retrieve any additional information
required. This redundancy in joins can lead to significant degradation in
performance when the query is executed. Thus it is important to design the view
definitions in a practical manner.
Partitioned Primary Indexes
Teradata V2R5.0 introduced partitioned primary indexes
(PPI), an indexing mechanism used in physical database design. When using PPI’s, rows within each AMP are
partitioned and then sorted by their row hash value within each partition. The net effect is that queries can run faster
by accessing only the rows of qualifying partitions.
PPI’s are especially helpful for queries based on range
access, such as date ranges. A common
scenario is to partition by something in the Time dimension, for example at a
daily, weekly, or monthly level.
Aggregate Join Indexes
Aggregate join indexes (AJI) are another great tool for
improving performance. An AJI is an
aggregated result set saved as an object in the database. It is transparent to
an end-user and will be used automatically by the Teradata optimizer when a
query contains the appropriate tables and columns. An AJI would typically be
put on the fact table or the tables that make up the fact view. AJIs are preferred over aggregates tables
due to the ease of maintenance and automatic use by the database.
Aggregate Tables
Pre-aggregating data is a common and powerful way to improve
end user query response times. Rather
than aggregating many, many rows at query time, pre-summarization allows the
database to perform row access and aggregation ahead of time, satisfying the
query at request time much faster. OBIEE
is an aggregate aware application that allows end users to query physical
summary tables without needing to specify which table to use to satisfy the
query. Aggregation tables might be
needed at some point but do not start building them right away. First build a model using OBIEE and Teradata
best practices and measure performance. When
hitting a performance issue aggregate tables should be one of the last avenues
to pursue. Most of the time an AJI can
be used instead (AJIs have far less maintenance to deal with).
Turning a semantic view into a table
Implementing physical tables is an available choice when the
desired performance requirements are not met by the tuning methods described
above. It could be true that all of the
tuning in the world won’t make a particular star performant. If the business logic incorporated in the
semantic views for BI Tools is very complex, then it could be beneficial to
implement the view as a physical table for performance benefits. This should only be considered as a last
resort. Since this choice introduces
high costs of storage and maintenance, this design needs to be justified by a
strong business need for performance improvements.