пятница, 19 сентября 2014 г.

Teradata Warehouse Performance Tuning for Oracle BI

These are a few general tuning guidelines that will help with any version of a Teradata database.   Keep in mind the use of function shipping throughout the design and tuning process.  Function ship as often as possible to let Teradata do the heavy lifting.

Primary Indexes


First make sure that your primary indexes are set appropriately.  This can be determined by the skew of the primary index (PI) column or columns.  A skew factor of more than 25% could potentially slow down join performance.  When selecting a new primary index consider using the join access paths to the tables.  Putting the PI on join columns can help performance.  Also check the type of the join the optimizer is using. Usually, merge joins are cheaper when the joins involve two large tables.

Secondary Indexes

Adding secondary indexes (SI) to join and filter columns is recommended.  The optimizer usually uses a SI when a filter is applied on the indexed column or columns, and the number of rows retrieved from the base table is around 10%.   Also, when the optimizer does a nested join on a large table, an SI can sometimes help improve the performance of the join. Experience indicates that a secondary index works best when a large table is joined to a small table.

Statistics

One of the most important sources of information the Teradata optimizer uses for choosing access plans is the set of statistics Teradata collects about the system and the data in it.  Database statistics include “data demographics” such as the number of rows in each table, the number of unique values within each column, the skew and distribution of values within each column, etc.  The optimizer uses this information when selecting the best access plan from among all the possible access plans that could satisfy a given query.

It is important for the statistics to be updated as frequently as practical.  Whenever new data is loaded or new indexes are built or any other significant change occurs to data, the statistics should be updated to reflect the change.  Note that out-of-date statistics can be worse than not collecting statistics at all.

In Teradata, statistics are particularly important because the optimizer does not allow the user to “override” its decisions through constructs like hints or rewritten SQL.  The optimizer will select the best plan for a given query, but its decisions are only as good as the information it uses to make them.

Statistics should be collected and kept up-to-date for PI columns, SI columns, join columns, filter columns and indexes.  Both single and multi-column statistics can help with performance.

Semantic Views

Semantic views should be designed so that they are not too complex and involve just the tables needed in the view. Unnecessary table joins must be avoided. The fewer the table joins the more efficient the view, as it increases the stability of the EXPLAIN plan.

When creating semantic views, there are a few things to avoid.  Outer joins don’t allow for query folding.  Query folding returns data very quickly by applying filters first and then joining the result sets.  When outer joins are used the reverse happens - tables are joined and then the filters are applied.  This is much more work for the database and therefore gives longer query times.  An alternative method to outer joins is to create a ‘–999’ or ‘Unspecified’ record in each dimension and update the foreign key with value ‘-999’ (or ‘Unspecified' if using a natural non integer key).  Performance will need to be checked to see which method works better.  Outer joins can cause poor performance, but a badly skewed inner joined table could be worse. 

Another thing to avoid in the semantic layer is using functions on columns used in a join.  When a function like “case” or “coalesce” is used in a join the optimizer will assign low confidence to steps in the explain plan.  This could result in a bad or slow running execution plan.  The same problem occurs when the data types on join columns don’t match - the optimizer assigns low confidence in the explain plan. The confidence of one step in the plan cascades down and influences the optimizer’s choice of the subsequent steps in the plan and could negatively impact performance even further.

Also avoid unnecessary aggregations in the semantic view definition.  Forcing an aggregation when a query may request the lowest level of detail just causes unnecessary processing.  The same goes for “distinct” – use a “distinct” only if absolutely necessary.

The views should be designed so that redundant joins are not performed in the BI Tools queries. A common practice is to join multiple tables and including just the dimensional IDs in the view definition. When the report being executed requires additional information about the attribute, BI Tools creates a join between the view and the tables to retrieve any additional information required. This redundancy in joins can lead to significant degradation in performance when the query is executed. Thus it is important to design the view definitions in a practical manner.

Partitioned Primary Indexes

Teradata V2R5.0 introduced partitioned primary indexes (PPI), an indexing mechanism used in physical database design.  When using PPI’s, rows within each AMP are partitioned and then sorted by their row hash value within each partition.  The net effect is that queries can run faster by accessing only the rows of qualifying partitions.

PPI’s are especially helpful for queries based on range access, such as date ranges.  A common scenario is to partition by something in the Time dimension, for example at a daily, weekly, or monthly level.

Aggregate Join Indexes

Aggregate join indexes (AJI) are another great tool for improving performance.  An AJI is an aggregated result set saved as an object in the database. It is transparent to an end-user and will be used automatically by the Teradata optimizer when a query contains the appropriate tables and columns. An AJI would typically be put on the fact table or the tables that make up the fact view.   AJIs are preferred over aggregates tables due to the ease of maintenance and automatic use by the database.

Aggregate Tables

Pre-aggregating data is a common and powerful way to improve end user query response times.  Rather than aggregating many, many rows at query time, pre-summarization allows the database to perform row access and aggregation ahead of time, satisfying the query at request time much faster.  OBIEE is an aggregate aware application that allows end users to query physical summary tables without needing to specify which table to use to satisfy the query.  Aggregation tables might be needed at some point but do not start building them right away.  First build a model using OBIEE and Teradata best practices and measure performance.  When hitting a performance issue aggregate tables should be one of the last avenues to pursue.  Most of the time an AJI can be used instead (AJIs have far less maintenance to deal with).

Turning a semantic view into a table

Implementing physical tables is an available choice when the desired performance requirements are not met by the tuning methods described above.  It could be true that all of the tuning in the world won’t make a particular star performant.  If the business logic incorporated in the semantic views for BI Tools is very complex, then it could be beneficial to implement the view as a physical table for performance benefits.  This should only be considered as a last resort.  Since this choice introduces high costs of storage and maintenance, this design needs to be justified by a strong business need for performance improvements.