How to do it...

The following are working points that will help you understand performance optimization from a Teradata database perspective, and how bad PI and bad join affects the performance of a database. Many a time, it only takes a single/few bad query(s) to bring database performance down:

  1. Check primary index (PI): What is a bad PI? When the table is skewed, mostly because of a large number of duplicate values on the column(s) that make the PI, a bad PI. PI is directly responsible for:
    • Row distribution
    • Access path
    • Join performance
  2. Check statistics: Statistics can break and make a query. The Optimizer is more accurate when it has collected statistics. It is more conservative when it must rely on dynamic AMP sampling. Statistics or stats help the optimizer generate better execution plans. No stats or stale stats can lead the optimizer to create a bad plan, which means wrong joins and wrong estimations of rows to be processed/returned.
  3. Check partition primary index (PPI): Helps the user to reap the benefits of scalability inherent in the hash architecture of the Teradata database. How does it help? It helps reduce the I/Os of the system by providing an access path to the rows in the base table and an efficient join.
  4. Check secondary index (SI): Two indexes on tables? Wow! But why do you need SI? It offers an alternative path to access the data. Use SI when data distribution is done by a UPI and access is handled by SI.
  5. Check parallel efficiency, PE: You can't define PE in a query, but you can calculate it. Remember, Teradata is a parallel architecture database. PE is simply defined as the Average Node CPU busy divided by the Maximum Node CPU busy. A high PE query means it's been executed on all AMPs.
  6. Verify SELECT *: In a select * all of the columns are loaded into spool, which increase spool usage. Select columns that are required.
  1. Verify functions in joins: Avoid using functions in joins and where clauses. The functions prevent the optimizer from seeing the data demographics, and the function must be executed for every row. Even if the column has collected stats, once the function is applied, the new manipulated column will not have stats, making the optimizer prone to wrong estimates of the data demographics.
  2. Use BETWEEN rather than IN when possible. 
  3. Use GROUP BY or Distinct: Use group by instead of Select Distinct. Be cautious using count distinct. The distinct tends to skew and cause performance issues when dealing with large numbers or with large numbers of duplicates.

CPUSkew, IOSkew, product join indicator, unnecessary IO indicator, and ImpactCPU combine together to form a report, which can be called suspected queries. All of these can be useful for identifying high resource consumption and skewed queries for tuning opportunities.