Top 10 Core Best Practices
During Teradata code review sessions I often find a few of the following fundamentals overlooked. In that spirit, these are my top 10 core best practices.
Tables should be defined as
NO FALLBACK unless there is a very specific reason to have a
FALLBACK protected definition.
MULTISET omits a unique row check, saving overhead, while
NO FALLBACK ensures there are not four copies of the table stored on disk. Teradata is RAID1/mirrored internally by default so a redundant copy already exists in case of drive failure.
PRIMARY INDEX selection:
The choice of a table's
PRIMARY INDEX is the single most important physical design decision. The
PRIMARY INDEX is not necessarily the
PRIMARY KEY. If it is unique then it should be defined as a
UNIQUE PRIMARY INDEX. If the table does not have any good candidate field(s) for the
PRIMARY INDEX then defining the table as
NO PRIMARY INDEX is acceptable (e.g. some staging tables). Two important considerations for the
PRIMARY INDEX are
Therefore, reasonable skew of data at rest is acceptable for processing efficiency gains.
- data distribution across the MPP architecture (i.e. the AMPs); and,
- join access, which is the more important.
Efficient data typing:
Fields should have an efficient data typing. If they always contain values then they should be defined as
NOT NULL. Again, further efficiencies are realized. Examples of efficient data typing include defining code fields (e.g. USPS State Code: MA, NH), and other fixed-length character fields as
CHARACTER and not
VARCHAR. Today, we MVC compress low-cardinality fixed-length fields (prior to the introduction of VLC where varible-length fields can also be compressed). Additionally, date fields are defined as
DATE and not
TIMESTAMP is internally a concatenation of Year, Month, Day, Hour, Minute, and Second fields, and is not nearly as efficient as
DATE. Likewise, turning date values into a look-alike
INTEGER saves nothing in storage, but removes date math capabilities inherent with the
DATE data type. And, storing numerical values in fields much larger than is required (e.g. if
DECIMAL(9,2) suffices then don't use
DECIMAL(18,2)). Also, don't exceed
DECIMAL(18,x) without a specific reason to do so.
Now that our tables and fields are defined we need to specify our Multi-Value Compression. MVC offers reduced disk space consumption and increased access performance. Typically, MVC saves 30% of required disk space on average. Thinner records equate to more records read per I/O. Less I/O leads to faster executing queries. The laws of physics.
Database statistics are core and fundamental to Teradata. The cost-based optimizer relies on them heavily. Approximately 90% of all performance problems are due to missing or stale/bad statistics. Teradata's rule of thumb for refreshing statistics is if the data changes by 10% or more--solely a guideline. I would be far more risk averse and place it at 5% or less. Ultimately, I consider fixed-period collection to work well (e.g. daily, weekly). Additionally, my belief is single-column statistics tend to be less confusing to the optimizer given the overlap of multi-column statistics with single-column versions; therefore, the core is single-column. At a minimum, the following fields should contain statistics:
- the pseudo field
- foreign key fields
- predicates: fields used as criteria in an
- secondary indexes (multi-column statistics if the secondary index is multi-column, and the collection is defined as
While statistics give the optimizer information about a table's demographics, they are intra-table. However, we rarely use solely one table in a query, rather joins to other tables are more common. Therefore, defining
FOREIGN KEYs between tables allows the optimizer to understand intended relationships and offers inter-table demographics and extrapolate cardinality.
FOREIGN KEY constraints done on a SoftRI (i.e. without database referential integrity checks) basis also allow for Join Elimination to be leveraged. This means that the optimizer may omit joins from a query or view if it deems that the join with not affect the integrity of the query's outcome and the join process would otherwise be superfluous. Efficiency gained.
Table accessing queries should occur through a view. The base view to the table for accessing queries should contain an access lock:
LOCKING TABLE table_name FOR ACCESS, or the row version at a minimum. Access locks remove another level of overhead and mitigate contention with write locks. Another efficiency gain.
Queries and views should be written with ANSI SQL if there is an equivalent function to the need. ANSI SQL tends to perform eqaully well or better and is typically more efficient for accessing queries and views.
Ordered Analytical Functions can, but not always, offer increased efficiency by reducing the number of database objects brought into a query. This is especially true when doing some form of chronological sequencing of the data. Prior to OAFs these types of processes required self-joins of tables to occur. Otherwise, this would require a copy of the table to be spun up into spool for that join to occur. With an OAF it reduces the number of objects/tables involved and in many cases requires only a single pass of specific tables. Less I/O is required to scan through a table once versus many times. Examples of this involve the need to look back or forward in time, across records within a key, for the prior/next chronological value of a field. OAFs can be very powerful for specific tasks.
After the tables are created, the data loaded, statistics collected, FKs defined, and the views and queries built, we need to validate it, not only from a results perspective, but from a processing perspective as well. The
EXPLAIN plan defines the exact steps, and their sequence. It can provide valuable information about where a process may deviate from what was originally anticipated and help with debugging. It is a powerful tool that should be used to ensure efficiency and quality.
I hope this is helpful and serves as a guideline for you present and future Teradata development.