Poorly designed databases are counterproductive to performance. Yet, many companies are unaware of the little errors that are causing big problems. These 10 database design best practices will help you significantly increase your company’s database management and performance, create more productive search queries and improve data quality.
Tip #1: Avoid Using Hints
Some of the largest RDBMS vendors have spent countless hours and untold amounts of money to create the most sophisticated and efficient SQL engines. Hints explicitly restrict the RDBMS from utilizing the most efficient access path to the data. If a hint is needed, it is most likely a database/query design flaw that needs to be addressed.
Tip #2: Proper Datatypes
The benefits from using correct and proper datatypes for attributes within the database are two-fold:
- It ensures the correct type of data is added to the database. For example, 20010229 can’t be added to a DATE datatype attribute; whereas, this value could be added to an INTEGER datatype.
- It provides better insight for the UPDATE STATISTICS command. Storing ‘Dates’ as an INTEGER rather than as a DATE datatype will result in the incorrect number of potential values available to the attribute. For example, there’s a potential of 100 values between 20010101 and 20010201 for an INTEGER compared to only 31 for a DATE datatype.
Tip #3: Eliminate Unnecessary Attributes within Queries
To maximize performance, limit the attributes in the SELECT clause within the query to only the attributes necessary to satisfy the business requirement. Limiting the attributes reduces the amount of data transferred from disk and allows for more useful data per page of memory, which will result in a more efficient query execution. Note: Never use the ‘*’ in the SELECT clause for any production queries.
Tip #4: Think about Relationships
When designing new relations, it’s important to consider the environment in which the relation will be deployed (i.e. OLTP, Operational Data Store, Data Warehouse, etc.).
The following items are key to designing new or modifying existing relations:
- Levels of data normalization to be deployed;
- Consistent datatypes across relations;
- Primary key and secondary index attributes and order;
- Cardinality of each attribute
Tip #5: Keep Statistics Updated
Statistics provide the SQL engine with knowledge and insights of the data. The statistics gathered by the UPDATE STATISTICS process are utilized by the OPTIMIZER to determine the most efficient access path into the data. As data changes over time, so could the access paths into the data. Therefore, it’s important to keep statistics current and recompile programs frequently.
Tip #6: Avoid Table or Index Scans
One of the biggest wastes of system resources (i.e. disk, CPU, memory, etc.) is accessing data that will never be utilized by a query. Unnecessary scanning of data is generally the result of one of the following items:
- Missing predicates in a query
- Poorly structured query
- Non-keyed or non-indexed attributes utilized as a predicate
- Poorly structured compound key
- Outdated statistics resulting in an inefficient QEP (Query Execution Plan)
- Incomplete or poorly structured JOIN between relations
- Predicate utilized as a function
Tip #7: Limit Temporary Tables
A temporary table is a great feature of most SQL implementations that allows for an intermediate data set to be repeatedly used by a query or stored procedure. However, there are several reasons to either limit or eliminate the use of temporary tables: Advances in RDBMS engines are now capable of providing this functionality automatically at run time.
Query execution time is negatively impacted because of the extra I/O operations to READ the initial data, WRITE the data to the temporary table, READ data from temporary table, and lastly dropping the temporary table. Query execution is suspended until the temporary table can be populated.
Tip #8: Limit the Utilization of TOP Function
There is a common misconception that when the TOP condition has been satisfied, query execution terminates. Unfortunately, this is simply not the case. An RDBMS engine requires a materialized intermediate result set before being able to evaluate the TOP condition.
Tip #9: The ORDER BY Clause is Mandatory in a Query Where Sorted Data is Mandatory
There are many reasons for processing data in a specific order; however, it is imperative to limit the ORDER BY clause to queries where the business requirements call for the data to be in a specific order. An alternative might be to structure, via primary key and/or an Index, in a manner that results in the data in the order needed to satisfy the business requirements without utilizing a SORT operation.
Tip #10: Use ‘Cover’ Indexes Where Possible
The advantage of utilizing more than one index is to gain efficient access into data other than the primary key. However, most indexes are designed in a way that results in a secondary READ of the base table to obtain all data required for the query. A cover index, by design, will include attributes that may not be utilized as predicates, but rather to satisfy the data requirements of the query.
About the Author