Before making any performance improvements, one need to measure it either you are trying to improve computer applications or DW queries, etc. In one of the Data Warehouse subject area where I run more than 120K (120,00) queries everyday in an ETL environment, I log all queries to a table and track over time to see which queries are showing performance issue(s). With the following table, I was able to improve the performance by more than 50% some time. For a query that repeatedly runs with different parameters in where clause, a small improvement adds up quickly and other times due to data skewing you can clearly see any changes needed to improve performance. Other times any mistakes in indexes (DBA dropped it 🙂 or application user modified their query, etc. you will have a reference point to check against why a particular query or queries are slow.
Column | Null? | Type | Comment |
---|---|---|---|
ID | No | int (11) | Auto Increment |
RUN_NUMBER | No | tinyint | Useful when job is recovered or re-ran |
APPLICATION | No | varchar(128) | Application name to track |
DOMAIN | Yes | varchar(128) | Application domain or subject area |
QUERY | No | varchar(4096) | Actual query |
DURATION | No | decimal(10,5) | How long did query run? |
ROW_COUNT | Yes | int | Number of rows affected |
RESULT | No | varchar(32) | Query result type – succeeded or failed? |
COMMENT | Yes | varchar(128) | User comment that can be logged with each query |
CREATED_DT | No | date | Query run date |
CREATED_DT_TM | No | timestamp/ datetime | Query run date and time |
CREATED_BY | Yes | varchar(64) | Query user name |
In the above table, one can use CREATED_DT_TM wherever CREATED_DT is needed while doing query performance analysis but most of these analyzes are done at date level (as most of DW ETL and summarizations are daily processes). By having date column and indexed, the query analysis queries are much faster as there is no need to apply date function on each row.
More interesting analysis including job failures, errors and query result set (ROW_COUNT) can be done with this data. You can also analyze specific database query performances. For example when a same query runs against different (mysql) databases (say, more than 100), some databases show performance getting worse due to table size or load, etc.
Once you know that there is some query performance issue, you can take remedial action. Let me know how you have implemented query performance monitoring?
Cheers,
Shiva