When you work daily in a data warehouse environment, you will inevitably compare or filter or summarize data for further analysis or debugging. The result set shown in MySQL Toad 4.5 has couple of very nice features that will help you compare or filter the results quickly – right in the “Results” pane. In the picture below I was calculating Herfindahl Index for a group of AdWords accounts. The query resulted in date, some daily metric and its index; and if I am further interested in filtering for specific range of index, for example, I can point the mouse at Herfindahl_Index column and click on “filter” icon. See below.
Also, by comparing the data between two result sets, the data comparison feature avoids bringing data to Excel or running sub-queries. For example, I modified the original query and ran it again and wanted to quickly see for any difference in numbers between two results sets at date level, which I could by just doing the data comparison between result 7 & 8 below. I know I have couple of ways of comparing the data as noted above, but being able to compare in Result set pane was the quickest – didn’t need to modify the query nor move the data.
Finally, you can use Pivot & Chart by dragging and dropping the columns to the grid area. For small data sets this is handy.
Hope that helps,
Like in any database query optimization is critical for MySQL data warehouse environment and having a better understanding of the “Explain plan” helps the database application developer avoiding issues with query performance. Also, DBAs will like your queries and they would be more than happy to help you optimize them.
MySQL’s “Explain” statement provides details on query parsing and execution steps and outputs 10 fields –
id, select_type, table, type, possible_keys, key, key_len, ref, rows and Extra columns.
You run the statement by issuing
explain #-- Insert your query between .
Shown below is the Toad’s output of explain plan of a self-joined table’s query with where clause. The table has around 20million rows. The query took less than 0.6seconds to index through ~70K rows and with “const” and a func (date_add function) to compare between one days data to its previous day data.
SELECT a.ad_date, a.unit_id, a.max_cpc,
SUM(coalesce(a.max_cpc, a.max_cpc) - coalesce(b.max_cpc, a.max_cpc)) diff_cpc
FROM sem_kw_summary a
LEFT JOIN sem_kw_summary b
ON date_add(a.ad_date, INTERVAL -1 day) = b.ad_date
AND a.unit_id = b.unit_id
WHERE date_add(CURRENT_DATE, INTERVAL -1 day) = a.ad_date
AND a.engine = 'google'
GROUP BY a.ad_date, a.unit_id
The most important columns that you should look are “type”, “key_len”, “ref” and “rows”. They quickly help you concentrate your tuning effort.
||An identifier and a sequential number of each selection within the query.
||A “type of select” with 9 possible values from simple to more complex type like derived, uncachable subquery, etc. Correlated subqueries are very costly in MySQL and avoid them.
||Name of the table or alias it is referring.
||A join or data access type. Can have 12 values like const, system, ref, eq_ref, etc. Full table scan happens when the value is “ALL” and try avoiding it. For fast performing queries you should see “const or system or eq_ref or ref”.
||Many possible keys that could be of use in this query (join).
||One of the many keys from possible_keys. A key can be composite key.
||Length of each key (in bytes). Smaller the key_len better it is for performance.
||The columns that will be compared to the index in key column above.
||Number of rows engine has to examine. It is an estimate. I wish it was a percentage of table size in rows. Obviously, smaller number is better.
||Can have many values and provides more details on the explain plan. Typically, “using index, using where” are good and don’t be confused with when you see “using filesort” – it does not mean MySQL will use file transferring data between main memory and disk. Filesort is a type of sort with an extra pass needed to retrieve the data.
For more details see http://dev.mysql.com/doc/refman/5.0/en/using-explain.html