MySQL Toad result set filter and compare features

When you work daily in a data warehouse environment, you will inevitably compare or filter or summarize data for further analysis or debugging.  The result set shown in MySQL Toad 4.5 has couple of very nice features that will help you compare or filter the results quickly – right in the “Results” pane.  In the picture below I was calculating Herfindahl Index for a group of AdWords accounts.  The query resulted in date, some daily metric  and its index; and if I am further interested in filtering for specific range of index, for example, I can point the mouse at  Herfindahl_Index column and click on “filter” icon. See below.

Also, by comparing the data between two result sets, the data comparison feature avoids bringing data to Excel or running sub-queries.  For example, I modified the original query and ran it again and wanted to quickly see for any difference in numbers between two results sets at date level, which I could by just doing the data comparison between result 7 & 8 below.  I know I have couple of ways of comparing the data as noted above, but being able to compare in Result set pane was the quickest – didn’t need to modify the query nor move the data.

Finally, you can use Pivot & Chart by dragging and dropping the columns to the grid area.  For small data sets this is handy.

Hope that helps,
Shiva

Character set, Collation

Here is an interesting challenge that I ran into in the Data Warehouse environment. While debugging a particular business user query, I joined between a string built in a sub-query to a column in a particular fact table with some 20million rows. BTW, I was using MySQL 5.0 and Toad 4.5 to run this query and instead of getting any result I got the following error!

MySQL Database Error: Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '='    22    0

The query was simple and similar to

SELECT s.str, s.d1, s.d2
FROM (
SELECT distinct 'string' str, dimension1 d1, dimension 2 d2
FROM table_1
WHERE ...
) s
LEFT OUTER JOIN table_2 t2
ON ...
WHERE s.str = t2.str
AND ...

The db error was indicating me that I was using two different collations for string comparison and shouldn’t be doing so. That left me with either forcing one collation to be converted to other by force or look closely at server, database, table, column character set or collations and change it appropriately, if needed.

By default Toad sets the character_set_client, character_set_collation to ‘utf8’ and the column that joined in table 2 was in ‘latin1’. Here is the first solution:

SELECT s.str, s.d1, s.d2
FROM (
SELECT distinct 'string' str, dimension1 d1, dimension 2 d2
FROM table_1
WHERE ...
) s
LEFT OUTER JOIN table_2 t2
ON ...
WHERE s.str COLLATE utf8_general_ci = t2.str
AND ...

Here the COLLATE is used in WHERE clause but it can also be used in SELECT or GROUP BY or other aggregate functions.

Then looking at server, database’s related variables indicated that they were in default setting.

character_set_server = latin1
collation_server = latin_swedish_ci
character_set_database = latin1
collation_database = latin_swedish_ci

and so was the table 2.

By setting the collation_connection and character_set_client to latin, I could effectively use the latin collation. Here is solution #2.

SET character_set_client = 'latin1';
COMMIT;
SET collation_connection = 'latin1_swedish_ci';
COMMIT;

Note, by setting the collation_connection I also set the character_set_connection to respective character set. After this, the query ran successfully.

Enjoy,
Shiva



Introduction and using MySQL Explain

Like in any database query optimization is critical for MySQL data warehouse environment and having a better understanding of the “Explain plan” helps the database application developer avoiding issues with query  performance. Also, DBAs will like your queries and they would be more than happy to help you optimize them.

MySQL’s “Explain” statement provides details on query parsing and execution steps and outputs 10 fields –
id, select_type, table, type, possible_keys, key, key_len, ref, rows and Extra columns.

You run the statement by issuing

explain  #-- Insert your query between . 

Shown below is the Toad’s output of explain plan of a self-joined table’s query with where clause. The table has around 20million rows. The query took less than 0.6seconds to index through ~70K rows and with “const” and a func (date_add function) to compare between one days data to its previous day data.

SELECT a.ad_date, a.unit_id, a.max_cpc, 
SUM(coalesce(a.max_cpc, a.max_cpc) - coalesce(b.max_cpc, a.max_cpc)) diff_cpc
FROM sem_kw_summary a
LEFT JOIN sem_kw_summary b
ON date_add(a.ad_date, INTERVAL -1 day) = b.ad_date
AND a.unit_id = b.unit_id
WHERE date_add(CURRENT_DATE, INTERVAL -1 day) = a.ad_date
AND a.engine = 'google'
GROUP BY a.ad_date, a.unit_id

The most important columns that you should look are “type”, “key_len”, “ref” and “rows”.  They quickly help you concentrate your tuning effort.

Column Column details
id An identifier and a sequential number of each selection within the query.
selecty_type A “type of select” with 9 possible values from simple to more complex type like derived, uncachable subquery, etc. Correlated subqueries are very costly in MySQL and avoid them.
table Name of the table or alias it is referring.
type A join or data access type. Can have 12 values like const, system, ref, eq_ref, etc. Full table scan happens when the value is “ALL” and try avoiding it. For fast performing queries you should see “const or system or eq_ref or ref”.
possible_keys Many possible keys that could be of use in this query (join).
key One of the many keys from possible_keys. A key can be composite key.
key_len Length of each key (in bytes). Smaller the key_len better it is for performance.
ref The columns that will be compared to the index in key column above.
rows Number of rows engine has to examine. It is an estimate. I wish it was a percentage of table size in rows. Obviously, smaller number is better.
Extra Can have many values and provides more details on the explain plan. Typically, “using index, using where” are good and don’t be confused with when you see “using filesort” – it does not mean MySQL will use file transferring data between main memory and disk. Filesort is a type of sort with an extra pass needed to retrieve the data.

For more details see http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

Cheers,
Shiva