Joins between tables of different charset

Not too long ago I got bitten by a slow performing MySQL query even though I had right indices (in fact unique index) and the tables were not big tables. The query which is similar to below was taking more than 20 mins which I ended up killing it.

The issue was one table was storing data in “latin1” while other in “utf8”.
select count(1)
from table_1 T1
left outer join table_2 T2
  on T1.id = T2.id
where T2.id is null
Note: “id”s here are varchars and they are not integers.
T1 had around 100K while T2 has around 800K joining on MySQL5.1, quad machine with 16G RAM.  I expected much quicker response and on running the explain plan I could see the query doing the full table scan!  First I wasn’t sure why this was happening but after digging little bit and found out that one table was defined with “charset=latin1” and another with “charset=utf8”.
MySQL was converting latin1 to utf8 and then performing the join which ended up with full table scan. On converting T2 to utf8 with collation utf8_unicode_ci, the query timing dropped to 10secs or so!
With that said by default now we create all tables with charset=utf8. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s