Special characters in table names

In DW environment, we allow business user to upload data into new tables and run their own adhoc queries when needed. The user interface allows the user to upload up to 10,000 (10K) rows in either comma or tab delimited format to a temporary database/table. The user can specify the table name they want the data to upload to. Many savvy business users can thus run their own SQL queries joining with the data in DW tables.

All this works fine but many a times user provides table names with non-alphanumeric characters including space, forward or backward slash, dollar sign, etc. At some point DBAs noted that managing these tables becoming an issue especially the tables names with special characters. The following monitoring tool helped alleviate or eliminated the problem and also the UI was modified to check for any special characters and remove them before creating users’ temporary tables.

Since MySQL doesn’t provide regular expression (RLIKE or REGEXP) in SHOW TABLE syntax one is forced to use UNIX or other language (awk, perl, python, etc.) for RE features.

The following pulls all table names with special characters including $, \, space, ?, and @. You can add other characters that are of interest to you to this set.

linux> mysql --host=hostname_or_ip  --database=db_name   --user=root  --password=your_pw  --execute="SHOW TABLES" | egrep [$\\\ ?!@]

# This shows tables like (not including double quotes)
"non_alphanum_table_name test it 2"
"non_alphanum_table_name; test it"
"non_alphanum_table_name; test it $"
"non_alphanum_table_name; test it $ \"

Here you have choice either delete those tables or store the name of these tables into a separate table called “ERROR_TABLE_NAME” and work with business users to later modify them. Option 2 is friendly and professional and you won’t annoy users like in option 1 :).

Quickly zap an awk script like

{ print "INSERT INTO ERROR_TABLE_NAME(name) VALUES('" $0 "'); COMMIT;" }

to insert each table name into ERROR_TABLE_NAME table.

You need to store the script in a file (say, error_tbl_name.awk) since at command line passing a single quote in the above string doesn’t work. When you pass a table name to MySQL, you need to quote the name like ‘non_alphanum_table_name test it 2’. But single quote is used by awk which was not easily escaped and errored out.

Now piping all three commands above will result in

linux> mysql --host=hostname_or_ip  --database=db_name   --user=root  --password=your_pw  --execute="SHOW TABLES" | egrep [$\\\ ?!@] | awk -f error_tbl_name.awk
-- The INSERT statements generated for each looks like
INSERT INTO ERROR_TABLE_NAME(name) VALUES('non_alphanum_table_name test it 2'); commit;
INSERT INTO ERROR_TABLE_NAME(name) VALUES('non_alphanum_table_name; test it'); commit;
INSERT INTO ERROR_TABLE_NAME(name) VALUES('non_alphanum_table_name; test it $'); commit;
INSERT INTO ERROR_TABLE_NAME(name) VALUES('non_alphanum_table_name; test it $ \\'); commit;
INSERT INTO ERROR_TABLE_NAME(name) VALUES('non_alphanum_table_name_test_$_!_2'); commit;

Also, note you have commit after each insert, due to fact that the auto commit is turned off and when piping is done mysql command is executed for each table which will generate error otherwise.

As final step execute

linux> mysql --host=hostname_or_ip  --database=db_name   --user=root  --password=your_pw  --execute="SHOW TABLES" | egrep [$\\\ ?!@] | awk -f error_tbl_name.awk |  mysql --host=hostname_or_ip  --database=db_name   --user=root  --password=your_pw

Change these table names to new standard table names and send an email to business users with list of above table names.

There you have it.

Getting MySQL DB size

Many a times you want to find the size of database (schema) for performance monitoring or to allocate more disk space or to take some proactive action on db size.  You can pull relevant information from MySQL INFORMATION_SCHEMA with the following query.  It sums up each table data and its index size for all tables in a single schema. 

SELECT s.schema_name,
CONCAT(IFNULL(ROUND((SUM(t.data_length) +
SUM(t.index_length))/1024/1024,2),0.00),"Mb") total_size,
CONCAT(IFNULL(ROUND((SUM(t.data_length))/1024/1024,2),0.00),"Mb") data_size,
CONCAT(IFNULL(ROUND((SUM(t.index_length))/1024/1024,2),0.00),"Mb") index_size,
CONCAT(IFNULL(ROUND(((SUM(t.data_length)+SUM(t.index_length)) -
SUM(t.data_free))/1024/1024,2),0.00),"Mb") data_used,
CONCAT(IFNULL(ROUND(SUM(data_free)/1024/1024,2),0.00),"Mb") data_free,
IFNULL(ROUND((((SUM(t.data_length)+SUM(t.index_length)) -
SUM(t.data_free))/((SUM(t.data_length) +
SUM(t.index_length)))*100),2),0) pct_used,
SUM(t.table_rows) total_rows,
COUNT(t.table_name) total_tables,
CONCAT(ROUND(MAX(t.data_length+t.index_length)/1024/1024, 0.00),
"Mb") biggest_table_size,
GROUP_CONCAT(distinct t.engine) engines_used,
GROUP_CONCAT(distinct t.table_collation) collations_used
ON s.schema_name = t.table_schema
WHERE s.schema_name = "test_dw_advertising"
GROUP BY s.schema_name\G

*************************** 1. row ***************************
schema_name: test_dw_advertising
total_size: 212151.14Mb
data_size: 154660.83Mb
index_size: 57490.31Mb
data_used: 210039.45Mb
data_free: 2111.69Mb
pct_used: 99.00
total_rows: 309940227
total_tables: 118
biggest_table_size: 74360Mb
engines_used: InnoDB,MyISAM
collations_used: latin1_swedish_ci
1 row in set (1 min 38.26 sec)

The total size of allocated (including free data size) is nearly 212G, out of which data used ~155G and indexes used ~57G for 118 tables leaving 2G free (99% of disk allocated is used). Biggest table used 74.3G disk space and this schema has both InnoDB and MyISAM tables using single collation latin1_swedish_ci.


Talend – Oracle – Java Path error

Few weeks ago on of our QA systems I installed Oracle 10g Express for testing and installation was smooth and all went well.  But recently, I needed to do some data profiling in a specific table in MySQL DW environment due to fact that some of queries were taking more than twice the amount of normal runs.  Specifically, one query that used to take 5-6 minutes churning through couple of million rows in a staging table started taking more than 15 minutes.  I suspected some data issue but needed to quickly run some tests knowing how the recent data profiles against earlier weeks data. 

I downloaded Talend Data Profiler to this new test machine (where I had Oracle 10g) and tried to install it.   Soon after double clicking on exe file, I got an error indicating something like “org.talend.rcp.branding.top.product could not be found.” in configuration\12725000123.log.  The log also has some more information regarding which Java version it is trying to use and command line options passed to run it.  The error itself doesn’t indicate much and this wasted quite a bit of time.

Going through manual of Talend I found out that profiler needs Java version 1.5 or later version.  But I was pretty sure that I had the latest Java environment on this box and checked twice by running java -version in cygwin and checking the path.  Also by visiting http://www.java.com then clicking on “Verify Installation” button.  They both indicated latest version.

cygwin$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
cygwin$ which java

Then on checking Windows “path” through
Start->Settings->Control Panel->System->Advanced Tab->Environment Variables->path, 
found out that Oracle had stuck its own java directory (version 1.4.2) in the beginning which was throwing off Talend installation.  I simply moved the oracle’s path variable values
to the end of path string and restarted the installation.  This time data profiler found the right Java and all went well.

I wish Talend had thrown a better exception indicating which version of Java it needs and which version it found.