Yahoo SEM isBooksClosed

As a part of Internet Marketing and SEM, you will need to pull daily summary of keyword performance reports from search engines like Google, Yahoo and MSN.

In case of Yahoo, you should check “isBooksClosed” to get the correct data for your report dates; meaning the data you pulled will not change if isBooksClosed is true and Yahoo analytics has collected all the data for the date you requested the report.

This is especially important if you are pulling the previous day data early in the morning as Yahoo may not have had time to get collect and complete the analytics. We have run into many times books not closed condition for few accounts out of many tens of accounts. We then need to repull the data at later time of the day and run the daily summaries.

I have reproduced the part of Yahoo explaination related to this from

Books Closed

It is recommended that you wait until the books are closed before running a report. Use the isBooksClosed operation to periodically check the status of books. After the report is run, you may also check the information in the report header. See Report Headers.

isBooksClosed = (Books Closed Time > Inclusive End Date)

If the books closed time is greater than the inclusive end date, the full 24 hours of the report end date, then isBooksClosed is true and the data in the report is considered complete. If the books closed time is less than the inclusive end date, then isBooksClosed is false and the data in the report is considered incomplete.

Say your report dates are 2009.11.01 to 2009.11.03 (November 1, 2009 to November 3, 2009) and you made a API request on Nov.4th morning 3am for an account’s “KeywordSummaryByDay” and there is a possibility that Yahoo has not closed the book for Nov.3rd data.  By making a SOAP call to BasicReportService with report name “isBooksClosedStatus” you be sure whether data is finalized on Yahoo side for Nov.3rd.  I have seen accounts or campaigns with tens of thousands of keywords not closed by 3am the next day.  So make sure that you are pulling the right data for your DW.

So, always make sure that you check for isBooksClosed.

Google’s extended searches

Internet search users are very well aware of Google search engine and typically they end up entering their query or keywords in the search box and click the resulting links of interest/relevant to them (typically first page results and especially among top 3 o 4 results).  Large percentage of users seem to be unaware of Google’s extended search that can provide exact result he or she is looking for without having to hop to one more site.

For example, weather for a particular city or current time at a location or dictionary or area code or sports scores and many more.   Following are few examples.

For weather: To find current weather at Boston, USA just enter keywords “weather Boston, USA” and you will see today and next few days weather in the first result!

For time: To find current time at Bangalore, India (or Bengaluru, India) enter “time Bengaluru, India” and first result is the time.

For dictionary: use “define” keyword first before entering your query like “define avatar”

For stocks: This one is little too specific for many users because one need to know specific stock’s ticker symbol and there more than 10,000 of the them in US.  It doesn’t seem to work for international stocks.

Area code: Enter US 3 digit area code to get phone’s area code info.

Fill in the Blank:  My favourite when searching something for kids home work or a pop quiz.  Try “Einstein got Nobel prize in *” or “Earth circumference is * miles”

There are many more that can save time and typing or mouse clicks! Check out Google’s search tips.

Better yet for all these queries just use “WolframAlpha knowledge engine”.


Yahoo SEM daily report – Currency and Dups handling

If you are pulling daily keyword performance data from major search engines to store in your in Data Warehouse, you need to perform many of transformation before having a daily, weekly or monthly summaries available for business (end) users.  In the case of Yahoo the required transformations that I needed to make are, one to handle the currency conversion and two to handle Yahoo sending more than one keyword when only one is expected. This has seem to happen when query or landing page url is modified for the same bid unit and for high impressions queries. Also, for disabled queries/keywords.

The following sql handles both with data read from staging table and loaded into final table after some transformation and cleaning. It does currency conversion for all countries except US and Mexico (units are in US dollars) which do not need conversions.

Sub-query below uses MySQL GROUP_CONCAT to find any duplicate keywords for the bid unit and dim_curreny has the daily currency exchange rates pulled from external third party services early in the morning. se_query_group table has the grouping of the keywords based on company requirements and search engine. Query also filters out any keyword that has zero impressions.

The query can further be modified for better performance of using query_id by joining with data warehouse query tables and using query_id in the joins also using search engine bid unit as the surrogate key.

(se_date, search_engine, account_id, campaign_id, ad_group_id, keyword_id
,domain, url, query_group_id
,query, query_num_2
,impressions, clicks, ctr
,cpc_orig, cpc
,revenue_orig, revenue
,cost_orig, cost
,avg_position )
SELECT KP.se_date, KP.search_engine, KP.account_id, KP.campaign_id
,KP.ad_group_id, KP.keyword_id,
,KP.domain, KP.url, KP.query_group_id
WHEN query2 NOT RLIKE '---' THEN query2 # Normal single kw
WHEN query2 RLIKE '---' THEN substring_index(query2, '---', 1) # Get 1st kw
END query2
WHEN query2 RLIKE '---' THEN substring_index(substring_index(query2, '---', 2), '---', -1)
END query_num_2 # Get 2nd query
,cpc cpc_orig
WHEN C.country_code not in ('US', 'MX') THEN cpc / C.currency_rate
ELSE cpc
END cpc
,revenue revenue_orig
WHEN C.country_code not in ('US', 'MX') THEN revenue / C.currency_rate
ELSE revenue
END revenue
,cost cost_orig
WHEN C.country_code not in ('US', 'MX') THEN cost / C.currency_rate
ELSE cost
END cost
WHEN avg_position > 0 and avg_position < 1 THEN 1
ELSE avg_position
END avg_position
SELECT se_date, search_engine
,account_id, campaign_id, ad_group_id, keyword_id,
,domain, url, query_group_id, query
,GROUP_CONCAT(query SEPARATOR '---') query2
,SUM(impressions) impressions, SUM(clicks) clicks
,AVG(ctr) ctr, AVG(cpc) cpc, 
,AVG(revenue) revenue, 
,SUM(cost) cost, AVG(avg_position) avg_position
FROM stg_keyword_perf
WHERE se_date >= '' #-- Date here
GROUP BY se_date ,account_id ,campaign_id ,ad_group_id ,keyword_id
) KP #- stg_kw_perf
JOIN se_query_group QG
ON KP.query_group_id = QG.query_group_id AND KP.search_engine = QG.search_engine
JOIN dim_currency C
ON KP.se_date = C.currency_date AND QG.country_code = C.country_code
WHERE KP.se_date = '' #-- Date here
AND KP.impressions 0


Ruby libmysql.dll or error

Installing Ruby MySQL module is pretty straight forward either on *nix or Windows by running

> gem  install  mysql

But on Windows when you try to import the module through “require mysql” you may get an error like

C:/Ruby/lib/ruby/gems/1.8/gems/mysql-2.8.1-x86-mswin32/lib/1.8/ 126: The specified module could not be found.   - C:/Ruby/lib/ruby/gems/1.8/gems/mysql-2.8.1-x86-mswin32/lib/1.8/ (LoadError)
# from C:/Ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'

In that scenario, you are missing “libmysql.dll” in the system32 directory.  In my case I had MySQL server copy and had the dll in folder C:/Program Files/MySQL/MySQL Server 5.4/lib/opt

From cygwin window do

> cp  /cygdrive/c/Program\ Files/MySQL/MySQL\ Server\ 5.4/lib/opt/libmysql.dll   /cygdrive/c/WINDOWS/system32/libmysql.dll

Now run your Ruby db test program

require "mysql"

dbh = Mysql.real_connect("db_server_host", "user_name", "password", "database_name")
puts "Server version:" + dbh.get_server_info
rescue Mysql::Error => err
puts "Error code: #{err.errno}"
puts "Error: #{err.error}"
dbh.close if dbh


Mapping URL to top level domain

When parsing web logs as part of web analytics, one might be interested in the top level domain analysis before drilling down further.  Following query maps an url (referer) to top level domain.  Top level domain include not only generic ones like ‘.com’, ‘.edu’, ‘.org’, etc but also country level top levels like ‘.uk’, ‘.de’, etc.  You can download the list from IANA (Internet Assigned Numbers Authority) .

After downloading it import into a table dim_int_mktg_tld which you can join with the staging data.

CREATE TABLE `dim_int_mktg_tld` (
`tld` varchar(16) NOT NULL,
`type` varchar(32) default 'country-code', #-- Can be country-code or generic
`info` varchar(500) default NULL,

#-- Join with yesterdays staging data and get TLD
SELECT WL.url, TLD.tld
FROM stg_weblog_date WL
LEFT OUTER JOIN dim_int_mktg_tld TLD
ON substring_index(
substring_index(url, '://', -1),
'.',-1) = TLD.tld
WHERE = DATE_ADD(CURRENT_DATE, interval -1 day)

#-- Example:
#-- Innermost substring_index returns everything to the right of '://'
#-- i.e,,
#-- then the middle one gets everything to the left of first '/'
#-- i.e,
#-- then the outer most substring_index returns everything to the right of last '.'
#-- i.e, au
#-- which is then joined with TLD table


Download Google keywords through SUDS

There are sometimes situations where you may want to download all the keywords (criteria) uploaded to Google Adwords and if you have substantial keyword basket (say, more than a million), then either downloading manually or checking through the tools provided by Google become prohibitive.  Also, Google daily keyword performance report provides metrics for all the keywords that have had at least one impression.

Following application is a simple tool that downloads all keywords and dumps that data to different files.    I have removed error handling code and hardcoded many values to simplify it and you can modify this your requirements and don’t forget to add your account and API details.  I have been able to download few million keywords using multiple processes.

Ubuntu karmic OS running Python 2.6.4 for Adwords API version 13 (V13) using Python SUDS SOAP module.

#-- Python application to pull all your campaign keywords.
#-- Uses suds SOAP module to connect to Google Services and writes to files (one for each client email).
#-- Get list of all client emails (typically multiple accounts created to handle large set of keywords)
#-- For each account get its campaigns
#-- For each campaign get its adgroups
#-- For each adgroup get all keywords
#-- print ids and keyword attributes

import sys
import time
import re

import traceback as tb
import suds.metrics as metrics

from suds import WebFault
from suds.client import Client

#-- A class to handle user details...
class GoogleSoapUser:
def __init__(self):
self.user = {'email' : 'your_developer_api_email_addr',
'password' : 'your_developer_password',
'useragent' : 'Google Adwords KW pull V0.1',
'developerToken' : 'your_developer_token',
'applicationToken' : 'your_application_token'
#-- Note: connecting to production (adwords); for sandbox testing replace it.
self.wsdl = {'campaign' :'',
'adGroup' :'',
'keyword' :''

def getUser(self):
return self.user

def getWSDL(self, k=None):
return self.wsdl[k]

def getClientEmails(self): #-- Client emails to pull data for
emails = ['your_campaign_ac_email_addr_1',
return emails

def setHeaderEmail(self, clientEmail=None):
self.user['clientEmail'] = clientEmail

#-- Main program
if __name__ == '__main__':
gUser = GoogleSoapUser()

#-- For each client email...
for cEmail in gUser.getClientEmails():
#-- Open a output file to dump...
print "Checking account: ", cEmail, "\n"
file = 'google_' + re.split('@', cEmail)[0] + '.txt'
f = open('/tmp/'+file, 'w')
except IOError as err:
print "Unable to open file", file, format(err)

#-- Set the SOAP header with client email

#-- Get all campaigns...
campClient = Client(gUser.getWSDL('campaign'))
campaigns = campClient.service.getAllAdWordsCampaigns(0) #-- Pass a dummy number...

#-- For each campaign ...
for campaign in campaigns:
campaignID = campaign['id']
print "Checking for campaign: ", campaignID, "\n"
adClient = Client(gUser.getWSDL('adGroup'))
adGroups = adClient.service.getAllAdGroups(campaignID)

#-- For each AdGroup...
for adGroup in adGroups:
adGroupID = adGroup['id']
print "Checking for adGroup: ", adGroupID, "\n"
kwClient = Client(gUser.getWSDL('keyword'))
keywords = kwClient.service.getAllCriteria(adGroupID)

#-- For each keyword...
for kw in keywords:
f.write(str(campaignID) + '\t' +
str(adGroupID) + '\t' +
str(kw['id']) + '\t' +
kw['type'] + '\t' +
kw['text'] + '\t' +
str(kw['qualityScore']) + '\t' +
kw['destinationUrl'] + '\n'
# sys.exit()

#-- Close this account output file...
print "Data pull complete for account ", cEmail, "\n"