Yahoo SEM isBooksClosed

As a part of Internet Marketing and SEM, you will need to pull daily summary of keyword performance reports from search engines like Google, Yahoo and MSN.

In case of Yahoo, you should check “isBooksClosed” to get the correct data for your report dates; meaning the data you pulled will not change if isBooksClosed is true and Yahoo analytics has collected all the data for the date you requested the report.

This is especially important if you are pulling the previous day data early in the morning as Yahoo may not have had time to get collect and complete the analytics. We have run into many times books not closed condition for few accounts out of many tens of accounts. We then need to repull the data at later time of the day and run the daily summaries.

I have reproduced the part of Yahoo explaination related to this from

Books Closed

It is recommended that you wait until the books are closed before running a report. Use the isBooksClosed operation to periodically check the status of books. After the report is run, you may also check the information in the report header. See Report Headers.

isBooksClosed = (Books Closed Time > Inclusive End Date)

If the books closed time is greater than the inclusive end date, the full 24 hours of the report end date, then isBooksClosed is true and the data in the report is considered complete. If the books closed time is less than the inclusive end date, then isBooksClosed is false and the data in the report is considered incomplete.

Say your report dates are 2009.11.01 to 2009.11.03 (November 1, 2009 to November 3, 2009) and you made a API request on Nov.4th morning 3am for an account’s “KeywordSummaryByDay” and there is a possibility that Yahoo has not closed the book for Nov.3rd data.  By making a SOAP call to BasicReportService with report name “isBooksClosedStatus” you be sure whether data is finalized on Yahoo side for Nov.3rd.  I have seen accounts or campaigns with tens of thousands of keywords not closed by 3am the next day.  So make sure that you are pulling the right data for your DW.

So, always make sure that you check for isBooksClosed.

Yahoo SEM daily report – Currency and Dups handling

If you are pulling daily keyword performance data from major search engines to store in your in Data Warehouse, you need to perform many of transformation before having a daily, weekly or monthly summaries available for business (end) users.  In the case of Yahoo the required transformations that I needed to make are, one to handle the currency conversion and two to handle Yahoo sending more than one keyword when only one is expected. This has seem to happen when query or landing page url is modified for the same bid unit and for high impressions queries. Also, for disabled queries/keywords.

The following sql handles both with data read from staging table and loaded into final table after some transformation and cleaning. It does currency conversion for all countries except US and Mexico (units are in US dollars) which do not need conversions.

Sub-query below uses MySQL GROUP_CONCAT to find any duplicate keywords for the bid unit and dim_curreny has the daily currency exchange rates pulled from external third party services early in the morning. se_query_group table has the grouping of the keywords based on company requirements and search engine. Query also filters out any keyword that has zero impressions.

The query can further be modified for better performance of using query_id by joining with data warehouse query tables and using query_id in the joins also using search engine bid unit as the surrogate key.

(se_date, search_engine, account_id, campaign_id, ad_group_id, keyword_id
,domain, url, query_group_id
,query, query_num_2
,impressions, clicks, ctr
,cpc_orig, cpc
,revenue_orig, revenue
,cost_orig, cost
,avg_position )
SELECT KP.se_date, KP.search_engine, KP.account_id, KP.campaign_id
,KP.ad_group_id, KP.keyword_id,
,KP.domain, KP.url, KP.query_group_id
WHEN query2 NOT RLIKE '---' THEN query2 # Normal single kw
WHEN query2 RLIKE '---' THEN substring_index(query2, '---', 1) # Get 1st kw
END query2
WHEN query2 RLIKE '---' THEN substring_index(substring_index(query2, '---', 2), '---', -1)
END query_num_2 # Get 2nd query
,cpc cpc_orig
WHEN C.country_code not in ('US', 'MX') THEN cpc / C.currency_rate
ELSE cpc
END cpc
,revenue revenue_orig
WHEN C.country_code not in ('US', 'MX') THEN revenue / C.currency_rate
ELSE revenue
END revenue
,cost cost_orig
WHEN C.country_code not in ('US', 'MX') THEN cost / C.currency_rate
ELSE cost
END cost
WHEN avg_position > 0 and avg_position < 1 THEN 1
ELSE avg_position
END avg_position
SELECT se_date, search_engine
,account_id, campaign_id, ad_group_id, keyword_id,
,domain, url, query_group_id, query
,GROUP_CONCAT(query SEPARATOR '---') query2
,SUM(impressions) impressions, SUM(clicks) clicks
,AVG(ctr) ctr, AVG(cpc) cpc, 
,AVG(revenue) revenue, 
,SUM(cost) cost, AVG(avg_position) avg_position
FROM stg_keyword_perf
WHERE se_date >= '' #-- Date here
GROUP BY se_date ,account_id ,campaign_id ,ad_group_id ,keyword_id
) KP #- stg_kw_perf
JOIN se_query_group QG
ON KP.query_group_id = QG.query_group_id AND KP.search_engine = QG.search_engine
JOIN dim_currency C
ON KP.se_date = C.currency_date AND QG.country_code = C.country_code
WHERE KP.se_date = '' #-- Date here
AND KP.impressions 0