SEM – Google Adwords API Data Retriever

If you are running many large campaigns or thousands of campaigns and want to track keyword performance of Google Adwords, the following application helps retrieve the data on regular intervals (daily, weekly, monthly, etc.). Using your data warehouse and reporting infrastructure you can slice and dice the data based on your business requirements.

Application is a multi-threaded implementation of pulling data through API. Each thread gets a campaign id to pull keyword performance report.  The code included here is a skeletal code and you need to add error handing, code to suit your database schema and some ETL.  In my environment I have used MySQL 5.0, linux (2.6) and Windows, Perl 5.10.1 and Google Adwords API v13.  You can also use Google’s client libraries available in many languages including Java, Python, DotNet, Perl, PHP and others.


Other posts that may be of interest:
1. Google Analytics Data Retriever
2. Bulk data loading using Talend
3. Google Keyword Performance data pull


Campaign Class:

package Campaign;

#-------------------------------------
# Campaign Package
#-------------------------------------

use strict;
use DBI;

sub new {
my $self = {};
$self->{campIds} = []; #-- Master campaign ID array

bless($self);

$self->_populateCampaignIds();
return $self;
}

sub _populateCampaignIds {
my $self = shift;

#--------------------------
#- Connect to database to get list of campaigns (IDs).
#--------------------------
#- my $dbh = DBI->connect($data_source, $user_name, $pw);
#- my $sth = $dbh->prepare(
#- "SELECT campaign_id from campaign
#- WHERE search_engine = 'google'
#- AND status = 'active';
#- ";
#- $sth->execute() or die "Unable to execute a query.\n";
#- while ( @id = $sth->fetchrow_array ) {
#- push (${$self->{campIds}}, $id[0]);
#- }
#-
#- $dbh->disconnect;

#---- For demo, I have hard-coded some IDs.
push( @{$self->{campIds} }, '1111');
push( @{$self->{campIds} }, '2222');
push( @{$self->{campIds} }, '3333');

return 1;
}

#----------------------------
#- Returns master list of campaign IDs
#----------------------------
sub getCampaignIds {
my $self = shift;

return @{ $self->{campIds} };
}

#----------------------------
#- Returns chunk of campaign IDs from master array.
#----------------------------
sub getCampaignIdChunk {
my $self = shift;
my $chunks = shift;
my $index = shift;

if (!defined $index and !defined $chunks) {
return undef;
}

my $chunkSize = int (scalar(@ {$self->{campIds}})/$chunks);
if ($chunkSize < 1) {
print "Error: no campaigns to process for thread $index\n";
return undef;
}
my $lIx = $chunkSize * ($index - 1); #-- low index
my $hIx = $chunkSize * ($index); #-- high index

my @campIds = ();
if ($index < $chunks) {
@campIds = splice (@ {$self->{campIds}}, $lIx, $chunkSize);
} else {
#- print "Getting LAST chunk for $index from lower $lIx element and a chunk size of $chunkSize\n";
@campIds = splice (@ {$self->{campIds}}, $lIx);
}

return @campIds;
}

1;

Prototype application:

#-------------------------------------------------
# Adwords Data Retriever
# This is skeleton application to pull Adwords Keywords Performance Report Data
# from Google through Adwords API.
# Uses multi-threaded programming technique for simultaneous pull of many campaigns.
# Campaign ids in the application are hard-coded with dummy values and hence replace them by connecting
# to your database and getting actual campaign ids (code shown).
# Also, for authentication use your user name, password, developer & application code.
# Valid for Google Adwords v13.
# Author: Shiva M.
#-------------------------------------------------

#- use lib "Add directory where you installed Campaign.pm";
use strict;
use Benchmark;
use Getopt::Std;
use Data::Dumper;

use XML::Simple;
use LWP::Simple;
use Crypt::SSLeay;
use SOAP::Lite;
use IO::Uncompress::Gunzip qw();

use threads;
use threads::shared;

use Campaign;

my (%opts, @worker) = undef;
my $NUM_THREADS = 3; #-- Modify this to suit your environment. Or better use Config module with ini file to setup application configuration.
my $GOOGLE_API = 'https://adwords.google.com/api/adwords/v13';

#-------------------------------------------------
# Main program.
#-------------------------------------------------
getOpts();

if ($opts{'s'} == 1) {
for (my $i = 1; $i < ($NUM_THREADS+1); $i++) {
print "MASTER: Starting thread " . $i, "\n";
$worker[$i] = threads->new(\&pullCampaignData, $i);
sleep 5;
}

while(anyRunningThread()) {
for (my $ii = 1; $ii < scalar(@worker); $ii++) {
my $thr = $worker[$ii];
if ($thr->is_joinable) {
my $ret = eval { $thr->join() };
print "Thread $ii returned: $ret\n";
}
}
sleep(10);
}
}

exit (0);

#-------------------------------------------------
# Subroutines.
#-------------------------------------------------
sub getOpts {
if (!getopts('s:d:', \%opts)) {
print "Invalid option. Exiting\n";
exit (-1);
}

return 1;
}

#-----------------------------
#- Any thread still running?. Returns Yes/No (1/0)
#-----------------------------
sub anyRunningThread {
my $runThrs = threads->list(threads::running);
print "Still $runThrs threads are running.\n";
if ($runThrs) { return 1; }

return 0;
}

#---------------------------
#- Each thread to pull keyword performance reports for a list of campaigns
#---------------------------
sub pullCampaignData {
my ($tId) = @_;
my $thr = $worker[$tId];

my $to_dollar_conv_factor = 1 /(1000 * 1000); #-- Currency in micros. $1 1000 * 1000;

my $log = "thread_$tId.log";
open (TFH, ">$log") or die "Unable to open file $log\n"; #-- Log file for the thread.

#---------------------------------
#- Get this thread share of load - list of campaign IDs
#---------------------------------
my $campaign = Campaign->new();
my @tCampIds = $campaign->getCampaignIdChunk($NUM_THREADS,$tId);

print TFH "\npullCampaignData \$tId: $tId: campaignIDs:", map {"$_ "} @tCampIds;

my @headers = (SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $ua_name),
SOAP::Header->name('developerToken' => $developer_token),
SOAP::Header->name('applicationToken' => $application_token)
);

my $url = sprintf('https://%s.google.com/api/adwords/v13/ReportService', 'adwords');
my $wsdl = $url . '?wsdl';
my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url);
print TFH "\ndebug:service=$service" if $opts{d};

$service->on_fault(
sub {
my $response = $::ARG[1];
die('The following SOAP fault occurred:', "\n", 'faultcode: ',
$response->faultcode(), "\n",
'faultstring: ', $response->faultstring(),
"\n");
}
);
SOAP::Lite->import(+trace => 'debug') if $opts{d};


#---------------------------
# Make SOAP call.
# Get one campaign data at a time. Generate a file to bulk load into database.
#---------------------------

foreach my $cId (@tCampIds) {

my $AWJobParams = {
'selectedReportType' => 'Keyword',
'name' => 'AW Keyword Perf Daily Report',
'aggregationTypes' => ['Daily'],
'startDay' => $yesterday_date,
'endDay' => $yesterday_date,
'selectedColumns' => [ qw (
KeywordId Keyword CampaignId CampaignStatus
AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType
Impressions Clicks CTR CPC Conversions ConversionRate Cost
AveragePosition QualityScore AverageConversionValue BottomPosition
CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount
DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus
KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC
MaximumCPM PageViewCount PageViewValue SalesCount SalesValue
TopPosition TotalConversionValue Transactions ValuePerClick
ValuePerCost CustomerName ExternalCustomerId )
],
'crossClient' => 1,
'campaigns' => [$cId]
};


#------------------------------
# Step 1. Define a job
#------------------------------
my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => "$GOOGLE_API"});

#------------------------------
# Output XML file after unzipped.
#------------------------------
my $output = "${cId}_kw_perf.xml";


#-------------------------------
# Catch any exception
#-------------------------------
eval {

#------------------------------
# Step 2. Validate report.
#------------------------------
$service->call('validateReportJob' => $report_job, @headers);
print TFH "debug: about to schedule report \n";

#------------------------------
# Step 3. Schedule report.
#------------------------------
my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();
$job_id = SOAP::Data->name('reportJobId' => $job_id);

#------------------------------
# Step 4. Get report status. Wait until finish or error.
#------------------------------
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
while ($status ne 'Completed' and $status ne 'Failed') {
print TFH "Report job status is $status\n";
sleep(30);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}
if ($status eq 'Failed') {
die("Report job generation failed.\n");
}

#------------------------------
# Step 5. Get report URL to download the file.
#------------------------------
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
print TFH "Report is available at $report_url\n" if $opts{d};

#------------------------------
# Step 6. Get the data file (report).
#------------------------------
my $gzip_report_data = LWP::Simple::get($report_url);


#------------------------------
# Step 6a. Save it to local file.
#------------------------------
my $localFile = "${cId}_kw_perf.xml.gz";
open(LFH, ">$localFile") or die "Could not open $localFile: $!\n";
print LFH $gzip_report_data . "\n";
close LFH;


#------------------------------
# Step 7. Unzip the file and build xml hash.
#------------------------------
IO::Uncompress::Gunzip::gunzip(\$gzip_report_data => $output);

my $xml_hash = XMLin($output);
print TFH Dumper($xml_hash) if $opts{d};

#------------------------------
# Step 8. Create database bulk load file.
#------------------------------
open (BLFH, ">${cId}_bulk_load_input.txt") or die "Unable to create bulk load file: $!\n";


#-- Uncomment the fields that are of interest to you.
foreach my $row (@{$xml_hash->{table}->{rows}->{row}}) {
print BLFH
$row->{keywordid},"\t", $row->{keyword},"\t",
$row->{campaignid},"\t",
# $row->{campaign},"\t",
$row->{campStatus},"\t", $row->{adgroupid},"\t",
# $row->{adgroup},"\t",
# $row->{agstatus},"\t",
# $row->{agMaxCpa} * $to_dollar_conv_factor,"\t",
# $row->{adwordsType}, $row->{keywordid}, $row->{keyword}, $row->{kwStatus},
# $row->{customerid}, $row->{acctname}, $row->{kwDestUrl},
$row->{imps},"\t", $row->{clicks},"\t", $row->{ctr},"\t",
# $row->{cpc} * $to_dollar_conv_factor, $row->{conv},
# $row->{convRate},
$row->{cost} * $to_dollar_conv_factor,"\t",
$row->{pos},"\t",
$row->{qualityScore},"\t",
# $row->{avgConvValue} * $to_dollar_conv_factor,
# $row->{bottomPosition}, $row->{cpm} * $to_dollar_conv_factor,
# $row->{costPerConv} * $to_dollar_conv_factor,
# $row->{cpt} * $to_dollar_conv_factor,
# $row->{budget} * $to_dollar_conv_factor, $row->{defaultConv},
# $row->{defaultConvValue} * $to_dollar_conv_factor,
# $row->{firstPageCpc} * $to_dollar_conv_factor,
$row->{kwDestUrl},"\t", $row->{kwType},
# $row->{leads}, $row->{leadValue} * $to_dollar_conv_factor,
# $row->{maxContentCpc} * $to_dollar_conv_factor,
# $row->{maxCpc} * $to_dollar_conv_factor,
# $row->{maxCpm} * $to_dollar_conv_factor, $row->{pageviews},
# $row->{pageviewValue} * $to_dollar_conv_factor,
# $row->{purchases},
# $row->{purchaseValue} * $to_dollar_conv_factor,
# $row->{topPosition},
# $row->{convValue} * $to_dollar_conv_factor,
# $row->{transactions}, $row->{convVpc},
# $row->{valueCostRatio},
"\n";
}
close(BLFH);
};

if ($@) {
print("Report job failed validation due to the following error: ", $@, "\n");
exit (-1);
}
}

close(TFH);
return 1;
}

Bulk loading into data warehouse staging and perform summarization.

#- Psuedo code
#- 1. Cleanup or perform any needed transformation of data in the application.
#- 2. Concatenate all the campaign keyword performance data to a file(s).
#- 3. Bulk load file into a staging table.
#- 4. Transform and summarize the data through SQL.

Enjoy,
Shiva

Adwords Keyword Performance data pull

If you are handling multiple accounts or a large account at Adwords center managing few tens of thousand of keywords to millions of keywords, you would be interested in tracking the keyword performance over a period of time. You can do so by pulling daily summary report from Google through Adwords API and storing the data in your data mart or data warehouse for further analysis.

Pulling the data through API involves 6 steps:
   1. definedReportJob object – Specifies the type of report you want.
   2. validateReportJob – Validate the report that you would like to get. Returns nothing.
   3. scheduleReportJob – Now schedule the job for above object. This request returns an ID.
   4. getReportJobStatus – Get status of the report. If “Completed” go to next.
   5. getReportDownloadUrl – Retrieve the URL where your report is stored. The returned URL is valid for five minutes. If the URL expires, just call getReportDownloadUrl again. Keep this URL private, since anyone can use it to access your report.
   6. getGzipReportDownloadUrl – Download your report by sending an HTTP GET request.

Following pseudo code and code explains each step execution in more detail.  I use Perl SOAP calls to generate Adwords requests. Also, note the following code works well with API version v13 and Google is transitioning many of its Adwords API to v2009. As of Dec.2009 ReportService was still available only in v13.

Before you get to the first step, build a data structure with job parameters & create a SOAP service –

my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url); 
#- $url = 'https://adwords.google.com/api/adwords/v13/CampaignService' ;
#- $wsdl = $url . '?$wsdl' ;
#- $userAgent = "My company name. Adwords v13 API"

my @headers =
(SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $userAgent),
SOAP::Header->name('developerToken' => $developerToken),
SOAP::Header->name('applicationToken' => $applicationToken) );


my $AWJobParams =
{'selectedReportType' => $selectedReportType,
'name' => $reportName,
'aggregationTypes' => $aggregationTypes,
'startDay' => $startDay,
'endDay' => $endDay,
'selectedColumns' => $selectedColumns,
'crossClient' => $crossClient,
'campaigns' => $campaigns
};

#- Example of selected columns
my $selectedColumns = [ qw (
KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )];
Job Parameter What it is
selectedReportType You can request 11 types of reports including keyword performance report. For more info see http://code.google.com/apis/adwords/docs/developer/DefinedReportJob.html
reportName Name of report. Provide something that is meaningful with date.
aggregationType Daily, Weekly, Monthly. This works in combination with selectedReportType and selectedColumns. For more details take a look at http://code.google.com/apis/adwords/docs/developer/adwords_api_reports.html.
startDay Date of the report. Typically yesterday for daily pull.
endDay End date of report.
selecteColumns Columns for this report that you are interested in.
crossClient 1 if you are using MCC.
campaigns Array. If set to 1 pull all the data for all campaigns under MCC.

1. Building the report object

my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => 'https://adwords.google.com/api/adwords/v13'});

2. Validating the report

$service->call('validateReportJob' => $report_job, @headers);

3. Scheduling the job

my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();

4. Check for job completion status

# Job completed?
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();

while ($status ne 'Completed' and $status ne 'Failed') {
sleep(60);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}

5. Get URL to download from and 6. Get the report

# Download report.
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
my $gzipped_report = LWP::Simple::get($report_url);

This gives you crux of downloading the daily keyword performance report and upload the file to a data warehouse staging environment before processing further. For more robust code you need to handle error or exceptions that happen, create output file before loading into database with appropriate field separators (tab, for example), match fields in input file to columns staging table, etc.

MySQL bulk loading:

LOAD DATA  CONCURRENT LOCAL INFILE $input_file_to_load
INTO TABLE $staging_table_name
FIELDS TERMINATED BY '\t'
ESCAPED BY '\'
LINES TERMINATED BY '\n'
(date searchEngine KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )

Cheers,
Shiva

Google NS Individual Search Box

Today, while searching for “cygwin” keyword I was presented with results that caught my eye.  The first result had its own search box.  See picture below.  Had anyone seen this before?  I liked the feature and pretty handy if you are looking for a specific information from a specific site. I entered “latest version” and hit “Search cygwin.com” that showed search entry “latest version site:cygwin.com” in the next search. Nice.


Other posts that may be of interest:
1. Best Google home page logos
2. Better quicker SEM ROI
3. 99 bottles of beer in different laguages


Cheers,
Shiva

Forecasting SEM ROI with Eureqa

There is a is powerful tool to detect equations and hidden relationships among the data sets.  What makes it irresistible tool is its cost – FREE!.  It is Eureqa.  In this blog I will explain how to find relationship between cost and long term revenue for a keyword campaign in Google Adwords.  This is a simplied version of identifying a relationship at macro level and then drilling down further to identify more details.

The sample I have taken is over two months summarized data at daily level for a campaign.  Since cost and revenue are affected by other factors including the keywords or ad words that comprised a campaign, competitors’ bidding, landing page or page optimizations, etc., it is important to note that the following is to get a quick insight into how cost is related to revenue for a particular campaign.  This is one of important arsenal out of few to get better positive ROI.

The data set is shown below in the application’s tab 1(Enter Data).  This very much looks like Excel spreadsheet!.  First column is cost and second is long term revenue.

Related Posts: Generating quick positive SEM ROI

Numbers above (from 1 to 5) points to different tabs I refer to in the article.  Once the data is loaded into respective columns, name the cost column as x and revenue column y and then move to tab 2 (Smooth Data). You can enable smoothing and in the lower panel you will see y vs x curve.  In tab 3 (Pick Modeling Task), select typical operations like subtract, multiple, divide, power, exponential and logarithm that will be used in creating the formula.  I typically leave out sine, cosine and absolute value.

Also, it is important to note that for some reason the application default formula selection is x = f(y).  But in a typical situation the formula is written as y = f(x).  So if you select x as the independent variable and y as dependent variable, once after the application provides the final formula you can derive y from x.  Or in the above tab 1 (Enter Data) reverse x and y which is much simpler (option 2).  Here I will use option 1.

Click on tab 4 (Start Search) and then on “Start” button.  This will begin the evaluation and churns through lot for formulaes to fit the data and you can see the running performance in the five windows.  As it progresses, running for few minutes, you will see the curve fitting the data in tab 5 (Solution Statistics).  With more dispersed data, you can expect complicated formula to be generated. You can pick any formula that is of interest to you and evaluate.  See below with tab 4 and 5.

The formula here I choose for illustration purpose is
f(y) = 0.31y – 557.42, that is x = 0.31y – 557.42 and so to get y, I would do

I crossed checked the formula in Excel sheet and the summed revenue (over two months) was difference of -3.4%.  And running it again at some other time, I got slightly different formula f(y) = 0.31y – 519.16. With this the above difference reduced to -1.18%. Pretty good.

So, with this tool you have got the formula that you can use to do some forecasting or predictive analysis! And you can use the similar technique to find relationships among other dimensions that are of interest to you.

Cheers,
Shiva M.

Quick ROI for SEM (Adwords) Defined

SEM or Internet paid advertisement or PPC or Adwords or what ever one may call, has been on high growth rate for many years now and has shown that it will continue to grow in the near future with the potential to grab market share from other advertisement channels.  This has implications on all companies that would like to have their products or services advertised.  By quickly responding to the market changes and being nimble, they can effectively spend their marketing dollars with positive ROI.


Though I use Google Adwords in the following explaination, you can replace the same logic for Yahoo and MSN. I assume that you are somewhat familiar with natural search, paid search, clicks, impressions and related; and what they mean.  


Related posts: Quick ROI forecasting technique

Keywords:  
Google or other search engines provide you the service to bid for specific keywords/ phrases.  And at the same time they track those keywords performance including clicks, impressions, and many more.  The big advantage is the availability of all this information through APIs (Application Programming Interfaces) and  I suggest you use daily summary to help track them.  The APIs calls are simple to setup and you can get it going with mainly one report called “Placement/ Keyword Performance Report”.
Google takes less than 1/2 an hour to provide the 80K to 100K keyword data. On making the API call to get the report, make the application check every minutes or so and once ready pull the report.  If you are pulling the reports from more than one search engine, it is easy to have separate instances of application and then merge everything in database or data warehouse.  This way you can have better concurrency.
Application:
This is the work horse that pulls the data from search engines.  The application can be single threaded to handle small load or if it is a large load (say, more than 100K keywords), you can use multi-threaded programming techniques.   Google’s v13 or v2009 API calls provide all the important data elements that one need, but in case of Yahoo you need to make two separate calls if you need max bid and keyword type.  But for most business uses you can do without them thus simplifying the process. The complexity araises not because you need to make 2nd call, but Yahoo returns different matching data and you need to take care that you match them to the correct bidding units.  I will provide more information on this in other articles.
Psudeocode or logic:
   Authenticate with Adwords (with your token)
   Make a SOAP request (through Perl, Java, PHP, etc.)
   Dump all the data to a file (Google has more than 50 columns).  Also, do any changes or cleanup, if needed.

Data Warehouse:
Once the data is dumped into file, have another process or same process bulk load the data into staging area of the data warehouse.  Medium or large companies typically have their own internal process setup to track the conversion and revenue.  So, by combining the spend/cost data from Google with your revenue, you can measure the ROI very effectively to the lowest level of granularity – keyword and its type.  You can also slice and dice the data along time, accounts, campaigns, ad groups, geo-location, etc.
Psuedocode or logic:
     Validate staging data
     Join revenue data with cost/spend data
     Summarize with time, bid unit, location, search engines, sites and other dimensions
Bidding:
Once you have the metrics of interest to you (either positive ROI, rate of improvement in clicks/ impressions, etc.), you can create reports for only those keywords/ campaigns that of interest to you.  Now you are ready to increase or decrease the bid amount on the keyword.  Also, note in this step, you should optimize your pages to the keywords so that your quality score is improved.  Just increasing the bid without the well optimized page is throwing the money away.
Psuedocode or logic:
   If daily change to number of keywords is not high, you can do bid changes manually or through excel.
   Else build automated bid framework that helps you
Cheers,

Shiva