Your genome data through API

Recently 23andme.com reduced their prices of DNA test kit to $99 and now you get your hereditary and other related information for price less than $100!  That is a big drop from where it started $999 and then $299.  I know little about genome/ genomics but it is a very interesting, dynamic, fast growing field which has potential to change the way we view health (one now is empowered to know whether they are at any risk of potential congenital diseases now or in the future ) or one’s ancestry!

My interest was in the data that you can play with.  With the API in place you can pull your or demo data.  To do that first I needed to setup data pull through API and following is quick summary of setup I had on my mac.

Some important links:
23andme API
OAuth Introduction
OAuth flow

After creating a developer login account  you can set up the application/client with given credentials – client_id and client_secret.  See below.

For quick testing and pull, I used Dancer web framework on local mac and 23andme uses OAuth2 with 3 legged authentication.  As a first step get the ‘code’ by creating a simple page with link to their login page.

For exmaple link below takes the user to authorize your client and once successfully user logins the authentication dance happens between the client and server.

"a api.23andme.com="" authorize="" href="http://www.blogger.com/" https:="" redirect_uri="http://localhost:5000/receive_code/&response_type=code&client_id=YOUR_CLIENT_ID&scope=basic" rs123="">"Connect with 23andMe. [pid: $$]";

Note: pid above is a process id for me to know when I killed and restarted the Dancer.

User clicks on the link

and then login to authorize your client to access to her resources.  This is the point where ‘code’ is received and exchanged for access_token.

After successful OAuth dance now you can call any of end-points ( https://api.23andme.com/docs/reference/ ).  Here is demo user call ( https://api.23andme.com/1/demo/user/ )

Download Google keywords through SUDS

There are sometimes situations where you may want to download all the keywords (criteria) uploaded to Google Adwords and if you have substantial keyword basket (say, more than a million), then either downloading manually or checking through the tools provided by Google become prohibitive.  Also, Google daily keyword performance report provides metrics for all the keywords that have had at least one impression.

Following application is a simple tool that downloads all keywords and dumps that data to different files.    I have removed error handling code and hardcoded many values to simplify it and you can modify this your requirements and don’t forget to add your account and API details.  I have been able to download few million keywords using multiple processes.

Environment:
Ubuntu karmic OS running Python 2.6.4 for Adwords API version 13 (V13) using Python SUDS SOAP module.

#-- Python application to pull all your campaign keywords.
#-- Uses suds SOAP module to connect to Google Services and writes to files (one for each client email).
#-- Get list of all client emails (typically multiple accounts created to handle large set of keywords)
#-- For each account get its campaigns
#-- For each campaign get its adgroups
#-- For each adgroup get all keywords
#-- print ids and keyword attributes

import sys
import time
import re

import traceback as tb
import suds.metrics as metrics

from suds import WebFault
from suds.client import Client

#-- A class to handle user details...
#--------------------------------------
class GoogleSoapUser:
def __init__(self):
self.user = {'email' : 'your_developer_api_email_addr',
'password' : 'your_developer_password',
'useragent' : 'Google Adwords KW pull V0.1',
'developerToken' : 'your_developer_token',
'applicationToken' : 'your_application_token'
}
#-- Note: connecting to production (adwords); for sandbox testing replace it.
self.wsdl = {'campaign' :'https://adwords.google.com/api/adwords/v13/CampaignService?wsdl',
'adGroup' :'https://adwords.google.com/api/adwords/v13/AdGroupService?wsdl',
'keyword' :'https://adwords.google.com/api/adwords/v13/CriterionService?wsdl'
}

def getUser(self):
return self.user

def getWSDL(self, k=None):
return self.wsdl[k]

def getClientEmails(self): #-- Client emails to pull data for
emails = ['your_campaign_ac_email_addr_1',
'your_campaign_ac_email_addr_2',
'your_campaign_ac_email_addr_3'
]
return emails

def setHeaderEmail(self, clientEmail=None):
self.user['clientEmail'] = clientEmail


#-- Main program
#------------------------------
if __name__ == '__main__':
gUser = GoogleSoapUser()

#-- For each client email...
for cEmail in gUser.getClientEmails():
#-- Open a output file to dump...
print "Checking account: ", cEmail, "\n"
file = 'google_' + re.split('@', cEmail)[0] + '.txt'
try:
f = open('/tmp/'+file, 'w')
except IOError as err:
print "Unable to open file", file, format(err)

#-- Set the SOAP header with client email
gUser.setHeaderEmail(cEmail)

#-- Get all campaigns...
campClient = Client(gUser.getWSDL('campaign'))
campClient.set_options(soapheaders=gUser.getUser())
campaigns = campClient.service.getAllAdWordsCampaigns(0) #-- Pass a dummy number...

#-- For each campaign ...
for campaign in campaigns:
campaignID = campaign['id']
print "Checking for campaign: ", campaignID, "\n"
adClient = Client(gUser.getWSDL('adGroup'))
adClient.set_options(soapheaders=gUser.getUser())
adGroups = adClient.service.getAllAdGroups(campaignID)

#-- For each AdGroup...
for adGroup in adGroups:
adGroupID = adGroup['id']
print "Checking for adGroup: ", adGroupID, "\n"
kwClient = Client(gUser.getWSDL('keyword'))
kwClient.set_options(soapheaders=gUser.getUser())
keywords = kwClient.service.getAllCriteria(adGroupID)

#-- For each keyword...
for kw in keywords:
f.write(str(campaignID) + '\t' +
str(adGroupID) + '\t' +
str(kw['id']) + '\t' +
kw['type'] + '\t' +
kw['text'] + '\t' +
str(kw['qualityScore']) + '\t' +
kw['destinationUrl'] + '\n'
)
# sys.exit()

#-- Close this account output file...
f.close()
print "Data pull complete for account ", cEmail, "\n"

Cheers,
Shiva

SEM – Google Adwords API Data Retriever

If you are running many large campaigns or thousands of campaigns and want to track keyword performance of Google Adwords, the following application helps retrieve the data on regular intervals (daily, weekly, monthly, etc.). Using your data warehouse and reporting infrastructure you can slice and dice the data based on your business requirements.

Application is a multi-threaded implementation of pulling data through API. Each thread gets a campaign id to pull keyword performance report.  The code included here is a skeletal code and you need to add error handing, code to suit your database schema and some ETL.  In my environment I have used MySQL 5.0, linux (2.6) and Windows, Perl 5.10.1 and Google Adwords API v13.  You can also use Google’s client libraries available in many languages including Java, Python, DotNet, Perl, PHP and others.


Other posts that may be of interest:
1. Google Analytics Data Retriever
2. Bulk data loading using Talend
3. Google Keyword Performance data pull


Campaign Class:

package Campaign;

#-------------------------------------
# Campaign Package
#-------------------------------------

use strict;
use DBI;

sub new {
my $self = {};
$self->{campIds} = []; #-- Master campaign ID array

bless($self);

$self->_populateCampaignIds();
return $self;
}

sub _populateCampaignIds {
my $self = shift;

#--------------------------
#- Connect to database to get list of campaigns (IDs).
#--------------------------
#- my $dbh = DBI->connect($data_source, $user_name, $pw);
#- my $sth = $dbh->prepare(
#- "SELECT campaign_id from campaign
#- WHERE search_engine = 'google'
#- AND status = 'active';
#- ";
#- $sth->execute() or die "Unable to execute a query.\n";
#- while ( @id = $sth->fetchrow_array ) {
#- push (${$self->{campIds}}, $id[0]);
#- }
#-
#- $dbh->disconnect;

#---- For demo, I have hard-coded some IDs.
push( @{$self->{campIds} }, '1111');
push( @{$self->{campIds} }, '2222');
push( @{$self->{campIds} }, '3333');

return 1;
}

#----------------------------
#- Returns master list of campaign IDs
#----------------------------
sub getCampaignIds {
my $self = shift;

return @{ $self->{campIds} };
}

#----------------------------
#- Returns chunk of campaign IDs from master array.
#----------------------------
sub getCampaignIdChunk {
my $self = shift;
my $chunks = shift;
my $index = shift;

if (!defined $index and !defined $chunks) {
return undef;
}

my $chunkSize = int (scalar(@ {$self->{campIds}})/$chunks);
if ($chunkSize < 1) {
print "Error: no campaigns to process for thread $index\n";
return undef;
}
my $lIx = $chunkSize * ($index - 1); #-- low index
my $hIx = $chunkSize * ($index); #-- high index

my @campIds = ();
if ($index < $chunks) {
@campIds = splice (@ {$self->{campIds}}, $lIx, $chunkSize);
} else {
#- print "Getting LAST chunk for $index from lower $lIx element and a chunk size of $chunkSize\n";
@campIds = splice (@ {$self->{campIds}}, $lIx);
}

return @campIds;
}

1;

Prototype application:

#-------------------------------------------------
# Adwords Data Retriever
# This is skeleton application to pull Adwords Keywords Performance Report Data
# from Google through Adwords API.
# Uses multi-threaded programming technique for simultaneous pull of many campaigns.
# Campaign ids in the application are hard-coded with dummy values and hence replace them by connecting
# to your database and getting actual campaign ids (code shown).
# Also, for authentication use your user name, password, developer & application code.
# Valid for Google Adwords v13.
# Author: Shiva M.
#-------------------------------------------------

#- use lib "Add directory where you installed Campaign.pm";
use strict;
use Benchmark;
use Getopt::Std;
use Data::Dumper;

use XML::Simple;
use LWP::Simple;
use Crypt::SSLeay;
use SOAP::Lite;
use IO::Uncompress::Gunzip qw();

use threads;
use threads::shared;

use Campaign;

my (%opts, @worker) = undef;
my $NUM_THREADS = 3; #-- Modify this to suit your environment. Or better use Config module with ini file to setup application configuration.
my $GOOGLE_API = 'https://adwords.google.com/api/adwords/v13';

#-------------------------------------------------
# Main program.
#-------------------------------------------------
getOpts();

if ($opts{'s'} == 1) {
for (my $i = 1; $i < ($NUM_THREADS+1); $i++) {
print "MASTER: Starting thread " . $i, "\n";
$worker[$i] = threads->new(\&pullCampaignData, $i);
sleep 5;
}

while(anyRunningThread()) {
for (my $ii = 1; $ii < scalar(@worker); $ii++) {
my $thr = $worker[$ii];
if ($thr->is_joinable) {
my $ret = eval { $thr->join() };
print "Thread $ii returned: $ret\n";
}
}
sleep(10);
}
}

exit (0);

#-------------------------------------------------
# Subroutines.
#-------------------------------------------------
sub getOpts {
if (!getopts('s:d:', \%opts)) {
print "Invalid option. Exiting\n";
exit (-1);
}

return 1;
}

#-----------------------------
#- Any thread still running?. Returns Yes/No (1/0)
#-----------------------------
sub anyRunningThread {
my $runThrs = threads->list(threads::running);
print "Still $runThrs threads are running.\n";
if ($runThrs) { return 1; }

return 0;
}

#---------------------------
#- Each thread to pull keyword performance reports for a list of campaigns
#---------------------------
sub pullCampaignData {
my ($tId) = @_;
my $thr = $worker[$tId];

my $to_dollar_conv_factor = 1 /(1000 * 1000); #-- Currency in micros. $1 1000 * 1000;

my $log = "thread_$tId.log";
open (TFH, ">$log") or die "Unable to open file $log\n"; #-- Log file for the thread.

#---------------------------------
#- Get this thread share of load - list of campaign IDs
#---------------------------------
my $campaign = Campaign->new();
my @tCampIds = $campaign->getCampaignIdChunk($NUM_THREADS,$tId);

print TFH "\npullCampaignData \$tId: $tId: campaignIDs:", map {"$_ "} @tCampIds;

my @headers = (SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $ua_name),
SOAP::Header->name('developerToken' => $developer_token),
SOAP::Header->name('applicationToken' => $application_token)
);

my $url = sprintf('https://%s.google.com/api/adwords/v13/ReportService', 'adwords');
my $wsdl = $url . '?wsdl';
my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url);
print TFH "\ndebug:service=$service" if $opts{d};

$service->on_fault(
sub {
my $response = $::ARG[1];
die('The following SOAP fault occurred:', "\n", 'faultcode: ',
$response->faultcode(), "\n",
'faultstring: ', $response->faultstring(),
"\n");
}
);
SOAP::Lite->import(+trace => 'debug') if $opts{d};


#---------------------------
# Make SOAP call.
# Get one campaign data at a time. Generate a file to bulk load into database.
#---------------------------

foreach my $cId (@tCampIds) {

my $AWJobParams = {
'selectedReportType' => 'Keyword',
'name' => 'AW Keyword Perf Daily Report',
'aggregationTypes' => ['Daily'],
'startDay' => $yesterday_date,
'endDay' => $yesterday_date,
'selectedColumns' => [ qw (
KeywordId Keyword CampaignId CampaignStatus
AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType
Impressions Clicks CTR CPC Conversions ConversionRate Cost
AveragePosition QualityScore AverageConversionValue BottomPosition
CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount
DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus
KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC
MaximumCPM PageViewCount PageViewValue SalesCount SalesValue
TopPosition TotalConversionValue Transactions ValuePerClick
ValuePerCost CustomerName ExternalCustomerId )
],
'crossClient' => 1,
'campaigns' => [$cId]
};


#------------------------------
# Step 1. Define a job
#------------------------------
my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => "$GOOGLE_API"});

#------------------------------
# Output XML file after unzipped.
#------------------------------
my $output = "${cId}_kw_perf.xml";


#-------------------------------
# Catch any exception
#-------------------------------
eval {

#------------------------------
# Step 2. Validate report.
#------------------------------
$service->call('validateReportJob' => $report_job, @headers);
print TFH "debug: about to schedule report \n";

#------------------------------
# Step 3. Schedule report.
#------------------------------
my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();
$job_id = SOAP::Data->name('reportJobId' => $job_id);

#------------------------------
# Step 4. Get report status. Wait until finish or error.
#------------------------------
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
while ($status ne 'Completed' and $status ne 'Failed') {
print TFH "Report job status is $status\n";
sleep(30);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}
if ($status eq 'Failed') {
die("Report job generation failed.\n");
}

#------------------------------
# Step 5. Get report URL to download the file.
#------------------------------
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
print TFH "Report is available at $report_url\n" if $opts{d};

#------------------------------
# Step 6. Get the data file (report).
#------------------------------
my $gzip_report_data = LWP::Simple::get($report_url);


#------------------------------
# Step 6a. Save it to local file.
#------------------------------
my $localFile = "${cId}_kw_perf.xml.gz";
open(LFH, ">$localFile") or die "Could not open $localFile: $!\n";
print LFH $gzip_report_data . "\n";
close LFH;


#------------------------------
# Step 7. Unzip the file and build xml hash.
#------------------------------
IO::Uncompress::Gunzip::gunzip(\$gzip_report_data => $output);

my $xml_hash = XMLin($output);
print TFH Dumper($xml_hash) if $opts{d};

#------------------------------
# Step 8. Create database bulk load file.
#------------------------------
open (BLFH, ">${cId}_bulk_load_input.txt") or die "Unable to create bulk load file: $!\n";


#-- Uncomment the fields that are of interest to you.
foreach my $row (@{$xml_hash->{table}->{rows}->{row}}) {
print BLFH
$row->{keywordid},"\t", $row->{keyword},"\t",
$row->{campaignid},"\t",
# $row->{campaign},"\t",
$row->{campStatus},"\t", $row->{adgroupid},"\t",
# $row->{adgroup},"\t",
# $row->{agstatus},"\t",
# $row->{agMaxCpa} * $to_dollar_conv_factor,"\t",
# $row->{adwordsType}, $row->{keywordid}, $row->{keyword}, $row->{kwStatus},
# $row->{customerid}, $row->{acctname}, $row->{kwDestUrl},
$row->{imps},"\t", $row->{clicks},"\t", $row->{ctr},"\t",
# $row->{cpc} * $to_dollar_conv_factor, $row->{conv},
# $row->{convRate},
$row->{cost} * $to_dollar_conv_factor,"\t",
$row->{pos},"\t",
$row->{qualityScore},"\t",
# $row->{avgConvValue} * $to_dollar_conv_factor,
# $row->{bottomPosition}, $row->{cpm} * $to_dollar_conv_factor,
# $row->{costPerConv} * $to_dollar_conv_factor,
# $row->{cpt} * $to_dollar_conv_factor,
# $row->{budget} * $to_dollar_conv_factor, $row->{defaultConv},
# $row->{defaultConvValue} * $to_dollar_conv_factor,
# $row->{firstPageCpc} * $to_dollar_conv_factor,
$row->{kwDestUrl},"\t", $row->{kwType},
# $row->{leads}, $row->{leadValue} * $to_dollar_conv_factor,
# $row->{maxContentCpc} * $to_dollar_conv_factor,
# $row->{maxCpc} * $to_dollar_conv_factor,
# $row->{maxCpm} * $to_dollar_conv_factor, $row->{pageviews},
# $row->{pageviewValue} * $to_dollar_conv_factor,
# $row->{purchases},
# $row->{purchaseValue} * $to_dollar_conv_factor,
# $row->{topPosition},
# $row->{convValue} * $to_dollar_conv_factor,
# $row->{transactions}, $row->{convVpc},
# $row->{valueCostRatio},
"\n";
}
close(BLFH);
};

if ($@) {
print("Report job failed validation due to the following error: ", $@, "\n");
exit (-1);
}
}

close(TFH);
return 1;
}

Bulk loading into data warehouse staging and perform summarization.

#- Psuedo code
#- 1. Cleanup or perform any needed transformation of data in the application.
#- 2. Concatenate all the campaign keyword performance data to a file(s).
#- 3. Bulk load file into a staging table.
#- 4. Transform and summarize the data through SQL.

Enjoy,
Shiva

Adwords Keyword Performance data pull

If you are handling multiple accounts or a large account at Adwords center managing few tens of thousand of keywords to millions of keywords, you would be interested in tracking the keyword performance over a period of time. You can do so by pulling daily summary report from Google through Adwords API and storing the data in your data mart or data warehouse for further analysis.

Pulling the data through API involves 6 steps:
   1. definedReportJob object – Specifies the type of report you want.
   2. validateReportJob – Validate the report that you would like to get. Returns nothing.
   3. scheduleReportJob – Now schedule the job for above object. This request returns an ID.
   4. getReportJobStatus – Get status of the report. If “Completed” go to next.
   5. getReportDownloadUrl – Retrieve the URL where your report is stored. The returned URL is valid for five minutes. If the URL expires, just call getReportDownloadUrl again. Keep this URL private, since anyone can use it to access your report.
   6. getGzipReportDownloadUrl – Download your report by sending an HTTP GET request.

Following pseudo code and code explains each step execution in more detail.  I use Perl SOAP calls to generate Adwords requests. Also, note the following code works well with API version v13 and Google is transitioning many of its Adwords API to v2009. As of Dec.2009 ReportService was still available only in v13.

Before you get to the first step, build a data structure with job parameters & create a SOAP service –

my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url); 
#- $url = 'https://adwords.google.com/api/adwords/v13/CampaignService' ;
#- $wsdl = $url . '?$wsdl' ;
#- $userAgent = "My company name. Adwords v13 API"

my @headers =
(SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $userAgent),
SOAP::Header->name('developerToken' => $developerToken),
SOAP::Header->name('applicationToken' => $applicationToken) );


my $AWJobParams =
{'selectedReportType' => $selectedReportType,
'name' => $reportName,
'aggregationTypes' => $aggregationTypes,
'startDay' => $startDay,
'endDay' => $endDay,
'selectedColumns' => $selectedColumns,
'crossClient' => $crossClient,
'campaigns' => $campaigns
};

#- Example of selected columns
my $selectedColumns = [ qw (
KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )];
Job Parameter What it is
selectedReportType You can request 11 types of reports including keyword performance report. For more info see http://code.google.com/apis/adwords/docs/developer/DefinedReportJob.html
reportName Name of report. Provide something that is meaningful with date.
aggregationType Daily, Weekly, Monthly. This works in combination with selectedReportType and selectedColumns. For more details take a look at http://code.google.com/apis/adwords/docs/developer/adwords_api_reports.html.
startDay Date of the report. Typically yesterday for daily pull.
endDay End date of report.
selecteColumns Columns for this report that you are interested in.
crossClient 1 if you are using MCC.
campaigns Array. If set to 1 pull all the data for all campaigns under MCC.

1. Building the report object

my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => 'https://adwords.google.com/api/adwords/v13'});

2. Validating the report

$service->call('validateReportJob' => $report_job, @headers);

3. Scheduling the job

my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();

4. Check for job completion status

# Job completed?
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();

while ($status ne 'Completed' and $status ne 'Failed') {
sleep(60);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}

5. Get URL to download from and 6. Get the report

# Download report.
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
my $gzipped_report = LWP::Simple::get($report_url);

This gives you crux of downloading the daily keyword performance report and upload the file to a data warehouse staging environment before processing further. For more robust code you need to handle error or exceptions that happen, create output file before loading into database with appropriate field separators (tab, for example), match fields in input file to columns staging table, etc.

MySQL bulk loading:

LOAD DATA  CONCURRENT LOCAL INFILE $input_file_to_load
INTO TABLE $staging_table_name
FIELDS TERMINATED BY '\t'
ESCAPED BY '\'
LINES TERMINATED BY '\n'
(date searchEngine KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )

Cheers,
Shiva