Your genome data through API

Recently 23andme.com reduced their prices of DNA test kit to $99 and now you get your hereditary and other related information for price less than $100!  That is a big drop from where it started $999 and then $299.  I know little about genome/ genomics but it is a very interesting, dynamic, fast growing field which has potential to change the way we view health (one now is empowered to know whether they are at any risk of potential congenital diseases now or in the future ) or one’s ancestry!

My interest was in the data that you can play with.  With the API in place you can pull your or demo data.  To do that first I needed to setup data pull through API and following is quick summary of setup I had on my mac.

Some important links:
23andme API
OAuth Introduction
OAuth flow

After creating a developer login account  you can set up the application/client with given credentials – client_id and client_secret.  See below.

For quick testing and pull, I used Dancer web framework on local mac and 23andme uses OAuth2 with 3 legged authentication.  As a first step get the ‘code’ by creating a simple page with link to their login page.

For exmaple link below takes the user to authorize your client and once successfully user logins the authentication dance happens between the client and server.

"a api.23andme.com="" authorize="" href="http://www.blogger.com/" https:="" redirect_uri="http://localhost:5000/receive_code/&response_type=code&client_id=YOUR_CLIENT_ID&scope=basic" rs123="">"Connect with 23andMe. [pid: $$]";

Note: pid above is a process id for me to know when I killed and restarted the Dancer.

User clicks on the link

and then login to authorize your client to access to her resources.  This is the point where ‘code’ is received and exchanged for access_token.

After successful OAuth dance now you can call any of end-points ( https://api.23andme.com/docs/reference/ ).  Here is demo user call ( https://api.23andme.com/1/demo/user/ )

Recursion defined

http://cdn.jpillora.com/js/jp-prettify.js Recursion, see Recursion.  🙂

Something defined in terms itself.  Or sometimes CS scientists or programmers making point through

GNU - "GNU's Not Unix"
YAML - "YAML Ain't Markup Language"

Or beautiful Sierpinski Traingles

When a function call’s itself some interesting things happen behind the scene like holding onto the variables which later used when computer execution unwinds the stack.  In a typical example of recursion in solving a factorial, one may write

#!/usr/bin/env perl

use strict;
sub factorial {
    my $v = shift;
    return 1 if $v == 1;
    return $v * factorial($v - 1);
}
factorial(5);

When a first call is made to factorial(5), the execution jumps to factorial function (subroutine) and gets to last line, where while evaluating encounters another function call to factorial ($v -1) which then again makes a call to function or subroutine. This pushes a new call frame on to stack (with args).  If a function returns it is pop-ed out of the stack and done (lost).

Few things are working together with call stack, heap, garbage collector (which removes any memory of any variable or func obj that doesn’t have reference count 1 or more), and execution system.

Now to see more on recursion you can try the following

  1 #!/usr/bin/env  perl
  2 $! = 1;
  3 use strict;
  4 use IO::Handle;
  5 use Carp qw(cluck);
  6
  7 STDOUT->autoflush(1);      # Flush output immediately
  8 STDERR->autoflush(1);
  9
 10 sub factorial {
 11     my $v = shift;
 12  
 13     dummy_func();             # Sub that returns immediately printing call stack
 14     return 1 if $v == 1;
 15     print "Variable v value: $v and it's address:", \$v,
                     "\nCurrent sub factorial addr:", \&factorial, "\n","-"x40;
 16     return $v * factorial($v - 1);    # Builds on call for each func call
 17 }
 18  
 19 sub dummy_func {
 20     cluck;
 21 }
 22
 23 factorial(5);

Resulting output:

  1     main::dummy_func() called at ./t_recursion.pl line 13
  2     main::factorial(5) called at ./t_recursion.pl line 23
  3 Variable v value: 5 and its address:SCALAR(0x7ff6240546a0)
  4 Current sub factorial addr:CODE(0x7ff62402f2c0)
  5 ----------------------------------------
  6     main::dummy_func() called at ./t_recursion.pl line 13
  7     main::factorial(4) called at ./t_recursion.pl line 16
  8     main::factorial(5) called at ./t_recursion.pl line 23
  9 Variable v value: 4 and its address:SCALAR(0x7ff6240610e8)
 10 Current sub factorial addr:CODE(0x7ff62402f2c0)
 11 ----------------------------------------
 12     main::dummy_func() called at ./t_recursion.pl line 13
 13     main::factorial(3) called at ./t_recursion.pl line 16
 14     main::factorial(4) called at ./t_recursion.pl line 16
 15     main::factorial(5) called at ./t_recursion.pl line 23
 16 Variable v value: 3 and its address:SCALAR(0x7ff6240612f8)
 17 Current sub factorial addr:CODE(0x7ff62402f2c0)
 18 ----------------------------------------
 19     main::dummy_func() called at ./t_recursion.pl line 13
 20     main::factorial(2) called at ./t_recursion.pl line 16
 21     main::factorial(3) called at ./t_recursion.pl line 16
 22     main::factorial(4) called at ./t_recursion.pl line 16
 23     main::factorial(5) called at ./t_recursion.pl line 23
 24 Variable v value: 2 and its address:SCALAR(0x7ff624061538)
 25 Current sub factorial addr:CODE(0x7ff62402f2c0)
 26 ----------------------------------------
 27     main::dummy_func() called at ./t_recursion.pl line 13
 28     main::factorial(1) called at ./t_recursion.pl line 16
 29     main::factorial(2) called at ./t_recursion.pl line 16
 30     main::factorial(3) called at ./t_recursion.pl line 16
 31     main::factorial(4) called at ./t_recursion.pl line 16
 32     main::factorial(5) called at ./t_recursion.pl line 23

When recursion script is kicked-off, it pushes factorial(5) first frame on to the call stack (line 2 above) which calls another dummy_func which then goes on to the stack (line 1).   Hence when cluck is called in dummy_func there are two calls on the stack along with any arguments passed.

Then dummy_call returns and is pop-ed from the stack.  Program moves to line 15 (script above) and evaluates to false.  Then prints lines 3&4 output ($v and its location, factorial sub location).

Script line 16 calls factorial which pushes the new function call on to stack and at the point the value of $v is 5.  The function and this variable are in same scope and on stack.  So later when this function returns is multiplied with $v (value 5).

When factorial is called 2nd time (but first time at line 16 and pushed onto call stack) $v is reduced by 1 ($v -1) which is then copied and execution starts at top of this subroutine again.  Remember copy of definition of function always the same at some location (CODE(0x7ff62402f2c0)) in program memory.

This execution then calls dummy_func which spits out the call stack and as you expected now you have dummy_func at top, 2nd factorial in middle and 1st factorial call at bottom.  Stack is FILO (First In Last Out or LIFO – Last In First Out).  Then execution moves to lines 14 & 15.   Output looks like:

  6     main::dummy_func() called at ./t_recursion.pl line 13
  7     main::factorial(4) called at ./t_recursion.pl line 16
  8     main::factorial(5) called at ./t_recursion.pl line 23
  9  Variable v value: 4 and its address:SCALAR(0x7ff6240610e8)
 10 Current sub factorial addr:CODE(0x7ff62402f2c0)

At script line 16 the recursion continues and you get output lines 12 to 32.  At the last function the base or terminal condition of recursion is met ( return 1 if $v == 1; ) and it returns 1.

factorial of 1 => 1! = 1;

Now the stack rewinding begins, the return value of 1 (when factorial (1) returned) is multiplied with the variable $v (value 2) and results in 2 which is returned by  return $v * factorial($v – 1);  statement.

Finally,  5! = 120.

All this happen behind the scene and it might be just better to know and recognize the common pattern when this happen :).  I wouldn’t worry about how the implementation is done when I run query like

SELECT column_N FROM table_X;

It is so darn simple but so much goes behind that SQL statement from mapping table to file and exact location in file to extract correct values.  It is all hidden from the application program.

For more details take a look at “Call Stack” or “Activation Record”.

But if you like to dig deeper through debugging, try

> perl -d t_recursion.pl
Loading DB routines from perl5db.pl version 1.33
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(t_recursion.pl:2): $! = 1;
  DB n
main::(t_recursion.pl:7): STDOUT->autoflush(1);
  DB n
main::(t_recursion.pl:8): STDERR->autoflush(1);
  DB n
main::(t_recursion.pl:23): factorial(5);
  DB s
main::factorial(t_recursion.pl:11):
11:    my $v = shift;
  DB s
main::factorial(t_recursion.pl:13):
13:    dummy_func();
  DB s
main::dummy_func(t_recursion.pl:20):
20:    cluck;
  DB T
. = main::dummy_func() called from file `t_recursion.pl' line 13
. = main::factorial(5) called from file `t_recursion.pl' line 23
  DB 


fizzbuzz in Ruby, Python, Perl, more

Recently, I ran into couple of articles on codinghorror.com  that covered interviews with programmers/ software engineers.  Many of the commentators answered the programming (fizzbuzz) question using their favorite computer language like VBA, Ruby, Perl, etc.  Since the post was no more allowing any new comments, I thought of posting them here on my blog itself.

fizzbuzz question: For all numbers between 1 and 100, print ‘fizz’ any number divisible by 3, ‘buzz’ for number divisible by 5 and ‘fizzbuzz’ for number divisible by both 3 & 5.

Article 1:  Why Can’t Programmers.. Program?
Article 2: The Non-Programming Programmer

I have used ‘map’ function to solve it and mapping function provide nice way to work through lists. There are few other ways you can solve it as well including the ones already given in the original articles above.

Perl:
sub fb {
if ($_ % 15 == 0) { return "fizzbuzz\n" }
elsif ($_ % 3 == 0) { return "fizz\n" }
elsif ($_ % 5 == 0) { return "buzz\n" }
return "$_\n";
}

print map {fb($_)} (1..100)

Ruby:

You can also use select method

(1..100).map { |n| if (n%15 == 0):   puts 'fizzbuzz'
elsif (n%5 == 0): puts 'buzz'
elsif (n%3 == 0): puts 'fizz'
elsif puts n
end
}
puts "----------------------------"

(1..100).select { |n| if (n%15 == 0): puts 'fizzbuzz'
elsif (n%5 == 0): puts 'buzz'
elsif (n%3 == 0): puts 'fizz'
elsif puts n
end
}

Python:

def fizzbuzz(x):
if x%15 == 0: return 'fizzbuzz'
elif x%3 == 0: return 'fizz'
elif x%5 == 0: return 'buzz'
return x

def p(x):
print x

map(p, map(fizzbuzz, range(1,101)))

SQL (MySQL):

SELECT  n, 
CASE
when n % 3 = 0 AND n % 5 = 0 then 'fizzbuzz'
when n % 5 = 0 then 'buzz'
when n % 3 = 0 then 'fizz'
END
FROM
( SELECT @n := @n + 1 as n
FROM some_small_table -- A table with more than 100 rows
(SELECT @n := 0) x -- Set variable to 0 in the begining
LIMIT 100 -- Limit to 100
) y;

Hope you enjoyed it.
Shiva

URL parsing in Python, Perl, MySQL

There are many situations when one needs to parse the URL. It could be in web log parsing or Google Keyword or other performance data that is downloaded through APIs. Application languages like Python, Perl, PHP etc provide many modules to parse or you can use raw regular expression to parse the URLs. Below I have shown few ways of doing it with a simple example – either in applications before loading into staging area of data warehouse or parsing it in SQL (MySQL) after the load.

In the example url http://http://www.hiregion.com/2009/12/sem-google-adwords-api-data-retriever.html, I would like to get sub-domain name and file name parts i.e; http://www.hiregion.com and sem-google-aswords-api-data-retriever.html.


Other posts that may be of interest:
1. Regular expressions – An intro
2. 99-bottles of bear in many languages


import time
import re




t2 = time.time()
for ii in range(1000000): #-- Parse it million times...
m = re.search(r'http://(.*?)/\d{4}/\d{2}/(.*$)', 'http://www.hiregion.com/2009/12/sem-google-adwords-api-data-retriever.html')
#- Saved to m.groups
t3 = time.time()

print 'regex took:', t3 - t2, 'seconds'




2. Using Python urlparse module


import re
from urlparse import urlparse

t2 = time.time()
for ii in range(1000000): #-- Parse a million times...
parsed = urlparse('http://www.hiregion.com/2009/12/sem-google-adwords-api-data-retriever.html')
file = re.sub('/\d{4}/\d{2}/', '',parsed.path)
#- Saved to parsed.netloc and file

t3 = time.time()
print 'urlparse took:', t2 - t1, 'seconds'

3. Using Perl Regular Expression


Regular expression is simple and same as the one used above in Python.

#!/cygdrive/c/perl/bin/perl

use strict;
use Time::HiRes qw(gettimeofday tv_interval);

my $str = 'http://www.hiregion.com/2009/12/sem-google-adwords-api-data-retriever.htm';
my $th1 = [gettimeofday()];

for my $ii (1..1000000) { #-- Parse a million times
$str =~ m#http://(.*?)/\d{4}/\d{2}/(.*$)#;
#- Saved to $1 and $2
}

my $th2 = [gettimeofday()];

print "tv_interval", tv_interval($th1), "\n";

If you would like to transform url after loading the data into staging tables (say, after the bulk load), the following one can be used.

4. Using MySQL (5.1 version)


SET @str = 'http://www.hiregion.com/2009/12/sem-google-adwords-api-data-retriever.html' ;

PREPARE select_q FROM
'SELECT CASE
WHEN ? REGEXP ''^http://'' then substring_index(?, ''/'', -1)
ELSE NULL end file,
CASE
WHEN ? REGEXP ''^http://'' then substring_index(substring_index(?, "/", 3), ''/'', -1)
ELSE NULL end sub_domain
' ;
EXECUTE select_q USING @str, @str, @str, @str;

# Output...
# file sub_domain
# sem-google-adwords-api-data-retriever.html www.hiregion.com

With Perl (5.10.1) and Python (2.6) running on the same system, perl took 0.999 seconds, Python regular expressions took 2.641 seconds and urlparse took 6.5 seconds.

Cheers,
Shiva

~
~

SEM – Google Adwords API Data Retriever

If you are running many large campaigns or thousands of campaigns and want to track keyword performance of Google Adwords, the following application helps retrieve the data on regular intervals (daily, weekly, monthly, etc.). Using your data warehouse and reporting infrastructure you can slice and dice the data based on your business requirements.

Application is a multi-threaded implementation of pulling data through API. Each thread gets a campaign id to pull keyword performance report.  The code included here is a skeletal code and you need to add error handing, code to suit your database schema and some ETL.  In my environment I have used MySQL 5.0, linux (2.6) and Windows, Perl 5.10.1 and Google Adwords API v13.  You can also use Google’s client libraries available in many languages including Java, Python, DotNet, Perl, PHP and others.


Other posts that may be of interest:
1. Google Analytics Data Retriever
2. Bulk data loading using Talend
3. Google Keyword Performance data pull


Campaign Class:

package Campaign;

#-------------------------------------
# Campaign Package
#-------------------------------------

use strict;
use DBI;

sub new {
my $self = {};
$self->{campIds} = []; #-- Master campaign ID array

bless($self);

$self->_populateCampaignIds();
return $self;
}

sub _populateCampaignIds {
my $self = shift;

#--------------------------
#- Connect to database to get list of campaigns (IDs).
#--------------------------
#- my $dbh = DBI->connect($data_source, $user_name, $pw);
#- my $sth = $dbh->prepare(
#- "SELECT campaign_id from campaign
#- WHERE search_engine = 'google'
#- AND status = 'active';
#- ";
#- $sth->execute() or die "Unable to execute a query.\n";
#- while ( @id = $sth->fetchrow_array ) {
#- push (${$self->{campIds}}, $id[0]);
#- }
#-
#- $dbh->disconnect;

#---- For demo, I have hard-coded some IDs.
push( @{$self->{campIds} }, '1111');
push( @{$self->{campIds} }, '2222');
push( @{$self->{campIds} }, '3333');

return 1;
}

#----------------------------
#- Returns master list of campaign IDs
#----------------------------
sub getCampaignIds {
my $self = shift;

return @{ $self->{campIds} };
}

#----------------------------
#- Returns chunk of campaign IDs from master array.
#----------------------------
sub getCampaignIdChunk {
my $self = shift;
my $chunks = shift;
my $index = shift;

if (!defined $index and !defined $chunks) {
return undef;
}

my $chunkSize = int (scalar(@ {$self->{campIds}})/$chunks);
if ($chunkSize < 1) {
print "Error: no campaigns to process for thread $index\n";
return undef;
}
my $lIx = $chunkSize * ($index - 1); #-- low index
my $hIx = $chunkSize * ($index); #-- high index

my @campIds = ();
if ($index < $chunks) {
@campIds = splice (@ {$self->{campIds}}, $lIx, $chunkSize);
} else {
#- print "Getting LAST chunk for $index from lower $lIx element and a chunk size of $chunkSize\n";
@campIds = splice (@ {$self->{campIds}}, $lIx);
}

return @campIds;
}

1;

Prototype application:

#-------------------------------------------------
# Adwords Data Retriever
# This is skeleton application to pull Adwords Keywords Performance Report Data
# from Google through Adwords API.
# Uses multi-threaded programming technique for simultaneous pull of many campaigns.
# Campaign ids in the application are hard-coded with dummy values and hence replace them by connecting
# to your database and getting actual campaign ids (code shown).
# Also, for authentication use your user name, password, developer & application code.
# Valid for Google Adwords v13.
# Author: Shiva M.
#-------------------------------------------------

#- use lib "Add directory where you installed Campaign.pm";
use strict;
use Benchmark;
use Getopt::Std;
use Data::Dumper;

use XML::Simple;
use LWP::Simple;
use Crypt::SSLeay;
use SOAP::Lite;
use IO::Uncompress::Gunzip qw();

use threads;
use threads::shared;

use Campaign;

my (%opts, @worker) = undef;
my $NUM_THREADS = 3; #-- Modify this to suit your environment. Or better use Config module with ini file to setup application configuration.
my $GOOGLE_API = 'https://adwords.google.com/api/adwords/v13';

#-------------------------------------------------
# Main program.
#-------------------------------------------------
getOpts();

if ($opts{'s'} == 1) {
for (my $i = 1; $i < ($NUM_THREADS+1); $i++) {
print "MASTER: Starting thread " . $i, "\n";
$worker[$i] = threads->new(\&pullCampaignData, $i);
sleep 5;
}

while(anyRunningThread()) {
for (my $ii = 1; $ii < scalar(@worker); $ii++) {
my $thr = $worker[$ii];
if ($thr->is_joinable) {
my $ret = eval { $thr->join() };
print "Thread $ii returned: $ret\n";
}
}
sleep(10);
}
}

exit (0);

#-------------------------------------------------
# Subroutines.
#-------------------------------------------------
sub getOpts {
if (!getopts('s:d:', \%opts)) {
print "Invalid option. Exiting\n";
exit (-1);
}

return 1;
}

#-----------------------------
#- Any thread still running?. Returns Yes/No (1/0)
#-----------------------------
sub anyRunningThread {
my $runThrs = threads->list(threads::running);
print "Still $runThrs threads are running.\n";
if ($runThrs) { return 1; }

return 0;
}

#---------------------------
#- Each thread to pull keyword performance reports for a list of campaigns
#---------------------------
sub pullCampaignData {
my ($tId) = @_;
my $thr = $worker[$tId];

my $to_dollar_conv_factor = 1 /(1000 * 1000); #-- Currency in micros. $1 1000 * 1000;

my $log = "thread_$tId.log";
open (TFH, ">$log") or die "Unable to open file $log\n"; #-- Log file for the thread.

#---------------------------------
#- Get this thread share of load - list of campaign IDs
#---------------------------------
my $campaign = Campaign->new();
my @tCampIds = $campaign->getCampaignIdChunk($NUM_THREADS,$tId);

print TFH "\npullCampaignData \$tId: $tId: campaignIDs:", map {"$_ "} @tCampIds;

my @headers = (SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $ua_name),
SOAP::Header->name('developerToken' => $developer_token),
SOAP::Header->name('applicationToken' => $application_token)
);

my $url = sprintf('https://%s.google.com/api/adwords/v13/ReportService', 'adwords');
my $wsdl = $url . '?wsdl';
my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url);
print TFH "\ndebug:service=$service" if $opts{d};

$service->on_fault(
sub {
my $response = $::ARG[1];
die('The following SOAP fault occurred:', "\n", 'faultcode: ',
$response->faultcode(), "\n",
'faultstring: ', $response->faultstring(),
"\n");
}
);
SOAP::Lite->import(+trace => 'debug') if $opts{d};


#---------------------------
# Make SOAP call.
# Get one campaign data at a time. Generate a file to bulk load into database.
#---------------------------

foreach my $cId (@tCampIds) {

my $AWJobParams = {
'selectedReportType' => 'Keyword',
'name' => 'AW Keyword Perf Daily Report',
'aggregationTypes' => ['Daily'],
'startDay' => $yesterday_date,
'endDay' => $yesterday_date,
'selectedColumns' => [ qw (
KeywordId Keyword CampaignId CampaignStatus
AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType
Impressions Clicks CTR CPC Conversions ConversionRate Cost
AveragePosition QualityScore AverageConversionValue BottomPosition
CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount
DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus
KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC
MaximumCPM PageViewCount PageViewValue SalesCount SalesValue
TopPosition TotalConversionValue Transactions ValuePerClick
ValuePerCost CustomerName ExternalCustomerId )
],
'crossClient' => 1,
'campaigns' => [$cId]
};


#------------------------------
# Step 1. Define a job
#------------------------------
my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => "$GOOGLE_API"});

#------------------------------
# Output XML file after unzipped.
#------------------------------
my $output = "${cId}_kw_perf.xml";


#-------------------------------
# Catch any exception
#-------------------------------
eval {

#------------------------------
# Step 2. Validate report.
#------------------------------
$service->call('validateReportJob' => $report_job, @headers);
print TFH "debug: about to schedule report \n";

#------------------------------
# Step 3. Schedule report.
#------------------------------
my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();
$job_id = SOAP::Data->name('reportJobId' => $job_id);

#------------------------------
# Step 4. Get report status. Wait until finish or error.
#------------------------------
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
while ($status ne 'Completed' and $status ne 'Failed') {
print TFH "Report job status is $status\n";
sleep(30);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}
if ($status eq 'Failed') {
die("Report job generation failed.\n");
}

#------------------------------
# Step 5. Get report URL to download the file.
#------------------------------
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
print TFH "Report is available at $report_url\n" if $opts{d};

#------------------------------
# Step 6. Get the data file (report).
#------------------------------
my $gzip_report_data = LWP::Simple::get($report_url);


#------------------------------
# Step 6a. Save it to local file.
#------------------------------
my $localFile = "${cId}_kw_perf.xml.gz";
open(LFH, ">$localFile") or die "Could not open $localFile: $!\n";
print LFH $gzip_report_data . "\n";
close LFH;


#------------------------------
# Step 7. Unzip the file and build xml hash.
#------------------------------
IO::Uncompress::Gunzip::gunzip(\$gzip_report_data => $output);

my $xml_hash = XMLin($output);
print TFH Dumper($xml_hash) if $opts{d};

#------------------------------
# Step 8. Create database bulk load file.
#------------------------------
open (BLFH, ">${cId}_bulk_load_input.txt") or die "Unable to create bulk load file: $!\n";


#-- Uncomment the fields that are of interest to you.
foreach my $row (@{$xml_hash->{table}->{rows}->{row}}) {
print BLFH
$row->{keywordid},"\t", $row->{keyword},"\t",
$row->{campaignid},"\t",
# $row->{campaign},"\t",
$row->{campStatus},"\t", $row->{adgroupid},"\t",
# $row->{adgroup},"\t",
# $row->{agstatus},"\t",
# $row->{agMaxCpa} * $to_dollar_conv_factor,"\t",
# $row->{adwordsType}, $row->{keywordid}, $row->{keyword}, $row->{kwStatus},
# $row->{customerid}, $row->{acctname}, $row->{kwDestUrl},
$row->{imps},"\t", $row->{clicks},"\t", $row->{ctr},"\t",
# $row->{cpc} * $to_dollar_conv_factor, $row->{conv},
# $row->{convRate},
$row->{cost} * $to_dollar_conv_factor,"\t",
$row->{pos},"\t",
$row->{qualityScore},"\t",
# $row->{avgConvValue} * $to_dollar_conv_factor,
# $row->{bottomPosition}, $row->{cpm} * $to_dollar_conv_factor,
# $row->{costPerConv} * $to_dollar_conv_factor,
# $row->{cpt} * $to_dollar_conv_factor,
# $row->{budget} * $to_dollar_conv_factor, $row->{defaultConv},
# $row->{defaultConvValue} * $to_dollar_conv_factor,
# $row->{firstPageCpc} * $to_dollar_conv_factor,
$row->{kwDestUrl},"\t", $row->{kwType},
# $row->{leads}, $row->{leadValue} * $to_dollar_conv_factor,
# $row->{maxContentCpc} * $to_dollar_conv_factor,
# $row->{maxCpc} * $to_dollar_conv_factor,
# $row->{maxCpm} * $to_dollar_conv_factor, $row->{pageviews},
# $row->{pageviewValue} * $to_dollar_conv_factor,
# $row->{purchases},
# $row->{purchaseValue} * $to_dollar_conv_factor,
# $row->{topPosition},
# $row->{convValue} * $to_dollar_conv_factor,
# $row->{transactions}, $row->{convVpc},
# $row->{valueCostRatio},
"\n";
}
close(BLFH);
};

if ($@) {
print("Report job failed validation due to the following error: ", $@, "\n");
exit (-1);
}
}

close(TFH);
return 1;
}

Bulk loading into data warehouse staging and perform summarization.

#- Psuedo code
#- 1. Cleanup or perform any needed transformation of data in the application.
#- 2. Concatenate all the campaign keyword performance data to a file(s).
#- 3. Bulk load file into a staging table.
#- 4. Transform and summarize the data through SQL.

Enjoy,
Shiva

Adwords Keyword Performance data pull

If you are handling multiple accounts or a large account at Adwords center managing few tens of thousand of keywords to millions of keywords, you would be interested in tracking the keyword performance over a period of time. You can do so by pulling daily summary report from Google through Adwords API and storing the data in your data mart or data warehouse for further analysis.

Pulling the data through API involves 6 steps:
   1. definedReportJob object – Specifies the type of report you want.
   2. validateReportJob – Validate the report that you would like to get. Returns nothing.
   3. scheduleReportJob – Now schedule the job for above object. This request returns an ID.
   4. getReportJobStatus – Get status of the report. If “Completed” go to next.
   5. getReportDownloadUrl – Retrieve the URL where your report is stored. The returned URL is valid for five minutes. If the URL expires, just call getReportDownloadUrl again. Keep this URL private, since anyone can use it to access your report.
   6. getGzipReportDownloadUrl – Download your report by sending an HTTP GET request.

Following pseudo code and code explains each step execution in more detail.  I use Perl SOAP calls to generate Adwords requests. Also, note the following code works well with API version v13 and Google is transitioning many of its Adwords API to v2009. As of Dec.2009 ReportService was still available only in v13.

Before you get to the first step, build a data structure with job parameters & create a SOAP service –

my $service = SOAP::Lite->service($wsdl)->autotype(0)->readable(1)->proxy($url); 
#- $url = 'https://adwords.google.com/api/adwords/v13/CampaignService' ;
#- $wsdl = $url . '?$wsdl' ;
#- $userAgent = "My company name. Adwords v13 API"

my @headers =
(SOAP::Header->name('email' => $email),
SOAP::Header->name('password' => $pw),
SOAP::Header->name('useragent' => $userAgent),
SOAP::Header->name('developerToken' => $developerToken),
SOAP::Header->name('applicationToken' => $applicationToken) );


my $AWJobParams =
{'selectedReportType' => $selectedReportType,
'name' => $reportName,
'aggregationTypes' => $aggregationTypes,
'startDay' => $startDay,
'endDay' => $endDay,
'selectedColumns' => $selectedColumns,
'crossClient' => $crossClient,
'campaigns' => $campaigns
};

#- Example of selected columns
my $selectedColumns = [ qw (
KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )];
Job Parameter What it is
selectedReportType You can request 11 types of reports including keyword performance report. For more info see http://code.google.com/apis/adwords/docs/developer/DefinedReportJob.html
reportName Name of report. Provide something that is meaningful with date.
aggregationType Daily, Weekly, Monthly. This works in combination with selectedReportType and selectedColumns. For more details take a look at http://code.google.com/apis/adwords/docs/developer/adwords_api_reports.html.
startDay Date of the report. Typically yesterday for daily pull.
endDay End date of report.
selecteColumns Columns for this report that you are interested in.
crossClient 1 if you are using MCC.
campaigns Array. If set to 1 pull all the data for all campaigns under MCC.

1. Building the report object

my $report_job = SOAP::Data->name('job' => $AWJobParams);
$report_job->attr({'xsi:type' => 'DefinedReportJob', 'xmlns' => 'https://adwords.google.com/api/adwords/v13'});

2. Validating the report

$service->call('validateReportJob' => $report_job, @headers);

3. Scheduling the job

my $job_id = $service->call('scheduleReportJob' => $report_job, @headers)->result();

4. Check for job completion status

# Job completed?
my $status = $service->call('getReportJobStatus' => $job_id, @headers)->result();

while ($status ne 'Completed' and $status ne 'Failed') {
sleep(60);
$status = $service->call('getReportJobStatus' => $job_id, @headers)->result();
}

5. Get URL to download from and 6. Get the report

# Download report.
my $report_url = $service->call('getGzipReportDownloadUrl' => $job_id, @headers)->result();
my $gzipped_report = LWP::Simple::get($report_url);

This gives you crux of downloading the daily keyword performance report and upload the file to a data warehouse staging environment before processing further. For more robust code you need to handle error or exceptions that happen, create output file before loading into database with appropriate field separators (tab, for example), match fields in input file to columns staging table, etc.

MySQL bulk loading:

LOAD DATA  CONCURRENT LOCAL INFILE $input_file_to_load
INTO TABLE $staging_table_name
FIELDS TERMINATED BY '\t'
ESCAPED BY '\'
LINES TERMINATED BY '\n'
(date searchEngine KeywordId Keyword CampaignId CampaignStatus AdGroupId AdGroupStatus AdGroupMaxCpa AdWordsType Impressions Clicks CTR CPC Conversions ConversionRate Cost AveragePosition QualityScore AverageConversionValue BottomPosition CPM CostPerConverstion CostPerTransaction DailyBudget DefaultCount DefaultValue FirstPageCpc KeywordDestUrlDisplay KeywordStatus KeywordTypeDisplay LeadCount LeadValue MaxContentCPC MaximumCPC MaximumCPM PageViewCount PageViewValue SalesCount SalesValue TopPosition TotalConversionValue Transactions ValuePerClick ValuePerCost CustomerName ExternalCustomerId )

Cheers,
Shiva

Google Analytics Data Retriever (API) – SEM, SEO

Google Analytics should be an integral part of any company’s marketing data analysis strategy that plays in SEM, SEO field. Not too long ago Google released API for their Analytics and one can use this data or reports to compliment their in-house data analysis.

Here I have provided a small utility program that can be used to retrieve most popular six (6) reports and have used similar utilities that I have built to dump the data into MySQL data warehouse for much bigger data sets.

The application has two Perl modules (GAProfile.pm and GAReport.pm) and a main program (ga_report_retriever.pl). GAProfile.pm module is used to authenticate your account and get the auth_code that will be used in the subsequent calls to GA to get reports. GAReport.pm module is used to pull the report and dump important dimension and metrics to standard output.  Put both these modules in the same directory (or copy them to directory in the path of PERL5LIB) as the main program and run it. To get different report, make a change in “ga_report_retriever.pl” passing the name of the report.

Before you start using the GA API, you may want to go through the documentation.

At command prompt> ./ga_report_retriever.pl


GAProfile.pm module

package GAProfile;
#-------------------------------------------------------
#- This is a Google Analytics Class providing methods to
#- 1. get authorized with google accounts &
#- 2. get list of profiles to pull analytics data
#- Use XML::Simple module and for large data sets, you can
#- try XML::Bare or LibXML modules for better performance.
#- Provide your own Google account info in the contructor.
#-
#- Author: Shiva M.
#-------------------------------------------------------

use strict;
use XML::Simple;
use URI::Escape;
use LWP::UserAgent;

use Data::Dumper;

#--- Create Google Analytics object.
#-------------------------------------------------------
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};

$self->{ga_user} = '{your google account}';
$self->{ga_passwd} = '{corresponding password}';

#-- Client Login URL
$self->{ga_login_url} = 'https://www.google.com/accounts/ClientLogin';
$self->{ga_login_param} = "accountType=GOOGLE";
$self->{ga_login_param} .= "&Email=$self->{ga_user}&Passwd=$self->{ga_passwd}";
$self->{ga_login_param} .= "&service=analytics&source=HiRegion.com-GA_Retriever-v1";

#-- Analytics account URL
$self->{ga_ac_url} = "https://www.google.com/analytics/feeds/accounts/default";

$self->{ga_auth_code} = undef;
$self->{ga_ua_name} = "GA Data Retriever"; #-- LWP User Agent name

bless ($self, $class);
return $self;
}

#- Escape the URI characters, if needed.
#-------------------------------------------------------
sub urlEncode {
my $self = shift;
my $str = shift;

return uri_escape($str)
}

#-- Get Auth code from Google Analytics so that you can make
#-- Data request calls. This sets object's ga_auth variable as well.
#-------------------------------------------------------
sub getAuthorized {
my $self = shift;

my $r = HTTP::Request->new(POST => $self->{ga_login_url});
$r->content_type('application/x-www-form-urlencoded');
$r->content($self->{ga_login_param});

#-- Make a call to client login...
my $ua = LWP::UserAgent->new(agent => $self->{ga_ua_name});
my $res = $ua->request($r);

my ($ga_auth);
if ($res->is_success) { #-- On success returns HTTP 200.
$ga_auth = (split(/Auth=/i, $res->content))[1];
#- print "content: ", $res->content(), "\nga_auth: $ga_auth\n";
}
else {
return (undef, "error: ". $res->status_line);
}

$self->{ga_auth_code} = $ga_auth;
# return the token
return (1, $ga_auth); #-- Return success & auth code
}

sub getAuthCode {
my $self = shift;
return $self->{ga_auth_code};
}


#-- Get list of profile IDs (web property IDs)
#-------------------------------------------------------
sub getProfiles {
my $self = shift;

# create user agent object & get profiles...
my $ua = LWP::UserAgent->new(agent => $self->{ga_ua_name});
my $res = $ua->get($self->{ga_ac_url}, Authorization => "GoogleLogin Auth=$self->{ga_auth_code}");

my @profiles;

# On getting the list of profiles...
if ($res->is_success) {
my ($content, $e);
$content = $res->content;

# Parse the xml response...
my $xml = new XML::Simple(KeyAttr=>[]);
my $tree = $xml->XMLin($content);

# iterate through each entry
my $ii = 0;
foreach $e (@{$tree->{entry}}) {
# add the account to the array
$profiles[$ii]->{title} = $e->{title}->{content};
$profiles[$ii]->{tableId} = $e->{'dxp:tableId'};

foreach my $p (@{$e->{'dxp:property'}}) {
if ($p->{'name'} =~ m/accountId/i) {
$profiles[$ii]->{accountId} = $p->{'value'};
} elsif ($p->{'name'} =~ m/profileId/i) {
$profiles[$ii]->{profileId} = $p->{'value'};
} elsif ($p->{'name'} =~ m/webPropertyId/i) {
$profiles[$ii]->{webPropertyId} = $p->{'value'};
}
}
$ii++;
}
} else {
# Request failed.
print "GAProfile->getProfiles() error: ". $res->status_line, "\n";
return undef;
}

return @profiles;
}


1;

GAReport.pm module


package GAReport;
#------------------------------------------------------
#- Google Analytics Report Module.
#- Handles url request to six popular pre-defined reports
#-
#- Author: Shiva M.
#------------------------------------------------------
use Date::Calc qw(:all);
use Data::Dumper;

sub new {
my $proto = shift;
my $profileId = shift;
my $class = ref($proto) || $proto;
my $self = {};

$self->{ga_data_url} = "https://www.google.com/analytics/feeds/data?";
$self->{standard_reports} = { visit_pageviews => visit_pageviews
,visit_traffic => visit_traffic
,visit_browser => visit_browser
,language_country => language_country
,top_pageviews => top_pageviews
,n_days_visits => n_days_visits
};
$self->{current_report} = "visit_traffic"; #-- Default report.
bless ($self, $class);

$self->{'profileId'} = undef;

return $self;
}


#-- Sets the profile id for data feed
#---------------------------------------------------------------------
sub setProfileId {
my $self = shift;
my $profileId = shift;

if (!defined $profileId or $profileId !~ m/\d+/) {
print "GAReport-setProfileId-Error: Ill formed profileID ($profileId)\n";
return undef;
}
$self->{profileId} = $profileId;

return 1;
}


#-- Returns current profile ID url param.
#---------------------------------------------------------------------
sub getProfileId {
my $self = shift;
return "ids=ga:$self->{profileId}"; #-- Append "ga:profileId"
}


#-- Most important reports are defined as standard reports and
#-- need to be predefined in constructor. Currently supports six reports.
#---------------------------------------------------------------------
sub getStandardReportURL {
my $self = shift;
my $report = shift; my $startDt = shift; my $endDt = shift; my $maxRes = shift;
#- my ($report, $startDt, $endDt, $maxRes) = @_;

my $url = $self->{ga_data_url};
$url .= $self->getProfileId();
$url .= $self->getDateDim($startDt, $endDt);
$url .= $self->getMaxResultDim($maxRes);

if ($report =~ m/$self->{standard_reports}->{visit_pageviews}/i) {
$url .= "&dimensions=ga%3Adate&metrics=ga%3Avisits%2Cga%3Apageviews";
} elsif ($report =~ m/$self->{standard_reports}->{visit_traffic}/i) {
$url .= "&dimensions=ga%3Asource%2Cga%3Amedium&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{visit_browser}/i) {
$url .= "&dimensions=ga%3Abrowser&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{language_country}/) {
$url .= "&dimensions=ga%3Alanguage%2Cga%3Acountry&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{top_pageviews}/i) {
$url .= "&dimensions=ga%3ApagePath%2Cga%3ApageTitle&metrics=ga%3Apageviews&sort=-ga%3Apageviews";
} elsif ($report =~ m/$self->{standard_reports}->{n_days_visits}/i) {
$url .= "&dimensions=ga%3Adate&metrics=ga%3Avisits&filters=ga%3Avisits%3E1&sort=ga%3Adate";
} else {
print "GAReport.pm-getStandardReport-Error: Unknown standard report\n";
$url = undef;
}

#-- Which report are we requesting...
$self->{current_report} = $report;

return $url;
}


sub isDateDimSet {
my $self = shift;
my $url = shift;

if ($url =~ m/start-date=\d{4}\d{2}\d{2}/i and $url =~ m/end-date=\d{4}\d{2}\d{2}/i) {
return 1;
}
return undef;
}


#-- Returns date parameters of URL. If undefined dates, then sets them to a week before.
#-- i.e; "&start-date=2009-10-02&end-date=2009-10-09"
#------------------------------------------------------
sub getDateDim {
my $self = shift;
my $startDt = shift; #-- Start Date
my $endDt = shift; #-- End Date

my ($y, $m, $d, $dtURL); #-- Year, Month, Date
if (!defined $startDt or !defined $endDt) {
($y, $m, $d) = Today();
($y, $m, $startDt) = Add_Delta_Days($y, $m, $d, "-9"); #-- A week before day-before-yesterday. Allow 2 days lag.
($y, $m, $endDt) = Add_Delta_Days($y, $m, $startDt, "7"); #-- Add 7 days to start date to get a week's report.
$dtURL = "&start-date=$y-$m-$startDt&end-date=$y-$m-$endDt";
} else {
$dtURL = "&start-date=$startDt&end-date=$endDt";
}

return $dtURL;
}


#-- Returns maximum number of resutls requested from GA.
#-- Defaults to 50.
#--------------------------------------------------------
sub getMaxResultDim {
my $self = shift;
my $maxRes = shift;

if (!defined $maxRes or !$maxRes =~ m/\d+/) {
$maxRes = 50;
}
return "&max-results=$maxRes"
}


#-- Before making data request some sanity check...
#--------------------------------------------------------
sub validateRequest {
my $self = shift;

return 1;
}


#-- Prints out the dimensions and metrics of a report.
#--------------------------------------------------------
sub getReport {
my $self = shift;
my $url = shift;
my $ga_auth_code = shift;

#- print "\n\nGAReport-getReport: Pulling $self->{current_report} now.";

if (!defined $url or !defined $ga_auth_code) {
print "GAReport-getReport-Error: url($url)) or authorization code ($ga_auth_code) is missing.\n";
return undef;
}
#-- Construct user agent and make a get call.
my $ua = LWP::UserAgent->new;
$ua->agent("GA Test Data Retriever/0.5");

my $res = $ua->get($url, Authorization => "GoogleLogin Auth=$ga_auth_code");

my $data;
#-- Check response and get the data...
if ($res->is_success) {
$data = $res->content;
} else {
print "GAReport-getReport-Error: Data pull failed: ". $res->status_line;
return undef;
}

#-- Convert XML to Perl tree structure and parse.
#------------------------------------------------
my $xml = XML::Simple->new();
my $tree = $xml->XMLin($data);
print Dumper $tree;

my $tableName = $tree->{'dxp:dataSource'}->{'dxp:tableName'};
print "\n\nTablename: $tableName";

foreach my $ok (keys %$tree) { #-- Outermost hash keys
#- print "Tree key: $ok\n";
#- my $tableName = $ok->{'dxp:tableName'};


#-- Pageview & visits metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_pageviews/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

#-- print "\n\tentries: @hEntries\n";
printf "\n\n%-25s\t%s\t\t%s\t%s\n%s", "Profile", "Date", "PageViews", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $dt = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$dt}{'pageviews'} = $hEAttr->{'dxp:metric'}->{'ga:pageviews'}->{'value'};
$out{$dt}{'visits'} = $hEAttr->{'dxp:metric'}->{'ga:visits'}->{'value'};
}

foreach my $dt (sort keys %out) {
printf ("\n%-25s\t%d\t%d\t\t%d", $tableName, $dt, $out{$dt}{'pageviews'}, $out{$dt}{'visits'});
}
}


#-- Visitor traffic metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_traffic/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%-20s\t%s\n%s", "Profile", "Source", "Medium", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $src = $hEAttr->{'dxp:dimension'}->{'ga:source'}->{'value'};
my $medium = $hEAttr->{'dxp:dimension'}->{'ga:medium'}->{'value'};
$out{$src}{$medium}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $src (sort keys %out) {
foreach my $med (sort keys %{$out{$src}}) {
printf ("\n%-25s\t%-20s\t%-20s\t%d", $tableName, $src, $med, $out{$src}{$med}{'visits'});
}
}
}


#--- Visit and browser metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_browser/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%s\n%s", "Profile", "Browser", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $brow = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$brow}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $brow (sort keys %out) {
printf ("\n%-25s\t%-20s\t%d", $tableName, $brow, $out{$brow}{'visits'});
}
}


#--- Language and country metrics...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/language_country/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%-20s\t%s\n%s", "Profile", "Language", "Country", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $lang = $hEAttr->{'dxp:dimension'}->{'ga:language'}->{'value'};
my $country = $hEAttr->{'dxp:dimension'}->{'ga:country'}->{'value'};
$out{$lang}{$country}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $lang (sort keys %out) {
foreach my $country (sort keys %{$out{$lang}}) {
printf ("\n%-25s\t%-20s\t%-20s\t%d", $tableName, $lang, $country, $out{$lang}{$country}{'visits'});
}
}

}


#--- Top page views metrics...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/top_pageviews/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-30s\t%-20s\t%s\n%s", "Profile", "PagePath", "PageTitle", "PageViews", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $pagePath = $hEAttr->{'dxp:dimension'}->{'ga:pagePath'}->{'value'};
my $pageTitle = $hEAttr->{'dxp:dimension'}->{'ga:pageTitle'}->{'value'};
$out{$pagePath}{$pageTitle}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $pagePath (sort keys %out) {
foreach my $pageTitle (sort keys %{$out{$pagePath}}) {
printf ("\n%-25s\t%-30s\t%-20s\t%d", $tableName, $pagePath, $pageTitle, $out{$pagePath}{$pageTitle}{'visits'});
}
}

}


#--- N days of visits...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/n_days_visits/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%s\n%s", "Profile", "Date", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $dt = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$dt}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $dt (sort keys %out) {
printf ("\n%-25s\t%-20s\t%d", $tableName, $dt, $out{$dt}{'visits'});
}

}
}

return 1;
}

1;

Main Program: ga_report_retriever.pl


#!/usr/local/bin/perl -w
#---/cygdrive/c/Perl/bin/perl -w

#-----------------------------------------------------------
#- Application to pull 6 popular Google Analytical reports.
#- Uses two classes - GAProfile and GAReport to perform data/report pull through APIs.
#- Reports are:
#- Pageview visits by date (visit_pageviews)
#- Visit by by source (visit_traffic)
#- Visit by brower (visit_brower)
#- Visit by language, country (language_country)
#- Top pageviews (top_pageviews)
#- "n" day visits (n_days_visits)
#-
#- Author: Shiva M.
#-----------------------------------------------------------
use lib ".";
use strict;
use Data::Dumper;

use GAProfile;
use GAReport;

#-- Get authroized to pull data. Pass username & password through GAProfile.
#-----------------------------------------------------------
my $gau = GAProfile->new();
$gau->urlEncode;
#-- $rc - return code 1 on success & set $ga_auth to authorization code.
my ($rc, $ga_auth) = $gau->getAuthorized();

#-- If failed, terminate the program.
if (!$rc) {
print "Authorization failed. Exiting. Fatal error: $ga_auth\n";
die;
}

#-- On success, get list of profiles to pull data from and try one profile at a time.
#------------------------------------------------------------
my @profiles = $gau->getProfiles();

my $report = GAReport->new();
foreach my $p (@profiles) {
$report->setProfileId($p->{profileId});

#-- Get a report by date for last 7 days by default.
#-- If n_days_visits report pass start & end dates and number of days between.
my $url = $report->getStandardReportURL("n_days_visits", '2009-12-01', '2009-12-30', 30);
$report->getReport($url, $ga_auth);
}

Beauty of 99 Bottles of Beer in Perl

Perl is well known for allowing a software developer to express the solution in more than one way. And it can be harnessed to be as creative or cryptic as one likes to be. Here is one example I saw sometime back on http://99-bottles-of-beer.net/language-perl-737.html

Run the program as:

#!/usr/local/bin/perl


''=~( '(?{' .('`' |'%') .('[' ^'-')
.('`' |'!') .('`' |',') .'"'. '\\$'
.'==' .('[' ^'+') .('`' |'/') .('['
^'+') .'||' .(';' &'=') .(';' &'=')
.';-' .'-'. '\\$' .'=;' .('[' ^'(')
.('[' ^'.') .('`' |'"') .('!' ^'+')
.'_\\{' .'(\\$' .';=('. '\\$=|' ."\|".( '`'^'.'
).(('`')| '/').').' .'\\"'.+( '{'^'['). ('`'|'"') .('`'|'/'
).('['^'/') .('['^'/'). ('`'|',').( '`'|('%')). '\\".\\"'.( '['^('(')).
'\\"'.('['^ '#').'!!--' .'\\$=.\\"' .('{'^'['). ('`'|'/').( '`'|"\&").(
'{'^"\[").( '`'|"\"").( '`'|"\%").( '`'|"\%").( '['^(')')). '\\").\\"'.
('{'^'[').( '`'|"\/").( '`'|"\.").( '{'^"\[").( '['^"\/").( '`'|"\(").(
'`'|"\%").( '{'^"\[").( '['^"\,").( '`'|"\!").( '`'|"\,").( '`'|(',')).
'\\"\\}'.+( '['^"\+").( '['^"\)").( '`'|"\)").( '`'|"\.").( '['^('/')).
'+_,\\",'.( '{'^('[')). ('\\$;!').( '!'^"\+").( '{'^"\/").( '`'|"\!").(
'`'|"\+").( '`'|"\%").( '{'^"\[").( '`'|"\/").( '`'|"\.").( '`'|"\%").(
'{'^"\[").( '`'|"\$").( '`'|"\/").( '['^"\,").( '`'|('.')). ','.(('{')^
'[').("\["^ '+').("\`"| '!').("\["^ '(').("\["^ '(').("\{"^ '[').("\`"|
')').("\["^ '/').("\{"^ '[').("\`"| '!').("\["^ ')').("\`"| '/').("\["^
'.').("\`"| '.').("\`"| '$')."\,".( '!'^('+')). '\\",_,\\"' .'!'.("\!"^
'+').("\!"^ '+').'\\"'. ('['^',').( '`'|"\(").( '`'|"\)").( '`'|"\,").(
'`'|('%')). '++\\$="})' );$:=('.')^ '~';$~='@'| '(';$^=')'^ '[';$/='`';

It produces output like:

99 bottles of beer on the wall, 99 bottles of beer!
Take one down, pass it around,
98 bottles of beer on the wall!

98 bottles of beer on the wall, 98 bottles of beer!
Take one down, pass it around,
97 bottles of beer on the wall!

97 bottles of beer on the wall, 97 bottles of beer!

It is a beauty!


Another one in Perl ( http://99-bottles-of-beer.net/language-perl-658.html )

#!/usr/local/bin/perl
# Copyright 2001 Christopher J. Carlson
# All Rights Reserved

$a=
"cpuu
\bmft p
\bg cff
\bs";$b
="po ui
\bf xbm
\bm";$c="
Ypv ublf p
\bof epxo qb
\btt ju bspvoe";
$a =~ s/\n//;$a =~
s/\s+/ /g; $b =~
s/\n// ; $b =~
s/\s+/ /g;$c =~
s/\n// ; $c =~
s/\s+/ /g;$a =~
y/b-z/a-z/;$b =~
tr/b-z/a-z/;$c =~
tr/b-z/a-z/ ; for(
$d=100;$d>0;$d--){
print"$d $a $b $d"
;print" $a,\n$c, "
;print($d-1);print
" $a $b.\n";} $x =
"cjc"; $y="dobbz";
$z="com";print"\n"
;print "- $x\@$y."
;print"$z \n\n";

If you write it in more readable format, it would be

$a = "cpuu \bmft p \bg cff \bs";
$b = "po ui \bf xbm \bm";
$c = "Ypv ublf p \bof epxo qb \btt ju bspvoe";

$a =~ s/\n//;
$a =~ s/\s+/ /g;
$b =~ s/\n// ;
$b =~ s/\s+/ /g;
$c =~ s/\n// ;
$c =~ s/\s+/ /g;

$a =~ y/b-z/a-z/;
$b =~ tr/b-z/a-z/;
$c =~ tr/b-z/a-z/;
for( $d=100;$d>0;$d--) {
print"$d $a $b $d";
print" $a,\n$c, ";
print($d-1);
print " $a $b.\n";
}
$x = "cjc";
$y="dobbz";
$z="com";
print"\n";
print "- $x\@$y.";
print"$z \n\n";

It is a very simple program and all that is needed to produced the required output is done in “for” loop. All the substitutes (6 lines of code “s//;” is redundant and seem to be added to make a bottle shaped code. First 3 lines of code ($a, $b, $c) are simply a ASCII next letter of the original lines, meaning, “b” written as “c”, “t” as “u” and then translated back to original lines through “y/” and “tr/” Perl functions. $x,$y,$z are author email address.


A version in Python ( http://99-bottles-of-beer.net/language-python-808.html )

#!/usr/bin/env python
# -*- coding: iso-8859-1 -*-
"""
99 Bottles of Beer (by Gerold Penz)
Python can be simple, too 🙂
"""

for quant in range(99, 0, -1):
if quant > 1:
print quant, "bottles of beer on the wall,", quant, "bottles of beer."
if quant > 2:
suffix = str(quant - 1) + " bottles of beer on the wall."
else:
suffix = "1 bottle of beer on the wall."
elif quant == 1:
print "1 bottle of beer on the wall, 1 bottle of beer."
suffix = "no more beer on the wall!"
print "Take one down, pass it around,", suffix
print "--"

Python code is pretty straight forward and self explainatory.


One of the simplest one – just grab the lyrics from the website 🙂
( http://99-bottles-of-beer.net/language-python-931.html )

#!/usr/local/bin/python
'''A more interesting way to get the lyrics to "99 Bottles of Beer on the Wall", in Python. Is this considered cheating?'''

import re, urllib
print re.sub('

', '', re.sub('
|

|
|
','\n', re.sub('No', '\nNo',
urllib.URLopener().open('http://www.99-bottles-of-beer.net/lyrics.html').read()[3516:16297])))



Hope you enjoyed it.
Shiva