Conversation Prism – An Image

As the social media, social networking, advertising, Internet marketing continue to evolve with new technologies and many companies create their own social groups, it all gets more complex and confusing.  Many a times a picture or an image will explain more elegantly than 1000 words are more and in some cases image is the most suited tool to explain.  Here is one image created by Brian Solis & Jesse Thomas that I like in this conversation!

Few other images that will help.

From Dave Fleet  
 (http://www.socialmediatoday.com/SMC/162110)

From FredCavazza.net

Enjoy,
Shiva

Mapping URL to top level domain

When parsing web logs as part of web analytics, one might be interested in the top level domain analysis before drilling down further.  Following query maps an url (referer) to top level domain.  Top level domain include not only generic ones like ‘.com’, ‘.edu’, ‘.org’, etc but also country level top levels like ‘.uk’, ‘.de’, etc.  You can download the list from IANA (Internet Assigned Numbers Authority) .

After downloading it import into a table dim_int_mktg_tld which you can join with the staging data.

CREATE TABLE `dim_int_mktg_tld` (
`tld` varchar(16) NOT NULL,
`type` varchar(32) default 'country-code', #-- Can be country-code or generic
`info` varchar(500) default NULL,
PRIMARY KEY (`tld`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

#-- Join with yesterdays staging data and get TLD
SELECT WL.url, TLD.tld
FROM stg_weblog_date WL
LEFT OUTER JOIN dim_int_mktg_tld TLD
ON substring_index(
substring_index(
substring_index(url, '://', -1),
'/',1),
'.',-1) = TLD.tld
WHERE WL.date = DATE_ADD(CURRENT_DATE, interval -1 day)

#-- Example: http://www.abc.co.au/in/test
#-- Innermost substring_index returns everything to the right of '://'
#-- i.e, www.abc.co.au/in/test,
#-- then the middle one gets everything to the left of first '/'
#-- i.e, www.abc.co.au
#-- then the outer most substring_index returns everything to the right of last '.'
#-- i.e, au
#-- which is then joined with TLD table

HTH,
Shiva

Google NS Individual Search Box

Today, while searching for “cygwin” keyword I was presented with results that caught my eye.  The first result had its own search box.  See picture below.  Had anyone seen this before?  I liked the feature and pretty handy if you are looking for a specific information from a specific site. I entered “latest version” and hit “Search cygwin.com” that showed search entry “latest version site:cygwin.com” in the next search. Nice.


Other posts that may be of interest:
1. Best Google home page logos
2. Better quicker SEM ROI
3. 99 bottles of beer in different laguages


Cheers,
Shiva

Useful SEM SEO Links

Google Analytics Data Retriever (API) – SEM, SEO

Google Analytics should be an integral part of any company’s marketing data analysis strategy that plays in SEM, SEO field. Not too long ago Google released API for their Analytics and one can use this data or reports to compliment their in-house data analysis.

Here I have provided a small utility program that can be used to retrieve most popular six (6) reports and have used similar utilities that I have built to dump the data into MySQL data warehouse for much bigger data sets.

The application has two Perl modules (GAProfile.pm and GAReport.pm) and a main program (ga_report_retriever.pl). GAProfile.pm module is used to authenticate your account and get the auth_code that will be used in the subsequent calls to GA to get reports. GAReport.pm module is used to pull the report and dump important dimension and metrics to standard output.  Put both these modules in the same directory (or copy them to directory in the path of PERL5LIB) as the main program and run it. To get different report, make a change in “ga_report_retriever.pl” passing the name of the report.

Before you start using the GA API, you may want to go through the documentation.

At command prompt> ./ga_report_retriever.pl


GAProfile.pm module

package GAProfile;
#-------------------------------------------------------
#- This is a Google Analytics Class providing methods to
#- 1. get authorized with google accounts &
#- 2. get list of profiles to pull analytics data
#- Use XML::Simple module and for large data sets, you can
#- try XML::Bare or LibXML modules for better performance.
#- Provide your own Google account info in the contructor.
#-
#- Author: Shiva M.
#-------------------------------------------------------

use strict;
use XML::Simple;
use URI::Escape;
use LWP::UserAgent;

use Data::Dumper;

#--- Create Google Analytics object.
#-------------------------------------------------------
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};

$self->{ga_user} = '{your google account}';
$self->{ga_passwd} = '{corresponding password}';

#-- Client Login URL
$self->{ga_login_url} = 'https://www.google.com/accounts/ClientLogin';
$self->{ga_login_param} = "accountType=GOOGLE";
$self->{ga_login_param} .= "&Email=$self->{ga_user}&Passwd=$self->{ga_passwd}";
$self->{ga_login_param} .= "&service=analytics&source=HiRegion.com-GA_Retriever-v1";

#-- Analytics account URL
$self->{ga_ac_url} = "https://www.google.com/analytics/feeds/accounts/default";

$self->{ga_auth_code} = undef;
$self->{ga_ua_name} = "GA Data Retriever"; #-- LWP User Agent name

bless ($self, $class);
return $self;
}

#- Escape the URI characters, if needed.
#-------------------------------------------------------
sub urlEncode {
my $self = shift;
my $str = shift;

return uri_escape($str)
}

#-- Get Auth code from Google Analytics so that you can make
#-- Data request calls. This sets object's ga_auth variable as well.
#-------------------------------------------------------
sub getAuthorized {
my $self = shift;

my $r = HTTP::Request->new(POST => $self->{ga_login_url});
$r->content_type('application/x-www-form-urlencoded');
$r->content($self->{ga_login_param});

#-- Make a call to client login...
my $ua = LWP::UserAgent->new(agent => $self->{ga_ua_name});
my $res = $ua->request($r);

my ($ga_auth);
if ($res->is_success) { #-- On success returns HTTP 200.
$ga_auth = (split(/Auth=/i, $res->content))[1];
#- print "content: ", $res->content(), "\nga_auth: $ga_auth\n";
}
else {
return (undef, "error: ". $res->status_line);
}

$self->{ga_auth_code} = $ga_auth;
# return the token
return (1, $ga_auth); #-- Return success & auth code
}

sub getAuthCode {
my $self = shift;
return $self->{ga_auth_code};
}


#-- Get list of profile IDs (web property IDs)
#-------------------------------------------------------
sub getProfiles {
my $self = shift;

# create user agent object & get profiles...
my $ua = LWP::UserAgent->new(agent => $self->{ga_ua_name});
my $res = $ua->get($self->{ga_ac_url}, Authorization => "GoogleLogin Auth=$self->{ga_auth_code}");

my @profiles;

# On getting the list of profiles...
if ($res->is_success) {
my ($content, $e);
$content = $res->content;

# Parse the xml response...
my $xml = new XML::Simple(KeyAttr=>[]);
my $tree = $xml->XMLin($content);

# iterate through each entry
my $ii = 0;
foreach $e (@{$tree->{entry}}) {
# add the account to the array
$profiles[$ii]->{title} = $e->{title}->{content};
$profiles[$ii]->{tableId} = $e->{'dxp:tableId'};

foreach my $p (@{$e->{'dxp:property'}}) {
if ($p->{'name'} =~ m/accountId/i) {
$profiles[$ii]->{accountId} = $p->{'value'};
} elsif ($p->{'name'} =~ m/profileId/i) {
$profiles[$ii]->{profileId} = $p->{'value'};
} elsif ($p->{'name'} =~ m/webPropertyId/i) {
$profiles[$ii]->{webPropertyId} = $p->{'value'};
}
}
$ii++;
}
} else {
# Request failed.
print "GAProfile->getProfiles() error: ". $res->status_line, "\n";
return undef;
}

return @profiles;
}


1;

GAReport.pm module


package GAReport;
#------------------------------------------------------
#- Google Analytics Report Module.
#- Handles url request to six popular pre-defined reports
#-
#- Author: Shiva M.
#------------------------------------------------------
use Date::Calc qw(:all);
use Data::Dumper;

sub new {
my $proto = shift;
my $profileId = shift;
my $class = ref($proto) || $proto;
my $self = {};

$self->{ga_data_url} = "https://www.google.com/analytics/feeds/data?";
$self->{standard_reports} = { visit_pageviews => visit_pageviews
,visit_traffic => visit_traffic
,visit_browser => visit_browser
,language_country => language_country
,top_pageviews => top_pageviews
,n_days_visits => n_days_visits
};
$self->{current_report} = "visit_traffic"; #-- Default report.
bless ($self, $class);

$self->{'profileId'} = undef;

return $self;
}


#-- Sets the profile id for data feed
#---------------------------------------------------------------------
sub setProfileId {
my $self = shift;
my $profileId = shift;

if (!defined $profileId or $profileId !~ m/\d+/) {
print "GAReport-setProfileId-Error: Ill formed profileID ($profileId)\n";
return undef;
}
$self->{profileId} = $profileId;

return 1;
}


#-- Returns current profile ID url param.
#---------------------------------------------------------------------
sub getProfileId {
my $self = shift;
return "ids=ga:$self->{profileId}"; #-- Append "ga:profileId"
}


#-- Most important reports are defined as standard reports and
#-- need to be predefined in constructor. Currently supports six reports.
#---------------------------------------------------------------------
sub getStandardReportURL {
my $self = shift;
my $report = shift; my $startDt = shift; my $endDt = shift; my $maxRes = shift;
#- my ($report, $startDt, $endDt, $maxRes) = @_;

my $url = $self->{ga_data_url};
$url .= $self->getProfileId();
$url .= $self->getDateDim($startDt, $endDt);
$url .= $self->getMaxResultDim($maxRes);

if ($report =~ m/$self->{standard_reports}->{visit_pageviews}/i) {
$url .= "&dimensions=ga%3Adate&metrics=ga%3Avisits%2Cga%3Apageviews";
} elsif ($report =~ m/$self->{standard_reports}->{visit_traffic}/i) {
$url .= "&dimensions=ga%3Asource%2Cga%3Amedium&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{visit_browser}/i) {
$url .= "&dimensions=ga%3Abrowser&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{language_country}/) {
$url .= "&dimensions=ga%3Alanguage%2Cga%3Acountry&metrics=ga%3Avisits&sort=-ga%3Avisits";
} elsif ($report =~ m/$self->{standard_reports}->{top_pageviews}/i) {
$url .= "&dimensions=ga%3ApagePath%2Cga%3ApageTitle&metrics=ga%3Apageviews&sort=-ga%3Apageviews";
} elsif ($report =~ m/$self->{standard_reports}->{n_days_visits}/i) {
$url .= "&dimensions=ga%3Adate&metrics=ga%3Avisits&filters=ga%3Avisits%3E1&sort=ga%3Adate";
} else {
print "GAReport.pm-getStandardReport-Error: Unknown standard report\n";
$url = undef;
}

#-- Which report are we requesting...
$self->{current_report} = $report;

return $url;
}


sub isDateDimSet {
my $self = shift;
my $url = shift;

if ($url =~ m/start-date=\d{4}\d{2}\d{2}/i and $url =~ m/end-date=\d{4}\d{2}\d{2}/i) {
return 1;
}
return undef;
}


#-- Returns date parameters of URL. If undefined dates, then sets them to a week before.
#-- i.e; "&start-date=2009-10-02&end-date=2009-10-09"
#------------------------------------------------------
sub getDateDim {
my $self = shift;
my $startDt = shift; #-- Start Date
my $endDt = shift; #-- End Date

my ($y, $m, $d, $dtURL); #-- Year, Month, Date
if (!defined $startDt or !defined $endDt) {
($y, $m, $d) = Today();
($y, $m, $startDt) = Add_Delta_Days($y, $m, $d, "-9"); #-- A week before day-before-yesterday. Allow 2 days lag.
($y, $m, $endDt) = Add_Delta_Days($y, $m, $startDt, "7"); #-- Add 7 days to start date to get a week's report.
$dtURL = "&start-date=$y-$m-$startDt&end-date=$y-$m-$endDt";
} else {
$dtURL = "&start-date=$startDt&end-date=$endDt";
}

return $dtURL;
}


#-- Returns maximum number of resutls requested from GA.
#-- Defaults to 50.
#--------------------------------------------------------
sub getMaxResultDim {
my $self = shift;
my $maxRes = shift;

if (!defined $maxRes or !$maxRes =~ m/\d+/) {
$maxRes = 50;
}
return "&max-results=$maxRes"
}


#-- Before making data request some sanity check...
#--------------------------------------------------------
sub validateRequest {
my $self = shift;

return 1;
}


#-- Prints out the dimensions and metrics of a report.
#--------------------------------------------------------
sub getReport {
my $self = shift;
my $url = shift;
my $ga_auth_code = shift;

#- print "\n\nGAReport-getReport: Pulling $self->{current_report} now.";

if (!defined $url or !defined $ga_auth_code) {
print "GAReport-getReport-Error: url($url)) or authorization code ($ga_auth_code) is missing.\n";
return undef;
}
#-- Construct user agent and make a get call.
my $ua = LWP::UserAgent->new;
$ua->agent("GA Test Data Retriever/0.5");

my $res = $ua->get($url, Authorization => "GoogleLogin Auth=$ga_auth_code");

my $data;
#-- Check response and get the data...
if ($res->is_success) {
$data = $res->content;
} else {
print "GAReport-getReport-Error: Data pull failed: ". $res->status_line;
return undef;
}

#-- Convert XML to Perl tree structure and parse.
#------------------------------------------------
my $xml = XML::Simple->new();
my $tree = $xml->XMLin($data);
print Dumper $tree;

my $tableName = $tree->{'dxp:dataSource'}->{'dxp:tableName'};
print "\n\nTablename: $tableName";

foreach my $ok (keys %$tree) { #-- Outermost hash keys
#- print "Tree key: $ok\n";
#- my $tableName = $ok->{'dxp:tableName'};


#-- Pageview & visits metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_pageviews/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

#-- print "\n\tentries: @hEntries\n";
printf "\n\n%-25s\t%s\t\t%s\t%s\n%s", "Profile", "Date", "PageViews", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $dt = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$dt}{'pageviews'} = $hEAttr->{'dxp:metric'}->{'ga:pageviews'}->{'value'};
$out{$dt}{'visits'} = $hEAttr->{'dxp:metric'}->{'ga:visits'}->{'value'};
}

foreach my $dt (sort keys %out) {
printf ("\n%-25s\t%d\t%d\t\t%d", $tableName, $dt, $out{$dt}{'pageviews'}, $out{$dt}{'visits'});
}
}


#-- Visitor traffic metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_traffic/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%-20s\t%s\n%s", "Profile", "Source", "Medium", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $src = $hEAttr->{'dxp:dimension'}->{'ga:source'}->{'value'};
my $medium = $hEAttr->{'dxp:dimension'}->{'ga:medium'}->{'value'};
$out{$src}{$medium}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $src (sort keys %out) {
foreach my $med (sort keys %{$out{$src}}) {
printf ("\n%-25s\t%-20s\t%-20s\t%d", $tableName, $src, $med, $out{$src}{$med}{'visits'});
}
}
}


#--- Visit and browser metrics
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/visit_browser/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%s\n%s", "Profile", "Browser", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $brow = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$brow}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $brow (sort keys %out) {
printf ("\n%-25s\t%-20s\t%d", $tableName, $brow, $out{$brow}{'visits'});
}
}


#--- Language and country metrics...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/language_country/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%-20s\t%s\n%s", "Profile", "Language", "Country", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $lang = $hEAttr->{'dxp:dimension'}->{'ga:language'}->{'value'};
my $country = $hEAttr->{'dxp:dimension'}->{'ga:country'}->{'value'};
$out{$lang}{$country}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $lang (sort keys %out) {
foreach my $country (sort keys %{$out{$lang}}) {
printf ("\n%-25s\t%-20s\t%-20s\t%d", $tableName, $lang, $country, $out{$lang}{$country}{'visits'});
}
}

}


#--- Top page views metrics...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/top_pageviews/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-30s\t%-20s\t%s\n%s", "Profile", "PagePath", "PageTitle", "PageViews", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $pagePath = $hEAttr->{'dxp:dimension'}->{'ga:pagePath'}->{'value'};
my $pageTitle = $hEAttr->{'dxp:dimension'}->{'ga:pageTitle'}->{'value'};
$out{$pagePath}{$pageTitle}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $pagePath (sort keys %out) {
foreach my $pageTitle (sort keys %{$out{$pagePath}}) {
printf ("\n%-25s\t%-30s\t%-20s\t%d", $tableName, $pagePath, $pageTitle, $out{$pagePath}{$pageTitle}{'visits'});
}
}

}


#--- N days of visits...
#-----------------------------------------------------------------
if ($self->{current_report} =~ m/n_days_visits/i and $ok =~ m/entry/i) {
my @hEntries = values %{$tree->{$ok}}; #-- hash refs

printf "\n\n%-25s\t%-20s\t%s\n%s", "Profile", "Date", "Visits", '-'x80;

my %out = ();
foreach my $hEAttr (@hEntries) {
my $dt = $hEAttr->{'dxp:dimension'}->{'value'};
$out{$dt}{'visits'} = $hEAttr->{'dxp:metric'}->{'value'};
}

foreach my $dt (sort keys %out) {
printf ("\n%-25s\t%-20s\t%d", $tableName, $dt, $out{$dt}{'visits'});
}

}
}

return 1;
}

1;

Main Program: ga_report_retriever.pl


#!/usr/local/bin/perl -w
#---/cygdrive/c/Perl/bin/perl -w

#-----------------------------------------------------------
#- Application to pull 6 popular Google Analytical reports.
#- Uses two classes - GAProfile and GAReport to perform data/report pull through APIs.
#- Reports are:
#- Pageview visits by date (visit_pageviews)
#- Visit by by source (visit_traffic)
#- Visit by brower (visit_brower)
#- Visit by language, country (language_country)
#- Top pageviews (top_pageviews)
#- "n" day visits (n_days_visits)
#-
#- Author: Shiva M.
#-----------------------------------------------------------
use lib ".";
use strict;
use Data::Dumper;

use GAProfile;
use GAReport;

#-- Get authroized to pull data. Pass username & password through GAProfile.
#-----------------------------------------------------------
my $gau = GAProfile->new();
$gau->urlEncode;
#-- $rc - return code 1 on success & set $ga_auth to authorization code.
my ($rc, $ga_auth) = $gau->getAuthorized();

#-- If failed, terminate the program.
if (!$rc) {
print "Authorization failed. Exiting. Fatal error: $ga_auth\n";
die;
}

#-- On success, get list of profiles to pull data from and try one profile at a time.
#------------------------------------------------------------
my @profiles = $gau->getProfiles();

my $report = GAReport->new();
foreach my $p (@profiles) {
$report->setProfileId($p->{profileId});

#-- Get a report by date for last 7 days by default.
#-- If n_days_visits report pass start & end dates and number of days between.
my $url = $report->getStandardReportURL("n_days_visits", '2009-12-01', '2009-12-30', 30);
$report->getReport($url, $ga_auth);
}