fizzbuzz in Ruby, Python, Perl, more

Recently, I ran into couple of articles on codinghorror.com  that covered interviews with programmers/ software engineers.  Many of the commentators answered the programming (fizzbuzz) question using their favorite computer language like VBA, Ruby, Perl, etc.  Since the post was no more allowing any new comments, I thought of posting them here on my blog itself.

fizzbuzz question: For all numbers between 1 and 100, print ‘fizz’ any number divisible by 3, ‘buzz’ for number divisible by 5 and ‘fizzbuzz’ for number divisible by both 3 & 5.

Article 1:  Why Can’t Programmers.. Program?
Article 2: The Non-Programming Programmer

I have used ‘map’ function to solve it and mapping function provide nice way to work through lists. There are few other ways you can solve it as well including the ones already given in the original articles above.

Perl:
sub fb {
if ($_ % 15 == 0) { return "fizzbuzz\n" }
elsif ($_ % 3 == 0) { return "fizz\n" }
elsif ($_ % 5 == 0) { return "buzz\n" }
return "$_\n";
}

print map {fb($_)} (1..100)

Ruby:

You can also use select method

(1..100).map { |n| if (n%15 == 0):   puts 'fizzbuzz'
elsif (n%5 == 0): puts 'buzz'
elsif (n%3 == 0): puts 'fizz'
elsif puts n
end
}
puts "----------------------------"

(1..100).select { |n| if (n%15 == 0): puts 'fizzbuzz'
elsif (n%5 == 0): puts 'buzz'
elsif (n%3 == 0): puts 'fizz'
elsif puts n
end
}

Python:

def fizzbuzz(x):
if x%15 == 0: return 'fizzbuzz'
elif x%3 == 0: return 'fizz'
elif x%5 == 0: return 'buzz'
return x

def p(x):
print x

map(p, map(fizzbuzz, range(1,101)))

SQL (MySQL):

SELECT  n, 
CASE
when n % 3 = 0 AND n % 5 = 0 then 'fizzbuzz'
when n % 5 = 0 then 'buzz'
when n % 3 = 0 then 'fizz'
END
FROM
( SELECT @n := @n + 1 as n
FROM some_small_table -- A table with more than 100 rows
(SELECT @n := 0) x -- Set variable to 0 in the begining
LIMIT 100 -- Limit to 100
) y;

Hope you enjoyed it.
Shiva

Adding disk space to Ubuntu (guest OS)

Recently, in one of my VMWare system (with Windows XP host OS and Ubuntu guest OS), I ran into disk space shortage while installing a package (Thrift – A cross language service).  Though there were many articles on the Internet in helping extending the guest OS partitions, they were for older versions or not applicable any more or had many complex steps.  On spending some time and digging more on the issue, I found pretty straight forward steps in achieving what I wanted to do – doubling the disk space for Ubuntu.

Initial setup:
  Host OS: Windows OS
  Guest OS: Ubuntu Karmic  (had 8G pre-allocated)
  Disk: 148G Total and 60G Free out of which 8G to be allocated to Ubuntu

Needed:
   Double Ubuntu disk space to 16G by adding a new partition


Step 1:

  Run “disk analyzer” and get current disk space usage data that provides total, used, available and usred.  See below.  The system has less than 10% free disk space and needs more.


Step 2:

On deciding to go with allocating new disk space, you must first back up the disk and turn-off the virtual machine.  Select the virtual machine, if you have more than one in VMPlayer, and click “Edit virtual machine settings” and click Hard Disk item and set the expand disk capacity to 16G. (8 + 8) as show below.  Also make sure that you have enough free space available (in host) to allocated to guest OS.


Step 3:

Download “gparted” – a GUI disk utility by running
> sudo apt-get install gparted
> sudo gparted

Select the “unallocated” partition and select “Create partition”. In the pending operations pane you will see “Create Primary Patition …” and click “Apply All Operations” (A green check mark in tool bar).


Step 4:

Now mount this new partition. Say for example, I am going use this for /src2  directory, then
> sudo  mkdir   /src2
> sudo  mount  -t  ext4   /dev/sda3   /src2

Move some big directories to this partition or new directory and thus releasing some disk space free on /dev/sda1 in my screen shots above.  And after the move I had 60% free on /dev/sda1 partition (an important boot partition)
> mv   /big_directory   /src2/big_directory

Also for more on partitioning see http://help.ubuntu.com/community/HowtoPartition  and if you like to create a bootable USB Linux distro check out this post.

Shiva

MySQL SP parameter for IN clause

Recently, I came across a question on stackoverflow.com where the poster indicated that he was not able to use the parameter string as part of stored procedure IN clause to get the right results and the result set always returned one row only.

Test table used for procedure:

CREATE TABLE cities (
id int(10) unsigned NOT NULL auto_increment,
name varchar(100) NOT NULL,
PRIMARY KEY (`id`)
);
insert into cities (name) values
('London'), ('Manchester'), ('Bristol'),
('Birmingham'), ('Brighton');

Original stored procedure:

DROP PROCEDURE IF EXISTS `cities_select_by_ids` $$
CREATE PROCEDURE `cities_select_by_ids`(
in _cityIds varchar(1000)
)
BEGIN
SET @cityIds = _cityIds;

PREPARE stmt FROM '
select
id,
name
from cities
where id in (?);
';

EXECUTE stmt USING @cityIds;
DEALLOCATE PREPARE stmt;

END $$
DELIMITER ;

call cities_select_by_ids_prepare('1, 2, 3');
#-- Only 1 city name returned (London).

One work around would be to split the string into individual elements and run the select as below.

DROP PROCEDURE IF EXISTS `cities_select_by_ids_2` ;
CREATE PROCEDURE `cities_select_by_ids_2`(
in cityIDs varchar(1000)
)
BEGIN

#- ix - index into the list of city IDs
#- cid - city ID
SET @ix := 1;
SET @cid := substring_index(cityIDs, ',', @ix);

LOOP_1:
WHILE (@cid is not null) DO
SELECT id, name
FROM cities
WHERE id in (@cid) ;

#-- substring_index returns complete cityIDs string when index is > number of elements
IF (length(substring_index(cityIDs, ',', @ix)) >= length(cityIDs)) THEN
LEAVE LOOP_1;
END IF;

SET @ix := @ix + 1;
SET @cid = substring_index(substring_index(cityIDs, ',', @ix), ',', -1);

END WHILE;
END

#----
call cities_select_by_ids_2('1, 2, 3');

Log Parsing through Hadoop, Hive & Python

One of the primary analysis done on web access logs is some cohort analysis where one need to pull user access date time and along with other dimensions like user, ip, geo data, etc. Here I will be using Hadoop/ Hive/ Python to pull date, ip data from access log into Hadoop and run some queries. The example illustrates using Hadoop (version 0.20.1) streaming, SERDE, Hive’s (version 0.40) plugin customer mapper (get_access_log_ip).

The steps below load few thousand rows into a target table (dw_log_ip_test – data warehouse access log) “access_log_2010_01_25” then extract date from format like DD/Mon/YYYY:HH:MM:SS -800 to ‘DD/Mon/YYYY’ along with remote ip address through a Python streaming script.

Step 1: First create a table to access log (access_log_2010_01_25) and then load data into it.


hive> 
CREATE TABLE access_log_2010_01_25 (
request_date STRING,
remote_ip STRING,
method STRING,
request STRING,
protocol STRING,
user STRING,
status STRING,
size STRING,
time STRING,
remote_host STRING,
ts STRING,
perf STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "\\[([^]]+)\\] ([^ ]*) \"([^ ]*) ([^ ]*) ([^ \"]*)\" user=([^ ]*) status=([^ ]*) size=([^ ]*) time=([^ ]*) host=([^ ]*) timestamp=([^ ]*) perf=([^ ]*)",
"output.format.string" = "%1$s %2$s \"%3$s %4$s %5$s\" user=%6$s status=%7$s size=%8$s time=%9$s host=%10$s timestamp=%11$s perf=%12$s"
)
STORED AS TEXTFILE;


hive> LOAD DATA LOCAL INPATH '/mnt/web_ser101/weblog_server101_20100125_1'
> OVERWRITE INTO TABLE access_log_2010_01_25;
#- After load the data in one of the record would look like:
#- 25/Jan/2010:13:14:05 -0800 123.123.123.123 GET /xmls/public/thumbnail.xml HTTP/1.1 - 302 250 0 abcd.com 1264454045 -

Step 2: Create a target test table


hive>  CREATE  TABLE  dw_log_ip_test (dt string, remote_ip string);

Step 3: In an editor of your choice, build a simple Python script (get_access_log_ip.py) that gets “date” string from “date/time” string and “remote_ip” address as below.


#!/usr/bin/python
import sys
for line in sys.stdin.readlines():
line = line.strip()
fields = line.split('\t')
dt = fields[0].split(':')[0] #-- Get date 25/Jan/2010
ip = fields[1] #-- Get remote IP
print dt,"\t",ip

Step 4: Now extract data to dw_log_ip table and load only some limited data (10 seconds data)


hive>  FROM access_log_2010_01_25  L
> INSERT OVERWRITE TABLE dw_log_ip MAP L.request_date, L.remote_ip
> USING '/home/hadoop/hive/etl/scripts/get_access_log_ip.py' AS dt, remote_ip
> WHERE L.request_date > '25/Jan/2010:13:11:40'
> and L.request_date < '25/Jan/2010:13:11:50';

# Hive outputs some information like:
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Ended Job = job_201001251623_0094, Tracking URL = http://hadoop_main:50030/jobdetails.jsp?jobid=job_201001251623_0094
Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=hdfs://hadoop_main:9001 -kill job_201001251623_0094
2010-02-03 18:42:40,793 Stage-1 map = 0%, reduce = 0%
2010-02-03 18:42:53,874 Stage-1 map = 50%, reduce = 0%
2010-02-03 18:43:05,981 Stage-1 map = 100%, reduce = 0%
2010-02-03 18:43:09,012 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201001251623_0094
Ended Job = -416940662, job is filtered out (removed at runtime).
Launching Job 2 out of 2


Loading data to table dw_log_ip_test
11110 Rows loaded to dw_log_ip_test
OK

#-- Now check the results...
hive > select dt, remote_ip from dw_log_ip;
hive > select dt, remote_ip, count(1)
> from dw_log_ip
> group by dt, remote_ip;

Enjoy,
Shiva