Meltdown and Spectre patch effects on Mac performance

With recently identified major flaws in modern CPUs, all major software and (CPU) hardware companies rushed to provide as quick solutions as possible to their systems.  For more detailed information checkout this site and for a programmer’s view or to test on a linux system try this.

Since the computer OS (Operating System) kernel space is highly protected (as well as any other process space) and isolated from other processes interference, any breakdown in this will lead to major issues.  Quote from the (Meltdown) paper – “The attack is independent of the operating system, and it does not rely on any software vulnerabilities. Meltdown breaks all security assumptions given by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation”

And paper also provides the details on the scope of  the issue which affects all modern computers/ phones!! Quote – “On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer” 

(PS: Bold highlighting added my me)

Reading through the paper and looking at the example code snippet below took me back to the days when I did some assembly level programming on Intel 8086 series. It was fun, challenging and interesting.

1 ; rcx = kernel address
2 ; rbx = probe array
3 retry:
4 mov al, byte [rcx]
5 shl rax, 0xc
6 jz retry
7 mov rbx, qword [rbx + rax]

Read More »

Compiling Python Redshift DB adapter for AWS Lambda env.

AWS lambda has gained huge momentum in the last couple of years and enabled software architects/ developers to build FaaS (Function as a Service).  As much as Lambda helps in scaling applications, it has some limitations like execution duration or memory space availability, etc.   For long running jobs, typically in the backend or batch processing, 5 minute duration can be a deal breaker.  But with appropriate data partitions and architecture it is still an excellent option for enterprises to scale their applications and be cost effective.

In the recent project, I architected data be loaded from a datalake into Redshift.  The data is produced by an engine in batches and pushed to s3.  The data partitioned on time scale and a consumer Python application will load this data at regular intervals into Redshift staging environment.  For scalable solution datalake can be populated from multiple producers and similarly one or more consumers can drain the datalake queue to load to Redshift.  The data from multiple staging tables are then loaded to final table after deduping and data augmentation.

Read More »

Quick rental market analysis with Python, Panda

houses_image_drawing rental_analysis_boxplot_2016

Real estate and rentals are always interesting, especially when one is in the market either as a tenant or a landlord.  Though some data is available in the market through APIs it is generally old by days or weeks or already summarized to the extent that it is not much helpful. And if you like to get most granular data to perform some analysis it gets even harder. Craigslist website is one good alternative and you can get most recent (real-time) data that one can use.  But it needs quite a bit of work to pull, extract and clean before using it.Read More »

Visualizing data processed consciously in a moment

Recently Google released a training material to its employees as part of how unconscious bias can happen in work place.  For more details see –
http://www.businessinsider.com/google-unconscious-bias-training-presentation-2015-12  which refers to research paper at http://rd.springer.com/chapter/10.1007%2F978-3-642-82598-9_3

It is very interesting and one of the slide (see below) specifically caught my attention with respect how huge amount of data is NOT processed consciously.   That is we can only handle 40 bits for every 11 million bits in a second!

That is just 1 part for every 275,000 parts!!

As interesting as it is the impact I thought will be even more dramatic when visualized it in some way.  Following are of couple of attempts at it using Python and ImageMagick.

Each tiny square (green or red) in the following picture represents 40 bits and each row has 275 of those squares. With 1000 rows (10 blocks of 100 rows each) of those we get 11 Million bits of data represented.  This is just for one second of duration!

Just imagine the the scale for one’s life time (80 years) data. All those 275,000 blocks above are to be repeated 2,522,880,000 times!!!

Another way is tiny string, say 5 pixels long, represents 40 bits then total length of 11M bits will be 275k * 5 pixels long!  See below.  Small green line at the center of the diagram is a string of few pixels and around which the really long string i.e., 275K times long string is wound around a square peg. At each of 25 rounds the string changes color for better visibility. Totally there are 742 sides.

at end of 25th round the length is 5,050 times longer and
at end of 50th round it is 20,100 times
at end of 100th round it is 80,200 times
at end of 150th round it is 180,300 times
and finally at 185.25 round it is 274,111 times longer

Note: Keep in mind it is not the size (area) of the squares but the total perimeter length of all these squares that represents ratio of data.

NASA Orion Journey To Mars Data Analysis

I have always been very much interested in Physics and enjoy reading related books, articles or watch shows like Carl Sagan’s Cosmos, Freeman’s Through the Wormhole, etc.  For that matter this site name *hiregion.com* is derived from H-I-Region (Interstellar cloud).

When I saw NASA’s – “Send Your Name on NASA’s Journey to Mars, Starting with Orion’s First Flight”, I was excited to put my family, relatives and friends’ names along with few charity names.  The names will be placed on a microchip of Orion’s test flight on Dec. 4, 2014 that orbits around the Earth and on future journey to Mars!  Following quote from the NASA site:

Your name will begin its journey on a dime-sized microchip when the agency’s Orion spacecraft launches Dec. 4 on its first flight, designated Exploration Flight Test-1. After a 4.5 hour, two-orbit mission around Earth to test Orion’s systems, the spacecraft will travel back through the atmosphere at speeds approaching 20,000 mph and temperatures near 4,000 degrees Fahrenheit, before splashing down in the Pacific Ocean.

But the journey for your name doesn’t end there. After returning to Earth, the names will fly on future NASA exploration flights and missions to Mars.

More info at

Orion test flight uses big boy Delta IV (biggest expendable launch system) and Orion after orbiting earth twice will reenter and splash down in Pacific ocean.

Courtesy NASA/ Wikipedia.org

Some of sample boarding passes:

By the time the entries were closed, I think it was on Oct.31, there were nearly 1.4million (1,379,961 exactly) names and the top countries were United States, India and United Kingdom by count with people from almost all countries having submitted their names.  For more details see http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/ .  Bar chart below shows the same info.

Though US, India and UK were the top three by number of names submitted I was curious to know how countries did when adjusted for population size, GDP and area (sq. miles).  With that in mind I pulled NASA data and country data from the following web sites.

Built a quick Python script to do data pull, join country data and perform some minor calculations.  The code is located here at Gist or see end of this post.

Running through few R scripts and clustering them based on each country’s

  • Orion passenger count/ 10K people
  • Orion passenger count/ 1K sq. miles
  • Orion passenger count/ Billion $ GDP

and then normalized through R scale for cluster selection.  Optimal cluster seem to be 7 or 8. Monaco and Singapore are major outliers due to skew that happened with their small geographical area (sq. miles). See below – Monaco is that single dangler at the top right and Singapore/ Hungary are at bottom right but above rest of other countries.

Scatter plot shows much more clearly the two countries standing out especially in the middle tiles below – passengers_per_1K_sq_miles vs other two metrics ( passengers_per_10K_population and passengers_per_1Billion_gdp).

And after removing those two countries from the data frame and clustering again results in the following:

That is an interesting cluster.  Countries that had highest entries adjusted for population, GDP, geo size Hungary tops the list! Maldives, Hong Kong, UK and Malta take other top 4 places.  Quick normalized scores look like:

country Score(/Pop.) Score(/Area) Score(/GDP) Score_ABS
Hungary 5.783493976 1.560361327 4.485219257 11.82907456
Maldives 0.715814116 4.784567704 4.43908513 9.939466951
Hong Kong -0.217141885 7.8493819 -0.59223565 8.658759434
United Kingdom 3.957774546 2.869764313 1.288187419 8.115726277
Malta 1.085016478 5.903919255 0.393610721 7.382546454
Bangladesh -0.195758981 1.116466958 4.697494631 6.00972057

Cluster (optimal) size analysis:

It is always fun playing around with different ways to slice and dice data and below bubble world map shows simple metric of passengers count for each billion dollar GDP.

Top 5 countries, in this case, are

Bangladesh 133.95982
Hungary 128.75381
Maldives 127.62238
Philippines 125.95591
Kosovo 106.8

It will be more interesting to see how the numbers relate with each country’s science and technology budget.  I will try doing it in next few days as some of the data is already available in the wild.  In ideal world there should be good percent of the yearly budget allocated to Science & Tech.

Data pull Python code:

#!/Users/shiva/anaconda/bin/python
# -*- coding: utf-8 -*-

import os
import sys
import re
import locale
import pprint
import scraperwiki
from bs4 import BeautifulSoup
from collections import defaultdict


class NasaData():
nasa_file_path = "/tmp/nasa_orion_reg_by_country.txt"
ctry_file_path = "/tmp/countrycode_org_data.txt"
nasa_site = "http://mars.nasa.gov/participate/send-your-name/orion-first-flight/world-participation-map/"
ctry_site = "http://countrycode.org/"
metrics_file_path = "/tmp/nasa_metrics_by_country.txt"

def __init__(self):
pass


def get_nasa_entries():
'''
Scrape NASA Orion participants count by country data
Ouptput to file nasa_orion_reg_by_country.txt
Args: None
'''

html = scraperwiki.scrape( NasaData.nasa_site )
soup = BeautifulSoup( html )

out_file = NasaData.nasa_file_path
if os.path.exists( out_file ) and os.path.getsize( out_file ) > 10:
print "Warning: " + out_file + " exists. Continuing without scraping NASA data.\n"
return False

countries = soup.find( 'ul', class_='countryList' )
with open( out_file, 'wt' ) as fh:
for country in countries.findAll('li'):
c_name = country.find('div', class_='countryName').text
c_num = country.find('div', class_='countNumber').text.strip()
# line = c_name + "," + c_num + "\n"
line = ''.join([c_name, ',', c_num, '\n'])
fh.write(line)

return True


def get_country_details():
'''
Scrape countrycode data including population, gdp, area, etc.
Dump output to file countrycode_org_data.txt
Args: None
'''

html = scraperwiki.scrape(NasaData.ctry_site)
soup = BeautifulSoup(html)

out_file = NasaData.ctry_file_path
if os.path.exists( out_file ) and os.path.getsize( out_file ) > 10:
print "Warning: " + out_file + " exists. Continuing without scraping COUNTRY_CODE data.\n"
return False

cnty_table = soup.find( lambda tag: tag.name == 'table' and tag.has_attr('id') and tag['id'] == "main_table_blue" )
countries = cnty_table.findAll( lambda tag: tag.name == 'tr' )
with open( out_file, 'wt' ) as fh:
for country in ( countries ):
cnty_str = '|'

cnty_attr = country.findAll( lambda tag: tag.name == 'th' )
if ( cnty_attr ):
for attr in ( cnty_attr ):
cnty_str += attr.contents[0] + "|"
else:
cnty_attr = country.findAll( lambda tag: tag.name == 'td' )
if ( cnty_attr ):
for ix, val in ( enumerate(cnty_attr) ):
if ix == 0:
cnty_str += val.findAll( lambda tag: tag.name == 'a' )[0].string + "|" # Get country name
else:
cnty_str += val.contents[0].strip() + "|" # Get country attrs

# print cnty_str
fh.write( cnty_str + "\n" )

return True


def join_country_data():
'''
Join two data sets by country name and write to file nasa_metrics_by_country.txt
country names and its metrics
Args: None
'''
fh = open( NasaData.metrics_file_path, 'wt' )
# Country names lowercased, removed leading "The ", removed leading/trailing and extra spaces
nasa_data = defaultdict(list)
cc_org_data = {}

for line in open( NasaData.nasa_file_path, 'rt' ):
ln_els = line.strip('\n').split(',')
ln_els[0] = ln_els[0].lower()
ln_els[0] = re.sub(r'(^[Tt]he\s+)', '', ln_els[0])
ln_els[0] = re.sub(r'(\s{2,})', ' ', ln_els[0])
nasa_data[ln_els[0]].append(ln_els[1]) # orion_vote appended

# nasa_data dict appended with country data. key:country => values[orion_votes, pop., area, gdp]
for l_num, line in enumerate( open( NasaData.ctry_file_path, 'rt') ):
# line: |Afghanistan|AF / AFG|93|28,396,000|652,230|22.27 Billion|
if l_num == 0: continue # Skip header

ln_els = line.strip('\n').split('|')
ln_els[1] = ln_els[1].lower()
ln_els[1] = re.sub(r'(^[Tt]he\s+)', '', ln_els[1])
ln_els[1] = re.sub(r'(\s{2,})', ' ', ln_els[1])

# Strip out comma in pop(element 4) and area (5)
nasa_data[ln_els[1]].append( ln_els[4].translate(None, ',') ) # pop appended
nasa_data[ln_els[1]].append( ln_els[5].translate(None, ',') ) # area appended

# Normalize gdp to millions
gdp = re.match( r'(\d+\.?\d*)', ln_els[6] ).group(0)
gdp = float(gdp)
if re.search( r'(Billion)', ln_els[6], re.I ):
gdp = gdp * 1000
elif re.search( r'(Trillion)', ln_els[6], re.I ):
gdp = gdp * 1000000
nasa_data[ln_els[1]].append( gdp ) # gdp appended


# TODO: Some country names are not standard in NASA data. Example French Guiana is either Guiana or Guyana
# Delete what is not found in country code data or match countries with hard coded values


locale.setlocale(locale.LC_ALL, '')
for cn in sorted(nasa_data): # country name
# array has all nasa_votes, pop., sq miles, gdp and has pop > 0 and gdp > 0. Capitalize name.
if len(nasa_data[cn]) > 3 and int(nasa_data[cn][1]) > 0 and int(nasa_data[cn][3]) > 0:
l = ( cn.title() + ":" + nasa_data[cn][0]
+ ":" + locale.format( '%d', int(nasa_data[cn][1]), 1 ) # pop
+ ":" + str( round( float( nasa_data[cn][0] ) * 10000/ int(nasa_data[cn][1]), 5 )) # per 10K pop
+ ":" + locale.format( '%d', int(nasa_data[cn][2]), 1 ) # area
+ ":" + str( round( float( nasa_data[cn][0]) * 1000 / int(nasa_data[cn][2]), 5 )) # per 1K sq mile
+ ":" + locale.format( '%d', int(nasa_data[cn][3]), 1 ) # gdp
+ ":" + str( round( float( nasa_data[cn][0]) * 1000 / nasa_data[cn][3], 5 )) # per Billion $ gdp
+ "\n"
)
fh.write(l)

return True



if __name__ == "__main__":
get_nasa_entries()
get_country_details()
join_country_data()
exit( 0 )