Recently I bought a new keyboard (see below) on Amazon for Macbook pro. I needed another cheaper keyboard when I might use it with wall mounted monitor near a tread mill. It is a knock off of Mac’s magic keyboard. Interestingly it has both bluetooth and wireless (2.4G) connectivity while having all the four keys Fn, Ctrl, Option and Cmd keys in the same order. Many of the third party mac keyboards either miss Fn key or place it near number keys and many other place them in Windows compatible keyboard order which is not ideal.
This keyboard worked well but for one major issue – Esc key didn’t act the way one expected. Nothing was happening and in “Vim” editor it is one of the most often used key. I needed to either fix or identify the right key code it is generating or map it to new code or return the item. I knew what ascii code (dec. 27) and key code (53) I should expect on pressing the key. Here are the corresponding codes.
Read More »
Recently I built an application that uses AWS lambda to load data from datalake to Redshift at regular intervals. The steps to compile the adapter suitable for AWS Lambda environment is given here. I also uploaded it to github here and one can use it without having to go through compilation steps.
Real estate and rentals are always interesting, especially when one is in the market either as a tenant or a landlord. Though some data is available in the market through APIs it is generally old by days or weeks or already summarized to the extent that it is not much helpful. And if you like to get most granular data to perform some analysis it gets even harder. Craigslist website is one good alternative and you can get most recent (real-time) data that one can use. But it needs quite a bit of work to pull, extract and clean before using it.Read More »
AWS CloudTrail tracks API calls made in one’s account and the all these calls are logged or can be analyzed. The output files are typically json formatted. I wrote another article here which provides some background why I needed to perform an investigative work 🙂 on these files to identify an offending application. In short, AWS deprecated some of the API calls and any application that made these calls were to be migrated.
These logs had multiple json objects in single line! And with more the API calls more the data and the number of output files. In one case I had more than 5,000 files generated over couple of days. At the client site I couldn’t get much help on code base since it was old Java code written by an outsourced company which had moved-on.
Then it became an exercise for me to use Apache Drill for the above scenario. First I took a single file and ran a Drill query:
0: jdbc:drill:zk=local> select T.jRec.eventSource, T.jRec.eventName,
from (select FLATTEN(Records) jRec
from dfs.`/cloudtrail_logs/144702NNNNNN_CloudTrail_us-east-1_20160711T2345Z_CJPTqBCGPPc1Bhqc.json`) T
group by T.jRec.eventSource, T.jRec.eventName,
order by EXPR$1;
Read More »
Recently I was posed an interesting work hack to quickly calculate distance measured in miles or kilometers given a set of LatLngs (latitudes and longitudes). The data points were in input files in csv format. File also had other other details.
# For quick visual check, I used vim and separated tabulate it with
:%column -t -s ,
Read More »
Oh oh. Github seemed to have major issue on Saturday from ~4.30pm PST. It is has been down for while now :(. The site status page https://status.github.com/ is showing
And main site
Real time analytics allows to track and monitor users and their activities; and then adjust what is presented to them. It could be a relevant advertisement like showing Nikon camera advertisement for a user searching for a new camera or camera prices or anything similar.
Taking one step further is when the price of an item(s) is adjusted depending on whether that user is a loyal customer and/or there is higher possibility that he/she may buy other accessories. Realtime predictive analytics makes this possible.
Below is an nice graph presented by Wall Street Journal and at http://www.ritholtz.com/blog/2012/09/lucky-us-toilet-paper-priced-like-airline-tickets/
Graph shows three companies’ (Sears, BestBuy and Amazon) price variation over a day for a microwave. Amazon increased prices during the peak hours by more than 10% (~8am to 12.30pm and then again 3pm to ~9pm EDT). All times shown in graph are in PDT (Pacific Day Time) timezone.
Even more interesting will be observe whether prices were varied based on user location or where Amazon’s servers were located? As it is simple to geo map the IP address of a user computer/device and vary the prices accordingly! Different users from different cities at different times will see different prices. The price points and user experience can be optimized for improved sales!
I ran into a following situation and it took some time to diagnose the issue and help from couple of folks from DBA and operations team to resolve it. Here is what happened.
I exported a large data set from MySQL to a file in directory /dir_1/dir_2/exported_file.txt, for example, in an application. Then after the file was exported the application went on to consume the file by reading it’s content. Since MySQL OUTFILE (exporting data) doesn’t overwrite a file if the file name already exists, the code would rename the file to *.bak. See below for pseudo code.
If OUTFILE exists
Move or Rename OUTFILE to OUTFILE.bak /* Step 1 */
Run MySQL export to OUTFILE /* Step 2 */
Check the error code
Read OUTFILE and parse /* Step 3 */
When I ran the application, it would sometime create the output file and go on to parse it correctly but many a times it would fail in step 1 throwing an error like “file already exists” when in fact it was not. Because I had removed the file with ‘rm -f’ before rerunning the program. Other times it would fail in step 3 indicating that file does not exists even though SQL exported the file successfully in step 2. I even provided sleep time between each step ranging from 5 to 60 seconds but continued to see the same random behavior.
After spending sometime trying to diagnose what might be going on, ended up debugging NFS caching. The directory /dir_1 was a mounted file system with NFS caching set to few hundred seconds. When the application wrote to NFS directory, the write cache was updated but not the OS directory structure (inode). Reducing the parameter setting (actimeo) to lower number, say 30 seconds, will help alliviate the delay. If sys admins are reluctant to change the older mounted system settings, you should get a new mount point with actimeo set (30). Once these changes were made application was able to run smoothly with the application sleep set to little higher than actimeo timings. Note, using actimeo sets all of acregmin, acregmax, acdirmin, and acdirmax to the same value. There is no default value. See man pages for more details.
Internet search users are very well aware of Google search engine and typically they end up entering their query or keywords in the search box and click the resulting links of interest/relevant to them (typically first page results and especially among top 3 o 4 results). Large percentage of users seem to be unaware of Google’s extended search that can provide exact result he or she is looking for without having to hop to one more site.
For example, weather for a particular city or current time at a location or dictionary or area code or sports scores and many more. Following are few examples.
For weather: To find current weather at Boston, USA just enter keywords “weather Boston, USA” and you will see today and next few days weather in the first result!
For time: To find current time at Bangalore, India (or Bengaluru, India) enter “time Bengaluru, India” and first result is the time.
For dictionary: use “define” keyword first before entering your query like “define avatar”
For stocks: This one is little too specific for many users because one need to know specific stock’s ticker symbol and there more than 10,000 of the them in US. It doesn’t seem to work for international stocks.
Area code: Enter US 3 digit area code to get phone’s area code info.
Fill in the Blank: My favourite when searching something for kids home work or a pop quiz. Try “Einstein got Nobel prize in *” or “Earth circumference is * miles”
There are many more that can save time and typing or mouse clicks! Check out Google’s search tips.