Real estate and rentals are always interesting, especially when one is in the market either as a tenant or a landlord. Though some data is available in the market through APIs it is generally old by days or weeks or already summarized to the extent that it is not much helpful. And if you like to get most granular data to perform some analysis it gets even harder. Craigslist website is one good alternative and you can get most recent (real-time) data that one can use. But it needs quite a bit of work to pull, extract and clean before using it.Read More »
AWS CloudTrail tracks API calls made in one’s account and the all these calls are logged or can be analyzed. The output files are typically json formatted. I wrote another article here which provides some background why I needed to perform an investigative work 🙂 on these files to identify an offending application. In short, AWS deprecated some of the API calls and any application that made these calls were to be migrated.
These logs had multiple json objects in single line! And with more the API calls more the data and the number of output files. In one case I had more than 5,000 files generated over couple of days. At the client site I couldn’t get much help on code base since it was old Java code written by an outsourced company which had moved-on.
Then it became an exercise for me to use Apache Drill for the above scenario. First I took a single file and ran a Drill query:
0: jdbc:drill:zk=local> select T.jRec.eventSource, T.jRec.eventName,
from (select FLATTEN(Records) jRec
from dfs.`/cloudtrail_logs/144702NNNNNN_CloudTrail_us-east-1_20160711T2345Z_CJPTqBCGPPc1Bhqc.json`) T
group by T.jRec.eventSource, T.jRec.eventName,
order by EXPR$1;
Read More »
Recently I was posed an interesting work hack to quickly calculate distance measured in miles or kilometers given a set of LatLngs (latitudes and longitudes). The data points were in input files in csv format. File also had other other details.
# For quick visual check, I used vim and separated tabulate it with
:%column -t -s ,
Read More »