Real estate and rentals are always interesting, especially when one is in the market either as a tenant or a landlord. Though some data is available in the market through APIs it is generally old by days or weeks or already summarized to the extent that it is not much helpful. And if you like to get most granular data to perform some analysis it gets even harder. Craigslist website is one good alternative and you can get most recent (real-time) data that one can use. But it needs quite a bit of work to pull, extract and clean before using it.
I use my own tool written in Python to help me performing the analysis. The tool helps me in two steps – 1. by pulling and extracting data and then 2. cleaning and saving the data to files. The tool can be configured to use specific web links to pull the relevant data.
You can pull rental data at multiple levels including zip code and city. In the following example I used city name. Once the tool pulls the data it looks for rent (dollar), size of the house (square feet) and number of bedrooms then save these data elements in a csv file.
sl,rent,bed_rooms,sq_ft,data_date 1,4300,4,2500,2016-01-07 2,3400,4,2100,2016-01-07 3,3095,4,1704,2016-01-07 4,3200,4,1700,2016-01-07 5,4995,5,4645,2016-01-07 ...
Then comes using Jupyter and Panda to build metrics like mean, median, rent/sq_ft, etc.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib matplotlib.style.use('ggplot') matplotlib.use('TkAgg') rents = pd.read_csv('/Data/craig_eby_housing_sam.csv') rents = rents.drop('data_date', 1) rents = rents.drop('sl', 1) # rents.groupby('bed_rooms').describe() new_rents_df = rents # make a copy new_rents_df['rent_per_sq_ft'] = new_rents_df['rent']/new_rents_df['sq_ft'] new_rents_df.head(5) aggs2 = { 'rent' : { # 'count': 'count', 'mean' : 'mean', 'max' : 'max', 'min' : 'min', 'median': 'median', 'std' : 'std', }, 'sq_ft' : { 'mean' : 'mean', 'max' : 'max', 'min' : 'min', 'median': 'median', 'std' : 'std' }, 'rent_per_sq_ft' : { 'mean' : 'mean', 'max' : 'max', 'min' : 'min', 'median': 'median', 'std' : 'std' }, } metrics = new_rents_df.groupby('bed_rooms').agg(aggs2) metrics
fig, ax = plt.subplots() new_df = pd.DataFrame(new_rents_df, columns=('bed_rooms', 'rent', 'sq_ft')) new_df.boxplot(by='bed_rooms') plt.show()
Note: Y axis is in $ for rent and ft2 for sq_ft chart
This gives quick insight into outliers, quintiles, mean, etc and helps in making one’s decision. There is lot more goes into renting before the final decision but the above analysis could be the first step in the process.
This can easily be extended to see how rent/sq_ft varies with respect to number of bedrooms or see the trend over time. With the column data_date above and storing all the scraped data in files or in cloud we can see how rental market is trending in a specific geo location.
Shiva,good to see you have used python to pull panda out.
See if you can use the tool to do similar analytics of Bengaluru JP Nagar near JP Nagar Metro station.
Will be thankful to see the data.