AWS lambda has gained huge momentum in the last couple of years and enabled software architects/ developers to build FaaS (Function as a Service). As much as Lambda helps in scaling applications, it has some limitations like execution duration or memory space availability, etc. For long running jobs, typically in the backend or batch processing, 5 minute duration can be a deal breaker. But with appropriate data partitions and architecture it is still an excellent option for enterprises to scale their applications and be cost effective.
In the recent project, I architected data be loaded from a datalake into Redshift. The data is produced by an engine in batches and pushed to s3. The data partitioned on time scale and a consumer Python application will load this data at regular intervals into Redshift staging environment. For scalable solution datalake can be populated from multiple producers and similarly one or more consumers can drain the datalake queue to load to Redshift. The data from multiple staging tables are then loaded to final table after deduping and data augmentation.
As with lambda system it comes with many tools, applications, libraries including boto3, Python, Perl, etc. but not psycopg2 – a Python DB adapter/ wrapper. So one has to package psycopg2 along with the function (service application) and uploaded it while creating a lambda. Following are the steps that I took compile statically linked library on AWS Lambda AMI compatible system.
Lambda environment is a Linux system and the current available AMI is amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 . First start an instance of this AMI and download needed tools and source code.
# Download git/ lynx... > sudo yum install git ... > sudo yum install lynx ... # Download Postgres... > mkdir ~/postgres > cd ~/postgres > lynx https://ftp.postgresql.org/pub/source/v9.4.3/postgresql-9.4.3.tar.gz ... saved postgresql-9.4.3.tar.gz > gunzip post*gz > tar xvf post*tar ... [More than 6,200 files] # Download psycop... > mkdir ~/psycop > cd ~/psycop > lynx http://initd.org/psycopg/tarballs/PSYCOPG-2-6/psycopg2-2.6.1.tar.gz ... > gunzip *gz > tar xvf *tar ... # Make config changes to Postgres and build > cd ~/postgres > ./configure --prefix ~/postgres/postgresql-9.4.3 --without-readline --without-zlib > make # After install check for pg_config, psql, etc. executables # Make config changes to psycop and build statically linked lib > vim psycopg2-2.6.1/setup.cfg # Build pscycopg2 > python setup build # Zip lambda function and library > zip -r [A_ZIP.zip] *.py [LIBRARY] -x *.pyc -x [ANYTHING_NOT_NEEDED]
Precompiled one on Github: https://hiregion.org/2017/12/16/precompiled-redshift-python-db-adapter/
[…] regular intervals. The steps to compile the adapter suitable for AWS Lambda environment is given here. I also uploaded it to github here and one can use it without having to go through compilation […]