http://cdn.jpillora.com/js/jp-prettify.js Last few months I started woking in Greenplum/ postgres environment and was pleasantly surprised to see the regular expression capabilities. Compared to Mysql (5.1) it has quite advanced RE features. For example I ended up using the following query to strip the decimal numbers from another string.
Input strings has the format “BucketN_N[ -N | + ]” where N is decimal number. For example, “Bucket1_1” or “Bucket7_21-50” or “Bucket5_100+” are all valid and output should extract bucket number (first N), minium value (second N) and maximum value of the bucket (third optional N).
The query
SELECT
bucket_string
,substring(qty_bucket, 7, 1)::integer as bucket_num
,regexp_replace(bucket_string, '^Bucket\\d{1,}_(\\d{1,})([-+].*)?', E'\\1')::integer as minimum_qty
,regexp_replace(regexp_replace(bucket_string, '.*-(\\d{1,})', E'\\1'), '.*_(\\d{1,})([+]?)$', E'\\1')::integer
as maximum_qty
FROM a_table
ORDER BY bucket_string
Though I could have used other functions like substring and other, it was fun using regular expression in smallest code. regexp_replace is the workhorse taking a string and replacing it with matched string (\\1 or $1). “^ and $” are anchors to beginning and end of string, \\d matches with decimal numbers and {1,} means 1 to many quantifier.
And more with regexp_matches and array selection. I wanted to select offset number of days from 2012 December 01 in this simplified example. The offset is a text in another external table of Greenplum and data is not always clean and could have some text before the offset.
SELECT d
regexp_matches(d, '(?:.*_)?(\\d+)$') re_ary
, '2012-12-01'::date+((regexp_matches(d, '(?:.*_)?(\\d+)$'))[1] || 'days')::interval new_d
FROM
(
SELECT '90' d
union
SELECT 'new_90' d
union
SELECT 'old_90_10' d
) tt
returns
d re_ary new_d
-------------------------------------
90 {90} 2013-03-01 00:00:00
new_90 {90} 2013-03-01 00:00:00
new_90_10 {10} 2012-12-11 00:00:00
With “?:” is non-capturing group and this matched text is optional meaning any text before string ending number is not-captured for later use and original string may have it but not all the time. ‘$’ anchors it to the end of string with regexp_matches returning an array of single element and subscript [1] selecting it.
Great and useful article. Creating content regularly is very tough. Your points are motivated me to move on.
SEO Company in Chennai