Le Nguyen The Dat bio photo

Le Nguyen The Dat

Data Science and Engineering at Uber. Kaggle Master.

Email Twitter Facebook LinkedIn Github

Ran into this request just today, in short, we need to create a downstream Database for our Data Warehouse for better scalability, security, and performance for our internal users.

Prerequisite:

Note: currently - July 2014 - there is a bug that prevents the following query to finish, when you try to do it from an RDS instance type M2, M3 or R3 - the query would run on Redshift for about 15 minutes (you can’t even kill it) and ends with a blank error message. Just choose RDS Instance type M1 and it would be just fine.

Step-by-step instructions:

  • Connect to your instance (Remember to set your instance permission / firewall properly):
psql -h your.rds.address -p port -U username -d dbname
  • Configuring dblink:
CREATE EXTENSION dblink;
SELECT dblink_connect('dbname=dbname user=user password=password
                       host=your.redshift.address port=port');
  • Copy a table from Redshift to RDS:
SELECT mytable.*
INTO rds_schema_name.rds_table_name
FROM
    dblink('SELECT field_1, field_2 FROM redshift_schema.redshift_table')
AS
    mytable(field_1 text, field_2 text);
Get data from the given table in RDS:
SELECT field_1, field_2
FROM rds_schema_name.rds_table_name;

Do it with aws-redshift-to-rds:

In case you need to solve this problem ASAP, here is a little command-line interface tool (named: aws-redshift-to-rds) that I’ve developed while working at Zalora :).