HadoopFDW: PostgreSQL Foreign Data Wrapper for accessing Hadoop

HadoopFDW is currently available on Windows, Linux & OSX for version 9.5 of our distribution. This Foreign Data Wrapper helps you to leverage the power of Hadoop from within PostgreSQL. It uses many of the latest FDW features of PostgreSQL to give you native access to Hadoop data.

PostgreSQL is the worlds most advanced open source RDBMS for transactionally intense OLTP applications. Hadoop is the worlds most advanced open source system for big data. The HadoopFDW bridges these two worlds.

Tutorial

HadoopFDW is developed and tested with Hadoop 2.6 & Hive 2.0. Below is an example where we connect to an HDP-2.4 Sandbox and create a Foreign Table within PostgreSQL pointing to one of the sample tables.

CREATE EXTENSION hadoop_fdw;

CREATE SERVER hadoop_server FOREIGN DATA WRAPPER hadoop_fdw
  OPTIONS (HOST 'hdp-vm', PORT '10000');

CREATE USER MAPPING FOR PUBLIC SERVER hadoop_server;

CREATE FOREIGN TABLE sample_07 (
    code                   TEXT,
    description            TEXT,
    total_emp              INT,
    salary                 INT
) SERVER hadoop_server OPTIONS (TABLE 'sample_07');
With the FOREIGN TABLE in place, run a query against it to retrieve data from Hive:

postgres=# SELECT code, total_emp FROM sample_07 ORDER BY code LIMIT 3;
  code   | total_emp
---------+-----------
 00-0000 | 134354250
 11-0000 |   6003930
 11-1011 |    299160
(3 rows)

For more complete information on the pre-req's, take a look at our documentation.