I recently attended a SQL SATURDAY precon in Minneapolis. The precon was an introduction to Hadoop for SQL users. The introduction got me interested enough to give Hadoop another try. In my spare time between last weekend and this I have been installing, configuring and playing around with Hadoop. My initial thoughts are that Hadoop is defiantly production ready despite what you might read from some analysts. Hortonworks Ambari made installing Hortonworks nodes painless.

You may not be aware of this but it is possible to query hadoop right from SQL SERVER using a linked server. In this tutorial I go through the steps needed to setup a linked server between Hadoop and SQL SERVER.

This tutorial was written using SQL SERVER 2012 and a three node Hortonworks cluster running HDFS 2.7, MapReduce2 2.7, YARN 2.7, and Hive 1.2. The Hortonworks cluster is running on CentOS 7.1.

Let’s get started, log into the Hadoop cluster via ssh. On the Linux cluster create a new user and add that user to the hadoop group.

shell> adduser sqlserver
shell> passwd sqlserver

Add user sqlserver to hadoop group

shell> usermod -a -G hadoop sqlserver

I ran through part of the HIVE tutorial and used their Group lens data set for sample data.

ssh into one of the Hadoop nodes and perform the following steps to load data into Hadoop and create a table.

shell> su hdfs
shell> wget
shell> unzip
shell> hive
hive>CREATE TABLE u_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)

hive>LOAD DATA LOCAL INPATH '/home/hdfs/ml-100k/' INTO TABLE u_data;
hive> exit

You should now have data loaded into the Hive table we just created in Hadoop.

Next, download the Microsoft HIVE ODBC driver and install it on your Microsoft SQL SERVER. Select the 32bit or 64bit driver which appropriate for your SQL SERVER.
Microsoft HIVE ODBC driver

Configure the ODBC driver using the “sqlserver” username and password you created earlier.

Log into SQL Management Studio and configure the linked server using the data source with the DSN used for the ODBC.

Configure the Hadoop username and password.

select *
from openquery (HIVE, 'select userid, movieid, rating from default.u_data')
where userid = 196;

If everything was configured correctly you should be able to query Hadoop from SQL SERVER.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s