I recently attended a SQL SATURDAY precon in Minneapolis. The precon was an introduction to Hadoop for SQL users. The introduction got me interested enough to give Hadoop another try. In my spare time between last weekend and this I have been installing, configuring and playing around with Hadoop. My initial thoughts are that Hadoop is defiantly production ready despite what you might read from some analysts. Hortonworks Ambari made installing Hortonworks nodes painless.
You may not be aware of this but it is possible to query hadoop right from SQL SERVER using a linked server. In this tutorial I go through the steps needed to setup a linked server between Hadoop and SQL SERVER.
This tutorial was written using SQL SERVER 2012 and a three node Hortonworks cluster running HDFS 2.7, MapReduce2 2.7, YARN 2.7, and Hive 1.2. The Hortonworks cluster is running on CentOS 7.1.
Let’s get started, log into the Hadoop cluster via ssh. On the Linux cluster create a new user and add that user to the hadoop group.
shell> adduser sqlserver shell> passwd sqlserver
Add user sqlserver to hadoop group
shell> usermod -a -G hadoop sqlserver
ssh into one of the Hadoop nodes and perform the following steps to load data into Hadoop and create a table.
shell> su hdfs shell> wget http://files.grouplens.org/datasets/movielens/ml-100k.zip shell> unzip ml-100k.zip
shell> hive hive>CREATE TABLE u_data ( userid INT, movieid INT, rating INT, unixtime STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; hive>LOAD DATA LOCAL INPATH '/home/hdfs/ml-100k/u.data' INTO TABLE u_data; hive> exit
You should now have data loaded into the Hive table we just created in Hadoop.
Next, download the Microsoft HIVE ODBC driver and install it on your Microsoft SQL SERVER. Select the 32bit or 64bit driver which appropriate for your SQL SERVER.
Microsoft HIVE ODBC driver
select * from openquery (HIVE, 'select userid, movieid, rating from default.u_data') where userid = 196;