ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1伪分布式环境部署

ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1伪分布式环境部署

目录

[toc]

前言

hadoop2.2.0、zookeeper3.4.5、hbase0.96.2、hive0.13.1都是什么

  hadoop2.2.0的介绍以及特性,参考这里:http://blog.yidooo.net/archives/hadoop-2-2-0-new-features.html
  zookeeper的介绍,参考这里:http://baike.baidu.com/view/3061646.htm
  hbase的介绍,参考这里:http://baike.baidu.com/view/1993870.htm
  hive0.13的介绍以及特性,参考这里:http://www.csdn.net/article/2014-04-22/2819438-Cloud-Hive
  
  四款软件打包后的文件,我放到了这里:http://pan.baidu.com/s/1i35PlI1
  我想能够看这篇文章的人,都会具备一些基础知识,这里就不多介绍了。
  BTW:我是用MAC10.09+Parallels9虚拟的4个ubuntu。分别为m1,m2两个主,s1,s2两个从,共四台机器。
  

1. 这些软件在哪里下载?

  hadoop2.2.0:http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
  zookeeper3.4.5:http://apache.dataguru.cn/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
  hbase0.96.2:http://mirrors.hust.edu.cn/apache/hbase/hbase-0.96.2/hbase-0.96.2-hadoop2-bin.tar.gz
  hive0.13.1:http://mirrors.cnnic.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
  JDK1.7.0_65:使用apt-get方式安装
  
  这里hadoop2.2.0使用的是源码包,因为我使用的是64bit的ubuntu,而hadoop官方提供的,只有32bit可用。如果在64bit上运行会报错util.NativeCodeLoader – Unable to load native-hadoop library for your platform..错误,所以需要重新在64bit上编辑,后面我会单独写一篇文章介绍如何编译64bit的hadoop。

2. 如何安装

2.1 安装JDK(当前主机名为m1)

    1)执行以下命令

root@m1:/home/hadoop# sudo apt-get install oracle-java7-installer

    2)配置JAVA环境变量

root@m1:/home/hadoop# sudo vi /etc/environment

     在第一行的PASH最后加上java的bin路径。

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-7oracle/bin”

     在PATH的后面加上下面三行

CLASSPATH="/usr/lib/jvm/java-7-oracle/lib”
JAVA_HOME="/usr/lib/jvm/java-7-oracle”
JRE_HOME="/usr/lib/jvm/java-7-oracle/jre”

     告诉系统,我们使用的sun的JDK,而非OpenJDK了

root@m1:/home/hadoop# sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-7-oracle/bin/java 300
root@m1:/home/hadoop# sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java-7-oracle/bin/javac 300
root@m1:/home/hadoop# sudo update-alternatives --config java

     这时会有几个选项,如选择2,然后再执行java -version就可以看到最新版本
    

2.2 用parallels克隆3台机器

    1)在parallels的硬件网络中选择如下所示,这个时候这个ping www.163.com就会ping通了

    2)点击Parallels左上角=》文件=》克隆,克隆三台虚拟机名字分别命名为:m2,s1,s2(克隆前要先停止虚拟机)
    执行sudo vi /etc/hostname ,修改各自的主机名称,如果生效需要重启。
    在m1、m2、s1、s2上分别执行ifconfig查看被分配到的IP地址,然后执行sudo vi /etc/hosts,然后执行”sudo /etc/init.d/networking restart”生效
    
    3)配置shhd无验证登录(我使用的是root帐号)
    安装SSH工具,(如果默认执行ssh存在,就不用安装了)

root@m1:/home/hadoop# sudo apt-get install ssh openssh-server

    在每台机器分别输入ssh-keygen,一路回车,然后会在用户的.ssh目录生成id_rsa和id_rsa.pub文件。
    在m1上执行:

root@m1:/home/hadoop# scp -r root@m2:/root/.ssh/id_rsa.pub ~/.ssh/m2.pub
root@m1:/home/hadoop# scp -r root@s1:/root/.ssh/id_rsa.pub ~/.ssh/s1.pub
root@m1:/home/hadoop# scp -r root@s2:/root/.ssh/id_rsa.pub ~/.ssh/s2.pub
root@m1:/home/hadoop# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/m2.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/s1.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/s2.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@m2:~/.ssh/
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s1:~/.ssh/
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s2:~/.ssh/

2.3 安装Zookeeper-3.4.5

    1)配置zoo.cfg(默认是没有zoo.cfg,将zoo_sample.cfg复制一份,并命名为zoo.cfg)

root@m1:/home/hadoop/zookeeper-3.4.5/conf# vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/home/hadoop/zookeeper-3.4.5/data
dataLogDir=/home/hadoop/zookeeper-3.4.5/logs
server.1=m1:2888:3888
server.2=m2:2888:3888
server.3=s1:2888:3888
server.4=s2:2888:3888
# the port at which the clients will connect
clientPort=2181
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

    2)将zookeeper从m1复制到m2,s1,s2机器上

root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@m2:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s1:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s2:/home/hadoop

    3)在m1,m2,s1,s2机器上,的/home/hadoop/zookeeper-3.4.5/dataDir目录下创建 myid文件,内容为在zoo.cfg中配置的server.后面的数字,记住只能是数字
    m1为1
    m2为2
    s1为3
    s2为4
    至此,zookeeper的配置结束。

2.4 安装hadoop2.2.0

    修改以下7个配置文件:
    1)/home/hadoop/hadoop-2.2.0/etc/hadoop/hadoop-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
#export JAVA_HOME=${JAVA_HOME}

    2)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

    3)/home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
       <property>
              <name>dfs.nameservices</name>
              <value>mycluster</value>
       </property>
       <property>
              <name>dfs.ha.namenodes.mycluster</name>
              <value>m1,m2</value>
       </property>
       <property>
              <name>dfs.namenode.rpc-address.mycluster.m1</name>
              <value>m1:9000</value>
       </property>
       <property>
              <name>dfs.namenode.rpc-address.mycluster.m2</name>
              <value>m2:9000</value>
       </property>
       <property>
              <name>dfs.namenode.http-address.mycluster.m1</name>
              <value>m1:50070</value>
       </property>
       <property>
              <name>dfs.namenode.http-address.mycluster.m2</name>
              <value>m2:50070</value>
       </property>
       <property>
              <name>dfs.namenode.shared.edits.dir</name>
              <value>qjournal://m1:8485;m2:8485/mycluster</value>
       </property>
      <property>
          <name>dfs.ha.automatic-failover.enabled.mycluster</name>
        <value>true</value>
    </property>
       <property>
              <name>dfs.client.failover.proxy.provider.mycluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
       </property>
       <property>
              <name>dfs.ha.fencing.methods</name>
              <value>sshfence</value>
       </property>
       <property>
              <name>dfs.ha.fencing.ssh.private-key-files</name>
              <value>/root/.ssh/id_rsa</value>
       </property>
       <property>
              <name>dfs.journalnode.edits.dir</name>
              <value>/home/hadoop/hadoop-2.2.0/tmp/journal</value>
       </property>
       <property>
              <name>dfs.replication</name>
              <value>3</value>
       </property>
       <property>
              <name>dfs.webhdfs.enabled</name>
              <value>true</value>
       </property>
          <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>
</configuration>

    4)/home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
     <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
          <description>Execution framework set to Hadoop YARN.</description>
     </property>
</configuration>

    5)/home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
 </property>
<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
 </property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>m1:2181,m2:2181,s1:2181,s2:2181</value>
 </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/hadoop-2.2.0/tmp</value>
                <description></description>
        </property>
</configuration>

    6)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties-->
       <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
       </property>
       <property>
              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
              <value>org.apache.hadoop.mapred.ShuffleHandler</value>
       </property>
       <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>m1</value>
       </property>
</configuration>

    7)/home/hadoop/hadoop-2.2.0/etc/hadoop/slaves

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi slaves 
s1
s2

    至此,hadoop的配置结束

3. 如何使用

3.1 启动zookeeper

    1)在m1,m2,s1,s2所有机器上执行,下面的代码是在m1上执行的示例:

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower
root@m1:/home/hadoop#

    2)在每台机器上执行下面的命令,可以查看状态,在s1上是leader,其他机器是follower

root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: leader
root@s1:/home/hadoop#

    3)测试zookeeper是否启动成功,看下面第29行高亮处,表示成功。

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh
Connecting to localhost:2181
2014-07-27 00:27:16,621 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-07-27 00:27:16,628 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=m1
2014-07-27 00:27:16,628 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_65
2014-07-27 00:27:16,629 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-07-27 00:27:16,629 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-27 00:27:16,630 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/home/hadoop/zookeeper-3.4.5/bin/../build/classes:/home/hadoop/zookeeper-3.4.5/bin/../build/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/home/hadoop/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/home/hadoop/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../conf:/usr/lib/jvm/java-7-oracle/lib
2014-07-27 00:27:16,630 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=:/usr/local/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-07-27 00:27:16,631 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-07-27 00:27:16,631 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.11.0-15-generic
2014-07-27 00:27:16,633 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2014-07-27 00:27:16,633 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2014-07-27 00:27:16,634 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2014-07-27 00:27:16,636 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@19b1ebe5
Welcome to ZooKeeper!
2014-07-27 00:27:16,672 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-07-27 00:27:16,685 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localhost/127.0.0.1:2181, initiating session
JLine support is enabled
2014-07-27 00:27:16,719 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x147737cd5d30000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1]

    4)在m1上格式化zookeeper,第33行的日志表示创建成功。

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs zkfc -formatZK
14/07/27 00:31:59 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at m1/192.168.1.50:9000
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:host.name=m1
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_65
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-client-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-api-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-site-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-2.2.0/lib/native
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.version=3.11.0-15-generic
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.name=root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=m1:2181,m2:2181,s1:2181,s2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5990054a
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Opening socket connection to server m1/192.168.1.50:2181. Will not attempt to authenticate using SASL (unknown error)
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Socket connection established to m1/192.168.1.50:2181, initiating session
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Session establishment complete on server m1/192.168.1.50:2181, sessionid = 0x147737cd5d30001, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/mycluster already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/mycluster? (Y or N) 14/07/27 00:32:00 INFO ha.ActiveStandbyElector: Session connected.
y
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
14/07/27 00:32:13 INFO zookeeper.ClientCnxn: EventThread shut down
14/07/27 00:32:13 INFO zookeeper.ZooKeeper: Session: 0x147737cd5d30001 closed
root@m1:/home/hadoop#

    5)验证zkfc是否格式化成功,如果多了一个hadoop-ha包就是成功了。

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 1]

3.2 启动JournalNode集群

    1)依次在m1,m2,s1,s2上面执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-journalnode-m1.out
root@m1:/home/hadoop# jps
2884 JournalNode
2553 QuorumPeerMain
2922 Jps
root@m1:/home/hadoop#

    2)格式化集群的一个NameNode(m1),有两种方法,我使用的是第一种
    方法一:

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –format

    方法二:

root@m1:/home/hadoop/hadoop-2.2.0/bin/hdfs namenode -format -clusterId m1

    3)在m1上启动刚才格式化的 namenode

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

    执行命令后,浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态
    
    
  4)在m2机器上,将m1的数据复制到m2上来,在m2上执行

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –bootstrapStandby

    5)启动m2上的namenode,执行命令后

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

    浏览:http://m2:50070/dfshealth.jsp可以看到m2的状态。这个时候在网址上可以发现m1和m2的状态都是standby。
    
    6)启动所有的datanode,在m1上执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemons.sh start datanode
s2: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s2.out
s1: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s1.out
root@m1:/home/hadoop#

    7)启动yarn,在m1上执行以下命令

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-resourcemanager-m1.out
s1: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s1.out
s2: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s2.out
root@m1:/home/hadoop#

    然后浏览:http://m1:8088/cluster, 可以看到效果
    
    8)、启动 ZooKeeperFailoverCotroller,在m1,m2机器上依次执行以下命令,这个时候再浏览50070端口,可以发现m1变成active状态了,而m2还是standby状态

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-zkfc-m1.out
root@m1:/home/hadoop#

    9)、测试HDFS是否可用

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /
Found 2 items
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -mkdir /input  
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /        
Found 3 items
drwxr-xr-x   - root supergroup          0 2014-07-27 01:20 /input
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input      
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -put hadoop.cmd /input
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input            
Found 1 items
-rw-r--r--   3 root supergroup       7530 2014-07-27 01:20 /input/hadoop.cmd
root@m1:/home/hadoop/hadoop-2.2.0/bin#

    10)、测试YARN是否可用,我们来做一个经典的例子,统计刚才放入input下面的hadoop.cmd的单词频率

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hadoop jar /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
14/07/27 01:22:41 INFO client.RMProxy: Connecting to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:43 INFO input.FileInputFormat: Total input paths to process : 1
14/07/27 01:22:44 INFO mapreduce.JobSubmitter: number of splits:1
14/07/27 01:22:44 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/07/27 01:22:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406394452186_0001
14/07/27 01:22:46 INFO impl.YarnClientImpl: Submitted application application_1406394452186_0001 to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:46 INFO mapreduce.Job: The url to track the job: http://m1:8088/proxy/application_1406394452186_0001/
14/07/27 01:22:46 INFO mapreduce.Job: Running job: job_1406394452186_0001
14/07/27 01:23:10 INFO mapreduce.Job: Job job_1406394452186_0001 running in uber mode : false
14/07/27 01:23:10 INFO mapreduce.Job:  map 0% reduce 0%
14/07/27 01:23:31 INFO mapreduce.Job:  map 100% reduce 0%
14/07/27 01:23:48 INFO mapreduce.Job:  map 100% reduce 100%
14/07/27 01:23:48 INFO mapreduce.Job: Job job_1406394452186_0001 completed successfully
14/07/27 01:23:49 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=6574
                FILE: Number of bytes written=175057
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=7628
                HDFS: Number of bytes written=5088
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=18062
                Total time spent by all reduces in occupied slots (ms)=14807
        Map-Reduce Framework
                Map input records=240
                Map output records=827
                Map output bytes=9965
                Map output materialized bytes=6574
                Input split bytes=98
                Combine input records=827
                Combine output records=373
                Reduce input groups=373
                Reduce shuffle bytes=6574
                Reduce input records=373
                Reduce output records=373
                Spilled Records=746
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=335
                CPU time spent (ms)=2960
                Physical memory (bytes) snapshot=270057472
                Virtual memory (bytes) snapshot=1990762496
                Total committed heap usage (bytes)=136450048
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=7530
        File Output Format Counters 
                Bytes Written=5088
root@m1:/home/hadoop/hadoop-2.2.0/bin#

    11)、验证HA的高可用性,故障转移,刚才我们用浏览器打开m1和m2的50070端口,已经看到m1的状态是active,m2的状态是standby
       a)我们在m1上kill掉namenode进程

root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
5492 Jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
3898 NameNode
4075 ResourceManager
root@m1:/home/hadoop/hadoop-2.2.0/bin# kill -9 3898
root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
5627 Jps
root@m1:/home/hadoop/hadoop-2.2.0/bin#

       b)再浏览m1和m2的50070端口,发现m1是打不开,而m2是active状态。
       
    这时候在m2上的HDFS和mapreduce还是可以正常运行的,虽然m1上的namenode进程已经被kill掉,但不影响使用这就是故障转移的优势!
    

3.3 Hbase-0.96.2-hadoop2

启动双HMaster的配置,m1是主HMaster,m2是从HMaster
    1)、修改hbase-env.sh配置,主要修JAVA_HOME的目录,以及HBASE_MANAGES_ZK

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-env.sh
#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.


# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
#这个值为false时,表示启动的是独立的zookeeper。而配置成true则是hbase自带的zookeeper。
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

    2)、修改hbase-site.xml配置

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-site.xml
hbase.rootdirhdfs://mycluster/hbasehbase.cluster.distributedtruehbase.tmp.dir/home/hadoop/hbase-0.96.2-hadoop2/tmphbase.master60000hbase.zookeeper.quorumm1,m2,s1,s2hbase.zookeeper.property.clientPort2181hbase.zookeeper.property.dataDir/home/hadoop/zookeeper-3.4.5/data

    2)、修改regionservers文件
       通常部署master的机器上不就部署slave了,用两台集群做Hbase从服务器

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi regionservers 
s1
s2

    3)、创建hadoop的hdfs-site.xml的软连接到hbase的配置文件目录

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:15 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ln -s /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hdfs-site.xml
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:16 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
lrwxrwxrwx 1 root root    50 Jul 27 09:16 hdfs-site.xml -> /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml*
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf#

    3)、hbase0.96.2版本的jar包不需要复制,官方提供的是已经打包好的

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# ls | grep hadoop
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-client-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-hdfs-2.2.0.jar
hadoop-hdfs-2.2.0-tests.jar
hadoop-mapreduce-client-app-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0-tests.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
hadoop-yarn-server-nodemanager-2.2.0.jar
hbase-client-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2-tests.jar
hbase-examples-0.96.2-hadoop2.jar
hbase-hadoop2-compat-0.96.2-hadoop2.jar
hbase-hadoop-compat-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2-tests.jar
hbase-prefix-tree-0.96.2-hadoop2.jar
hbase-protocol-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2-tests.jar
hbase-shell-0.96.2-hadoop2.jar
hbase-testing-util-0.96.2-hadoop2.jar
hbase-thrift-0.96.2-hadoop2.jar
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib#

    4)、将m1上面的hbase0.96.2复制到m2,s1,s2同样的目录中

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@m2:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s1:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s2:/home/hadoop

    5)、在m1上启动hbase0.96.2,执行命令:

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/start-hbase.sh
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m1.out
s1: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s1.out
s2: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s2.out
root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
7769 Jps
4075 ResourceManager
root@m1:/home/hadoop#

    执行命令后,浏览网址可以看效果:http://m1:60010/master-status
    
    6)、在m1上用shell测试连接hbase

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 09:31:07,601 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):001:0> list
TABLE                                                                                                                                                                                                                  
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 2.8030 seconds
=> []
hbase(main):002:0> version
0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):003:0> status
2 servers, 0 dead, 1.0000 average load
hbase(main):004:0> create 'test_idoall_org','uid','name'
0 row(s) in 0.5800 seconds
=> Hbase::Table - test_idoall_org
hbase(main):005:0> list
TABLE                                                                                                                                                                                                                  
test_idoall_org                                                                                                                                                                                                        
1 row(s) in 0.0320 seconds
=> ["test_idoall_org"]
hbase(main):006:0> put 'test_idoall_org','10086','name:idoall','idoallvalue'
0 row(s) in 0.1090 seconds                 ^
hbase(main):009:0> get 'test_idoall_org','10086'
COLUMN                                                 CELL                                                                                                                                                            
 name:idoall                                           timestamp=1406424831473, value=idoallvalue                                                                                                                      
1 row(s) in 0.0450 seconds
hbase(main):010:0> scan 'test_idoall_org'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 10086                                                 column=name:idoall, timestamp=1406424831473, value=idoallvalue                                                                                                  
1 row(s) in 0.0620 seconds
hbase(main):011:0>

    7)、在m2上启动hbase,同样执行命令:

root@m2:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase-daemon.sh start master
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m2.out
root@m2:/home/hadoop#

    执行命令后,在浏览器打开网址也可以看到m2上的hbase状态:http://m2:60010/master-status
    
    8)、测试m1和m2的主从备份切换
       a)这时在浏览器打开http://m1:60010/master-status和http://m2:60010/master-status,可以看到下图的状态
       b)我们在m1上停止掉hbase的进程

root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
8645 Jps
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
root@m1:/home/hadoop# kill -9 7540
root@m1:/home/hadoop# jps
6688 NameNode
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
8655 HMaster
8719 Jps
root@m1:/home/hadoop#

       再打开网址,会发现m1已经打不开,而m2的hbase集群状态已经被改变
       
    <font color=red>至此,hbase已经配置完,并且主从故障转移是可用的。</font>
    

3.4 在ubuntu12.04的m1上面安装mysql5.5.x

    1)、apt-get install mysql-server mysql-client mysql-common
    过程中会弹出一个界面,让你输入root的密码。我设置的是123456
    安装后可以测试下mysql的连接状态:mysql -uroot -p123456
    可以用service mysql stop/service mysql start来启动和停止mysql状态

    2)、授权可以远程访问mysql

root@m1:/home/hadoop# mysql -uroot -p123456
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 36
Server version: 5.5.22-0ubuntu1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all on *.* to 'root'@'%'  identified by '123456' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

    3)、如果还无法远程连接,打开:vi /etc/mysql/my.cnf。将bind-address=127.0.0.1,改为本机ip,重新启动mysql
    

3.5 hive 0.13.1安装(在m1上操作)

    1)、将apache-hive-0.13.1-bin.tar.gz解压到/home/hadoop/hive-0.13.1

    2)、进入到hive的conf文件,将模板文件复制出对应的配置文件
    

root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-env.sh.template hive-env.sh
root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-default.xml.template hive-site.xml

    3)、修改hive-env.sh文件,主要设置hadoop目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-env.sh
HADOOP_HOME=/home/hadoop/hadoop-2.2.0

    4)、修改hive-site.xml文件

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-site.xml
hive.metastore.warehouse.dirhdfs://mycluster/user/hive/warehouseThe list of zookeeper servers to talk to. This isonly needed for read/write locks.hive.exec.scratchdirhdfs://mycluster/user/hive/scratchdirhive.querylog.location/home/hadoop/hive-0.13.1/logsjavax.jdo.option.ConnectionURLjdbc:mysql://m1:3306/hiveMeta?createDatabaseIfNotExist=truejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driverjavax.jdo.option.ConnectionUserNamerootjavax.jdo.option.ConnectionPassword123456hive.aux.jars.pathfile:///home/hadoop/hive-0.13.1/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hive-h
base-handler-0.13.1.jar,file:///home/hadoop/hive-0.13.1/lib/protobuf-java-2.5.0.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-client-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-common-0.96.2-hadoop2
.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-protocol-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-server-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/zookeeper-3.4.5.jar,file:///home/had
oop/hive-0.13.1/lib/guava-11.0.2.jar,file:///home/hadoop/hive-0.13.1/lib/htrace-core-2.04.jarhive.zookeeper.quorumm1,m2,s1,s2

    5)、hive-site.xml中hive.aux.jars.path配置项包含的jar,hive-hbase-handler-0.13.1.jar和guava-11.0.2.jar是默认就有的,只需要执行以下命令,将其他的从hadoop/zookeeper/hbase中复制过来即可

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/zookeeper-3.4.5/dist-maven/zookeeper-3.4.5.jar /home/hadoop/hive-0.13.1/lib

    6)、mysql的odbc驱动,可以到这里下载http://dev.mysql.com/downloads/connector/j/,解压后,将目录中的mysql-connector-java-5.1.31-bin.jar复制到 /home/hadoop/hive-0.13.1/lib

    7)、创建测试数据,以及在hadoop上创建数据仓库目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi /home/hadoop/hive-0.13.1/testdata001.dat
12306,mname,yname
10086,myidoall,youidoall
/home/hadoop/hadoop-2.2.0/bin/hadoop fs -mkdir -p /user/hive/warehouse

    8)、使用shell命令,测试hive

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive 
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.464 seconds, Fetched: 1 row(s)
hive> create database testidoall;
OK
Time taken: 0.279 seconds
hive>  show databases;
OK
default
testidoall
Time taken: 0.021 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.039 seconds
hive> create external table testtable(uid int,myname string,youname string) row format delimited fields terminated by ',' location '/user/hive/warehouse/testtable';
OK
Time taken: 0.205 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/hive-0.13.1/testdata001.dat' OVERWRITE INTO TABLE testtable;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.testtable
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://mycluster/user/hive/warehouse/testtable
Table testidoall.testtable stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.77 seconds
hive> select * from testtable;
OK
12306   mname   yname
10086   myidoall        youidoall
Time taken: 0.279 seconds, Fetched: 2 row(s)
hive>

    <font color=red>至此,hive已经安装完成。</font>
    
    

3.6 hive to hbase(Hive中的表数据导入到Hbase中去)

    1)、创建hbase可以识别的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.45 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testtable
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive> CREATE TABLE hive2hbase_idoall(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_idoall");
OK
Time taken: 2.332 seconds
hive> show tables;
OK
hive2hbase_idoall
testtable
Time taken: 0.036 seconds, Fetched: 2 row(s)
hive>

    2)、创建本地表,用来存储数据,然后插入到Hbase用的,相当于一张中间表了。同时将之前的测试数据导入到这张中间表。

hive> create table hive2hbase_idoall_middle(foo int,bar string)row format delimited fields terminated by ',';
OK
Time taken: 0.086 seconds
hive> show tables;                                                                                           
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.03 seconds, Fetched: 3 row(s)
hive> load data local inpath '/home/hadoop/hive-0.13.1/testdata001.dat' overwrite into table hive2hbase_idoall_middle;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.hive2hbase_idoall_middle
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://mycluster/user/hive/warehouse/testidoall.db/hive2hbase_idoall_middle
Table testidoall.hive2hbase_idoall_middle stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]
OK
Time taken: 0.683 seconds    
hive>

    3)、将本地中间表(hive2hbase_idoall_middle)导入到表(hive2hbase_idoall)中,会自动同步到hbase。

hive> insert overwrite table hive2hbase_idoall select * from hive2hbase_idoall_middle;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1406394452186_0002, Tracking URL = http://m1:8088/proxy/application_1406394452186_0002/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1406394452186_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-07-27 11:44:11,491 Stage-0 map = 0%,  reduce = 0%
2014-07-27 11:44:22,684 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
MapReduce Total cumulative CPU time: 1 seconds 510 msec
Ended Job = job_1406394452186_0002
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.51 sec   HDFS Read: 288 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 510 msec
OK
Time taken: 25.613 seconds
hive> select * from hive2hbase_idoall;
OK
10086   myidoall
12306   mname
Time taken: 0.179 seconds, Fetched: 2 row(s)
hive> select * from hive2hbase_idoall_middle;
OK
12306   mname
10086   myidoall
Time taken: 0.088 seconds, Fetched: 2 row(s)
hive>

    4)、用shell连接hbase,查看hive过来的数据是否已经存在

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:47:14,454 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> list
TABLE                                                                                                                                                                                                                  
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hive2hbase_idoall                                                                                                                                                                                                      
test_idoall_org                                                                                                                                                                                                        
2 row(s) in 2.9480 seconds

=> ["hive2hbase_idoall", "test_idoall_org"]
hbase(main):002:0> scan "hive2hbase_idoall"
ROW                                                    COLUMN+CELL                                                                                                                                                     
 10086                                                 column=cf1:val, timestamp=1406432660860, value=myidoall                                                                                                         
 12306                                                 column=cf1:val, timestamp=1406432660860, value=mname                                                                                                            
2 row(s) in 0.0540 seconds

hbase(main):003:0> get "hive2hbase_idoall",'12306'
COLUMN                                                 CELL                                                                                                                                                            
 cf1:val                                               timestamp=1406432660860, value=mname                                                                                                                            
1 row(s) in 0.0110 seconds

hbase(main):004:0>

    <font color=red>至此,hive to hbase的测试功能正常。</font>
    
    

3.7 hbase to hive(Hbase中的表数据导入到Hive)

    1)、在hbase下创建表hbase2hive_idoall

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:54:25,844 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> create 'hbase2hive_idoall','gid','info'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 3.4970 seconds

=> Hbase::Table - hbase2hive_idoall
hbase(main):002:0> put 'hbase2hive_idoall','3344520','info:time','20140704'
0 row(s) in 0.1020 seconds

hbase(main):003:0> put 'hbase2hive_idoall','3344520','info:address','HK'
0 row(s) in 0.0090 seconds

hbase(main):004:0> scan 'hbase2hive_idoall'
ROW                                                    COLUMN+CELL                                                                                                                                                    
 3344520                                               column=info:address, timestamp=1406433302317, value=HK                                                                                                          
 3344520                                               column=info:time, timestamp=1406433297567, value=20140704                                                                                                       
1 row(s) in 0.0330 seconds

hbase(main):005:0>

    2)、Hive下创建表连接Hbase中的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.449 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.02 seconds
hive> show tables;
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.026 seconds, Fetched: 3 row(s)
hive> create external table hbase2hive_idoall (key string,gid map)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="info:")  TBLPROPERTIES ("hbase.table.name" = "hbase2hive_idoall");
OK
Time taken: 1.696 seconds
hive> show tables;                       
OK
hbase2hive_idoall
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.034 seconds, Fetched: 4 row(s)
hive> select * from hbase2hive_idoall;
OK
3344520 {"address":"HK","time":"20140704"}
Time taken: 0.701 seconds, Fetched: 1 row(s)
hive>

    <font color=red>至此,如文章标题所描述的ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署,全部测试完毕,过程中也遇到了一些坑,会在常见问题中介绍。希望这个测试笔记可以帮助到更多的人。</font>
    

4. 常见问题

  1、过程中如果在hadoop(namenode/datanode/yarn)、hbase、hive启动出现问题时,一定要用tail -n 100 ***.log仔细查看相关的日志,可以发现很多有用的信息。以下几个命令,也有助于在命令行模式追踪错误。
      1)、hadoop在控制台输出debug信息,执行完以下命令后,可以启动namenode,datanode,yarn测试效果

root@m1:/home/hadoop# export HADOOP_ROOT_LOGGER=DEBUG,console

      2)、hive 在控制台输出debug信息

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive --hiveconf  hive.root.logger=DEBUG,console

  2、mysql在启动时,遇到过job failed to start,可以用以下几个命令,重新安装解决。

rm /var/lib/mysql/ -R
rm /etc/mysql/ -R
apt-get autoremove mysql* —purge
apt-get remove apparmor
apt-get install mysql-server mysql-client mysql-common

  3、dpkg 被中断,您必须手工运行 sudo dpkg –configure -a解决此问题

sudo rm /var/lib/dpkg/updates/*
sudo apt-get update
sudo apt-get upgrade

5. 参考资料

  _00018 Hadoop-2.2.0 + Hbase-0.96.2 + Hive-0.13.1 分布式环境整合,Hadoop-2.X使用HA方式
  Hadoop2.2.0源代码编译
  hadoop2.1.0编译安装教程
  CentOS6.4编译Hadoop2.2.0


博文作者:迦壹
博客地址:ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1伪分布式环境部署
写作时间:2014-08-05 17:41
转载声明:可以转载, 但必须以超链接形式标 明文章原始出处和作者信息及版权声明,谢谢合作!


发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注