hadoop2.2.0分布式配置
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Hadoop2.2.0分布式配置
一、系统环境:
IP 账号/主机名功能操作系统
192.168.25.150 hadoop@hadoopm nm/rm/sm red hat enterprise linux 6.0 192.168.25.151 hadoop@hadoopd1 dn/rm red hat enterprise linux 6.0 192.168.25.152 hadoop@hadoopd2 dn/rm red hat enterprise linux 6.0
二、设置HOST:
vi /etc/hosts
192.168.25.150 hadoopm
192.168.25.151 hadoopd1
192.168.25.152 hadoopd2
注释掉localhost等配置:
#127.0.0.1 localhost.localdomain localhost
#::1 localhost6.localdomain6 localhost6
设置好后,将此文件直接覆盖到其它主机对应的文件。
三、设置静态IP:
查看ip:
Ifconfig
设置静态ip:
vi /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME= hadoopm #主机名
DEVICE=eth0 #网卡标志
ONBOOT=yes #是否自动启动
BOOTPROTO=static #是否使用静态IP
IPADDR=192.168.25.150 #当前机器的IP地址
NETMASK=255.255.255.0 #子网掩码
GATEWAY=192.168.25.255 #网关
也可以单独修改具体网卡的ip配置:
vi /etc/sysconfig/network-scripts/ifcfg-etho
DEVICE=eth0 #网卡标志
ONBOOT=yes #是否自动启动
BOOTPROTO=static #是否使用静态IP
IPADDR=192.168.25.150 #当前机器的IP地址
NETMASK=255.255.255.0 #子网掩码
GATEWAY=192.168.25.255 #网关
使配置生效:
/etc/init.d/network restart
一台机器修改完毕后,其它机器按照此方式修改,注意要调整hostname和ip地址为正在被修改机器的对应信息。
四、JDK环境配置
4.1将jdk-7u45-linux-i586.tar.gz拷贝到usr目录下并解压:
tar -xzf jdk-7u45-linux-i586.tar.gz
配置好后,采用如下命令直接复制到其它机器:
scp -r /usr/jdk1.7.0_45 hadoop@192.168.25.151:/usr
scp -r /usr/jdk1.7.0_45 hadoop@192.168.25.152:/usr
4.2修改环境变量:
vi /etc/profile
在此文件的最后面加上如下配置(按键i):
JAVA_HOME=/usr/jdk1.7.0_45
JRE_HOME=/usr/jdk1.7.0_45/jre
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export JRE_HOME
export PATH
export CLASSPATH
保存退出(按键esc然后:wq!保存退出或者:q!直接退出不保存),最后运行下面命令,使配置生效:
source /etc/profile
验证是否成功:
java -version
配置好后,其它机器也采用一样操作即可。
五、SSH无密码登陆
cd /home/hadoop/.ssh #若此目录下没有.ssh目录,可以ssh hadoop@hadoopm下
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
启动生效:
/etc/init.d/sshd restart
按照以上步骤在其它机器上也类似操作。
最后执行完毕后,将三台机器上的此两个文件的三条记录分别合并,合并后,覆盖到三台机器的文件对应位置即可。
(分别将各台机子上的.ssh/id_rsa.pub的内容追加到其他两台的.ssh/authorized_keys) 六、关闭防火墙及SELinux
setup
修改SELinux也可以采用如下方法:
vi /etc/selinux/config
SELINUX=disable
七、安装hadoop
7.1将hadoop-2.2.0.tar.gz上传到/home/hadoop目录下
cd /home/hadoop
tar -zxf hadoop-2.2.0.tar.gz
7.2创建目录:
mkdir dfs
mkdir dfs/name
mkdir dfs/data
mkdir temp
mkdir temp/dfs
mkdir yarn
mkdir yarn/app-logs
mkdir yarn/log
7.3修改配置文件:
cd hadoop-2.2.0/etc/hadoop
7.3.1修改core-site.xml:
vi core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopm:9000</value>
</property>
<property>
<name></name>
<value>hdfs://hadoopm:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/temp</value>
<description>Abase for other temporary directories.</description> </property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
7.3.2修改hadoop-env.sh:
vi hadoop-env.sh
export JAVA_HOME=/usr/jdk1.7.0_45
7.3.3修改hdfs-site.xml:
vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>node.secondary.http-address</name>
<value>hadoopm:9001</value>
</property>
<property>
<name>.dir</name>
<value>file:/home/hadoop/dfs/name</value>
<description> </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--property>
<name>dfs.datanode.du.reserved</name>
<value>1073741824</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.xcievers </name>
<value>4096</value>
</property-->
</configuration>
7.3.4修改mapred-site.xml:
vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name></name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoopm:10020</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hadoopm:54311</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoopm:19888</value>
</property>
</configuration>
7.3.5修改slaves:
vi slaves
hadoopd1
hadoopd2
7.3.6修改yarn-env.sh:
yarn-env.sh
export JAVA_HOME=/usr/jdk1.7.0_45
7.3.7修改yarn-site.xml:
vi yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopm:8031</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopm:8030</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Cap acityScheduler</value>
<description>In case you do not want to use the default
scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopm:8032</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description> </property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoopm:8033</value>
</property>
<!--property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoopm:8088</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/yarn/node</value>
<description>the local directories used by the nodemanager</description> </property>
<property>
<name>yarn.nodemanager.address</name>
<value>hadoopm:8994</value>
<description>the nodemanagers bind to this port</description> </property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>200</value>
<description>the amount of memory on the NodeManager in GB</description> </property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/home/hadoop/yarn/app-logs</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/yarn/node</value>
<description>the directories used by Nodemanagers as log
directories</description>
</property-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
7.4拷贝配置文件到其它机器:
scp -r /home/hadoop/hadoop-2.2.0 hadoop@192.168.25.151:/home/hadoop
scp -r /home/hadoop/dfs hadoop@192.168.25.151:/home/hadoop
scp -r /home/hadoop/temp hadoop@192.168.25.151:/home/hadoop
scp -r /home/hadoop/yarn hadoop@192.168.25.151:/home/hadoop
scp -r /home/hadoop/hadoop-2.2.0 hadoop@192.168.25.152:/home/hadoop
scp -r /home/hadoop/dfs hadoop@192.168.25.152:/home/hadoop
scp -r /home/hadoop/temp hadoop@192.168.25.152:/home/hadoop
scp -r /home/hadoop/yarn hadoop@192.168.25.152:/home/hadoop
7.5在hadoopm主服务器上执行hadoop格式化:
cd /home/hadoop/hadoop-2.2.0
./bin/hdfs namenode –format
7.6运行hadoop:
./sbin/start-all.sh
7.7查看集群状态:
./bin/hdfs dfsadmin –report
查看文件块组成:
./bin/hdfs fsck / -files -blocks
查看HDFS:
http://192.168.25.150:50070
查看RM:
http://192.168.25.150:8088
7.8运行示例程序:
先在hdfs上创建一个文件夹
./bin/hdfs dfs –mkdir /input
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter input。