hive使用TEZ引擎
说明:
我们知道hive默认计算引擎为MR,而MR的特性决定了它的运算速度并不是太让人满意。在此将其运算引擎修改为Tez(当然在此需要先安装Tez),Tez基于内存的计算使得hive可以有更高的运算效率。
tez-site.xml 参数官方文档
https://tez.apache.org/releases/0.9.2/tez-api-javadocs/configs/TezConfiguration.html
hadoop:2.7.2
tez:0.9.2
hive:1.2.0
tomcat:8.0.1
jdk1.8.0_144
下载tez的依赖包:http://tez.apache.org
解压缩
apache-tez-0.9.2-bin
配置TEZ引擎
hive添加配置
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HIVE_CONF_DIR=/home/hadoop/apache-hive-2.3.9-bin/conf
export TEZ_HOME=/home/hadoop/apache-tez-0.9.2-bin
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export TEZ_JARS=${TEZ_JARS:1}
export HIVE_AUX_JARS_PATH=$TEZ_JARS
新增tez-site.xml配置文件,并添加软连接到etc/hadoop路径下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<description>Enable Tez to use the Timeline Server for History Logging</description>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
</configuration>
[hadoop@n1 hadoop]$ ln -s /home/hadoop/apache-hive-1.2.0-bin/conf/tez-site.xml /home/hadoop/hadoop-2.7.2/etc/hadoop/
yarn-site.xml增加关闭虚拟内存检查
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
mapred-site.xml修改或增加(可以不修改,修改后提交mr任务会报错)
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
上传TEZ引擎压缩包到hdfs,位置为tez-site.xml中的tez.lib.uris
[hadoop@n1 apache-hive-1.2.0-bin]$ hadoop fs -put apache-tez-0.9.2-bin.tar.gz /user/tez/
启动hive并测试
启动metastore
nohup bin/hive --service metastore -p 9083 &
启动hiveserver2
nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port 50000 &
进入beeline
beeline -u jdbc:hive2://n1:50000 -n hadoop
启动时正确加载tez引擎包
查看当前默认引擎
set hive.execution.engine;
测试
插入一条数据出现一下信息则成功
设置tez-ui(根据需要进行配置)
安装tomcat
https://archive.apache.org/dist/tomcat/
把tez目录下tez-ui-0.9.2.war 复制到tomcat webapp/tez-ui目录下并解压,修改配置文件
configs.js
去掉注释并修改相应地址
yarn-site.xml增加开启timelineserver
<!-- conf timeline server START -->
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>n1</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name> yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<description>Address for the Timeline server to start the RPC server.</description>
<name>yarn.timeline-service.address</name>
<value>n1:10201</value>
</property>
<property>
<description>The http address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.address</name>
<value>n1:8188</value>
</property>
<property>
<description>The https address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.https.address</name>
<value>n1:2191</value>
</property>
<property>
<name>yarn.timeline-service.handler-thread-count</name>
<value>24</value>
</property>
<!-- conf timeline server END -->
修改tez配置文件 增加配置
<!--Configuring Tez to use YARN Timeline-->
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.tez-ui.history-url.base</name>
<value>http://n1:8880/tez-ui/</value>
</property>
重启yarn,并启动timelineserver
[hadoop@n1 ~]$ stop-yarn.sh
[hadoop@n1 ~]$ start-yarn.sh
[hadoop@n1 ~]$ yarn-daemon.sh start timelineserver
启动tomcat
[hadoop@n1 apache-tomcat-8.0.1]$ bin/startup.sh
测试:
执行hive查询,访问ui地址
访问地址
遇到的问题
把namenode机器的hosts中加上hadoop
=========================================================================
解决方法:
在 hive-site.xml 里增加以下配置。
<property>
<name>hive.tez.container.size</name>
<value>1024</value>
</property>
=========================================================================
hive的
mapreduce.task.io.sort.mb