hive使用TEZ引擎

说明:

我们知道hive默认计算引擎为MR,而MR的特性决定了它的运算速度并不是太让人满意。在此将其运算引擎修改为Tez(当然在此需要先安装Tez),Tez基于内存的计算使得hive可以有更高的运算效率。

tez-site.xml 参数官方文档

https://tez.apache.org/releases/0.9.2/tez-api-javadocs/configs/TezConfiguration.html

hadoop:2.7.2
tez:0.9.2
hive:1.2.0
tomcat:8.0.1
jdk1.8.0_144

下载tez的依赖包:http://tez.apache.org

解压缩

apache-tez-0.9.2-bin

配置TEZ引擎

hive添加配置

<property>
  <name>hive.execution.engine</name>
  <value>tez</value>
</property>
export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HIVE_CONF_DIR=/home/hadoop/apache-hive-2.3.9-bin/conf
export TEZ_HOME=/home/hadoop/apache-tez-0.9.2-bin
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export TEZ_JARS=${TEZ_JARS:1}
export HIVE_AUX_JARS_PATH=$TEZ_JARS

新增tez-site.xml配置文件,并添加软连接到etc/hadoop路径下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
	<name>tez.lib.uris</name>
 <value>${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin.tar.gz</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>true</value>
</property>
<property>
     <name>tez.history.logging.service.class</name>
     <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
		<description>Enable Tez to use the Timeline Server for History Logging</description>
		<name>tez.history.logging.service.class</name>
		<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
	</property>
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
</configuration>
[hadoop@n1 hadoop]$ ln -s /home/hadoop/apache-hive-1.2.0-bin/conf/tez-site.xml /home/hadoop/hadoop-2.7.2/etc/hadoop/

yarn-site.xml增加关闭虚拟内存检查

<property>
	<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>

mapred-site.xml修改或增加(可以不修改,修改后提交mr任务会报错)

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
</property>

上传TEZ引擎压缩包到hdfs,位置为tez-site.xml中的tez.lib.uris

[hadoop@n1 apache-hive-1.2.0-bin]$ hadoop fs -put apache-tez-0.9.2-bin.tar.gz /user/tez/

启动hive并测试

启动metastore

nohup bin/hive --service metastore -p 9083 &

启动hiveserver2

nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port 50000 &

进入beeline

beeline -u jdbc:hive2://n1:50000 -n hadoop

启动时正确加载tez引擎包

查看当前默认引擎

set hive.execution.engine;

测试

插入一条数据出现一下信息则成功

设置tez-ui(根据需要进行配置)

  1. 安装tomcat

https://archive.apache.org/dist/tomcat/

把tez目录下tez-ui-0.9.2.war 复制到tomcat webapp/tez-ui目录下并解压,修改配置文件

configs.js

去掉注释并修改相应地址

  1. yarn-site.xml增加开启timelineserver

<!-- conf timeline server START -->
   <property>
        <name>yarn.timeline-service.enabled</name>
        <value>true</value>
   </property>
   <property>
        <name>yarn.timeline-service.hostname</name>
        <value>n1</value>
   </property>
   <property>
        <name>yarn.timeline-service.http-cross-origin.enabled</name>
        <value>true</value>
   </property>
   <property>
        <name> yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>true</value>
   </property>
   <property>
        <name>yarn.timeline-service.generic-application-history.enabled</name>
        <value>true</value>
   </property>
   <property>
        <description>Address for the Timeline server to start the RPC server.</description>
        <name>yarn.timeline-service.address</name>
        <value>n1:10201</value>
   </property>
   <property>
        <description>The http address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.address</name>
        <value>n1:8188</value>
   </property>
   <property>
        <description>The https address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.https.address</name>
        <value>n1:2191</value>
   </property>
   <property>
        <name>yarn.timeline-service.handler-thread-count</name>
        <value>24</value>
   </property>
<!-- conf timeline server END -->
  1. 修改tez配置文件 增加配置

<!--Configuring Tez to use YARN Timeline-->
    <property>
        <name>tez.history.logging.service.class</name>
        <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
    </property>
    <property>
        <name>tez.tez-ui.history-url.base</name>
        <value>http://n1:8880/tez-ui/</value>
    </property>
  1. 重启yarn,并启动timelineserver

[hadoop@n1 ~]$ stop-yarn.sh
[hadoop@n1 ~]$ start-yarn.sh
[hadoop@n1 ~]$ yarn-daemon.sh start timelineserver
  1. 启动tomcat

[hadoop@n1 apache-tomcat-8.0.1]$ bin/startup.sh

测试:

执行hive查询,访问ui地址

访问地址

http://n1:8080/tez-ui/

遇到的问题

把namenode机器的hosts中加上hadoop

=========================================================================

解决方法:

在 hive-site.xml 里增加以下配置。

<property>
    <name>hive.tez.container.size</name>
    <value>1024</value>
</property>

=========================================================================

hive的

mapreduce.task.io.sort.mb


hive使用TEZ引擎
https://www.hechunyu.com/archives/1698222321347
作者
chunyu
发布于
2022年08月25日
许可协议