HDFS HA自动切换失败

报错信息:

java.lang.RuntimeException: Mismatched address stored in ZK for NameNode at /172.23.6.96:9000: Stored protobuf was nameserviceId: "mycluster"

namenodeId: "nn1"

hdfs-size.xml

<configuration>
         <!-- 完全分布式集群名称 -->
        <property>
                <name>dfs.nameservices</name>
                <value>mycluster</value>
        </property>

        <!-- 集群中NameNode节点都有哪些 -->
        <property>
                <name>dfs.ha.namenodes.mycluster</name>
                <value>nn1,nn2</value>
        </property>

        <!-- nn1的RPC通信地址 -->
        <property>
                <name>dfs.namenode.rpc-address.mycluster.nn1</name>
                <value>172.23.6.96:9000</value>
        </property>

        <!-- nn2的RPC通信地址 -->
        <property>
                <name>dfs.namenode.rpc-address.mycluster.nn2</name>
                <value>172.23.7.1:9000</value>
        </property>

        <!-- nn1的http通信地址 -->
        <property>
                <name>dfs.namenode.http-address.mycluster.nn1</name>
                <value>172.23.6.96:50070</value>
        </property>

        <!-- nn2的http通信地址 -->
        <property>
                <name>dfs.namenode.http-address.mycluster.nn2</name>
                <value>172.23.7.1:50070</value>
        </property>

        <!-- 指定NameNode元数据在JournalNode上的存放位置 -->
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://172.23.6.96:8485;172.23.6.97:8485;172.23.7.1:8485/mycluster</value>
        </property>


        <!-- 声明journalnode服务器存储目录-->
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/home/hadoop/data/hadoop/jn</value>
        </property>

        <!-- 关闭权限检查-->
        <property>
                <name>dfs.permissions.enable</name>
                <value>false</value>
        </property>

        <!-- 访问代理类:client,mycluster,active配置失败自动切换实现方式-->
        <property>
                <name>dfs.client.failover.proxy.provider.mycluster</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
                <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>sshfence</value>
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/home/hadoop/.ssh/id_rsa</value>
        </property>
</configuration>

core-size.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>172.23.6.96:2181,172.23.6.97:2181,172.23.7.1:2181</value>
  </property>
 <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/data/hadoop/tmp</value>
 </property>
</configuration>

配置文件正确,启动后zk中可以查看到/hadoop-ha/mycluster/ActiveBreadCrumb节点信息

但是手动停止处于active状态的namenode时,不会自动切换,zkfc日志报错如上。

zk节点信息:

原因:由于zk中节点注册的信息是域名,只需要在hosts中配置相关域名即可

========================================================================

报错信息:

2022-07-20 10:44:32,301 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPor

t(null) was unsuccessful.

2022-07-20 10:44:32,301 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.

2022-07-20 10:44:32,302 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election

java.lang.RuntimeException: Unable to fence NameNode at n4/10.241.241.111:9000

问题原因:

  1. 由于dfs.ha.fencing.methods参数的value是sshfence,需要使用的fuser命令;所以通过如下命令安装一下即可,两个namenode节点都需要安装

  2. 还有个原因是namenode节点没有配置ssh免密,所有namenode节点都要配置到其他机器的免密

解决方法

安装psmisc

yum -y install psmisc

配置ssh免密


HDFS HA自动切换失败
https://www.hechunyu.com/archives/1698216680857
作者
chunyu
发布于
2023年07月25日
许可协议