Troubleshooting:重新安装Vertica建库后无法启动

环境:RHEL6.5 + Vertica7.1.0-3

1.故障现象

2.重装集群

3.再次定位

4.解决问题

5.总结

1.故障现象

故障现象:Vertica集群安装成功,但是创建数据库后一直无法up.
具体报错输出如下:

Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize. Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) Node Status: v_wnop_node0001: (DOWN) ERROR: Database did not start cleanly on initiator node! Stopping all nodes

进一步查看vertica日志:

[root@vnode01 v_wnop_node0001_catalog]# tail -f vertica.log 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Txn] <INFO> Found my node (v_wnop_node0001) in the catalog 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Txn] <INFO> Catalog info: version=0x1, number of nodes=1, permanent #=1, K=0 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Txn] <INFO> Catalog info: current epoch=0x1 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Catalog] <INFO> Catalog OID generator updated based on GLOBAL tier catalog 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Init] <INFO> Catalog loaded 2016-09-07 15:19:07.018 unknown:0x7f298bac5700 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /data/verticadb/WNOP/v_wnop_node0001_catalog/spread.conf' 2016-09-07 15:19:07.019 unknown:0x7f298bac5700 [Comms] <INFO> forked spread pid=82427, wrote pidfile /data/verticadb/WNOP/v_wnop_node0001_catalog/spread.pid 2016-09-07 15:19:07.020 unknown:0x7f298bac5700 [Init] <INFO> Listening on port: 5433 2016-09-07 15:19:07.020 unknown:0x7f298bac5700 [Init] <INFO> About to fork 2016-09-07 15:19:07.021 unknown:0x7f298bac5700 [Init] <INFO> About to fork again 2016-09-07 15:19:07.023 unknown:0x7f298bac5700 [Init] <INFO> Completed forking 2016-09-07 15:19:07.023 unknown:0x7f298bac5700 [Init] <INFO> Startup [Connecting to Spread] Connecting to spread 4803 2016-09-07 15:19:37.039 unknown:0x7f298bac5700 [Init] <INFO> Spread daemon does not appear to be running on 192.168.1.105 -- exiting!

可以看到大概是spread进程在尝试连接4803端口时有什么样的问题,似乎spread进程压根没启动成功;
在检查各节点的防火墙和SELinux之后,都是关闭的状态,并未发现问题。

2.重装集群

前期准备脚本和互信,可参考:

重装集群(先彻底删除再安装)

--删除集群 --杀掉vertica相关进程 cluster_run_all_nodes "hostname;ps -ef|grep vertica |grep -v grep|awk '{print $2}'|xargs kill -9" --删除vertica软件 cluster_run_all_nodes "hostname;rpm -e vertica" --杀掉dbadmin相关进程 cluster_run_all_nodes "hostname;ps -ef|grep dbadmin |grep -v grep|awk '{print $2}'|xargs kill -9" --删除之前创建的组和用户 cluster_run_all_nodes "hostname;id dbadmin" cluster_run_all_nodes "hostname;groupdel verticadba" cluster_run_all_nodes "hostname;userdel -r dbadmin" --删除数据存储目录,软件安装目录 cluster_run_all_nodes "hostname;rm -rf /data/verticadb" cluster_run_all_nodes "hostname;rm -rf /opt/vertica" --创建数据存储目录 cluster_run_all_nodes "hostname;mkdir -p /data/verticadb" --安装 --安装软件 cd /usr2 rpm -ivh vertica-7.1.0-3.x86_64.RHEL5.rpm --安装集群 /opt/vertica/sbin/install_vertica -s 192.168.1.105,192.168.1.106,192.168.1.107,192.168.1.108 -r /usr2/vertica-7.1.0-3.x86_64.RHEL5.rpm --failure-threshold=HALT -u dbadmin -p vertica --给数据存储目录赋予权限 cluster_run_all_nodes "hostname;chown -R dbadmin:verticadba /data/verticadb" --建库 admintools建库

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/19bc18ad115ce63a4768e6ad32b2473d.html