如何从集群的pc状态清除失败的操作-IGI

时间：2020-01-09 10:37:40 　来源:igfitidea点击:

在本文中，我将共享命令以从High Availability Pcaemaker群集的" pcs status"输出中清除失败的操作。

当群集中的资源启动失败时，在" pcs状态"中记录了一些失败的动作的情况很多次。即使资源成功启动后，这些失败的动作仍会继续出现在" pcs status"输出中。

因此，在这种情况下，我们可以从"个人电脑状态"中"清除失败的操作"。

问题：从PC状态清除失败的操作消息

下面是我的KVM高可用性群集上的pcs状态的示例输出，这里有两种类型的失败操作

资源操作失败
Fencing操作失败

要检查集群状态：

[root@centos8-2 ~]# pcs status
Cluster name: ha-cluster
Stack: corosync
Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat Jan  2 14:38:27 2017
Last change: Sat Jan  2 14:38:23 2017 by root via cibadmin on centos8-2
3 nodes configured
4 resources configured
Online: [ centos8-2 centos8-3 centos8-4 ]
Full list of resources:
 fence-centos8-3        (stonith:fence_xvm):    Started centos8-3
 fence-centos8-2        (stonith:fence_xvm):    Started centos8-2
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started centos8-4
 fence-centos8-4        (stonith:fence_xvm):    Started centos8-3
Failed Resource Actions:
* fence-centos8-2_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=122, status=Timed Out, exitreason='',
    last-rc-change='Sat Jan  2 14:36:16 2017', queued=1ms, exec=20012ms
* fence-centos8-4_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=124, status=Timed Out, exitreason='',
    last-rc-change='Sat Jan  2 14:36:36 2017', queued=0ms, exec=20011ms
Failed Fencing Actions:
* reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Sat Jan  2 14:37:17 2017'
* reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Fri Jan  1 20:57:41 2017'
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

现在，我的资源和防护资源已成功启动，因此不需要保留这些失败的操作消息。

用于清除"资源"和"隔离"的失败操作的命令是不同的。

解决方案：资源清理失败的操作

要清除"失败的资源操作"下资源的失败操作消息，请使用pcs resource cleanup <resource>。我们可以从"失败的资源操作"消息输出中获取资源名称。

以下是我的"个人电脑状态"的输出

* fence-centos8-2_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=122, status=Timed Out, exitreason='',
    last-rc-change='Sat Jan  2 14:36:16 2017', queued=1ms, exec=20012ms
* fence-centos8-4_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=124, status=Timed Out, exitreason='',
    last-rc-change='Sat Jan  2 14:36:36 2017', queued=0ms, exec=20011ms

这里的资源名称是fence-centos8-2和fence-centos8-4，我们也可以使用pcs资源状态进行检查

因此，为使用fence-centos8-2清理失败的操作消息：

[root@centos8-2 ~]# pcs resource cleanup fence-centos8-2
Cleaned up fence-centos8-2 on centos8-4
Cleaned up fence-centos8-2 on centos8-3
Cleaned up fence-centos8-2 on centos8-2
Waiting for 1 reply from the controller. OK

与清理fence-centos8-2资源的失败操作消息类似

[root@centos8-2 ~]# pcs resource cleanup fence-centos8-4
Cleaned up fence-centos8-4 on centos8-4
Cleaned up fence-centos8-4 on centos8-3
Cleaned up fence-centos8-4 on centos8-2
Waiting for 1 reply from the controller. OK

执行清理后，检查集群状态

[root@centos8-2 ~]# pcs status
Cluster name: ha-cluster
Stack: corosync
Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat Jan  2 14:39:19 2017
Last change: Sat Jan  2 14:39:17 2017 by hacluster via crmd on centos8-4
3 nodes configured
4 resources configured
Online: [ centos8-2 centos8-3 centos8-4 ]
Full list of resources:
 fence-centos8-3        (stonith:fence_xvm):    Started centos8-3
 fence-centos8-2        (stonith:fence_xvm):    Started centos8-2
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started centos8-4
 fence-centos8-4        (stonith:fence_xvm):    Started centos8-3
Failed Fencing Actions:
* reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Sat Jan  2 14:37:17 2017'
* reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Fri Jan  1 20:57:41 2017'
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

所以现在我们没有任何" Failed Resource Actions"，接下来我们将清除Fencing的失败操作消息

解决方案：清理失败的防护措施

现在，" pcs status"仍然显示针对Fencing的失败操作消息，因此要清除针对fencing的失败操作消息，我们将使用pcs stonith历史记录清理<resource>

但是在执行清理之前，我们可以使用pcs stonith历史记录show <resource>检查Fencing Fencing Actions的完整历史记录。

[root@centos8-2 ~]# pcs stonith history show centos8-2
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat Jan  2 14:36:57 2017
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat Jan  2 14:36:37 2017
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat Jan  2 14:36:17 2017
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat Jan  2 14:37:16 2017
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat Jan  2 14:37:17 2017
0 events found

我们可以从" pcs status"的消息输出中获取资源名称。

* reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Sat Jan  2 14:37:17 2017'
* reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3,
    last-failed='Fri Jan  1 20:57:41 2017'

执行清除使用围栏的失败操作消息

[root@centos8-2 ~]# pcs stonith history cleanup centos8-2
cleaning up fencing-history for node centos8-2
0 events found
[root@centos8-2 ~]# pcs stonith history cleanup centos8-4
cleaning up fencing-history for node centos8-4
0 events found

现在使用" pcs status"检查pcaemaker集群状态

[root@centos8-2 ~]# pcs status
Cluster name: ha-cluster
Stack: corosync
Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat Jan  2 14:41:05 2017
Last change: Sat Jan  2 14:39:17 2017 by hacluster via crmd on centos8-4
3 nodes configured
4 resources configured
Online: [ centos8-2 centos8-3 centos8-4 ]
Full list of resources:
 fence-centos8-3        (stonith:fence_xvm):    Started centos8-3
 fence-centos8-2        (stonith:fence_xvm):    Started centos8-2
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started centos8-4
 fence-centos8-4        (stonith:fence_xvm):    Started centos8-3
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

因此，我们没有其他失败的操作消息。

说明：

这只会清除以前遇到的错误。如果pc继续显示更多，则表明故障继续发生，那么我们必须首先调试实际的根本原因。

如何从集群的pc状态清除失败的操作

问题：从PC状态清除失败的操作消息

解决方案：资源清理失败的操作

解决方案：清理失败的防护措施

相关推荐

最近更新

标签

如何从集群的pc状态清除失败的操作

问题：从PC状态清除失败的操作消息

解决方案：资源清理失败的操作

解决方案：清理失败的防护措施

相关推荐

Linux使用命令行选项查找BIOS版本

找出启动Linux系统需要多长时间

如何在OpenSUSE Linux中检查CPU温度

如何测试Linux操作系统对IPv6网络的支持

相关推荐

最近更新

标签