Running fence_scsi_check_hardreboot consumes CPU. #313

HideoYamauchi · 2019-12-26T00:29:56Z

Hi All,

Configure a cluster using fence_scsi in a virtual environment to which only one CPU core is allocated.

When fence_scsi_check_hardreboot is used together with the watchdog service to configure the pacemaker cluster, fence_scsi_check_hardreboot uses 20% of the CPU every second.

When this happens, pacemaker frequently outputs the following log.

(snip)
12:56:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 2.080000
12:57:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.930000
12:57:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.540000
12:58:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.470000
12:58:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.230000
12:59:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.650000
(snip)

Some improvement can be achieved by increasing the number of CPU cores or increasing the monitoring interval of the watchdog service.
However, some users may not be able to change core assignments. Increasing the monitoring interval also affects the failover time when a failure occurs.

Is there any way to improve the fence_scsi_check_hardreboot script to solve the problem?
(Can make the processing of fence_scsi_check_hardreboot a little lighter?)

Best Regards,
Hideo Yamauchi.

The text was updated successfully, but these errors were encountered:

oalbrigt · 2020-01-15T09:57:10Z

You could try setting verbose=yes to see if you can track down what exactly causes the issue (will be explained in the agent's metadata if it's supported on your installed version of the agent).

HideoYamauchi · 2020-01-16T01:55:04Z

Hi Oyvind,

Our environment is RHEL8.0, and fence_scsi seems to support verbose=yes.

I tried to set verbose=yes in the fence_scsi parameter, but it seems that information is not output especially to pacemaker.log.
Is the information output to other places?

Best Regards,
Hideo Yamauchi.

oalbrigt · 2020-01-16T09:17:44Z

It might also be in corosync.log or /var/log/messages.

If you try to run it manually though it should be shown on your screen immediately.

HideoYamauchi · 2020-01-16T23:34:48Z

Hi Oyvind,

Thanks for your comment.
I'll give it a try.

Best Regards,
Hideo Yamauchi.

HideoYamauchi · 2020-01-22T00:15:57Z

Hi Oyvind,

Since the specification of the verbose option cannot be performed well, I forcibly changed the code of fence_scsi and enabled and executed the verbose, but it did not seem to get much useful information.

(snip)
def scsi_check(hardreboot=False):
        if len(sys.argv) >= 3 and sys.argv[1] == "repair":
                return int(sys.argv[2])
        options = {}
        options["--sg_turs-path"] = "/usr/bin/sg_turs"
        options["--sg_persist-path"] = "/usr/bin/sg_persist"
        options["--power-timeout"] = "5"
        options["retry"] = "0"
        options["retry-sleep"] = "1"
        options = scsi_check_get_options(options)
#       if "verbose" in options and options["verbose"] == "yes":
        logging.getLogger().setLevel(logging.DEBUG)
(snip)


[root@rh80-02 ~]#  /etc/watchdog.d/fence_scsi_check_hardreboot test 
INFO:root:Executing: /usr/bin/sg_turs /dev/sdb

DEBUG:root:0  

INFO:root:Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdb

DEBUG:root:0   PR generation=0x5fb3, 8 registered reservation keys follow:
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000
 

DEBUG:root:key 5e2a0001 registered with device /dev/sdb

Also, it seems that the same high CPU load occurs when using the watchdog service with fence_mpath.

I will investigate the cause a little more.

Best Regards,
Hideo Yamauchi.

oalbrigt · 2020-01-22T08:04:41Z

Maybe there's some watchdog setting for tuning priority of the process?

HideoYamauchi · 2020-01-23T02:27:44Z

Hi Oyvind,

Maybe there's some watchdog setting for tuning priority of the process?

Yes.

In the environment in question, the default settings in /etc/watchdog.conf are as follows:

(snip)
# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 1
(snip)

Many thanks,
Hideo Yamauchi.

oalbrigt · 2020-01-23T15:52:47Z

I would try changing the priority to see if that helps.

HideoYamauchi · 2020-01-23T23:54:25Z

Hi Oyvind,

I would try changing the priority to see if that helps.

I'll give it a try....

But...

I changed the priority to 50 or 99 and restarted the watchdog service, but it seems that the CPU usage of fence_scsi_check_hardreboot does not change.

It seems that you can confirm that the CPU usage rises simply by the following command line.

 /usr/libexec/platform-python -c 'import sys;sys.path.append("/usr/share/fence");import fencing'

I think this improvement seems to be difficult for python import processing.

Best Regards,
Hideo Yamauchi.

oalbrigt · 2020-01-30T13:39:12Z

Yeah. I dont know how we can improve that.

HideoYamauchi · 2020-02-03T00:02:53Z

Hi Oyvind,

I think a little more about improvement.

It may be the right conclusion that this improvement is difficult in Python.
In that case, you will need to dedicate more CPU resources to virtual machines and so on.

Best Regards,
Hideo Yamauchi.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running fence_scsi_check_hardreboot consumes CPU. #313

Running fence_scsi_check_hardreboot consumes CPU. #313

HideoYamauchi commented Dec 26, 2019 •

edited

Loading

oalbrigt commented Jan 15, 2020

HideoYamauchi commented Jan 16, 2020

oalbrigt commented Jan 16, 2020

HideoYamauchi commented Jan 16, 2020

HideoYamauchi commented Jan 22, 2020

oalbrigt commented Jan 22, 2020

HideoYamauchi commented Jan 23, 2020

oalbrigt commented Jan 23, 2020

HideoYamauchi commented Jan 23, 2020 •

edited

Loading

oalbrigt commented Jan 30, 2020

HideoYamauchi commented Feb 3, 2020

Running fence_scsi_check_hardreboot consumes CPU. #313

Running fence_scsi_check_hardreboot consumes CPU. #313

Comments

HideoYamauchi commented Dec 26, 2019 • edited Loading

oalbrigt commented Jan 15, 2020

HideoYamauchi commented Jan 16, 2020

oalbrigt commented Jan 16, 2020

HideoYamauchi commented Jan 16, 2020

HideoYamauchi commented Jan 22, 2020

oalbrigt commented Jan 22, 2020

HideoYamauchi commented Jan 23, 2020

oalbrigt commented Jan 23, 2020

HideoYamauchi commented Jan 23, 2020 • edited Loading

oalbrigt commented Jan 30, 2020

HideoYamauchi commented Feb 3, 2020

HideoYamauchi commented Dec 26, 2019 •

edited

Loading

HideoYamauchi commented Jan 23, 2020 •

edited

Loading