Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running fence_scsi_check_hardreboot consumes CPU. #313

Open
HideoYamauchi opened this issue Dec 26, 2019 · 11 comments
Open

Running fence_scsi_check_hardreboot consumes CPU. #313

HideoYamauchi opened this issue Dec 26, 2019 · 11 comments

Comments

@HideoYamauchi
Copy link
Contributor

HideoYamauchi commented Dec 26, 2019

Hi All,

Configure a cluster using fence_scsi in a virtual environment to which only one CPU core is allocated.

When fence_scsi_check_hardreboot is used together with the watchdog service to configure the pacemaker cluster, fence_scsi_check_hardreboot uses 20% of the CPU every second.

When this happens, pacemaker frequently outputs the following log.

(snip)
12:56:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 2.080000
12:57:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.930000
12:57:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.540000
12:58:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.470000
12:58:45 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.230000
12:59:15 xx pacemaker-controld  [10137] (throttle_check_thresholds)    notice: High CPU load detected: 1.650000
(snip)

Some improvement can be achieved by increasing the number of CPU cores or increasing the monitoring interval of the watchdog service.
However, some users may not be able to change core assignments. Increasing the monitoring interval also affects the failover time when a failure occurs.

Is there any way to improve the fence_scsi_check_hardreboot script to solve the problem?
(Can make the processing of fence_scsi_check_hardreboot a little lighter?)

Best Regards,
Hideo Yamauchi.

@oalbrigt
Copy link
Collaborator

You could try setting verbose=yes to see if you can track down what exactly causes the issue (will be explained in the agent's metadata if it's supported on your installed version of the agent).

@HideoYamauchi
Copy link
Contributor Author

Hi Oyvind,

Our environment is RHEL8.0, and fence_scsi seems to support verbose=yes.

I tried to set verbose=yes in the fence_scsi parameter, but it seems that information is not output especially to pacemaker.log.
Is the information output to other places?

Best Regards,
Hideo Yamauchi.

@oalbrigt
Copy link
Collaborator

It might also be in corosync.log or /var/log/messages.

If you try to run it manually though it should be shown on your screen immediately.

@HideoYamauchi
Copy link
Contributor Author

Hi Oyvind,

Thanks for your comment.
I'll give it a try.

Best Regards,
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor Author

Hi Oyvind,

Since the specification of the verbose option cannot be performed well, I forcibly changed the code of fence_scsi and enabled and executed the verbose, but it did not seem to get much useful information.

(snip)
def scsi_check(hardreboot=False):
        if len(sys.argv) >= 3 and sys.argv[1] == "repair":
                return int(sys.argv[2])
        options = {}
        options["--sg_turs-path"] = "/usr/bin/sg_turs"
        options["--sg_persist-path"] = "/usr/bin/sg_persist"
        options["--power-timeout"] = "5"
        options["retry"] = "0"
        options["retry-sleep"] = "1"
        options = scsi_check_get_options(options)
#       if "verbose" in options and options["verbose"] == "yes":
        logging.getLogger().setLevel(logging.DEBUG)
(snip)


[root@rh80-02 ~]#  /etc/watchdog.d/fence_scsi_check_hardreboot test 
INFO:root:Executing: /usr/bin/sg_turs /dev/sdb

DEBUG:root:0  

INFO:root:Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdb

DEBUG:root:0   PR generation=0x5fb3, 8 registered reservation keys follow:
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000
 

DEBUG:root:key 5e2a0001 registered with device /dev/sdb

Also, it seems that the same high CPU load occurs when using the watchdog service with fence_mpath.

I will investigate the cause a little more.

Best Regards,
Hideo Yamauchi.

@oalbrigt
Copy link
Collaborator

Maybe there's some watchdog setting for tuning priority of the process?

@HideoYamauchi
Copy link
Contributor Author

Hi Oyvind,

Maybe there's some watchdog setting for tuning priority of the process?

Yes.

In the environment in question, the default settings in /etc/watchdog.conf are as follows:

(snip)
# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 1
(snip)

Many thanks,
Hideo Yamauchi.

@oalbrigt
Copy link
Collaborator

I would try changing the priority to see if that helps.

@HideoYamauchi
Copy link
Contributor Author

HideoYamauchi commented Jan 23, 2020

Hi Oyvind,

I would try changing the priority to see if that helps.

I'll give it a try....

But...

I changed the priority to 50 or 99 and restarted the watchdog service, but it seems that the CPU usage of fence_scsi_check_hardreboot does not change.

It seems that you can confirm that the CPU usage rises simply by the following command line.

 /usr/libexec/platform-python -c 'import sys;sys.path.append("/usr/share/fence");import fencing'

I think this improvement seems to be difficult for python import processing.

Best Regards,
Hideo Yamauchi.

@oalbrigt
Copy link
Collaborator

Yeah. I dont know how we can improve that.

@HideoYamauchi
Copy link
Contributor Author

Hi Oyvind,

I think a little more about improvement.

It may be the right conclusion that this improvement is difficult in Python.
In that case, you will need to dedicate more CPU resources to virtual machines and so on.

Best Regards,
Hideo Yamauchi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants