Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the sensors reading on plugable boards issue (valid sensors detection method proposal) #25

Open
huangalang opened this issue Dec 13, 2022 · 9 comments

Comments

@huangalang
Copy link

As we found that we have to define sensor reading path in config.json
Then buildSensors() function will search all the sensor reading paths defined in config.json
Once they are found , they are added to SensorManager (mgmr.addSensor())
(https://github.com/openbmc/phosphor-pid-control/blob/master/sensors/builder.cpp#L184)

Once pid service started , the added sensors reading are updated if the property changed
=>void DbusPidZone::updateSensors(void)

However, buildSensors() process will fail when any of the defined sensor reading paths is not found.
It will restart over and over

take following case as example : the sensors on a pluggable board
The Sensors path need to be predefined in config.json
But they wont exist , if the board is not plugged
It will let buildSensors() process repeat over and over
=>
(https://github.com/openbmc/phosphor-pid-control/blob/master/sensors/builder.cpp#L99)

We’d like propose a method called valid sensor detection
The main idea is to create validSensors map recording if the sensor reading path is found within a certain time,
Only the valid ones will be added to sensor manager
(https://github.com/openbmc/phosphor-pid-control/blob/master/sensors/builder.cpp#L184)

And this validSensors map will be passed on to buildZones() process

Because only the valid sensors can be added to thermal_inputs of each zone (we use this map to identify the valid sensors)
Through this way , only the valid sensors reading will be updated =>

=========================================

void DbusPidZone::updateSensors(void)
{
for (const auto& t : _thermalInputs) <== only the valid sensors in _thermalInputs
{
auto sensor = _mgr.getSensor(t); <=== only the valid sensors in mgr
ReadReturn r = sensor->read();
}

}

please share your ideas, we are open to further discussion

@Krellan
Copy link
Member

Krellan commented Dec 13, 2022

Here's my thoughts:

  1. It looks like a bug, if buildSensors() is in a tight restart loop. It should not be in a restart loop.

Expected behavior is that it should set all zones with missing sensors to failsafe mode, and continue loading. Zones without missing sensors should function normally. Zones with missing sensors should remain in failsafe mode.

If all of the missing sensors appear at a later time, and the zone is no longer missing any sensors, the zone should come out of failsafe mode, and begin normal operation. I'm hoping the program receives notifications when new sensors are added, so that it doesn't have to be manually restarted in order to search for sensors again. If not, that's a bug.

  1. We should have a "missing is acceptable" flag, for each sensor. Default to false.

If we simply filter out all missing sensors, then we run the risk of overheating the system if an important sensor is missing. We should set the system to failsafe mode, instead. Failsafe mode is designed to be used if any sensors are missing, or not reporting valid data.

Some unimportant sensors may be designated as optional. Those, I'm OK with filtering them out in the updateSensors() loop, as you described. This designation would be a new feature, so that it would not impact existing sensors already in the field.

Alternatively, a compile-time constant could be used, to change all sensors from mandatory to optional, defaulting to false. This would most likely be much easier to implement. By defaulting to false, it would not affect existing systems in the field, so it would not compromise their ability to go into failsafe mode when sensors are missing.

@harveyquta
Copy link
Contributor

I'm not sure if it's a bug or not.
When I was using static json before, all of the sensors in the "inputs" list need to exist on dbus, or the service will start to fail.
But in dbusconfiguration, it is acceptable that the sensors in "inputs" list don't exist on dbus.

@Krellan
Copy link
Member

Krellan commented Dec 13, 2022

A pluggable board should trigger some auto-detection, perhaps using entity-manager JSON probe/exposes files, so that the PID loops are only instantiated when the board is plugged in.

There is no easy solution for this case. Failsafe mode is bad, because there's no need to have the fans spin fast, to cool a board that isn't there. Ignoring missing sensors is bad, because the board could overheat if any sensors on it go bad and disappear.

As for static JSON versus D-Bus, that is a strange behavior. It sounds like a bug to me. It would be great if we could address both of the code paths and clean up any differences in behavior like this, that we find.

@zevweiss
Copy link
Contributor

We should have a "missing is acceptable" flag, for each sensor. Default to false.

Isn't that essentially what "unavailableAsFailed": false in the config provides? (Albeit with an inverted boolean sense.)

As for static JSON versus D-Bus, that is a strange behavior.

Sounds reminiscent of this...

@huangalang
Copy link
Author

huangalang commented Dec 13, 2022

@zevweiss
as we know this flag unavailableAsFailed": false , only works for the case
when the sensors are unavaible on the fly , ex : cpu temp sensors is unavalible when host is powered off
in this case sensor reading path is obtained at buildSensors() , so this unavailableAsFailed will be set
while creating dbus passive sensor object
(https://github.com/openbmc/phosphor-pid-control/blob/master/dbus/dbuspassive.cpp#L83)

but the sensors we are talking about is the sensors on plugable board
they will not be found since the path wont be there from the begining, so the dbuspassive sensor wont be created, and buildSensors() will restart over and over
(https://github.com/openbmc/phosphor-pid-control/blob/master/dbus/dbuspassive.cpp#L73)

@huangalang
Copy link
Author

@Krellan , as you suggested , missing sensors should be devided into two types
1 unacceptable ones, in this case , we should keep the original behaviour , let it restart when it's not found
2. acceptable ones , in this case we may go for the method we proposed =>only add valid ones to _thermalInput
by distinguishing these two types , we may use the flag you adviced "missing is acceptable"

@huangalang
Copy link
Author

harveyquta

how did you get that result?
our experiments shows buildSensors() basically fails, when the sensor path are not found (the board is not plugged)

@harveyquta
Copy link
Contributor

harveyquta commented Dec 15, 2022

When using phosphor-hwmon + statis JSON before, I added DIMM sensors to an input list.

"inputs": [
"dimm0","dimm1","dimm2","dimm3","dimm4","dimm5","dimm6","dimm7","dimm8","dimm9","dimm10","dimm11","dimm12","dimm13","dimm14","dimm15"],

But when one of dimm sensors in this list doesn't exist on dbus(didn't insert DIMM device on board), pid service will start to fail and restart.

@huangalang
Copy link
Author

@harveyquta we use enity-manager +dbus-sensors with phosphor-pid-control , so it's the different story

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants