-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete PEL when associated guard is deleted #106
base: main
Are you sure you want to change the base?
Conversation
Phosphor-Logging recently made changes to prohibit deletion of PELs which are associated with hardware isolation entries. So, deleting PELs whenever we clear hardware isolation entries. Test Results: Before: root@p10bmc:~# guard -l ID | ERROR | Type | Path 0x00000001 | 0x50002626 | predictive | physical:sys-0/node-0 /ocmb_chip-20/mem_port-1 root@p10bmc:~# peltool -l { "0x50002626": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x50002626", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:50:45", "Sev": "Predictive Error", "CompID": "bmc fans" } } root@p10bmc:~# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | `- /xyz/openbmc_project/logging/entry/6 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:~# guard -r root@p10bmc:~# guard -l No Records to display root@p10bmc:~# peltool -l { "0x50002626": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x50002626", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:50:45", "Sev": "Predictive Error", "CompID": "bmc fans" } } root@p10bmc:~# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | `- /xyz/openbmc_project/logging/entry/6 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager After: Clearing all guards: root@p10bmc:/tmp/test# guard -l ID | ERROR | Type | Path 0x00000001 | 0x50002618 | predictive | physical:sys-0/node-0 /dimm-113 0x00000002 | 0x00000000 | manual | physical:sys-0/node-0/ proc-0/eq-0/fc-1/core-1 root@p10bmc:/tmp/test# peltool -l { "0x50002618": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x50002618", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 09:19:50", "Sev": "Predictive Error", "CompID": "bmc fans" } } root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | `- /xyz/openbmc_project/logging/entry/559 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:/tmp/test# guard -r root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:/tmp/test# peltool -l {} root@p10bmc:/tmp/test# guard -l No Records to display Invalidate single guard: root@p10bmc:/tmp/test# guard -l ID | ERROR | Type | Path 0x00000002 | 0x00000000 | manual | physical:sys-0/node-0 /proc-0/eq-0/fc-1/core-1 0x00000004 | 0x5000261c | predictive | physical:sys-0/node-0 /dimm-113 0x00000005 | 0x5000261d | predictive | physical:sys-0/node-0 /ocmb_chip-30 root@p10bmc:~# peltool -l { "0x5000261C": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x5000261C", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:09:56", "Sev": "Predictive Error", "CompID": "bmc fans" }, "0x5000261D": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x5000261D", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:10:47", "Sev": "Predictive Error", "CompID": "bmc fans" } } root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | |- /xyz/openbmc_project/logging/entry/564 | `- /xyz/openbmc_project/logging/entry/566 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:/tmp/test# guard -i 0x00000004 root@p10bmc:/tmp/test# guard -l ID | ERROR | Type | Path 0x00000002 | 0x00000000 | manual | physical:sys-0/node-0 /proc-0/eq-0/fc-1/core-1 0x00000005 | 0x5000261d | predictive | physical:sys-0/node-0 /ocmb_chip-30 root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | `- /xyz/openbmc_project/logging/entry/566 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:~# peltool -l { "0x5000261D": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x5000261D", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:10:47", "Sev": "Predictive Error", "CompID": "bmc fans" } } Invalidate All records: root@p10bmc:/tmp/test# guard -l ID | ERROR | Type | Path 0x00000002 | 0x00000000 | manual | physical:sys-0/node-0 /proc-0/eq-0/fc-1/core-1 0x00000005 | 0x5000261d | predictive | physical:sys-0/node-0 /ocmb_chip-30 0x00000006 | 0x5000261e | predictive | physical:sys-0/node-0 /ocmb_chip-20/mem_port-1 root@p10bmc:/tmp/test# peltool -l { "0x5000261D": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x5000261D", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:10:47", "Sev": "Predictive Error", "CompID": "bmc fans" }, "0x5000261E": { "SRC": "BD60280E", "Message": "A temperature sensor is under its low warning threshold.", "PLID": "0x5000261E", "CreatorID": "BMC", "Subsystem": "Power/Cooling", "Commit Time": "01/02/2033 10:14:25", "Sev": "Predictive Error", "CompID": "bmc fans" } } root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry | |- /xyz/openbmc_project/logging/entry/566 | `- /xyz/openbmc_project/logging/entry/567 `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:/tmp/test# guard -I root@p10bmc:/tmp/test# guard -l No unresolved records to display root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz `- /xyz/openbmc_project `- /xyz/openbmc_project/logging |- /xyz/openbmc_project/logging/entry `- /xyz/openbmc_project/logging/internal `- /xyz/openbmc_project/logging/internal/manager root@p10bmc:/tmp/test# peltool -l {} Change-Id: Iafe18dc480619ac558a51d42ab9f75c6c11b122f Signed-off-by: SwethaParasa <[email protected]>
@@ -90,7 +91,11 @@ void Entry::resolveEntry(bool clearRecord) | |||
{ | |||
hw_isolation::utils::setEnabledProperty( | |||
_bus, std::get<2>(assoc), true); | |||
break; | |||
} | |||
if (std::get<0>(assoc) == "isolated_hw_errorlog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test with the latest driver, particularly with the PEL changes that prohibit PEL deletion when associated with an HWIsolation entry?
I'm a bit surprised how it's working without erasing the HWIsolation entry, as shown below (LN102), because the association between PEL and HWIsolation is only removed when the HWIsolation DBus entry object is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try with latest and will make the change to delete it after the entry object is deleted.
catch (const sdbusplus::exception::exception& e) | ||
{ | ||
// Don't throw the exception, we should allow the caller to proceed | ||
// further since caller might call this in the failure case. | ||
log<level::ERR>(std::format("D-Bus Exception [{}], failed to delete " | ||
"the error log for ObjectPath [{}] " | ||
"and Interface [{}]", | ||
e.what(), errorLogPath, | ||
type::LoggingDeleteIface) | ||
.c_str()); | ||
} | ||
catch (const std::exception& e) | ||
{ | ||
// Don't throw the exception, we should allow the caller to proceed | ||
// further since caller might call this in the failure case. | ||
log<level::ERR>( | ||
std::format("Exception [{}], failed to delete the error log", | ||
e.what()) | ||
.c_str()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any advantae of catching two different exceptions unless we do different actions, we can combine the same into 1 and catch std::exception and lo the traces from 441 to 446, I know you had followed createErrorLog method.
Phosphor-Logging recently made changes to prohibit deletion of PELs which are associated with hardware isolation entries. So, deleting PELs whenever we clear hardware isolation entries.
Test Results:
Before:
After:
Change-Id: Iafe18dc480619ac558a51d42ab9f75c6c11b122f