Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete PEL when associated guard is deleted #106

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SwethaParasa
Copy link
Contributor

Phosphor-Logging recently made changes to prohibit deletion of PELs which are associated with hardware isolation entries. So, deleting PELs whenever we clear hardware isolation entries.

Test Results:

Before:

root@p10bmc:~# guard -l
ID         | ERROR      | Type            | Path
0x00000001 | 0x50002626 | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-20/mem_port-1
root@p10bmc:~# peltool -l
{
    "0x50002626": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002626",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:50:45",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:~# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/6
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:~# guard -r
root@p10bmc:~# guard -l
No Records to display
root@p10bmc:~# peltool -l
{
    "0x50002626": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002626",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:50:45",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:~# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/6
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager

After:

Clearing all guards:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000001 | 0x50002618 | predictive      | physical:sys-0/node-0
                                            /dimm-113
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0/
                                            proc-0/eq-0/fc-1/core-1
root@p10bmc:/tmp/test# peltool -l
{
    "0x50002618": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002618",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 09:19:50",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}

root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/559
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -r
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging `- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# peltool -l
{}
root@p10bmc:/tmp/test# guard -l
No Records to display

Invalidate single guard:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000004 | 0x5000261c | predictive      | physical:sys-0/node-0
                                            /dimm-113
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
root@p10bmc:~# peltool -l
{
    "0x5000261C": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261C",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:09:56",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    },
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | |- /xyz/openbmc_project/logging/entry/564
      | `- /xyz/openbmc_project/logging/entry/566
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -i 0x00000004
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/566
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:~# peltool -l
{
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}

Invalidate All records:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
0x00000006 | 0x5000261e | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-20/mem_port-1
root@p10bmc:/tmp/test# peltool -l
{
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    },
    "0x5000261E": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261E",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:14:25",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | |- /xyz/openbmc_project/logging/entry/566
      | `- /xyz/openbmc_project/logging/entry/567
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -I
root@p10bmc:/tmp/test# guard -l
No unresolved records to display
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# peltool -l
{}

Change-Id: Iafe18dc480619ac558a51d42ab9f75c6c11b122f

Phosphor-Logging recently made changes to prohibit deletion of PELs
which are associated with hardware isolation entries. So, deleting
PELs whenever we clear hardware isolation entries.

Test Results:

Before:
root@p10bmc:~# guard -l
ID         | ERROR      | Type            | Path
0x00000001 | 0x50002626 | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-20/mem_port-1
root@p10bmc:~# peltool -l
{
    "0x50002626": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002626",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:50:45",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:~# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/6
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:~# guard -r
root@p10bmc:~# guard -l
No Records to display
root@p10bmc:~# peltool -l
{
    "0x50002626": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002626",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:50:45",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:~# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/6
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager

After:

Clearing all guards:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000001 | 0x50002618 | predictive      | physical:sys-0/node-0
                                            /dimm-113
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0/
                                            proc-0/eq-0/fc-1/core-1
root@p10bmc:/tmp/test# peltool -l
{
    "0x50002618": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x50002618",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 09:19:50",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}

root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/559
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -r
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# peltool -l
{}
root@p10bmc:/tmp/test# guard -l
No Records to display

Invalidate single guard:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000004 | 0x5000261c | predictive      | physical:sys-0/node-0
                                            /dimm-113
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
root@p10bmc:~# peltool -l
{
    "0x5000261C": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261C",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:09:56",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    },
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | |- /xyz/openbmc_project/logging/entry/564
      | `- /xyz/openbmc_project/logging/entry/566
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -i 0x00000004
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | `- /xyz/openbmc_project/logging/entry/566
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:~# peltool -l
{
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}

Invalidate All records:
root@p10bmc:/tmp/test# guard -l
ID         | ERROR      | Type            | Path
0x00000002 | 0x00000000 | manual          | physical:sys-0/node-0
                                            /proc-0/eq-0/fc-1/core-1
0x00000005 | 0x5000261d | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-30
0x00000006 | 0x5000261e | predictive      | physical:sys-0/node-0
                                            /ocmb_chip-20/mem_port-1
root@p10bmc:/tmp/test# peltool -l
{
    "0x5000261D": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261D",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:10:47",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    },
    "0x5000261E": {
        "SRC":                  "BD60280E",
        "Message":              "A temperature sensor is under its low
                                 warning threshold.",
        "PLID":                 "0x5000261E",
        "CreatorID":            "BMC",
        "Subsystem":            "Power/Cooling",
        "Commit Time":          "01/02/2033 10:14:25",
        "Sev":                  "Predictive Error",
        "CompID":               "bmc fans"
    }
}
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      | |- /xyz/openbmc_project/logging/entry/566
      | `- /xyz/openbmc_project/logging/entry/567
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# guard -I
root@p10bmc:/tmp/test# guard -l
No unresolved records to display
root@p10bmc:/tmp/test# busctl tree xyz.openbmc_project.Logging
`- /xyz
  `- /xyz/openbmc_project
    `- /xyz/openbmc_project/logging
      |- /xyz/openbmc_project/logging/entry
      `- /xyz/openbmc_project/logging/internal
        `- /xyz/openbmc_project/logging/internal/manager
root@p10bmc:/tmp/test# peltool -l
{}

Change-Id: Iafe18dc480619ac558a51d42ab9f75c6c11b122f
Signed-off-by: SwethaParasa <[email protected]>
@@ -90,7 +91,11 @@ void Entry::resolveEntry(bool clearRecord)
{
hw_isolation::utils::setEnabledProperty(
_bus, std::get<2>(assoc), true);
break;
}
if (std::get<0>(assoc) == "isolated_hw_errorlog")
Copy link
Collaborator

@RameshIyyar RameshIyyar Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test with the latest driver, particularly with the PEL changes that prohibit PEL deletion when associated with an HWIsolation entry?

I'm a bit surprised how it's working without erasing the HWIsolation entry, as shown below (LN102), because the association between PEL and HWIsolation is only removed when the HWIsolation DBus entry object is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try with latest and will make the change to delete it after the entry object is deleted.

Comment on lines +437 to +457
catch (const sdbusplus::exception::exception& e)
{
// Don't throw the exception, we should allow the caller to proceed
// further since caller might call this in the failure case.
log<level::ERR>(std::format("D-Bus Exception [{}], failed to delete "
"the error log for ObjectPath [{}] "
"and Interface [{}]",
e.what(), errorLogPath,
type::LoggingDeleteIface)
.c_str());
}
catch (const std::exception& e)
{
// Don't throw the exception, we should allow the caller to proceed
// further since caller might call this in the failure case.
log<level::ERR>(
std::format("Exception [{}], failed to delete the error log",
e.what())
.c_str());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any advantae of catching two different exceptions unless we do different actions, we can combine the same into 1 and catch std::exception and lo the traces from 441 to 446, I know you had followed createErrorLog method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants