Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate: explain/document metrics #57

Open
bertsky opened this issue Mar 10, 2022 · 1 comment
Open

evaluate: explain/document metrics #57

bertsky opened this issue Mar 10, 2022 · 1 comment

Comments

@bertsky
Copy link
Collaborator

bertsky commented Mar 10, 2022

If I understand correctly the idea behind these metrics are taken from "rethinking semantic segmentation evaluation" paper, but could you explain to me how could I obtain AP,TPs,FPs,FNs for instance segmentation task?

Originally posted by @andreaceruti in cocodataset/cocoapi#564 (comment)

@bertsky
Copy link
Collaborator Author

bertsky commented Mar 10, 2022

Yes, that paper lent the idea for the oversegmentation and undersegmentation measures – but only these two (not the others), and I took the liberty to deviate from the exact definition of Zhang et al. 2021:

# Zhang's idea of attenuating the under/oversegmentation ratio with a "penalty"
# to account for the degree of further sub-segmentation is misguided IMHO,
# because its degree term depends on the total number of segments:
# oversegmentation = np.tanh(oversegmentation * over_degree)
# undersegmentation = np.tanh(undersegmentation * under_degree)

So in my implementation these measures are merely raw ratios, i.e. the share of regions in GT and DT which have been oversegmented (or undersegmented, resp.).

My notion of a match is somewhat arbitrary, but IMO more adequate than averaging over different IoU thresholds for various confidence thresholds:

  • A pair of true vs predicted region is a true positive (TP), iff
    • its IoU is ≥ 50% or
    • its IoGT is ≥ 50% or
    • its IoDT is ≥ 50%.
  • A prediction which is not matched is a false positive (FP).
  • A ground truth which is not matched is a false negative (FN).

(All area values under consideration are numbers of pixels in the polygon-masked segments, not just bounding box sizes.)

So in all, you get the following metrics here:

  • area measures
    • IoU: intersection over union,
      i.e. the share of the overlapping area of a match over the union of the true and the predicted region
    • IoGT: intersection over ground truth,
      i.e. the share of the overlapping area of a match over the total area of the true region
    • IoDT: intersection over detection,
      i.e. the share of the overlapping area of a match over the total area of the predicted region
    • pixel-recall: page-wise aggregate of intersection over GT including missed true regions (FN),
      i.e. the share of the overlapping areas over the total area of true regions in a page
    • pixel-precision: page-wise aggregate of intersection over DT including fake predicted regions (FP),
      i.e. the share of the overlapping areas over the total area of predicted regions in a page
  • segment measures
    • oversegmentation: share of true and predicted regions which have been oversegmented (i.e. where true regions match multiple detections) over all regions
    • undersegmentation: share of true and predicted regions which have been undersegmented (i.e. where predicted regions match multiple ground truths) over all regions
    • recall: ratio of matches (TP) over true regions,
      i.e. share of correctly predicted regions in total GT
    • precision: ratio of matches (TP) over detected regions,
      i.e. share of correctly predicted regions in total DT

For each metric, there is a page-wise (or even segment-wise) and an aggregated measure; the latter always uses micro-averaging over all (matching pairs in all) pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant