Eval means inputparams #26

bluque · 2017-03-24T21:59:14Z

This script outputs the metrics of each batch of 128 images, but I have added the computation of the mean of these metrics in order to have a global evaluation of the model.

I also propose an improvement over #25 where the model and dataset names are passed as arguments when calling the function. This way, we don't need to modify the script for every new evaluation.
python eval_detection_fscore.py model_name dataset_name weights_file path_to_images

The only reason why I changed the way the model is built when distinguishing between yolo and tiny yolo is so it's easier to add more models in a future (just add another elif model_name == ...), but it's not a relevant change, the result is the same.

lluisgomez · 2017-03-25T09:58:22Z

@bluque Thanks for the pull request!

Overall I like the changes you propose for the input arguments of model and dataset names.

However I do not see the point of the "averaged metrics". See, in the original code what is printed on lines from 125 to 128 is the "running" precision, recall, and f-score. Not the metrics for each batch.

For example the variable "ok" is defined and set to zero on line 66, outside the loop, and never set to zero again. We only increase its value every time we find a correct detection.

The same for variables "total_pred" and "total_true".

So the metrics that are shown are the metrics for all the images evaluated so far. Thus when the script finishes they are the metrics for the whole dataset. right?

On the other hand, be careful with one thing: an averaged metric (per batch) as you propose is not always meaningful. Imagine we evaluate only two "batches" of 128 images each. In the first batch there is only one object in one of the images (all other images contain no objects) and the model we are evaluating misses it, so the recall for this "batch" is zero. Then imagine In the second batch there are 200 objects and the model detects correctly all of them, so the recall for the second batch is 100%. If you do the mean of these two recall values you get a final recall of 50% while the model had correctly detected 200 objects out of 201 :) so final recall must be 99.5%. Do you see the point? We must calculate the average over the total objects in the ground truth, not over the total images or batches.

Please, let me know if this is clear ... I've double checked the code and I think it's correct as it is. Anyway, it's always good to check things that are not clear, and be sure they are correct.

Also I'm open to change the code for example to print "Running precision" instead of "Precision" etc. and then print the final precision at the end when the main loop is finished. Maybe this helps to avoid confusions.

bluque · 2017-03-25T10:06:28Z

Yes, you are right! I didn't check with detail how the metrics where computed because I thought they were related to the batch. In any case, it is true that the average of these metrics wouldn't be precise anyway. I will make the modifications you propose :)

I removed the means of the precision, recall and F score but I did leave the mean of the fps, as this one is computed on every batch independently.

bluque added 2 commits March 24, 2017 22:40

Adding means of metrics + input arguments model & dataset name

794047e

remove blank lines

5ed7646

Remove metrics means

e2f56d8

I removed the means of the precision, recall and F score but I did leave the mean of the fps, as this one is computed on every batch independently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval means inputparams #26

Eval means inputparams #26

bluque commented Mar 24, 2017

lluisgomez commented Mar 25, 2017

bluque commented Mar 25, 2017

Eval means inputparams #26

Are you sure you want to change the base?

Eval means inputparams #26

Conversation

bluque commented Mar 24, 2017

lluisgomez commented Mar 25, 2017

bluque commented Mar 25, 2017