The overall score is not matching with the principles #11

ASC-Competition · 2024-01-02T09:40:43Z

Hi,
I found that some answer with higher overall_socre possessing a lower helpfulness_score in evol_instruct.jsonl dataset which the principle is 100% helpfulness.

for example, the scores of 9th sample in evol_instruct.jsonl dataset is as following:

models	helpfulness	honesty	instruction following	truthfulness	overall score
gpt-3.5-turbo	4	5	4	5	7
llama-2-70b-chat	4	4	5	5	7.5
mpt-30b-chat	3	4	3	5	6.5
vicuna-33b	5	4	4	5	6.5

The answer of vicuna-33b has the highest helpfulness but lowest overall score.

My question is should I pickup the answer with the highest overall score or the highest helpfulness score as a preference anwer, or should I use the mean of the four principles.

Any suggestions will be appriciated, thx.

The text was updated successfully, but these errors were encountered:

lifan-yuan · 2024-01-03T07:43:09Z

Hi,

Thanks for your interest.

The overall and fine-grained scores are annotated in different schemas and thus may not strictly match each other. Specifically, fine-grained scores are annotated according to our hand-written documentation, while overall scores totally rely on GPT-4 itself with the textual critique being the CoT rationale for scoring.

We investigated the effects of both kinds of scores in our paper (See section 4.1) and found that using fine-grained scores was slightly better. But note that the experiments were based on the previous "bugged" version of overall scores (see this issue), and we are not sure if the conclusion in the paper still apply to our updated scores.

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The overall score is not matching with the principles #11

The overall score is not matching with the principles #11

ASC-Competition commented Jan 2, 2024

lifan-yuan commented Jan 3, 2024

The overall score is not matching with the principles #11

The overall score is not matching with the principles #11

Comments

ASC-Competition commented Jan 2, 2024

lifan-yuan commented Jan 3, 2024