Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristic proxy for confidence in agent's predictions #477

Open
gabrielfior opened this issue Sep 20, 2024 · 8 comments
Open

Heuristic proxy for confidence in agent's predictions #477

gabrielfior opened this issue Sep 20, 2024 · 8 comments

Comments

@gabrielfior
Copy link
Contributor

Based on @kongzii suggestion:

-> Divide all agent's predictions into probabilty buckets (deciles), e.g. if an agent gives 65% probability to a market, it goes in the 7th decile.
-> For each decile, we roughly expect the accuracy of it to be equal their decile - i.e., the 7th decile above (60-70%) should have an accuracy of roughly 60-70%.
-> Using the correlation between decile accuracy vs actual accuracy, we can draw a value for the confidence
-> It would also be interesting to use the metrics above to quantify an associated error.

@kongzii
Copy link
Contributor

kongzii commented Sep 20, 2024

-> we can draw a value for the confidence

How do you mean?

@evangriffiths
Copy link
Contributor

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?', not that it should be used to generate a confidence score for a given prediction. Maybe this info could be given to the agent when asking it to generate a confidence score, but I think it still needs to be decided on a per-prediction basis.

@gabrielfior
Copy link
Contributor Author

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?'

That's also my understanding

The question (for this ticket) remains open - how should we define confidence for the agent? Still ask the agent for it, or define using hardcoded rules?

@gabrielfior
Copy link
Contributor Author

Some additional observations
-> From @evangriffiths : "I remember back in the beginning we used the PMAT Benchmark class to generate a bunch of predictions and confidence scores, and we saw that the LLM gave pretty rubbish scores - there was like no correlation between 'confidence' and 'abs difference between estimate_p_yes and manifold/polymarket p_yes'. So we can definitely do better, but it's not obvious how. "
-> From @kongzii : "Another LLM doing the confidence based on research and probability from the first LLM ?"
-> From @gabrielfior - Let's mark this as low priority since we don't have a great idea on how to improve the current status quo.

@kongzii
Copy link
Contributor

kongzii commented Oct 1, 2024

To sum it up, there are multiple parts to this issue:

  1. Evaluation of p_yes of agents
  2. Evaluation of confidence of agents
  3. Getting a better way of drawing confidence

(1) and (2) feels to be easily doable thanks to https://github.com/gnosis/prediction-market-agent-tooling/blob/main/examples/monitor/match_bets_with_langfuse_traces.py, and I'd say that's more than a low priority now given the mixed results of Kelly, wdyt @evangriffiths @gabrielfior ?

@evangriffiths
Copy link
Contributor

@kongzii are you thinking this is another approach for how we can still use KellyBettingStrategy(max_bet_amount=big_number), but mitigate the issue where the agent is incorrectly very confident, and loses all its money? And I guess there's no reason why this couldn't be used in combination with @gabrielfior's max_slippage approach.

My one reservation is that it might be a bit messy in the code - to throw away the confidence returned by the agent, and use this new one, based on this approach. But definitely worth a try

@kongzii
Copy link
Contributor

kongzii commented Oct 1, 2024

No, no, I just meant it as yet another evaluation method. Similarly, as we have accuracy and profitability, we can also have something like:

mean_abs_error = sum(abs(bucket_predicted_probability - bucket_real_probability) for bucket in buckets) / number_of_buckets

agent with the lowest MAE should be the best probability predictor.

@gabrielfior
Copy link
Contributor Author

No, no, I just meant it as yet another evaluation method

Agree with this as scope of the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants