Support for closed markets benchmarks + a little cleaning up #17

kongzii · 2024-02-16T11:28:57Z

Summary by CodeRabbit

New Features
- Enhanced prediction capabilities in benchmarking agents with new methods.
- Added MarketResolution enum for improved prediction evaluation based on market resolution.
- Introduced new enums and fields in market and prediction models for advanced filtering and resolution handling.
Refactor
- Refined prediction and market evaluation logic for clarity and efficiency.
- Simplified prediction object structure in test scenarios.
Tests
- Updated test cases to align with the new prediction and market model changes.

coderabbitai · 2024-02-16T11:29:04Z

Warning

Rate Limit Exceeded

@kongzii has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 1 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.
Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.
Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between 5f7bc81 and 60d6e2c.

Walkthrough

The updates focus on enhancing the prediction market agent tooling, particularly around agent benchmarking and market analysis. Key modifications include the introduction of new methods and properties for more nuanced prediction capabilities and market analysis, the addition of enums for market filtering and resolution, and refinements in how predictions are evaluated and reported. These changes aim to provide more detailed, flexible, and accurate tools for analyzing and interacting with prediction markets.

Changes

File(s)	Summary
`.../benchmark/agents.py` `.../benchmark/benchmark.py`	Replaced `evaluate` with `is_predictable` in `AbstractBenchmarkedAgent`. Added new methods and updated prediction logic in `RandomAgent` and `FixedAgent`. Refactored prediction evaluation, adjusted outcome comparison, simplified ratio computation, and updated expected returns calculation.
`.../benchmark/utils.py`	Added `MarketResolution` enum, validators, and properties to models. Updated functions to handle new fields and logic. Added `filter_` parameter to `get_markets` function.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between b04007c and aa5451b.

Files selected for processing (4)

prediction_market_agent_tooling/benchmark/agents.py (3 hunks)
prediction_market_agent_tooling/benchmark/benchmark.py (7 hunks)
prediction_market_agent_tooling/benchmark/utils.py (7 hunks)
tests/test_benchmark.py (4 hunks)

Additional comments: 7

prediction_market_agent_tooling/benchmark/agents.py (3)

16-20: The introduction of the is_predictable method in place of evaluate is a significant change. Ensure that all subclasses and usages of the previously existing evaluate method have been updated accordingly to prevent any runtime errors.

28-32: The check_and_predict method provides a streamlined way to check predictability before making a prediction. This is a good practice as it encapsulates the predictability check and prediction within a single method, reducing the risk of errors and improving code readability.

58-74: The check_and_predict_restricted method effectively combines the predictability check with the restricted prediction process. This method ensures that predictions adhere to the specified time restrictions, which is crucial for maintaining the integrity of predictions in scenarios with time-sensitive information.

prediction_market_agent_tooling/benchmark/benchmark.py (4)

16-16: The addition of MarketResolution to the import statements indicates the introduction of a new enum to handle market resolution statuses. This change aligns with the PR objectives to refine benchmarking logic based on market resolution status.

272-274: The comparison logic for determining the correct outcome percentage has been updated to use the probable_resolution field from both prediction and market objects. This change is part of the refinement of benchmarking logic based on market resolution status and is a logical update to ensure outcome comparison is aligned with the probable resolution of markets.

289-300: The method _compute_precision_and_recall_percentages now uses the probable_resolution field to calculate precision and recall percentages for yes and no outcomes. This update is part of the benchmarking logic refinement and ensures that the evaluation metrics are accurately calculated based on the probable resolution of markets. The logic appears correct and aligns with the objectives of enhancing the benchmarking process.

352-352: The method _compute_ratio_evaluated_as_answerable now calculates the proportion of predictions evaluated as answerable based on the is_predictable field. This change supports the enhancement of agent predictability by accurately reflecting the proportion of markets where agents could make predictions.

coderabbitai · 2024-02-19T08:17:40Z

prediction_market_agent_tooling/benchmark/benchmark.py

+                    prediction = (
+                        agent.check_and_predict(market_question=market.question)
+                        if not market.is_resolved
+                        else agent.check_and_predict_restricted(
+                            market_question=market.question,
+                            time_restriction_up_to=market.created_time,  # TODO: Add support for resolved_at and any time in between.
+                        )


The logic within get_prediction_result method has been updated to differentiate between resolved and unresolved markets, using check_and_predict for the latter and check_and_predict_restricted for the former. This change is part of enhancing agent predictability in various market conditions. However, there's a TODO comment regarding adding support for resolved_at and any time in between, which suggests incomplete implementation for handling different time restrictions.

Consider implementing the handling for resolved_at and other time restrictions to fully support the new prediction capabilities in restricted or closed market scenarios.

coderabbitai · 2024-02-19T08:17:40Z

prediction_market_agent_tooling/benchmark/benchmark.py

+                    f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]"
+                    if p.is_predictable
                    and p.outcome_prediction  # Is answerable and answered
-                    else "N/A"
-                    if not p.evaluation
-                    and not p.outcome_prediction  # Not evaluated for some reason
-                    else "S"
-                    if p.evaluation
-                    and not p.evaluation.is_predictable  # Skipped (evaluated to be not predictable)
-                    else "F"
-                    if p.evaluation
-                    and p.evaluation.is_predictable
-                    and not p.outcome_prediction  # Failed (no prediction)
-                    else should_not_happen(
-                        f"Unexpected case in get_markets_summary() for {p}."
+                    else (
+                        "S"
+                        if not p.is_predictable  # Skipped (evaluated to be not predictable)
+                        else (
+                            "F"
+                            if p.is_predictable
+                            and not p.outcome_prediction  # Failed (no prediction)
+                            else should_not_happen(
+                                f"Unexpected case in get_markets_summary() for {p}."
+                            )
+                        )


The get_markets_summary method has been updated to include the probable_resolution in the market summary generation. This change enhances the summary information by providing insights into the resolution status of markets. However, the handling of unexpected cases using should_not_happen might not be the best approach for error handling. It's recommended to log unexpected cases or handle them more gracefully to avoid potential crashes or uninformative error messages.

Consider improving the error handling in the get_markets_summary method to log unexpected cases or handle them more gracefully, ensuring the application remains robust and informative in case of errors.

- else should_not_happen(f"Unexpected case in get_markets_summary() for {p}.") + else "Unexpected case" # Log or handle unexpected cases more gracefully

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]"

if p.is_predictable

and p.outcome_prediction # Is answerable and answered

else "N/A"

if not p.evaluation

and not p.outcome_prediction # Not evaluated for some reason

else "S"

if p.evaluation

and not p.evaluation.is_predictable # Skipped (evaluated to be not predictable)

else "F"

if p.evaluation

and p.evaluation.is_predictable

and not p.outcome_prediction # Failed (no prediction)

else should_not_happen(

f"Unexpected case in get_markets_summary() for {p}."

else (

"S"

if not p.is_predictable # Skipped (evaluated to be not predictable)

else (

"F"

if p.is_predictable

and not p.outcome_prediction # Failed (no prediction)

else should_not_happen(

f"Unexpected case in get_markets_summary() for {p}."

)

)

f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]"

if p.is_predictable

and p.outcome_prediction # Is answerable and answered

else (

"S"

if not p.is_predictable # Skipped (evaluated to be not predictable)

else (

"F"

if p.is_predictable

and not p.outcome_prediction # Failed (no prediction)

else "Unexpected case" # Log or handle unexpected cases more gracefully

coderabbitai · 2024-02-19T08:17:40Z

prediction_market_agent_tooling/benchmark/benchmark.py

+            if prediction.outcome_prediction.probable_resolution == MarketResolution.YES
            and market.yes_outcome_price > 0
            else 0
        )
        no_shares = (
            bet_units / 0.5  # market.no_outcome_price
-            if prediction.outcome_prediction.p_yes <= buy_yes_threshold
+            if prediction.outcome_prediction.probable_resolution == MarketResolution.NO


The calculate_expected_returns method has been updated to include logic based on the probable_resolution field. This change is part of the updates to expected returns calculation based on the probable resolution of markets. The logic appears correct and aligns with the objectives of refining the benchmarking process. However, the hardcoded bet units and the assumption of a 50/50 market state might limit the flexibility and accuracy of the expected returns calculation.

Consider adding support for dynamic bet sizes and more accurate share price calculations to improve the flexibility and accuracy of the expected returns calculation.

coderabbitai · 2024-02-19T08:17:40Z