Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally filter out invalid markets #482

Merged
merged 3 commits into from
Oct 10, 2024
Merged

Optionally filter out invalid markets #482

merged 3 commits into from
Oct 10, 2024

Conversation

kongzii
Copy link
Contributor

@kongzii kongzii commented Oct 9, 2024

We have two functions that seem similar (I was thinking about merging them first), but serve a bit different purpose:

  • is_invalid - this evaluates if the question should be resolved as invalid in the end; theoretically, we can have agent that doesn't care about it, e.g. this one Create "Invalid agent" prediction-market-agent#505
  • is_predictable_binary - this evaluates if the question is predictable by the agent

To show the difference:

Will X be alive at DD.MM.YYYY? --> this is predictable, but it's invalid because it's violent
What will be the price of GNO at DD.MM.YYYY? -> this isn't predictable because our agents support binary markets only, but it's a valid question

Copy link

coderabbitai bot commented Oct 9, 2024

Walkthrough

The changes introduce a new boolean attribute allow_invalid_questions to the DeployableTraderAgent class in agent.py, which controls the validation of market questions. The verify_market method has been updated to utilize this attribute alongside a new is_invalid function defined in is_invalid.py. This function assesses the validity of questions in the context of a prediction market. Additionally, a new constant LLM_SEED is added in utils.py, and a test file for the is_invalid function is created in test_is_invalid.py.

Changes

File Path Change Summary
prediction_market_agent_tooling/deploy/agent.py Added allow_invalid_questions: bool = False to DeployableTraderAgent. Updated verify_market method to check question validity. Added import for is_invalid.
prediction_market_agent_tooling/tools/is_invalid.py Introduced is_invalid function to evaluate question validity with caching and retrying decorators.
prediction_market_agent_tooling/tools/utils.py Added constant LLM_SEED = 0. Updated comments for clarity and enhanced documentation for some functions.
tests/tools/test_is_invalid.py Created new test file with parameterized tests for is_invalid function, skipping based on RUN_PAID_TESTS.

Possibly related PRs

  • 383 move rebalancing logic involving previous positions to strategy #387: This PR modifies the agent.py file, which is directly related to the main PR's changes in the same file, specifically regarding the handling of market processing and the introduction of new methods that may interact with the DeployableTraderAgent class.
  • Process invalid answers #333: Although this PR primarily focuses on the omen_contracts.py file, it introduces new methods for handling answers, which could be relevant to the overall functionality of the prediction market system that the main PR is also a part of.
  • Fix kelly bet if token pool is exactly balanced #408: This PR addresses the kelly_criterion.py file, which may relate to the overall betting strategies and logic that the main PR's changes in agent.py could impact, especially in terms of how bets are processed and validated.

Suggested reviewers

  • evangriffiths

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1b736f0 and 282f4e2.

📒 Files selected for processing (1)
  • prediction_market_agent_tooling/tools/is_invalid.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (4)
prediction_market_agent_tooling/tools/is_invalid.py (4)

1-16: LGTM: Imports are appropriate and well-organized.

The imports are relevant to the functionality of the file, including necessary modules for caching, logging, API key management, and language model interactions.


1-92: Overall assessment: Good implementation with minor improvements needed.

The is_invalid function is well-implemented and achieves its purpose of evaluating the validity of questions in a prediction market context. The use of caching, retry mechanisms, and error handling demonstrates good software engineering practices.

To improve the code:

  1. Fix the spelling errors in the QUESTION_IS_INVALID_PROMPT constant.
  2. Update the ChatOpenAI initialization parameters.
  3. Reconsider the error handling for the case when langchain is not installed.

Once these issues are addressed, the implementation will be more robust and maintainable.


18-53: ⚠️ Potential issue

Correct spelling errors and consider prompt improvements.

The QUESTION_IS_INVALID_PROMPT constant provides comprehensive guidelines for determining invalid questions. However, there are a few spelling errors that should be corrected:

  1. Line 36: "George Flyod" should be "George Floyd"
  2. Line 23: "assatination" should be "assassination"
  3. Lines 24-25: "violant" should be "violent"
  4. Line 25: "ilegally" should be "illegally"

Additionally, consider the following improvements:

  • Line 44: Clarify what you mean by "write the parts of the following question"
  • Line 62: The emoji "🥶" doesn't seem to serve a purpose and should be removed

Apply the following diff to fix the spelling errors:

-  - Valid: Will Derek Chauvin go to jail for the murder of George Flyod? (It's unlikely that the jurors would collude to make a wrong verdict in order to win this market).
+  - Valid: Will Derek Chauvin go to jail for the murder of George Floyd? (It's unlikely that the jurors would collude to make a wrong verdict in order to win this market).
- The market's question is about immoral violence, dead or assatination.
- The violant event can be caused by a single conscious being.
- The violant event is done ilegally.
+ The market's question is about immoral violence, death or assassination.
+ The violent event can be caused by a single conscious being.
+ The violent event is done illegally.

56-92: ⚠️ Potential issue

Update ChatOpenAI initialization and consider error handling improvements.

The is_invalid function is well-structured and includes good practices like caching and retry mechanisms. However, there are a few points to address:

  1. The ChatOpenAI initialization includes incorrect parameter names:

    • Use model_name instead of model
    • Use openai_api_key instead of api_key
    • The seed parameter is not supported by ChatOpenAI
  2. The error handling for the langchain import could be improved:

    • Instead of returning True (indicating the question is invalid) when langchain is not installed, consider raising an exception or returning a value that indicates the check could not be performed.

Apply this diff to correct the ChatOpenAI initialization:

     llm = ChatOpenAI(
-        model=engine,
+        model_name=engine,
         temperature=temperature,
-        seed=seed,
-        api_key=APIKeys().openai_api_key_secretstr_v1,
+        openai_api_key=APIKeys().openai_api_key_secretstr_v1,
     )

For the error handling, consider replacing:

    except ImportError:
        logger.error("langchain not installed, skipping is_invalid")
        return True

with:

    except ImportError:
        logger.error("langchain not installed, unable to perform validity check")
        raise RuntimeError("Required dependencies not installed")

This change will make it clear that the validity check couldn't be performed, rather than potentially misclassifying questions as invalid.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.


@pytest.mark.skipif(not RUN_PAID_TESTS, reason="This test costs money to run.")
@pytest.mark.parametrize(
"question, invalid",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay
Screenshot by Dropbox Capture

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)
tests/tools/test_is_invalid.py (2)

7-81: LGTM: Well-structured test function with appropriate decorators.

The test function is well-designed with the use of skipif for cost management and parametrize for multiple test cases. This structure allows for comprehensive testing of the is_invalid function across various scenarios.

Consider improving the error message in the assertion:

- assert (
-     is_invalid(question=question) == invalid
- ), f"Question is not evaluated correctly."
+ assert (
+     is_invalid(question=question) == invalid
+ ), f"Question '{question}' is not evaluated correctly. Expected: {invalid}, Got: {is_invalid(question=question)}"

This change will provide more context in case of test failures, making debugging easier.

🧰 Tools
🪛 Ruff

81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)


9-77: LGTM: Comprehensive set of test cases covering various scenarios.

The test cases provide a good coverage of different question types and edge cases. The comments on some test cases are helpful in understanding the reasoning behind the expected outcomes.

Consider the following suggestions to further improve the test cases:

  1. Group similar test cases together (e.g., time-related questions, violence-related questions) for better organization.
  2. Add more edge cases, such as questions with ambiguous wording or borderline validity.
  3. Include test cases for non-English questions to ensure language independence.

Example of grouping similar test cases:

@pytest.mark.parametrize(
    "question, invalid",
    [
        # Time-related questions
        ("Russian nuke in space?", False),
        ("Russian nuke in space by March 31?", False),
        ("Russian nuke in space in 2024?", False),
        ("Russian nuke in space in 24 days?", True),
        
        # Violence-related questions
        ("Will Hezbollah leader Hassan Nasrallah be confirmed dead by 6 October 2024?", True),
        ("Will Netanyahu die before the end of October 7th 2024", True),
        ("Will Travis James Mullis be executed in Texas on 1 October 2024?", False),
        ("Will the second nitrogen gas execution in Alabama be carried out by 5 October 2024?", False),
        ("Will someone beat me?", True),
        
        # Other categories...
    ]
)

This organization will make it easier to maintain and expand the test suite in the future.

prediction_market_agent_tooling/tools/utils.py (2)

25-26: Improved documentation for LLM_SUPER_LOW_TEMPERATURE

The updated comment provides valuable context by referencing a specific experiment. This helps developers understand when to use this constant versus a zero value.

Consider adding a brief summary of the experiment's findings directly in the comment to save developers time. For example:

# Experiment (https://github.com/gnosis/prediction-market-agent/pull/438) showed that using LLM_SUPER_LOW_TEMPERATURE produces more consistent results than 0 for gpt-4-turbo.

27-28: Addition of LLM_SEED for consistent results

The introduction of LLM_SEED is a good practice for ensuring reproducibility in LLM-based operations. The comment clearly explains its purpose.

Consider using a more unique seed value instead of 0. While 0 is a valid seed, using a more distinctive number (e.g., a project-specific constant or a hash of a string) could help avoid potential conflicts with other systems that might use 0 as a default seed. For example:

LLM_SEED = 48273  # Project-specific seed for consistent results

Also, it might be beneficial to add a comment about whether this seed should be configurable in production environments.

prediction_market_agent_tooling/deploy/agent.py (2)

299-299: LGTM: New attribute added with appropriate default value

The new allow_invalid_questions attribute is added with a default value of False, which is a safe choice. This attribute will control whether invalid questions are allowed in the market verification process.

Consider adding a docstring to explain the purpose and usage of this attribute, especially how it interacts with the verify_market method.


408-410: LGTM: New check for invalid questions added

The new check for invalid questions is correctly implemented, using the allow_invalid_questions attribute and the is_invalid function. This addition aligns with the PR objectives to enhance the decision-making process for agents operating within the prediction market framework.

Consider simplifying the condition for better readability:

-        if not self.allow_invalid_questions and is_invalid(market.question):
-            return False
-
+        return self.allow_invalid_questions or not is_invalid(market.question)

This change combines the new check with the existing return statement, making the method more concise and easier to understand at a glance.

🧰 Tools
🪛 Ruff

408-411: Return the negated condition directly

Inline condition

(SIM103)

prediction_market_agent_tooling/tools/is_invalid.py (1)

67-69: Improve the function docstring for clarity

The current docstring does not accurately describe the function's purpose. Update it to reflect that the function determines if a question is invalid according to predefined criteria.

Suggested docstring:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2335bae and 1b736f0.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !**/*.toml
📒 Files selected for processing (4)
  • prediction_market_agent_tooling/deploy/agent.py (3 hunks)
  • prediction_market_agent_tooling/tools/is_invalid.py (1 hunks)
  • prediction_market_agent_tooling/tools/utils.py (1 hunks)
  • tests/tools/test_is_invalid.py (1 hunks)
🧰 Additional context used
🪛 Ruff
prediction_market_agent_tooling/deploy/agent.py

408-411: Return the negated condition directly

Inline condition

(SIM103)

tests/tools/test_is_invalid.py

81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (8)
tests/tools/test_is_invalid.py (1)

1-4: LGTM: Imports are appropriate and well-structured.

The imports are correctly placed and include all necessary components for the test file.

prediction_market_agent_tooling/tools/utils.py (1)

Line range hint 1-283: Overall assessment of changes in utils.py

The changes in this file, while minor, contribute positively to the project:

  1. The updated comment for LLM_SUPER_LOW_TEMPERATURE provides better context for its usage.
  2. The addition of LLM_SEED improves reproducibility for LLM operations.

These changes align well with the PR objectives of clarifying functionality and improving the decision-making process for agents in the prediction market framework. The improved documentation and focus on consistent results will benefit developers working with this tooling.

prediction_market_agent_tooling/deploy/agent.py (2)

69-69: LGTM: New import added correctly

The new import for the is_invalid function is correctly placed with other imports from the same package. This function will be used to check if a question is invalid in the market verification process.


69-69: Summary: Effective implementation of invalid market filtering

The changes introduced in this PR successfully implement the ability to filter out invalid markets, as per the PR objectives. Here's a summary of the key changes:

  1. The is_invalid function is imported to check for invalid questions.
  2. A new allow_invalid_questions attribute is added to control the filtering behavior.
  3. The verify_market method is updated to use these new components.

These changes enhance the decision-making process for agents by allowing them to avoid participating in markets with invalid questions. The implementation is flexible, as it can be toggled using the allow_invalid_questions attribute.

Overall, these changes improve the robustness of the prediction market agent tooling without introducing any apparent issues or bugs.

Also applies to: 299-299, 408-410

prediction_market_agent_tooling/tools/is_invalid.py (4)

81-81: Verify the attribute for OpenAI API key

Ensure that openai_api_key_secretstr_v1 is the correct attribute to access the OpenAI API key from APIKeys. If the attribute is incorrect, it may cause authentication issues.


61-61: ⚠️ Potential issue

Verify the model name 'gpt-4o'

The default value for the engine parameter is set to 'gpt-4o'. Please verify that 'gpt-4o' is the correct model name. If it's a typo, consider correcting it to 'gpt-4' or the intended model name.


71-72: ⚠️ Potential issue

Correct the import statement for ChatOpenAI

The module langchain_openai does not exist. To import ChatOpenAI, use the correct module path from the LangChain library.

Apply this diff to fix the imports:

Likely invalid or redundant comment.


84-90: ⚠️ Potential issue

Adjust the prompt formatting and LLM invocation

The current approach to formatting the prompt and invoking the LLM may not align with the LangChain API. Update the code to correctly format the prompt and call the LLM to obtain the completion.

Apply this diff to adjust the code:

Note: Ensure that callbacks is the correct parameter based on the return value of get_langfuse_langchain_config().

Likely invalid or redundant comment.

Comment on lines 23 to 25
- The market's question is about immoral violence, dead or assatination.
- The violant event can be caused by a single conscious being.
- The violant event is done ilegally.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct spelling errors in the prompt text

There are several spelling mistakes in the prompt that may affect the clarity of instructions provided to the language model:

  • Line 23: "dead or assatination" should be "death or assassination".
  • Lines 24-25: "The violant event" should be "The violent event".
  • Line 25: "ilegally" should be "illegally".

Apply the following diff to fix the spelling errors:

Committable suggestion was skipped due to low confidence.

Comment on lines +77 to +82
llm = ChatOpenAI(
model=engine,
temperature=temperature,
seed=seed,
api_key=APIKeys().openai_api_key_secretstr_v1,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update ChatOpenAI initialization with correct parameters

The initialization of ChatOpenAI includes incorrect parameter names and unsupported parameters:

  • Use model_name instead of model.
  • The seed parameter is not supported by ChatOpenAI.
  • Use openai_api_key instead of api_key.

Apply this diff to correct the initialization:

Committable suggestion was skipped due to low confidence.

Comment on lines +74 to +75
logger.error("langchain not installed, skipping is_invalid")
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Reconsider handling of ImportError

When the langchain library is not installed, the function logs an error and returns True, indicating the question is invalid. This behavior might not be appropriate, as it could lead to incorrect results. Consider raising an exception or returning a value that indicates the check could not be performed.

Suggestion:

Committable suggestion was skipped due to low confidence.

- Valid: Will Trump win the 2020 US presidential election? (The main source of uncertainty is the vote of US citizens, not a potential murder of a presidential candidate).
- Which could give an incentive only to specific participants to commit an immoral violent action, but are in practice unlikely.
- Valid: Will the US be engaged in a military conflict with a UN member state in 2021? (It’s unlikely for the US to declare war in order to win a bet on this market).
- Valid: Will Derek Chauvin go to jail for the murder of George Flyod? (It’s unlikely that the jurors would collude to make a wrong verdict in order to win this market).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in "George Floyd"

In line 36, correct the spelling of "George Flyod" to "George Floyd".

Apply the following diff:

Committable suggestion was skipped due to low confidence.

@kongzii kongzii mentioned this pull request Oct 9, 2024
Copy link
Contributor

@evangriffiths evangriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

niiice! just a couple typos, but otherwise lgtm

prediction_market_agent_tooling/tools/is_invalid.py Outdated Show resolved Hide resolved
def is_invalid(
question: str,
engine: str = "gpt-4o",
temperature: float = LLM_SUPER_LOW_TEMPERATURE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥶


Follow a chain of thought to evaluate if the question is invalid:

First, write the parts of the following question:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this bit of the prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just giving the LLM more tokens to reason about it. Completion looks like this:

Screenshot by Dropbox Capture

vs. if the question is more complicated

Screenshot by Dropbox Capture

this helped before in the is_predictable_binary function, I just kept it here.

@@ -22,8 +22,10 @@
T = TypeVar("T")

# t=0 is mathematically impossible and it's not clear how OpenAI (and others) handle it, as a result, even with t=0, gpt-4-turbo produces very different outputs,
# it seems that using a very low temperature is the best way to have as consistent outputs as possible: https://community.openai.com/t/why-the-api-output-is-inconsistent-even-after-the-temperature-is-set-to-0/329541/12
# see this experiment to figure out if you should use LLM_SUPER_LOW_TEMPERATURE or just 0: https://github.com/gnosis/prediction-market-agent/pull/438.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would defo be worth making a twitter thread out of!

@evangriffiths evangriffiths self-requested a review October 9, 2024 09:34
@kongzii kongzii merged commit 3ccf31a into main Oct 10, 2024
14 checks passed
@kongzii kongzii deleted the peter/invalidity branch October 10, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants