Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Loop Symptoms on Prefect Server #15607

Open
a14e opened this issue Oct 8, 2024 · 4 comments
Open

Infinite Loop Symptoms on Prefect Server #15607

a14e opened this issue Oct 8, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@a14e
Copy link

a14e commented Oct 8, 2024

Bug summary

I ran into an issue with Prefect while working locally that looks like an endless loop happening on the backend.

I've observed the following behavior locally several times:

  1. After running several dozen manual script executions, the workflows stop displaying correctly in the web UI. Specifically, the green squares on new flows are no longer visible.
  2. At the same time, I see a large stream of logs in the PostgreSQL terminal.
  3. Restarting Prefect resolves the issue.

Logs:

prefect-postgres-1  | 2024-10-07 23:57:37.075 UTC [2893] STATEMENT:  INSERT INTO task_run_state (task_run_id, type, timestamp, name, message, state_details, data, id, created, updated) VALUES ($1::UUID, $2::state_type, $3::TIMESTAMP WITH TIME ZONE, $4::VARCHAR, $5::VARCHAR, $6, $7::JSON, $8::UUID, $9::TIMESTAMP WITH TIME ZONE, $10::TIMESTAMP WITH TIME ZONE) ON CONFLICT (id) DO NOTHING
prefect-postgres-1  | 2024-10-07 23:57:37.082 UTC [2893] ERROR:  duplicate key value violates unique constraint "uq_task_run_state__task_run_id_timestamp_desc"
prefect-postgres-1  | 2024-10-07 23:57:37.082 UTC [2893] DETAIL:  Key (task_run_id, "timestamp")=(3649e07c-9883-42ac-8819-76c821f5f8eb, 2024-10-07 23:45:17.096643+00) already exists.
prefect-postgres-1  | 2024-10-07 23:57:37.082 UTC [2893] STATEMENT:  INSERT INTO task_run_state (task_run_id, type, timestamp, name, message, state_details, data, id, created, updated) VALUES ($1::UUID, $2::state_type, $3::TIMESTAMP WITH TIME ZONE, $4::VARCHAR, $5::VARCHAR, $6, $7::JSON, $8::UUID, $9::TIMESTAMP WITH TIME ZONE, $10::TIMESTAMP WITH TIME ZONE) ON CONFLICT (id) DO NOTHING
....

Version info (prefect version output)

Version:             3.0.4
API version:         0.8.4
Python version:      3.12.6
Git commit:          c068d7e2
Built:               Tue, Oct 1, 2024 11:54 AM
OS/Arch:             linux/x86_64
Profile:             ephemeral
Server type:         server
Pydantic version:    2.9.2

Additional context

PostgreSQL version: 16

@a14e a14e added the bug Something isn't working label Oct 8, 2024
@a14e
Copy link
Author

a14e commented Oct 8, 2024

Here is an example of the error in the UI. You can see that tasks are not displayed, but there are saved artifacts. The Task Runs list is also empty.

Artifacts

image

Task Runs

image

@a14e
Copy link
Author

a14e commented Oct 8, 2024

Workaround:
Execute a command on the PostgreSQL server of Prefect

DROP INDEX IF EXISTS uq_task_run_state__task_run_id_timestamp_desc;
CREATE INDEX uq_task_run_state__task_run_id_timestamp_desc
    ON task_run_state (task_run_id, timestamp DESC);

@desertaxle
Copy link
Member

Thanks for the bug report @a14e! Do you have a flow that you can reliably reproduce this with? I suspect that there's something strange happening with the task run recorder, but it will be easier to confirm with a clear way to reproduce the issue.

@a14e
Copy link
Author

a14e commented Oct 8, 2024

@desertaxle Thank you for your reply!
Yes, I can reproduce the issue, but there is a significant amount of randomness involved. Below, I've provided the flow code, and on my machine, it took around 420 flow runs to reproduce the issue twice (I was running it from the UI).

My flow looks simplified like this:

from prefect import flow, task, serve
from typing import List


class DirectusLoaderTasks:

    @task(log_prints=True, name="Directus. Load Topic")
    async def get_topic_by_id(self, topic_id: str) -> str:
        return "123"

class GrammarDBTasks:

    @task(log_prints=True, name="DB. Load items for topic")
    async def load_all_items_from_db(self, topic_id: str) -> list[str]:
        return []

    @task(log_prints=True, name="DB. Load item groups for topic")
    async def load_all_groups_from_db(self, topic_id: str) -> list[str]:
        return []


@task(log_prints=True, name="Simple filter")
async def simple_filter(from_db: List[str]) -> List[str]:
    return []



class OpenAiTasks:
    @task(name="Open AI. filter3", retries=3)
    async def filter3(self,
                      groups: List[str],
                      topic: str) -> List[str]:
        return []

    @task(name="Open AI. filter1", retries=3)
    async def filter1(self,
                      groups: List[str],
                      topic: str) -> List[str]:

        return []

    @task(name="Open AI. filter2", retries=3)
    async def filter2(self,
                      groups: List[str],
                      groups_in_db: List[int],
                      topic: str) -> List[str]:

        return []

    @task(name="Open AI. Normalize", retries=3)
    async def normalize(self,
                        groups: List[str],
                        topic: str) -> List[str]:
        return []




@flow(log_prints=True,name="Item Generation flow")
async def generate_items_flow():

    directus_tasks = DirectusLoaderTasks()
    grammar_tasks = GrammarDBTasks()
    open_ai_tasks = OpenAiTasks()

    topic_id = await directus_tasks.get_topic_by_id("123")

    item_groups_from_db = await grammar_tasks.load_all_groups_from_db(topic_id)
    items_from_db = await grammar_tasks.load_all_items_from_db(topic_id)
    new_item_groups = await simple_filter(items_from_db)

    deduplicated_response: List[str] = await open_ai_tasks.filter1(
        new_item_groups,
        topic_id
    )

    filtered_response: List[str] = await open_ai_tasks.filter2(
        deduplicated_response,
        item_groups_from_db,
        topic_id
    )

    filtered_response = await open_ai_tasks.filter3(filtered_response,
                                                    topic_id)

    normalize_forms = await open_ai_tasks.normalize(filtered_response,
                                                    topic_id)

    return


if __name__ == "__main__":
    example_deploy = generate_items_flow.to_deployment(
        "Generate Grammar Item Groups",
        tags=["tag1", "tag2", "tag3"]
    )
    serve(example_deploy)

Here’s a video showing what the PostgreSQL logs look like in the terminal:

Recording.2024-10-08.232607.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants