Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansible_rulebook.rule_set_runner - ERROR - Error calling action run_playbook #614

Open
3 tasks done
charlespick opened this issue Nov 1, 2023 · 16 comments
Open
3 tasks done

Comments

@charlespick
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that ansible-rulebook is open source software provided for free and that I might not receive a timely response.

Bug Summary

I'm using Webhook event source to trigger ansible rulebook from the the command line using another locally installed service. The first time it triggers it almost always works but subsequent triggers I sometimes get an error (below)

Environment

1.0.3
Executable location = /usr/local/bin/ansible-rulebook
Drools_jpy version = 0.3.7
Java home = /usr/lib/jvm/java-17-openjdk-amd64
Java version = 17.0.8.1
Python version = 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Ubuntu 22.04.3 LTS on ESXi AMD64

Steps to reproduce

Using this rulebook:

---
- name: Run playbook
  hosts: all
  sources:
    - ansible.eda.webhook:
        host: 127.0.0.1
        port: 6000
  rules:
    - name: Webhook called
      condition: event.payload.cmd == 'start'
      action:
        run_playbook:
          name: /home/charlespick/playbook.yml

Actual results

2023-10-29 18:46:29,916 - ansible_rulebook.rule_set_runner - ERROR - Error calling action run_playbook, err [('/home/charlespick/.ansible/cp/47aa5da64f', '/tmp/edach48wb7s/project/.ansible/cp/47aa5da64f', "[Errno 6] No such device or address: '/home/charlespick/.ansible/cp/47aa5da64f'")]

Expected results

Playbook should execute reliably

Additional information

357c703fdf9d05295589f06bdd5da2ac3d25478f 1

@mkanoor
Copy link
Contributor

mkanoor commented Nov 2, 2023

@charlespick Since you are using a local playbook file, it would have to be copied into the project directory for the ansible-runner. Can you add copy_files: True as an option like shown here

@charlespick
Copy link
Author

Hi @mkanoor
It doesn't look like that worked
image

@mkanoor
Copy link
Contributor

mkanoor commented Dec 2, 2023

Is it possible that there is a playbook in your collection with the same name?

@nageshredhat
Copy link

- name: Listen for events on a webhook
  hosts: all
  ## Define our source for events
  sources:
    - ansible.eda.webhook:
        host: 0.0.0.0
        port: 5000
  ## Define the conditions we are looking for
  rules:
    - name: Say Hello
      condition: event.payload.message == "Ansible is super cool"
  ## Define the action we should take should the condition be met
      action:
        run_playbook:
          name: say-what.yml

say-what.yml

- name: say thanks
  hosts: localhost
  gather_facts: false
  tasks:
    - debug:
        msg: "Thank you, {{ event.sender | default('my friend') }}!"

Try to execute this rulebook

To trigger the rule book use following command.
curl -H 'Content-Type: application/json' -d "{\"message\": \"Ansible is alright\"}" 127.0.0.1:5000/endpoint
curl -H 'Content-Type: application/json' -d "{\"message\": \"Ansible is super cool\"}" 127.0.0.1:5000/endpoint

@Alex-Izquierdo
Copy link
Contributor

Hi @nageshredhat We need the output of your ansible-rulebook cmd as well as the output of the ansible-rulebook --version. You can also try with -vv flag for more debug information.

@muhammad-rafi
Copy link

I have same issue, rather than raising new issue, thought to discuss here, here is my rulebook

- name: Listen Kafka Events for BGP Neighbors
  hosts: all
  sources:
  - ansible.eda.kafka:
      host: "{{ hostname }}"
      port: "{{ port }}"
      topic: "{{ topic }}"
      group_id: "{{ group_id }}"
      offset: latest
      verify_mode: CERT_NONE
      security_protocol: SASL_PLAINTEXT
      sasl_mechanism: SCRAM-SHA-512
      sasl_plain_username: "{{ sasl_plain_username }}"
      sasl_plain_password: "{{ sasl_plain_password }}"

  rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:
          msg: |
            **Device: {{ event.body.tags.source }} 
            **BGP Neighbor: {{ event.body.tags.neighbor_address }}
            **Description: {{ event.body.fields.description | default('N/A') }}
            **Remote ASN: {{ event.body.fields.remote_as_number }}
            **Address Family: {{ event.body.fields['af_data/af_name']}}
            **Prefix Limit: {{ event.body.fields['af_data/max_prefix_limit'] }}
            **Prefix Limit Threshold: {{ event.body.fields['af_data/max_prefix_threshold_percent'] }}
            **Reason: {{ event.body.fields.reset_reason }}

command to run this rulebook

ansible-rulebook --rulebook rulebooks/bgp-max-pfx-rulebook.yml -i cml_hosts.yml --verbose --vars .kafka_vars.yml

Here is the error keep repeating

2024-04-25 11:39:33,415 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:34,094 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:34,792 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:35,443 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:36,120 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:36,793 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:37,488 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:38,140 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:38,853 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:39,632 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:40,355 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:41,067 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:41,756 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:42,422 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:43,107 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:43,799 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:44,482 - ansible_rulebook.rule_set_runner - ERROR - 

It started ok and after couple of minutes, I am getting this error and it keeps repeating itself.

Please advise.

@mkanoor
Copy link
Contributor

mkanoor commented Apr 25, 2024

Are all those attributes defined in the event payload. You can just use the
action:
debug:

to see what it prints that way we will know if its missing fields in the substitution. In Jinja you can put a default value like

{{ event.body.tags.source |default("missing")}} 

@muhammad-rafi
Copy link

Are all those attributes defined in the event payload. You can just use the action: debug:

to see what it prints that way we will know if its missing fields in the substitution. In Jinja you can put a default value like

{{ event.body.tags.source |default("missing")}} 

thanks for the response @mkanoor , I will try that out, thanks for the advise, but is this related to the issue I am having ?

@mkanoor
Copy link
Contributor

mkanoor commented Apr 25, 2024

@muhammad-rafi When you change it to this

 rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:

We will at least know if the issue is related to the attribute missing in the Jinja substitution. The default debug action prints the entire payload.

@muhammad-rafi
Copy link

muhammad-rafi commented Apr 25, 2024

    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:
          msg: |
            **Device: {{ event.body.tags.source | default("missing") }} 
            **BGP Neighbor: {{ event.body.tags.neighbor_address | default("missing") }}
            **Description: {{ event.body.fields.description | default("missing") }}
            **Remote ASN: {{ event.body.fields.remote_as_number | default("missing") }}
            **Address Family: {{ event.body.fields['af_data/af_name'] | default("missing") }}
            **Prefix Limit: {{ event.body.fields['af_data/max_prefix_limit'] | default("missing") }}
            **Prefix Limit Threshold: {{ event.body.fields['af_data/max_prefix_threshold_percent'] | default("missing") }}
            **Reason: {{ event.body.fields.reset_reason | default("missing") }}

@mkanoor I have changed it to this as too by you but I was not getting missing values, the issue is, it starts OK in the beginning and the after couple of I starts getting the following Memory threshold reached issue along with the one I mentioned earlier.

2024-04-25 22:53:24 543 [Thread-0] WARN org.drools.ansible.rulebook.integration.api.rulesengine.AutomaticPseudoClock - Pseudo clock is diverged, the difference is 207 ms. Going to sync with the real clock.
2024-04-25 22:54:57,237 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 386 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 503722296
2024-04-25 22:54:57,387 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 546 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 503981880
2024-04-25 22:54:57,547 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 719 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504231584
2024-04-25 22:54:57,719 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 885 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504512136
2024-04-25 22:54:57,886 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 049 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504788864
2024-04-25 22:54:58,050 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 203 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 505036304
2024-04-25 22:54:58,203 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 367 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505286392
2024-04-25 22:54:58,367 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 544 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505618384
2024-04-25 22:54:58,545 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 706 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505848056
2024-04-25 22:54:58,706 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 875 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506078912
2024-04-25 22:54:58,875 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:59 056 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506334488
2024-04-25 22:54:59,057 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
^C2024-04-25 22:54:59 416 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506578240
2024-04-25 22:54:59,417 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%

omitted some output 


Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
2024-04-25 23:15:29,696 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:30,424 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:31,198 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:31,910 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:32,649 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:33,339 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:34,034 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:34,716 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:35,408 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:36,071 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:36,767 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:37,510 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:38,174 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:38,849 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:39,529 - ansible_rulebook.rule_set_runner - ERROR - 

besides this I have another rule for the playbook to run and same issue if I enable that rule too.

    - name: Display Kafka Logs and Run Action Playbook
      # for ioxr BGP neighbor down due to prefix limit exceeded
      condition: events.body.fields.is_neighbor_max_prefix_shutdown == "true" and events.body.fields.reset_reason == "max-prefix-exceeded"
      action:
        run_playbook:
          name: playbooks/bgp-max-pfx-fix.yml
          extra_vars:
            target_host: "{{ event.body.tags.source }}"
            bgp_neighbor: "{{ event.body.tags.neighbor_address }}"
            bgp_remote_asn: "{{ event.body.fields.remote_as_number }}"
            bgp_local_asn: "{{ event.body.fields.local_as }}"
          verbosity: 1
          copy_files: True

please advise.

@muhammad-rafi
Copy link

muhammad-rafi commented Apr 25, 2024

@muhammad-rafi When you change it to this

 rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:

We will at least know if the issue is related to the attribute missing in the Jinja substitution. The default debug action prints the entire payload.

@mkanoor I have tried this suggested as well, it does print the entire payload but again same issue happening after couple of minutes

@mkanoor
Copy link
Contributor

mkanoor commented Apr 27, 2024

@muhammad-rafi It seems like some sort of a memory leak in the aiokafka or the kafka source plugin. How many events do you think are getting sent across? Is there a lot of events coming along and since they don't get ack'ed is the same event repeating. If you ran ansible-rulebook with the -vv option you will see the event coming in.
Also monitor the memory of the process, the JVM is dying because its running out of memory because something is not releasing memory.

@muhammad-rafi
Copy link

thanks @mkanoor I have a same doubt, this may be happening, I may be dealing with 1000+ interesting events, I was looking for some work around to slow it down or control the memory option with ansible rulebook. Let me know please if you have any suggestions

@muhammad-rafi
Copy link

just to add, on the kafka we dont see any issue, it must be at my end.

@muhammad-rafi
Copy link

@mkanoor any other thoughts on this one please ? this is still an issue

@mkanoor
Copy link
Contributor

mkanoor commented May 16, 2024

@muhammad-rafi Do you see the memory increasing for the python process? How many events are being processed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants