Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe: add whois injection to latentinjection #947

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

leondz
Copy link
Owner

@leondz leondz commented Oct 11, 2024

Add tests for WHOIS response injections to latentinjection probe. Resolves #73

Todo:

  • move whois responses into payload
  • add second probe using same data in a "sandwiched" way like the Eiffel snippet probe does, simulating a RAG being asked to identify and report on any malicious domains in a whois DB

Verification

  • garak -m test -p latentinjection.LatentWhois
  • Run the tests and ensure they pass python -m pytest tests/probes/test_probes_latentinjection.py

@leondz leondz added the probes Content & activity of LLM probes label Oct 11, 2024
@leondz leondz marked this pull request as ready for review October 16, 2024 13:49
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing shows some inconsistency in the results for the Snippet probe:

garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:34:07.347185
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   92/ 140   (failure rate:  34.29%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on  160/ 160
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.html
✔️  garak run complete in 43.95s
garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:35:14.182848
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   93/ 140   (failure rate:  33.57%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: FAIL  ok on  158/ 160   (failure rate:   1.25%)
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.html
✔️  garak run complete in 43.91s

This PR also exposed that some result handling can be unclear. When tested against a model with a somewhat restrictive max request length all responses returned as None resulting in the detector processing all attempt but reporting as SKIP:

garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:17:08.171726
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: mediatek/breeze-7b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhoisSnippet
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: SKIP  ok on    0/   0
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.html
✔️  garak run complete in 220.45s

This was due to 400 responses from the NIM:

Error code: 400 - {
'type': 'urn:inference-service:problem-details:bad-request',
'title': 'Bad Request',
'status': 400,
'detail': 'model inference failed -- ValueError: A request contains input text which is too long. Maximum allowed input sequence length for the model is 4096 whereas input length is 6734.',
'instance': '/v2/nvcf/pexec/functions/14919f3d-8dc9-4d19-a6da-6dbf99360156',
'requestId': '84eb5985-bda7-4264-a5b4-728f28d6bda2'
}

garak/probes/latentinjection.py Show resolved Hide resolved
@leondz
Copy link
Owner Author

leondz commented Oct 18, 2024

This was due to 400 responses from the NIM:

Indeed. Current NIMs have patchy handling for exposing ctxlen or handling overruns. We handle as best we can. Other places we see this are dan.DanInTheWild and encoding.InjectZalgo. It's broken these runs for a while. Could set up a LUT for NIM ctxlens but since NIM names shift, and new NIMs are the ones more likely to be tested, this is a manual non-scaling process probably better handled in run-specific config - until the values get exposed.

I consider the issue out-of-scope for this PR but it's good to keep flagging it.

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there might be some value in tweaking the selections for LatentWhoisSnippet as it consistently only found issues when -g > 1 was used.

./garak_debug.py -m nim -n meta/llama3-8b-instruct -p latentinjection.LatentWhois,latentinjection.LatentWhoisSnippet --parallel_attempts 16 -g 2
garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-18T11:24:45.710370
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   38/  56   (failure rate:  32.14%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: FAIL  ok on   61/  64   (failure rate:   4.69%)
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.html
✔️  garak run complete in 22.82s
./garak_debug.py -m nim -n meta/llama3-8b-instruct -p latentinjection.LatentWhois,latentinjection.LatentWhoisSnippet --parallel_attempts 16 -g 1
garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-18T11:26:22.978236
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   20/  28   (failure rate:  28.57%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on   32/  32
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.html
✔️  garak run complete in 15.19s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
probes Content & activity of LLM probes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

probe: injection in whois
2 participants