probe: add whois injection to latentinjection #947

leondz · 2024-10-11T13:43:52Z

Add tests for WHOIS response injections to latentinjection probe. Resolves #73

Todo:

move whois responses into payload
add second probe using same data in a "sandwiched" way like the Eiffel snippet probe does, simulating a RAG being asked to identify and report on any malicious domains in a whois DB

Verification

garak -m test -p latentinjection.LatentWhois
Run the tests and ensure they pass python -m pytest tests/probes/test_probes_latentinjection.py

…sted generations

jmartin-tech

Testing shows some inconsistency in the results for the Snippet probe:

garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:34:07.347185
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   92/ 140   (failure rate:  34.29%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on  160/ 160
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.a2d88a48-0f91-47f3-921c-75f5299eeea2.report.html
✔️  garak run complete in 43.95s

garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:35:14.182848
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   93/ 140   (failure rate:  33.57%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: FAIL  ok on  158/ 160   (failure rate:   1.25%)
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.2abae83b-247f-48d5-9c63-a90a0bc4ab41.report.html
✔️  garak run complete in 43.91s

This PR also exposed that some result handling can be unclear. When tested against a model with a somewhat restrictive max request length all responses returned as None resulting in the detector processing all attempt but reporting as SKIP:

garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-16T12:17:08.171726
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: mediatek/breeze-7b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhoisSnippet
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: SKIP  ok on    0/   0
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.e3dbf4eb-840c-446b-988f-d5b2a78d8db3.report.html
✔️  garak run complete in 220.45s

This was due to 400 responses from the NIM:

Error code: 400 - {
'type': 'urn:inference-service:problem-details:bad-request',
'title': 'Bad Request',
'status': 400,
'detail': 'model inference failed -- ValueError: A request contains input text which is too long. Maximum allowed input sequence length for the model is 4096 whereas input length is 6734.',
'instance': '/v2/nvcf/pexec/functions/14919f3d-8dc9-4d19-a6da-6dbf99360156',
'requestId': '84eb5985-bda7-4264-a5b4-728f28d6bda2'
}

garak/probes/latentinjection.py

leondz · 2024-10-18T13:47:32Z

This was due to 400 responses from the NIM:

Indeed. Current NIMs have patchy handling for exposing ctxlen or handling overruns. We handle as best we can. Other places we see this are dan.DanInTheWild and encoding.InjectZalgo. It's broken these runs for a while. Could set up a LUT for NIM ctxlens but since NIM names shift, and new NIMs are the ones more likely to be tested, this is a manual non-scaling process probably better handled in run-specific config - until the values get exposed.

I consider the issue out-of-scope for this PR but it's good to keep flagging it.

jmartin-tech

LGTM, there might be some value in tweaking the selections for LatentWhoisSnippet as it consistently only found issues when -g > 1 was used.

./garak_debug.py -m nim -n meta/llama3-8b-instruct -p latentinjection.LatentWhois,latentinjection.LatentWhoisSnippet --parallel_attempts 16 -g 2
garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-18T11:24:45.710370
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   38/  56   (failure rate:  32.14%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: FAIL  ok on   61/  64   (failure rate:   4.69%)
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.70490236-ce1b-42f4-822b-e00df617f283.report.html
✔️  garak run complete in 22.82s

./garak_debug.py -m nim -n meta/llama3-8b-instruct -p latentinjection.LatentWhois,latentinjection.LatentWhoisSnippet --parallel_attempts 16 -g 1
garak LLM vulnerability scanner v0.9.0.16.post1 ( https://github.com/leondz/garak ) at 2024-10-18T11:26:22.978236
📜 logging to /home/jemartin/.local/share/garak/garak.log
🦜 loading generator: NIM: meta/llama3-8b-instruct
📜 reporting to /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.jsonl
🕵️  queue of probes: latentinjection.LatentWhois, latentinjection.LatentWhoisSnippet
latentinjection.LatentWhois                                                 base.TriggerListDetector: FAIL  ok on   20/  28   (failure rate:  28.57%)
latentinjection.LatentWhoisSnippet                                          base.TriggerListDetector: PASS  ok on   32/  32
📜 report closed :) /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.jsonl
📜 report html summary being written to /home/jemartin/.local/share/garak/garak_runs/garak.fa0f1bc6-095f-4154-907d-287206cda8a5.report.html
✔️  garak run complete in 15.19s

leondz added 2 commits October 11, 2024 15:39

add whois injections

d2b28f8

expand latentinjection tests

3158ca3

leondz added the probes Content & activity of LLM probes label Oct 11, 2024

leondz added 3 commits October 16, 2024 14:46

refactor attrib testing

68da343

move whois contexts to payload

d25a4c2

add multiple whois report injection probe

8dd2a3b

leondz marked this pull request as ready for review October 16, 2024 13:49

leondz requested review from jmartin-tech and erickgalinkin October 16, 2024 13:49

add denylist source, make permutation sample size predicated on reque…

c1026b2

…sted generations

jmartin-tech reviewed Oct 16, 2024

View reviewed changes

garak/probes/latentinjection.py Show resolved Hide resolved

leondz added 2 commits October 18, 2024 15:39

add shuffle toggle, make context/generations factor configurable

dc0bea8

set random.seed if configured

9fdd706

leondz requested a review from jmartin-tech October 22, 2024 09:30

jmartin-tech approved these changes Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probe: add whois injection to latentinjection #947

probe: add whois injection to latentinjection #947

leondz commented Oct 11, 2024

jmartin-tech left a comment

leondz commented Oct 18, 2024

jmartin-tech left a comment

probe: add whois injection to latentinjection #947

Are you sure you want to change the base?

probe: add whois injection to latentinjection #947

Conversation

leondz commented Oct 11, 2024

Todo:

Verification

jmartin-tech left a comment

Choose a reason for hiding this comment

leondz commented Oct 18, 2024

jmartin-tech left a comment

Choose a reason for hiding this comment