RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #683

himanshushukla12 · 2024-09-26T06:20:08Z

System Info

Version I'm using:

Python 3.10.11
torch==2.4.1
torchaudio==2.4.1
torchvision==0.19.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
2 GPU NVIDIA RTX 6000 Ada Gen of 50Gb each (total 100GB)
Ubuntu 22.04

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

Command I used:
python inference.py --model_name /home/z004x2xz/meta-llama/Meta-Llama-3.1-8B-Instruct --prompt_file 'Hello' --use_auditnlg

Error logs

Here is the error log from my terminal:

 DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
  from torch.distributed._shard.checkpoint import (
use_fast_kernelsTrue
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.07s/it]
Running on local URL:  http://0.0.0.0:7860
2024/09/26 04:28:11 [W] [service.go:132] login to server failed: dial tcp 44.237.78.176:7000: i/o timeout

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
    return await future
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
    outputs = model.generate(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
    return await future
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
    outputs = model.generate(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Expected behavior

Answer generated when inferencing.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #683

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #683

himanshushukla12 commented Sep 26, 2024

RuntimeError: probability tensor contains either inf, nan or element < 0 #683

RuntimeError: probability tensor contains either inf, nan or element < 0 #683

Comments

himanshushukla12 commented Sep 26, 2024

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #683

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #683