Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: probability tensor contains either inf, nan or element < 0 #683

Open
1 of 2 tasks
himanshushukla12 opened this issue Sep 26, 2024 · 0 comments
Open
1 of 2 tasks

Comments

@himanshushukla12
Copy link

System Info

Version I'm using:

Python 3.10.11
torch==2.4.1
torchaudio==2.4.1
torchvision==0.19.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
2 GPU NVIDIA RTX 6000 Ada Gen of 50Gb each (total 100GB)
Ubuntu 22.04

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Command I used:
python inference.py --model_name /home/z004x2xz/meta-llama/Meta-Llama-3.1-8B-Instruct --prompt_file 'Hello' --use_auditnlg

Error logs

Here is the error log from my terminal:

 DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
  from torch.distributed._shard.checkpoint import (
use_fast_kernelsTrue
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.07s/it]
Running on local URL:  http://0.0.0.0:7860
2024/09/26 04:28:11 [W] [service.go:132] login to server failed: dial tcp 44.237.78.176:7000: i/o timeout

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
    return await future
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
    outputs = model.generate(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
    return await future
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
    outputs = model.generate(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Expected behavior

Answer generated when inferencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant