You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python 3.10.11
torch==2.4.1
torchaudio==2.4.1
torchvision==0.19.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
2 GPU NVIDIA RTX 6000 Ada Gen of 50Gb each (total 100GB)
Ubuntu 22.04
Information
The official example scripts
My own modified scripts
🐛 Describe the bug
Command I used: python inference.py --model_name /home/z004x2xz/meta-llama/Meta-Llama-3.1-8B-Instruct --prompt_file 'Hello' --use_auditnlg
Error logs
Here is the error log from my terminal:
DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
from torch.distributed._shard.checkpoint import (
use_fast_kernelsTrue
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.07s/it]
Running on local URL: http://0.0.0.0:7860
2024/09/26 04:28:11 [W] [service.go:132] login to server failed: dial tcp 44.237.78.176:7000: i/o timeout
Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
return await future
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
result = context.run(func, *args)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
outputs = model.generate(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
User prompt deemed safe.
User prompt:
tell me something about AI
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2357, in run_sync_in_worker_thread
return await future
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 864, in run
result = context.run(func, *args)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/recipes/quickstart/inference/local_inference/inference.py", line 105, in inference
outputs = model.generate(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/home/z004x2xz/WorkAssignedByMatt/llama-recipes/venvLlamaDirectBuild/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Expected behavior
Answer generated when inferencing.
The text was updated successfully, but these errors were encountered:
System Info
Version I'm using:
Information
🐛 Describe the bug
Command I used:
python inference.py --model_name /home/z004x2xz/meta-llama/Meta-Llama-3.1-8B-Instruct --prompt_file 'Hello' --use_auditnlg
Error logs
Here is the error log from my terminal:
Expected behavior
Answer generated when inferencing.
The text was updated successfully, but these errors were encountered: