Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Wait timeout: 10000 ms (local run) #295

Open
FFAMax opened this issue Oct 6, 2024 · 2 comments
Open

RuntimeError: Wait timeout: 10000 ms (local run) #295

FFAMax opened this issue Oct 6, 2024 · 2 comments

Comments

@FFAMax
Copy link

FFAMax commented Oct 6, 2024

raceback (most recent call last):
  File "/home/ffamax/exo/exo/api/chatgpt_api.py", line 273, in handle_post_chat_completions
    await asyncio.wait_for(self.node.process_prompt(shard, prompt, image_str, request_id=request_id), timeout=self.response_timeout)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/ffamax/exo/exo/orchestration/standard_node.py", line 98, in process_prompt
    resp = await self._process_prompt(base_shard, prompt, image_str, request_id, inference_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/exo/exo/orchestration/standard_node.py", line 134, in _process_prompt
    result, inference_state, is_finished = await self.inference_engine.infer_prompt(request_id, shard, prompt, image_str, inference_state=inference_state)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/exo/exo/inference/tinygrad/inference.py", line 59, in infer_prompt
    await self.ensure_shard(shard)
  File "/home/ffamax/exo/exo/inference/tinygrad/inference.py", line 97, in ensure_shard
    self.model = await asyncio.get_event_loop().run_in_executor(self.executor, build_transformer, model_path, shard, "8B" if "8b" in shard.model_id.lower() else "70B")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/exo/exo/inference/tinygrad/inference.py", line 48, in build_transformer
    load_state_dict(model, weights, strict=False, consume=False)  # consume=True
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/nn/state.py", line 129, in load_state_dict
    else: v.replace(state_dict[k].to(v.device)).realize()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/tensor.py", line 213, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 224, in run_schedule
    ei.run(var_vals, do_update_stats=do_update_stats)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 174, in run
    et = self.prg(bufs, var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 140, in __call__
    self.copy(dest, src)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 135, in copy
    dest.copyin(src.as_buffer(allow_zero_copy=True))  # may allocate a CPU buffer depending on allow_zero_copy
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/device.py", line 114, in as_buffer
    return self.copyout(memoryview(bytearray(self.nbytes)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/device.py", line 125, in copyout
    self.allocator.copyout(mv, self._buf)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/device.py", line 657, in copyout
    self.device.synchronize()
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/device.py", line 519, in synchronize
    self.timeline_signal.wait(self.timeline_value - 1)
  File "/home/ffamax/miniconda3/envs/.venv.py3.12/lib/python3.12/site-packages/tinygrad/device.py", line 424, in wait
    raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
RuntimeError: Wait timeout: 10000 ms! (the signal is not set to 19, but 0)
Deregister callback_id='chatgpt-api-wait-response-b71dd1bf-c1f7-4ea5-a626-1ddd6febcaf1' deregistered_callback=None

@FFAMax
Copy link
Author

FFAMax commented Oct 6, 2024

Prompted from CLI - the same:

.venv.py3.12) ffamax@srv4090:~/exo$ DEBUG=2 SUPPORT_BF16=0 exo run llama-3.1-8b --prompt "hi"
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Detected system: Linux
Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader
Trying to find available port port=55797
[63156, 51540, 55328, 63288, 52742, 58298, 60339, 50307]
Using available port: 55797
Retrieved existing node ID: 434ccf86-78b3-40e9-9b07-29954834f823
Chat interface started:
 - http://10.1.3.172:8000
 - http://127.0.0.1:8000
ChatGPT API endpoint served at:
 - http://10.1.3.172:8000/v1/chat/completions
 - http://127.0.0.1:8000/v1/chat/completions
tinygrad Device.DEFAULT='NV'
NVIDIA device gpu_name='NVIDIA GEFORCE RTX 3060' gpu_memory_info=<pynvml.c_nvmlMemory_t object at 0x7f749a010650>
Server started, listening on 0.0.0.0:55797
tinygrad Device.DEFAULT='NV'
NVIDIA device gpu_name='NVIDIA GEFORCE RTX 3060' gpu_memory_info=<pynvml.c_nvmlMemory_t object at 0x7f7499eb02d0>
update_peers: added=[] removed=[] updated=[] unchanged=[] to_disconnect=[] to_connect=[]
Collecting topology max_depth=4 visited=set()
Collected topology: Topology(Nodes: {434ccf86-78b3-40e9-9b07-29954834f823: Model: Linux Box (NVIDIA GEFORCE RTX 3060). Chip: NVIDIA GEFORCE RTX 3060. Memory: 12288MB. Flops: fp32: 13.00 TFLOPS, 
fp16: 26.00 TFLOPS, int8: 52.00 TFLOPS}, Edges: {})
Checking if local path exists to load tokenizer from local 
local_path=PosixPath('/home/ffamax/.cache/huggingface/hub/models--mlabonne--Meta-Llama-3.1-8B-Instruct-abliterated/snapshots/368c8ed94ce4c986e7b9ca5c159651ef753908ce')
Resolving tokenizer for model_id='mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated' from 
local_path=PosixPath('/home/ffamax/.cache/huggingface/hub/models--mlabonne--Meta-Llama-3.1-8B-Instruct-abliterated/snapshots/368c8ed94ce4c986e7b9ca5c159651ef753908ce')
Processing prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

@AlexCheema
Copy link
Contributor

On first sight this looks like it might be something in tinygrad itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants