-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vllm serve --config.yaml - Order of arguments matters?
bug
Something isn't working
#8947
opened Sep 29, 2024 by
FloWsnr
1 task done
[Question]: Apply LoRA adapter on quantized model
misc
#8945
opened Sep 29, 2024 by
Tejaswgupta
1 task done
[Feature]: Qwen2.5 bitsandbytes support
feature request
#8941
opened Sep 29, 2024 by
hanan9m
1 task done
[New Model]: Molmo support
new model
Requests to new models
#8940
opened Sep 29, 2024 by
win4r
1 task done
[Usage]: caching with different batches
usage
How to use vllm
#8939
opened Sep 29, 2024 by
KevinZeng08
1 task done
[Bug]: Error when using tensor_parallel in v0.6.1.post1 or 0.6.2
bug
Something isn't working
#8937
opened Sep 29, 2024 by
ruleGreen
1 task done
[Bug]: Vllm0.6.2 UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
bug
Something isn't working
#8933
opened Sep 29, 2024 by
Clint-chan
1 task done
[Feature]: Get logits instead of lobprobs for distillation
feature request
#8926
opened Sep 28, 2024 by
nivibilla
1 task done
[Bug]: Model multimodal config initialisation unhandled and irrelevant error when no architectures found
bug
Something isn't working
#8923
opened Sep 28, 2024 by
AminAlam
1 task done
[Performance] TTFT regression from v0.5.4 to 0.6.2
performance
Performance-related issues
#8918
opened Sep 27, 2024 by
rickyyx
1 task done
[RFC]: QuantizationConfig and QuantizeMethodBase Refactor for Simplifying Kernel Integrations
RFC
#8913
opened Sep 27, 2024 by
LucasWilkinson
1 task done
[Usage]: LLM with tensor_parallel_size larger than n. gpus in one node
usage
How to use vllm
#8908
opened Sep 27, 2024 by
gpucce
1 task done
[Usage]: guided_regex in offline model
usage
How to use vllm
#8907
opened Sep 27, 2024 by
RonanKMcGovern
1 task done
[Bug]: Tokenization Mismatch Between HuggingFace and vLLM
bug
Something isn't working
#8904
opened Sep 27, 2024 by
rafapi
1 task done
[Feature]: Guided Decoding Schema Cache Store
feature request
#8902
opened Sep 27, 2024 by
berniwal
1 task done
[Performance]: Talk about the model parallelism
performance
Performance-related issues
#8898
opened Sep 27, 2024 by
baifanxxx
1 task done
[Bug]: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
usage
How to use vllm
#8893
opened Sep 27, 2024 by
Hothan01
[Bug]: Variance Between Mutiple Prefix Cache Example runs
bug
Something isn't working
#8890
opened Sep 27, 2024 by
Imss27
1 task done
[Installation]: FAILED: CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o
installation
Installation problems
#8889
opened Sep 27, 2024 by
wangshuai09
1 task done
[Bug]: assert len(self._async_stopped) == 0
bug
Something isn't working
#8881
opened Sep 27, 2024 by
sfc-gh-zhwang
1 task done
[Bug]: --quantization=awq Using the quantized startup parameters will cause a restart
bug
Something isn't working
#8877
opened Sep 27, 2024 by
SongXiaoMao
1 task done
[Bug]: Server - Something isn't working
aqlm
fails with --cpu-offload-gb
bug
#8873
opened Sep 26, 2024 by
JMPSequeira
1 task done
[Feature]: Add model context information to chat template
feature request
#8869
opened Sep 26, 2024 by
maxdebayser
1 task done
[Performance]: Slowdown compared to Gradio
performance
Performance-related issues
#8866
opened Sep 26, 2024 by
theoren
1 task done
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.