-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: vllm llama integration #129
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ith current logic of atoma-vllm
Cifko
approved these changes
Sep 18, 2024
Cifko
approved these changes
Sep 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will test that later
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Integrate Llama Model for Fast, Batched Inference in atoma-vllm
Motivation
The atoma-vllm library aims to provide efficient, batched inference capabilities for large language models. Integrating the Llama model into our system will expand our support for popular and powerful language models, enabling users to leverage Llama's capabilities within our high-performance inference framework.
Description
This PR introduces Llama model support to the atoma-vllm library, implementing the necessary components for model loading, execution, and integration with our existing infrastructure. Key changes include:
llama.rs
file in themodels
directory to implement Llama-specific functionality.ModelLoader
,ModelMetadata
, andModelExecutor
traits for theLlamaModel
struct.ModelFilePaths
struct to accommodate Llama's file structure.fetch
andload
functions to handle Llama model files and configurations.llama.rs
in thetests
directory to ensure proper functionality of the Llama integration.Cargo.toml
to include necessary Llama-related libraries.These changes allow users to easily load and use Llama models within the atoma-vllm framework, benefiting from our optimized, batched inference capabilities.
Breaking Changes
ModelLoader
trait'sfetch
andload
functions now have slightly different signatures to accommodate the newModelFilePaths
struct and additional parameters.ModelMetadata
trait has been updated with new method names and signatures. Any custom model implementations will need to be updated to match the new trait definition.These breaking changes are necessary to support a wider range of models and improve overall system flexibility. Users of the library may need to update their code if they have implemented custom models or are directly interacting with low-level components of the system.