Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vllm llama integration #129

Merged
merged 105 commits into from
Sep 18, 2024
Merged

feat: vllm llama integration #129

merged 105 commits into from
Sep 18, 2024

Conversation

jorgeantonio21
Copy link
Contributor

@jorgeantonio21 jorgeantonio21 commented Aug 16, 2024

Integrate Llama Model for Fast, Batched Inference in atoma-vllm

Motivation

The atoma-vllm library aims to provide efficient, batched inference capabilities for large language models. Integrating the Llama model into our system will expand our support for popular and powerful language models, enabling users to leverage Llama's capabilities within our high-performance inference framework.

Description

This PR introduces Llama model support to the atoma-vllm library, implementing the necessary components for model loading, execution, and integration with our existing infrastructure. Key changes include:

  1. Added a new llama.rs file in the models directory to implement Llama-specific functionality.
  2. Implemented the ModelLoader, ModelMetadata, and ModelExecutor traits for the LlamaModel struct.
  3. Updated the ModelFilePaths struct to accommodate Llama's file structure.
  4. Modified the fetch and load functions to handle Llama model files and configurations.
  5. Integrated Llama-specific parameters and configurations into the existing model execution pipeline.
  6. Added a test file llama.rs in the tests directory to ensure proper functionality of the Llama integration.
  7. Updated dependencies in Cargo.toml to include necessary Llama-related libraries.
  8. Made adjustments to existing code to accommodate Llama's specific requirements and ensure compatibility with our batched inference system.

These changes allow users to easily load and use Llama models within the atoma-vllm framework, benefiting from our optimized, batched inference capabilities.

Breaking Changes

  • The ModelLoader trait's fetch and load functions now have slightly different signatures to accommodate the new ModelFilePaths struct and additional parameters.
  • The ModelMetadata trait has been updated with new method names and signatures. Any custom model implementations will need to be updated to match the new trait definition.

These breaking changes are necessary to support a wider range of models and improve overall system flexibility. Users of the library may need to update their code if they have implemented custom models or are directly interacting with low-level components of the system.

atoma-vllm/src/llm_service.rs Show resolved Hide resolved
atoma-vllm/src/worker.rs Show resolved Hide resolved
atoma-vllm/src/token_output_stream.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Cifko Cifko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will test that later

@jorgeantonio21 jorgeantonio21 merged commit bb8b07b into main Sep 18, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants