Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding GQA #1139

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

minowau
Copy link

@minowau minowau commented Jul 13, 2024

Implementation by optimizing memory usage and performance for low-resource environments. Key updates include the integration of grouped query attention, modifications to the tokenizer for better encoding and decoding, and improvements to the text generation logic using nucleus sampling. Additionally, the code structure has been refined with comprehensive documentation, ensuring clarity and maintainability. Initial tests have been conducted to validate the overall functionality of the updated components.

Enhancements to Transformer Model Implementation

  • Transformer Model (Transformer class):

    • Implemented grouped query attention to optimize memory usage.
    • Adjusted the forward method to handle dynamic token lengths.
  • Transformer Block (TransformerBlock class):

    • Updated attention and feedforward layers for improved performance.
  • Attention Module (Attention class):

    • Integrated grouped query attention and adjusted key/value caching mechanisms.
  • Tokenizer (Tokenizer class):

    • Modified the encoding and decoding processes using SentencePiece.
    • Ensured proper handling of special tokens: beginning-of-sequence (BOS), end-of-sequence (EOS), and padding (PAD).
  • Generation Method (generate function):

    • Enhanced logic to support dynamic input lengths.
    • Implemented nucleus sampling with adjustable temperature and top-p parameters for better control over text generation.
    • Improved handling of log probabilities and early stopping conditions based on EOS tokens.
  • Documentation and Code Structure:

    • Added detailed docstrings and comments for clarity and maintainability.
    • Ensured consistent formatting throughout the codebase.
  • Testing and Validation:

    • Conducted initial tests to validate the functionality of the model, tokenizer, and generation processes.

…del implementation by optimizing memory usage and performance for low-resource environments. Key updates include the integration of grouped query attention, modifications to the tokenizer for better encoding and decoding, and improvements to the text generation logic using nucleus sampling. Additionally, the code structure has been refined with comprehensive documentation, ensuring clarity and maintainability. Initial tests have been conducted to validate the overall functionality of the updated components.

**Enhancements to Transformer Model Implementation**

- **Transformer Model (`Transformer` class)**:
  - Implemented grouped query attention to optimize memory usage.
  - Adjusted the forward method to handle dynamic token lengths.

- **Transformer Block (`TransformerBlock` class)**:
  - Updated attention and feedforward layers for improved performance.

- **Attention Module (`Attention` class)**:
  - Integrated grouped query attention and adjusted key/value caching mechanisms.

- **Tokenizer (`Tokenizer` class)**:
  - Modified the encoding and decoding processes using SentencePiece.
  - Ensured proper handling of special tokens: beginning-of-sequence (BOS), end-of-sequence (EOS), and padding (PAD).

- **Generation Method (`generate` function)**:
  - Enhanced logic to support dynamic input lengths.
  - Implemented nucleus sampling with adjustable temperature and top-p parameters for better control over text generation.
  - Improved handling of log probabilities and early stopping conditions based on EOS tokens.

- **Documentation and Code Structure**:
  - Added detailed docstrings and comments for clarity and maintainability.
  - Ensured consistent formatting throughout the codebase.

- **Testing and Validation**:
  - Conducted initial tests to validate the functionality of the model, tokenizer, and generation processes.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 13, 2024
@minowau
Copy link
Author

minowau commented Jul 13, 2024

@msaroufim could you please check this.Just a request.This is a new architecture

@minowau
Copy link
Author

minowau commented Jul 13, 2024

@jspisak Just want you to review This

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants