Skip to content

v0.6.0

Compare
Choose a tag to compare
@dakinggg dakinggg released this 12 Mar 20:22

🚀 LLM Foundry v0.6.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Configurable loss for chat-formatted data (#985)

For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.

This can be specified in the train_loader.dataset section of your yaml as follows:

...
train_loader:
  dataset:
    ...
    target_prompts: <FILL IN>
    target_reseponses: <FILL IN>

See the docstring for a description of the options.

Olmo support (#1016)

We've added support for the OLMo model from AI2.

To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]).

Then you will need to adjust the tokenizer section of your config as follows:

tokenizer:
  name: allenai/OLMo-7B
  kwargs:
    revision: main
    model_max_length: 2048
    model_input_names:
    - input_ids
    - attention_mask
    trust_remote_code: true

Token accuracy (#983)

We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.

Configurable activation checkpointing (#951)

More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.

Finetuning with multiple streams, and pretokenized data (#933, #945, #946)

We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.

Eval Gauntlet v0.3 (#824)

We've release v0.3 of our Evaluation gauntlet. See the README for a full description.

Breaking changes and deprecations

Flash attention v1 removal (#1023)

Support for flash attention v1 has now been removed.

Extra BOS token removed (#1003)

When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.

Deprecation of triton flash attention, prefixLM, and text denoising (#1007)

We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.6.0