From b152587288e1e2ddd9653b94d3a6aebcfcb07b53 Mon Sep 17 00:00:00 2001 From: jakotlow Date: Thu, 1 Aug 2024 14:11:25 +0200 Subject: [PATCH] Create new OpenELM tech page index.mdx Create new OpenELM tech page --- technologies/open-elm/index.mdx | 55 +++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 technologies/open-elm/index.mdx diff --git a/technologies/open-elm/index.mdx b/technologies/open-elm/index.mdx new file mode 100644 index 00000000..69a093dd --- /dev/null +++ b/technologies/open-elm/index.mdx @@ -0,0 +1,55 @@ +--- +title: "OpenELM" +author: "Apple" +description: "OpenELM, developed by Apple, is a family of Transformer-based language models optimized for devices with limited memory and computational resources, balancing high performance with efficiency." +--- + +# OpenELM + +OpenELM (Open-source Efficient Language Models) is a family of Transformer-based language models developed by Apple, optimized for running on devices with constrained memory and computational resources. The OpenELM models are designed to balance high performance with efficiency, making them suitable for deployment on mobile devices, laptops, and other hardware with limited processing power. + +| General | | +| ----------- | ---------------------------------------------------------------- | +| Relese date | 2024 | +| Author | [Apple](https://machinelearning.apple.com/) | +| Type | Transformer-based Language Models | + + +## Key Models and Features + +- **OpenELM-270M:** A compact model with 270 million parameters, designed for basic text generation and understanding tasks. + +- **OpenELM-450M:** An intermediate model with 450 million parameters, offering improved performance for more complex language tasks. + +- **OpenELM-1.1B:** A larger model with 1.1 billion parameters, providing a good balance between size and capability. + +- **OpenELM-3B:** The most powerful in the series, with 3 billion parameters, suitable for more demanding applications. + +Each model is available in a base version and an instruction-tuned variant, which is fine-tuned on datasets for tasks that require following specific instructions. + + +## Unique Architecture and Efficiency + +OpenELM models feature a unique non-uniform layer-wise scaling architecture. Unlike traditional Transformers, which maintain consistent parameter allocation across layers, OpenELM allocates fewer parameters to initial layers and gradually increases them towards the output layers. This design optimizes the use of available parameters, enhancing the model’s performance without increasing its size. + + +## Training and Data + +The models are trained on a mix of publicly available datasets, including The Pile and RedPajama, totaling approximately 1.8 trillion tokens. Instruction tuning was performed using the UltraFeedback dataset, comprising around 60,000 prompts. The models were trained with a focus on efficiency, employing techniques like Flash Attention and grouped query attention to reduce memory and computational requirements. + + +## Applications and Use Cases + +OpenELM models are ideal for on-device applications where privacy and low latency are crucial. They are suitable for a range of applications, including natural language understanding, text generation, and coding assistance. The models are fully open-source, with Apple providing comprehensive training logs, multiple checkpoints, and pre-training configurations to facilitate further research and development . + + +## Open Source Release + +In a significant departure from its usual approach, Apple has made the OpenELM models fully open-source, including the model weights, training data, and code. This move aims to encourage collaboration within the research community and to support the development of on-device AI applications. + + +👉 For more details and to access the models, you can visit the [OpenELM collection on Hugging Face](https://huggingface.co/collections/apple/openelm-pretrained-models-6619ac6ca12a10bd0d0df89e). + + + +