Meta Unveils Llama 3: Next-Gen Open Source Language Model with Unprecedented 70B Parameters

Meta Unveils Llama 3: Next-Gen Open Source Language Model With Unprecedented 70B Parameters

Meta has introduced Meta Llama 3, the latest generation of its open-source large language model. The model will be available on platforms such as AWS, Google Cloud, and Microsoft Azure, with support from hardware platforms like AMD, Intel, and NVIDIA. The company aims to develop Llama 3 responsibly, offering resources for others to use it responsibly as well. The technology behind Meta AI, which is built with Llama 3, is now one of the world’s leading AI assistants. The new models demonstrate improved reasoning and performance on industry benchmarks.

Introduction to Meta Llama 3: The Next Generation Open Source Large Language Model

Meta has announced the introduction of Meta Llama 3, the latest iteration of their open source large language model (LLM). This new model is set to be available on various platforms including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. It will also be supported by hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

Meta Llama 3 is being developed with a focus on responsible use, and to this end, Meta is introducing new trust and safety tools such as Llama Guard 2, Code Shield, and CyberSec Eval 2. The company plans to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance in the coming months.

Meta Llama 3: Aiming for State-of-the-Art Performance

The first two models of Meta Llama 3, featuring pretrained and instruction-fine-tuned language models with 8B and 70B parameters, are now available for broad use. These models demonstrate state-of-the-art performance on a wide range of industry benchmarks and offer new capabilities, including improved reasoning.

The 8B and 70B parameter Llama 3 models represent a significant improvement over Llama 2, establishing a new standard for LLM models at these scales. These improvements are due to advancements in pretraining and post-training procedures, which have led to a substantial reduction in false refusal rates, improved alignment, and increased diversity in model responses.

Goals and Design Philosophy of Llama 3

The development of Llama 3 was guided by the goal of building the best open models that are on par with the best proprietary models available today. The team aimed to address developer feedback to increase the overall usefulness of Llama 3, while continuing to play a leading role in the responsible use and deployment of LLMs.

The design philosophy of Llama 3 was centered around innovation, scale, and simplicity. This philosophy was reflected in four key areas: the model architecture, the pretraining data, scaling up pretraining, and instruction fine-tuning.

Model Architecture and Training Data of Llama 3

The model architecture of Llama 3 is a relatively standard decoder-only transformer architecture. Compared to Llama 2, several key improvements were made. Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, leading to substantially improved model performance.

The training data for Llama 3 was collected from publicly available sources and totals over 15T tokens. This dataset is seven times larger than that used for Llama 2, and it includes four times more code. Over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages.

Future Developments and Expectations for Llama 3

In the near future, Meta aims to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding. The company is embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development.

The development of Llama 3 also involved optimizing for performance for real-world scenarios. To this end, Meta developed a new high-quality human evaluation set containing 1,800 prompts that cover 12 key use cases. The results of these evaluations highlight the strong performance of the 70B instruction-following model compared to competing models of comparable size in real-world scenarios.

More information
External Link: Click Here For More