Llama 4: Open, Multimodal AI Models Scout & Maverick Released

As artificial intelligence increasingly integrates into daily life, the availability of leading models as open systems is crucial for fostering innovation and personalized experiences globally. Today marks a significant advancement for the Llama ecosystem with the introduction of its most sophisticated model suite yet. We are thrilled to announce Llama 4 Scout and Llama 4 Maverick, pioneering open-weight, natively multimodal models featuring unprecedented context length support and built using a mixture-of-experts (MoE) architecture. Alongside these, we offer a preview of Llama 4 Behemoth, an exceptionally intelligent Large Language Model (LLM) poised to be our most powerful creation, serving as a 'teacher' for the new models.The Llama 4 series heralds a new chapter for the ecosystem. Two efficient models lead this launch: Llama 4 Scout, equipped with 17 billion active parameters and 16 experts, and Llama 4 Maverick, also with 17 billion active parameters but utilizing 128 experts. Scout is designed for efficiency, fitting on a single H100 GPU (with Int4 quantization), while Maverick operates on a single H100 host. Complementing these is the teacher model, Llama 4 Behemoth, which demonstrates superior performance against competitors like GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on demanding STEM benchmarks such as MATH-500 and GPQA Diamond. Although Behemoth is still undergoing training and not yet released, we are eager to share insights into its development.Our commitment to openness remains steadfast, believing it fuels innovation beneficial to developers, Meta, and the wider world. Consequently, Llama 4 Scout and Llama 4 Maverick are available for download immediately via llama.com and Hugging Face, empowering the community to leverage our latest technology for novel applications. Availability through our partners will follow shortly. Users can also experience Meta AI powered by Llama 4 starting today within WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website.This launch is merely the beginning for the Llama 4 collection. Our vision extends towards creating intelligent systems capable of generalized actions, natural human conversation, and tackling novel, complex problems. Enhancing Llama with these capabilities will translate into superior products for our platform users and unlock new avenues for developers in consumer and business applications. Ongoing research and prototyping efforts continue, with further details about our vision to be shared at LlamaCon on April 29.For developers building upon our models, enterprises integrating them into workflows, or individuals exploring AI's potential, Llama 4 Scout and Llama 4 Maverick represent prime choices for incorporating next-generation intelligence. We are excited to delve into the four key aspects of their development, offering insights into our research and design philosophy, and anticipate the remarkable experiences the community will create using these new models.The pre-training phase for these models introduced several novel approaches essential for building the next generation of Llama. These models stand out by offering multimodal intelligence at a competitive value, surpassing significantly larger models in performance. A key innovation is the adoption of a mixture of experts (MoE) architecture, a first for Llama models. In MoE systems, only a fraction of the total parameters are activated for any single token, leading to greater compute efficiency during both training and inference. This architecture allows for higher quality outcomes compared to dense models given a fixed training FLOPs budget. For instance, Llama 4 Maverick features 17 billion active parameters out of 400 billion total parameters, employing alternating dense and MoE layers. The MoE layers consist of 128 routed experts and one shared expert; each token engages the shared expert plus one routed expert, optimizing inference efficiency, cost, and latency. Maverick can run on a single NVIDIA H100 DGX host or utilize distributed inference for maximum efficiency.Furthermore, Llama 4 models boast native multimodality achieved through early fusion, seamlessly integrating text and vision tokens within a unified model backbone. This represents a significant leap, enabling joint pre-training on vast amounts of unlabeled text, image, and video data. The vision encoder, based on MetaCLIP, was also enhanced and trained specifically with a frozen Llama model to better align it with the LLM. We introduced a novel training technique, MetaP, for reliably setting critical hyperparameters like per-layer learning rates and initialization scales, finding these transfer effectively across varying batch sizes, model dimensions, and training tokens. Supporting open-source fine-tuning, Llama 4 was pre-trained on 200 languages (over 100 with more than 1 billion tokens each), using 10 times more multilingual tokens than Llama 3. Efficiency was further boosted by using FP8 precision during training without compromising quality, achieving high FLOPs utilization—reaching 390 TFLOPs/GPU during Llama 4 Behemoth's pre-training on 32K GPUs. The training data comprised over 30 trillion tokens, more than double that of Llama 3, encompassing diverse text, image, and video datasets. A subsequent 'mid-training' phase refined core capabilities using new recipes, including long context extension via specialized datasets, which enhanced quality and enabled Llama 4 Scout's class-leading 10 million token input context length.Post-training focused on tailoring the models for diverse applications. Llama 4 Maverick excels in image and text understanding, ideal for sophisticated AI applications, general assistant tasks, precise image comprehension, and creative writing. A major post-training challenge for Maverick was balancing multimodality, reasoning, and conversation. We developed a curated curriculum strategy for mixing modalities without performance trade-offs. The post-training pipeline was revamped to: lightweight supervised fine-tuning (SFT) > online reinforcement learning (RL) > lightweight direct preference optimization (DPO). We learned that excessive SFT and DPO could overly constrain the model during RL, hindering accuracy in reasoning, coding, and math. To mitigate this, over 50% of data judged 'easy' by Llama models was removed, and lightweight SFT was performed on the remaining harder data. In the subsequent multimodal online RL stage, selecting harder prompts led to significant performance gains. A continuous online RL strategy, alternating training with filtering for medium-to-hard prompts, proved highly effective for compute and accuracy. Finally, lightweight DPO addressed corner cases in response quality, balancing intelligence and conversational ability. This resulted in an industry-leading chat model with state-of-the-art intelligence and image understanding, outperforming models like GPT-4o and Gemini 2.0 on various benchmarks and competing well with larger models like DeepSeek v3.1.Our smaller model, Llama 4 Scout, is a general-purpose powerhouse with 17 billion active parameters, 16 experts, and 109 billion total parameters, delivering top performance in its class. Scout dramatically expands context length support from Llama 3's 128K to an industry-leading 10 million tokens, enabling tasks like multi-document summarization, analysis of extensive user activity for personalization, and reasoning over large codebases. Scout was pre-trained and post-trained with a 256K context length, fostering advanced length generalization capabilities, demonstrated in tasks like 'retrieval needle in haystack' for text and handling 10 million code tokens. A key architectural innovation is the iRoPE architecture, using interleaved attention layers without positional embeddings and employing inference time temperature scaling of attention to enhance length generalization ('i' for interleaved, aiming for 'infinite' context, 'RoPE' for rotary position embeddings). Both models were trained on diverse image and video stills for broad visual understanding, enabling interaction with multiple images alongside text prompts. Pre-trained on up to 48 images, post-training tests show good results with up to eight images. Llama 4 Scout also leads in image grounding, aligning prompts with visual concepts and anchoring responses to image regions for precise visual Q&A. It surpasses comparable models on coding, reasoning, long context, and image benchmarks, outperforming all previous Llama versions.These new models serve as vital components for the future of human connection. True to our open-source commitment, Llama 4 Maverick and Llama 4 Scout are downloadable from llama.com and Hugging Face, with broader platform availability imminent. Transitioning to larger scales, we preview Llama 4 Behemoth, a teacher model showcasing advanced intelligence. This multimodal MoE model has 288B active parameters, 16 experts, and nearly two trillion total parameters. Its state-of-the-art performance on non-reasoning tasks made it ideal for teaching the smaller Llama 4 models. Llama 4 Maverick was codistilled from Behemoth, significantly improving quality metrics. A novel distillation loss function dynamically weighted soft and hard targets. Codistillation during pre-training amortized the cost of computing distillation targets for most student training data; for new data, forward passes were run on Behemoth. Post-training this massive model required revamping the recipe, pruning 95% of SFT data (compared to 50% for smaller models) for quality and efficiency. Lightweight SFT followed by large-scale RL yielded significant reasoning and coding improvements. The RL recipe focused on sampling hard prompts and dynamically filtering prompts with zero advantage. Scaling RL for a 2T parameter model necessitated overhauling the infrastructure. We optimized MoE parallelization for speed and developed a fully asynchronous online RL framework, enhancing flexibility and resource allocation, resulting in a ~10x improvement in training efficiency.Developing helpful models while mitigating risks is paramount. Llama 4 incorporates best practices from our AI Protections Developer Use Guide, integrating safeguards at every stage, from pre-training data filtering to post-training policy conformance and tunable system-level mitigations. We empower developers with tools for creating safe, adaptable applications. System-level safeguards include open-sourced tools adaptable for specific needs: Llama Guard: An input/output safety LLM based on the MLCommons taxonomy for policy violation detection.Prompt Guard: A classifier trained to detect malicious prompts (Jailbreaks) and prompt injections.CyberSecEval: Evaluations to understand and reduce generative AI cybersecurity risks. We believe tailored open solutions are most effective and continue collaborating on industry-wide standards. Systematic testing and red-teaming are integral. We employ adversarial dynamic probing and advanced techniques like Generative Offensive Agent Testing (GOAT), which simulates multi-turn adversarial interactions to increase testing coverage and identify vulnerabilities faster, allowing human experts to focus on novel threats.Addressing inherent bias in LLMs, often stemming from internet training data leaning left on debated topics, is a key focus. Our goal is to eliminate bias, ensuring Llama understands and articulates multiple viewpoints neutrally. Llama 4 shows significant improvement over Llama 3 and is comparable to Grok in this regard: Refusals on debated topics dropped from 7% (Llama 3.3) to below 2%.Unequal response refusals on debated questions are now less than 1%.Responses indicating strong political lean occur at a rate comparable to Grok (half the rate of Llama 3.3). While progress has been made, we remain committed to further reducing bias.Intelligent models must also respond personally and quickly. Llama 4 is optimized for these needs, forming part of a larger ecosystem including product integrations. We are dedicated to the full stack and look forward to ongoing collaboration with partners and the open-source community. The potential for rich experiences built upon the new Llama ecosystem is vast, and we eagerly await the community's innovations. Explore Llama 4 Scout and Maverick today via download or experience Meta AI powered by Llama 4 across integrated platforms.

News

News

News

Meet Llama 4: Open, Multimodal AI Power

Introducing Scout, Maverick, and Behemoth – advanced, open-weight AI models.

Meet Llama 4: Open, Multimodal AI Power

Introducing Scout, Maverick, and Behemoth – advanced, open-weight AI models.