Open Models

We hope that the information on recently released AI models will be useful for education and research purposes.

presentation

LLaMA 4 (Released: 2025)

  • Meta's 2025 LLaMA 4 family (Scout / Maverick / Behemoth) — multimodal and advanced capabilities; announced as open-source by Meta.
  • Core Features: Llama 4 Scout: Class-leading natively multimodal model that offers superior text and visual intelligence, single H100 GPU efficiency, and a 10M context window for seamless long document analysis. Llama 4 Maverick: Industry-leading natively multimodal model for image and text understanding with groundbreaking intelligence and fast responses at a low cost. Llama 4 Behemoth: a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs
  • Use Cases: Multimodal tasks (text, audio, image, video)
  • License: Source-available (Llama 4 Community License Agreement and Llama 4 Acceptable Use Policy)
  • Official Website: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
  • Hugging Face: https://huggingface.co/meta-llama
  • GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama4

GPT-OSS (Released: 2025)

  • GPT-OSS (stylized as gpt-oss) is a set of open-weight reasoning models released by OpenAI on August 5, 2025, aimed to be usable locally.
  • Core Features: The gpt-oss model is OpenAI's first open-weight language model since GPT‑2. It includes the ability to adjust the 'reasoning_effrot' for tasks that do not require complex inference and/or aim for low-latency final output. It has two variants—a larger 117-billion-parameter model called gpt-oss-120b and a smaller 21-billion-parameter model called gpt-oss-20b.
  • Use Cases: Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • License: Apache License 2.0
  • Official Website: https://openai.com/index/introducing-gpt-oss/
  • Hugging Face: https://huggingface.co/openai
  • GitHub: https://github.com/openai/gpt-oss

Magistral Small (Released: 2025)

  • Open-weight reasoning model, an AI model whose parameters (weights) are publicly released, allowing users to run or fine-tune it locally, and which is specifically designed for reasoning and logical inference tasks
  • Core Features: it has open-weight large language models (LLMs), with both open-source and proprietary AI models. Open models: Magistral Small 1.2(25.09) Proprietary models: Magistral Medium 1.2(25.09)
  • Use Cases: Code generation, RAG (Retrieval-Augmented Generation), Agentic workflows, Advanced reasoning, Knowledge extraction, AI at the edge, AI safety
  • License: Apache License 2.0
  • Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
  • Hugging Face: https://huggingface.co/mistralai
  • GitHub: https://github.com/mistralai

Mistral Small (Released: 2025)

Voxtral Small / Mini (Released: 2025)

  • First model with audio input capabilities for instruct use cases
  • Core Features: These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B variant for local and edge deployments. Both versions are released under the Apache 2.0 license, and are also available on our API.
  • Use Cases: Voice Communication AI
  • License: Apache License 2.0
  • Official Website: https://mistral.ai/news/voxtral
  • Hugging Face: https://huggingface.co/mistralai
  • GitHub: https://github.com/mistralai

Devstral Small (Released: 2025)

  • Agentic model for software engineering tasks
  • Core Features: Devstral is designed to tackle this problem. Devstral is trained to solve real GitHub issues; it runs over code agent scaffolds such as OpenHands or SWE-Agent, which define the interface between the model and the test cases. Here, we show Devstral’s performance on the popular SWE-Bench Verified benchmark, a dataset of 500 real-world GitHub issues which have been manually screened for correctness.
  • Use Cases: contextualising code within a large codebase, identifying relationships between disparate components, and identifying subtle bugs in intricate functions
  • License: Apache License 2.0
  • Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
  • Hugging Face: https://huggingface.co/mistralai
  • GitHub: https://github.com/mistralai

Pixtral 12B (Released: 2024)

Mistral Nemo 12B (Released: 2024)

LLaMA 3 / Llama 3.1 / Llama 3.2 / Llama 3.3 (Released: 2024)

Mixtral / Mistral / Codestral Mamba / Pixtral (Released: 2024)

Mistral 7B / Mixtral 8x7B (Released: 2023)

  • Compact yet powerful 7B parameter model released by Mistral AI
  • Core Features: Uses GQA(Grouped Query Attention) and sliding attention, strong balance of speed and performance. * GQA (Grouped Query Attention): A technique that processes queries in groups during attention computation to improve computational efficiency and reduce memory usage. * Sliding Attention: A method that applies attention in partial window units when processing long sequence data, reducing memory load while preserving contextual information.
  • Use Cases: NLP research, embeddings, on-device inference.
  • License: Apache License 2.0
  • Official Website: https://mistral.ai/
  • Hugging Face: https://huggingface.co/mistralai/Mistral-7B
  • GitHub: https://github.com/mistralai

Falcon-40B (Released: 2023)

LLaMA 2 (Released: 2023)

Vicuna (Released: 2023)

Alpaca (Released: 2023)

GPT-NeoX / GPT-NeoX-20B (Released: 2022)

BLOOM (176B) (Released: 2022)

GPT-J-6B (Released: 2021)

  • 6B parameter open model by EleutherAI implemented in JAX
  • Core Features: Lightweight, Apache 2.0, widely used in open research.
  • Use Cases: NLP research, prototypes, text generation.
  • License: Apache License 2.0
  • Hugging Face: https://huggingface.co/EleutherAI/gpt-j-6B

GPT-2 (Released: 2019)

  • Large autoregressive Transformer (up to 1.5B) with strong text-generation capabilities; staged release due to misuse concerns
  • Core Features: Developed as a large-scale language model for text generation, following GPT-1. The architecture is Transformer decoder-only, with 150 million to 1.5 billion parameters (Small, Medium, Large, XL versions), and uses Byte Pair Encoding (BPE) for tokenization. After pre-training, the model can be fine-tuned for specific tasks.
  • Use Cases: Text generation, summarization, creative writing, fine-tuning research.
  • License: Weights widely available via Hugging Face (community) — license varies by distribution
  • Hugging Face: https://huggingface.co/openai-community/gpt2

RoBERTa (Released: 2019)

  • Robustly optimized BERT variant (more data, longer training, no NSP(Next Sentence Prediction) objective)
  • Core Features: Improved performance on many NLP benchmarks; used for classification, QA, embedding tasks
  • Use Cases: NLP, QA
  • License: Code & weights available (e.g., Hugging Face); Apache-style/OSS for code
  • Hugging Face: https://huggingface.co/roberta-base

T5 (Text-to-Text Transfer Transformer) (Released: 2019)

  • Unified text-to-text formulation; released pretrained checkpoints and Colossal Clean Crawled Corpus(CCC)
  • Core Features: Summarization, translation, many text-to-text tasks, research baseline. * The Colossal Clean Crawled Corpus (CCC) is a text dataset created by cleaning large-scale web data. It is primarily used for pretraining natural language processing (NLP) models.
  • Use Cases: Summarization, translation
  • License: Released with Apache-2.0 style code & checkpoints
  • Official Website: https://research.google/blog/exploring-transfer-learning-with-t5-the-text-to-text-transfer-transformer/

BERT (Bidirectional Encoder Representations from Transformers) (Released: 2018)

GPT-1 (OpenAI GPT) (Released: 2018)

ELMo (Deep Contextualized Word Representations) (Released: 2018)

  • Contextualized word embeddings via Bi-LSTM; dynamic token representations by context
  • Core Features: Used as contextual embeddings for NER(Named Entity Recognition), tagging, classification tasks before Transformers dominated.
  • License: Research code / checkpoints available (ACL paper & repos)
  • Official Website: https://aclanthology.org/N18-1202/
  • Hugging Face: https://huggingface.co/allenai/bidaf-elmo

Transformer ("Attention Is All You Need") (Released: 2017)

  • Introduced the Transformer architecture based on self-attention, replacing RNNs/Conv in sequence models
  • Core Features: Foundation for nearly all modern LLMs and sequence models; translation, pretraining backbones (research)
  • License: N/A (research paper)
  • Official Website: https://arxiv.org/abs/1706.03762