Open Models
We hope that the information on recently released AI models will be useful for education and research purposes.

LLaMA 4 (Released: 2025)
- Meta's 2025 LLaMA 4 family (Scout / Maverick / Behemoth) — multimodal and advanced capabilities; announced as open-source by Meta.
- Core Features: Llama 4 Scout: Class-leading natively multimodal model that offers superior text and visual intelligence, single H100 GPU efficiency, and a 10M context window for seamless long document analysis. Llama 4 Maverick: Industry-leading natively multimodal model for image and text understanding with groundbreaking intelligence and fast responses at a low cost. Llama 4 Behemoth: a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs
- Use Cases: Multimodal tasks (text, audio, image, video)
- License: Source-available (Llama 4 Community License Agreement and Llama 4 Acceptable Use Policy)
- Official Website: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
- Hugging Face: https://huggingface.co/meta-llama
- GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama4
GPT-OSS (Released: 2025)
- GPT-OSS (stylized as gpt-oss) is a set of open-weight reasoning models released by OpenAI on August 5, 2025, aimed to be usable locally.
- Core Features: The gpt-oss model is OpenAI's first open-weight language model since GPT‑2. It includes the ability to adjust the 'reasoning_effrot' for tasks that do not require complex inference and/or aim for low-latency final output. It has two variants—a larger 117-billion-parameter model called gpt-oss-120b and a smaller 21-billion-parameter model called gpt-oss-20b.
- Use Cases: Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- License: Apache License 2.0
- Official Website: https://openai.com/index/introducing-gpt-oss/
- Hugging Face: https://huggingface.co/openai
- GitHub: https://github.com/openai/gpt-oss
Magistral Small (Released: 2025)
- Open-weight reasoning model, an AI model whose parameters (weights) are publicly released, allowing users to run or fine-tune it locally, and which is specifically designed for reasoning and logical inference tasks
- Core Features: it has open-weight large language models (LLMs), with both open-source and proprietary AI models. Open models: Magistral Small 1.2(25.09) Proprietary models: Magistral Medium 1.2(25.09)
- Use Cases: Code generation, RAG (Retrieval-Augmented Generation), Agentic workflows, Advanced reasoning, Knowledge extraction, AI at the edge, AI safety
- License: Apache License 2.0
- Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Mistral Small (Released: 2025)
- Multilingual open source model
- Core Features: The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Use Cases: Code generation
- License: Apache License 2.0
- Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Voxtral Small / Mini (Released: 2025)
- First model with audio input capabilities for instruct use cases
- Core Features: These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B variant for local and edge deployments. Both versions are released under the Apache 2.0 license, and are also available on our API.
- Use Cases: Voice Communication AI
- License: Apache License 2.0
- Official Website: https://mistral.ai/news/voxtral
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Devstral Small (Released: 2025)
- Agentic model for software engineering tasks
- Core Features: Devstral is designed to tackle this problem. Devstral is trained to solve real GitHub issues; it runs over code agent scaffolds such as OpenHands or SWE-Agent, which define the interface between the model and the test cases. Here, we show Devstral’s performance on the popular SWE-Bench Verified benchmark, a dataset of 500 real-world GitHub issues which have been manually screened for correctness.
- Use Cases: contextualising code within a large codebase, identifying relationships between disparate components, and identifying subtle bugs in intricate functions
- License: Apache License 2.0
- Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Pixtral 12B (Released: 2024)
- Image understanding capabilities in addition to text
- Core Features: Natively multimodal, trained with interleaved image and text data: The model is designed from the ground up to understand and process both images and text simultaneously, and its training alternates (interleaves) between image and text data to enhance its multimodal capabilities.
- Use Cases: Multimodal tasks (text, image)
- License: Apache License 2.0
- Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Mistral Nemo 12B (Released: 2024)
- Multilingual open source model
- Core Features: The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Use Cases: contextualising code
- License: Apache License 2.0
- Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
LLaMA 3 / Llama 3.1 / Llama 3.2 / Llama 3.3 (Released: 2024)
- The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples".
- Core Features: Optimized architecture, various sizes, community license (restricted commercial use).
- Use Cases: Research, assistants, fine-tuning.
- License: Meta Llama Community License
- Official Website: https://www.llama.com/models/llama-3/
- Hugging Face: https://huggingface.co/meta-llama
- GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama3 https://github.com/meta-llama/llama-models/tree/main/models/llama3_1 https://github.com/meta-llama/llama-models/tree/main/models/llama3_2 https://github.com/meta-llama/llama-models/tree/main/models/llama3_3
Mixtral / Mistral / Codestral Mamba / Pixtral (Released: 2024)
- High-performance Mixture-of-Experts (MoE) language models from Mistral AI
- Core Features: Efficient MoE design, fast inference, multiple licensing schemes.
- Use Cases: Chatbots, text generation, server deployment.
- License: Apache License 2.0 / Mistral custom license
- Official Website: https://mistral.ai/
- Hugging Face: https://huggingface.co/mistralai
- GitHub: https://github.com/mistralai
Mistral 7B / Mixtral 8x7B (Released: 2023)
- Compact yet powerful 7B parameter model released by Mistral AI
- Core Features: Uses GQA(Grouped Query Attention) and sliding attention, strong balance of speed and performance. * GQA (Grouped Query Attention): A technique that processes queries in groups during attention computation to improve computational efficiency and reduce memory usage. * Sliding Attention: A method that applies attention in partial window units when processing long sequence data, reducing memory load while preserving contextual information.
- Use Cases: NLP research, embeddings, on-device inference.
- License: Apache License 2.0
- Official Website: https://mistral.ai/
- Hugging Face: https://huggingface.co/mistralai/Mistral-7B
- GitHub: https://github.com/mistralai
Falcon-40B (Released: 2023)
- Autoregressive 40B model from UAE’s Technology Innovation Institute (TII)
- Core Features: Optimized with FlashAttention, high performance-to-cost ratio
- Use Cases: Summarization, chatbots, generation.
- License: Apache License 2.0
- Official Website: https://falconllm.tii.ae/falcon-40b.html
- Hugging Face: https://huggingface.co/Falconsai/nsfw_image_detection
- GitHub: https://github.com/Decentralised-AI/falcon-40b
LLaMA 2 (Released: 2023)
- Second generation of Meta’s LLaMA models (7B–70B)
- Core Features: Improved quality, multiple sizes, community license with restrictions
- Use Cases: Research, fine-tuning, integration.
- License: Llama 2 Community License
- Official Website: https://www.llama.com/llama2/
- Hugging Face: https://huggingface.co/meta-llama/Llama-2
- GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama2
Vicuna (Released: 2023)
- Community fine-tuned instruction model based on LLaMA
- Core Features: Excellent conversational quality; inherits LLaMA’s license restrictions
- Use Cases: Chatbots, demos, academic use.
- License: Follows LLaMA license
- Official Website: https://arxiv.org/abs/2306.05685
- Hugging Face: https://huggingface.co/lmsys/vicuna-7b
- GitHub: https://github.com/lm-sys/FastChat
Alpaca (Released: 2023)
- Instruction-tuned version of LLaMA 7B by Stanford
- Core Features: Low-cost instruction tuning, research and education oriented
- Use Cases: Teaching, experiments, research.
- License: Follows LLaMA license
- Official Website: https://crfm.stanford.edu/2023/03/13/alpaca.html
- Hugging Face: https://huggingface.co/datasets/tatsu-lab/alpaca
- GitHub: https://github.com/tatsu-lab/stanford_alpaca
GPT-NeoX / GPT-NeoX-20B (Released: 2022)
- Community-built large models using NeoX training code; 20B variant widely referenced
- Core Features: GPT-3-like design, transparent and open research
- Use Cases: Language modeling, fine-tuning, experimentation.
- License: Apache License 2.0
- Official Website: https://arxiv.org/abs/2204.06745
- Hugging Face: https://huggingface.co/EleutherAI/gpt-neox-20b
BLOOM (176B) (Released: 2022)
- Multilingual open LLM by BigScience project (176B parameters)
- Core Features: Transparent training, multilingual (46+ languages), responsible AI license
- Use Cases: Research, multilingual NLP, education.
- License: BigScience RAIL License
- Official Website: https://arxiv.org/abs/2211.05100
- Hugging Face: https://huggingface.co/bigscience/bloom
GPT-J-6B (Released: 2021)
- 6B parameter open model by EleutherAI implemented in JAX
- Core Features: Lightweight, Apache 2.0, widely used in open research.
- Use Cases: NLP research, prototypes, text generation.
- License: Apache License 2.0
- Hugging Face: https://huggingface.co/EleutherAI/gpt-j-6B
GPT-2 (Released: 2019)
- Large autoregressive Transformer (up to 1.5B) with strong text-generation capabilities; staged release due to misuse concerns
- Core Features: Developed as a large-scale language model for text generation, following GPT-1. The architecture is Transformer decoder-only, with 150 million to 1.5 billion parameters (Small, Medium, Large, XL versions), and uses Byte Pair Encoding (BPE) for tokenization. After pre-training, the model can be fine-tuned for specific tasks.
- Use Cases: Text generation, summarization, creative writing, fine-tuning research.
- License: Weights widely available via Hugging Face (community) — license varies by distribution
- Hugging Face: https://huggingface.co/openai-community/gpt2
RoBERTa (Released: 2019)
- Robustly optimized BERT variant (more data, longer training, no NSP(Next Sentence Prediction) objective)
- Core Features: Improved performance on many NLP benchmarks; used for classification, QA, embedding tasks
- Use Cases: NLP, QA
- License: Code & weights available (e.g., Hugging Face); Apache-style/OSS for code
- Hugging Face: https://huggingface.co/roberta-base
T5 (Text-to-Text Transfer Transformer) (Released: 2019)
- Unified text-to-text formulation; released pretrained checkpoints and Colossal Clean Crawled Corpus(CCC)
- Core Features: Summarization, translation, many text-to-text tasks, research baseline. * The Colossal Clean Crawled Corpus (CCC) is a text dataset created by cleaning large-scale web data. It is primarily used for pretraining natural language processing (NLP) models.
- Use Cases: Summarization, translation
- License: Released with Apache-2.0 style code & checkpoints
- Official Website: https://research.google/blog/exploring-transfer-learning-with-t5-the-text-to-text-transfer-transformer/
BERT (Bidirectional Encoder Representations from Transformers) (Released: 2018)
- Bidirectional encoder Transformer; masked language modeling pretraining for contextual embeddings
- Core Features: QA, classification, NER(Named Entity Recognition), sentence embeddings and many downstream NLP((Natural Language Processing) tasks.
- Use Cases: QA, Classification., NER, NLP
- License: Apache-2.0 (code & checkpoints available via TensorFlow / Hugging Face)
- Official Website: https://research.google/blog/open-sourcing-bert-state-of-the-art-pre-training-for-natural-language-processing/
- Hugging Face: https://huggingface.co/google-bert/bert-base-uncased
GPT-1 (OpenAI GPT) (Released: 2018)
- Early causal-transformer language model demonstrating generative pretraining + fine-tuning paradigm
- Core Features: Foundation experiments for generative pretraining; research and fine-tuning experiments.
- License: Model weights/code: community repositories; research paper available
- Hugging Face: https://huggingface.co/openai
- GitHub: https://huggingface.co/openai-community/openai-gpt
ELMo (Deep Contextualized Word Representations) (Released: 2018)
- Contextualized word embeddings via Bi-LSTM; dynamic token representations by context
- Core Features: Used as contextual embeddings for NER(Named Entity Recognition), tagging, classification tasks before Transformers dominated.
- License: Research code / checkpoints available (ACL paper & repos)
- Official Website: https://aclanthology.org/N18-1202/
- Hugging Face: https://huggingface.co/allenai/bidaf-elmo
Transformer ("Attention Is All You Need") (Released: 2017)
- Introduced the Transformer architecture based on self-attention, replacing RNNs/Conv in sequence models
- Core Features: Foundation for nearly all modern LLMs and sequence models; translation, pretraining backbones (research)
- License: N/A (research paper)
- Official Website: https://arxiv.org/abs/1706.03762