Open Models

We hope that the information on recently released AI models will be useful for education and research purposes.

LLaMA 4 (Released: 2025)

Meta's 2025 LLaMA 4 family (Scout / Maverick / Behemoth) — multimodal and advanced capabilities; announced as open-source by Meta.
Core Features: Llama 4 Scout: Class-leading natively multimodal model that offers superior text and visual intelligence, single H100 GPU efficiency, and a 10M context window for seamless long document analysis. Llama 4 Maverick: Industry-leading natively multimodal model for image and text understanding with groundbreaking intelligence and fast responses at a low cost. Llama 4 Behemoth: a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs
Use Cases: Multimodal tasks (text, audio, image, video)
License: Source-available (Llama 4 Community License Agreement and Llama 4 Acceptable Use Policy)
Official Website: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
Hugging Face: https://huggingface.co/meta-llama
GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama4

GPT-OSS (Released: 2025)

GPT-OSS (stylized as gpt-oss) is a set of open-weight reasoning models released by OpenAI on August 5, 2025, aimed to be usable locally.
Core Features: The gpt-oss model is OpenAI's first open-weight language model since GPT‑2. It includes the ability to adjust the 'reasoning_effrot' for tasks that do not require complex inference and/or aim for low-latency final output. It has two variants—a larger 117-billion-parameter model called gpt-oss-120b and a smaller 21-billion-parameter model called gpt-oss-20b.
Use Cases: Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
License: Apache License 2.0
Official Website: https://openai.com/index/introducing-gpt-oss/
Hugging Face: https://huggingface.co/openai
GitHub: https://github.com/openai/gpt-oss

Magistral Small (Released: 2025)

Open-weight reasoning model, an AI model whose parameters (weights) are publicly released, allowing users to run or fine-tune it locally, and which is specifically designed for reasoning and logical inference tasks
Core Features: it has open-weight large language models (LLMs), with both open-source and proprietary AI models. Open models: Magistral Small 1.2(25.09) Proprietary models: Magistral Medium 1.2(25.09)
Use Cases: Code generation, RAG (Retrieval-Augmented Generation), Agentic workflows, Advanced reasoning, Knowledge extraction, AI at the edge, AI safety
License: Apache License 2.0
Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Mistral Small (Released: 2025)

Multilingual open source model
Core Features: The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
Use Cases: Code generation
License: Apache License 2.0
Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Voxtral Small / Mini (Released: 2025)

First model with audio input capabilities for instruct use cases
Core Features: These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B variant for local and edge deployments. Both versions are released under the Apache 2.0 license, and are also available on our API.
Use Cases: Voice Communication AI
License: Apache License 2.0
Official Website: https://mistral.ai/news/voxtral
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Devstral Small (Released: 2025)

Agentic model for software engineering tasks
Core Features: Devstral is designed to tackle this problem. Devstral is trained to solve real GitHub issues; it runs over code agent scaffolds such as OpenHands or SWE-Agent, which define the interface between the model and the test cases. Here, we show Devstral’s performance on the popular SWE-Bench Verified benchmark, a dataset of 500 real-world GitHub issues which have been manually screened for correctness.
Use Cases: contextualising code within a large codebase, identifying relationships between disparate components, and identifying subtle bugs in intricate functions
License: Apache License 2.0
Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Pixtral 12B (Released: 2024)

Image understanding capabilities in addition to text
Core Features: Natively multimodal, trained with interleaved image and text data: The model is designed from the ground up to understand and process both images and text simultaneously, and its training alternates (interleaves) between image and text data to enhance its multimodal capabilities.
Use Cases: Multimodal tasks (text, image)
License: Apache License 2.0
Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Mistral Nemo 12B (Released: 2024)

Multilingual open source model
Core Features: The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
Use Cases: contextualising code
License: Apache License 2.0
Official Website: https://docs.mistral.ai/getting-started/models/models_overview/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

LLaMA 3 / Llama 3.1 / Llama 3.2 / Llama 3.3 (Released: 2024)

The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples".
Core Features: Optimized architecture, various sizes, community license (restricted commercial use).
Use Cases: Research, assistants, fine-tuning.
License: Meta Llama Community License
Official Website: https://www.llama.com/models/llama-3/
Hugging Face: https://huggingface.co/meta-llama
GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama3 https://github.com/meta-llama/llama-models/tree/main/models/llama3_1 https://github.com/meta-llama/llama-models/tree/main/models/llama3_2 https://github.com/meta-llama/llama-models/tree/main/models/llama3_3

Mixtral / Mistral / Codestral Mamba / Pixtral (Released: 2024)

High-performance Mixture-of-Experts (MoE) language models from Mistral AI
Core Features: Efficient MoE design, fast inference, multiple licensing schemes.
Use Cases: Chatbots, text generation, server deployment.
License: Apache License 2.0 / Mistral custom license
Official Website: https://mistral.ai/
Hugging Face: https://huggingface.co/mistralai
GitHub: https://github.com/mistralai

Mistral 7B / Mixtral 8x7B (Released: 2023)

Compact yet powerful 7B parameter model released by Mistral AI
Core Features: Uses GQA(Grouped Query Attention) and sliding attention, strong balance of speed and performance. * GQA (Grouped Query Attention): A technique that processes queries in groups during attention computation to improve computational efficiency and reduce memory usage. * Sliding Attention: A method that applies attention in partial window units when processing long sequence data, reducing memory load while preserving contextual information.
Use Cases: NLP research, embeddings, on-device inference.
License: Apache License 2.0
Official Website: https://mistral.ai/
Hugging Face: https://huggingface.co/mistralai/Mistral-7B
GitHub: https://github.com/mistralai

Falcon-40B (Released: 2023)

Autoregressive 40B model from UAE’s Technology Innovation Institute (TII)
Core Features: Optimized with FlashAttention, high performance-to-cost ratio
Use Cases: Summarization, chatbots, generation.
License: Apache License 2.0
Official Website: https://falconllm.tii.ae/falcon-40b.html
Hugging Face: https://huggingface.co/Falconsai/nsfw_image_detection
GitHub: https://github.com/Decentralised-AI/falcon-40b

LLaMA 2 (Released: 2023)

Second generation of Meta’s LLaMA models (7B–70B)
Core Features: Improved quality, multiple sizes, community license with restrictions
Use Cases: Research, fine-tuning, integration.
License: Llama 2 Community License
Official Website: https://www.llama.com/llama2/
Hugging Face: https://huggingface.co/meta-llama/Llama-2
GitHub: https://github.com/meta-llama/llama-models/tree/main/models/llama2

Vicuna (Released: 2023)

Community fine-tuned instruction model based on LLaMA
Core Features: Excellent conversational quality; inherits LLaMA’s license restrictions
Use Cases: Chatbots, demos, academic use.
License: Follows LLaMA license
Official Website: https://arxiv.org/abs/2306.05685
Hugging Face: https://huggingface.co/lmsys/vicuna-7b
GitHub: https://github.com/lm-sys/FastChat

Alpaca (Released: 2023)

Instruction-tuned version of LLaMA 7B by Stanford
Core Features: Low-cost instruction tuning, research and education oriented
Use Cases: Teaching, experiments, research.
License: Follows LLaMA license
Official Website: https://crfm.stanford.edu/2023/03/13/alpaca.html
Hugging Face: https://huggingface.co/datasets/tatsu-lab/alpaca
GitHub: https://github.com/tatsu-lab/stanford_alpaca

GPT-NeoX / GPT-NeoX-20B (Released: 2022)

Community-built large models using NeoX training code; 20B variant widely referenced
Core Features: GPT-3-like design, transparent and open research
Use Cases: Language modeling, fine-tuning, experimentation.
License: Apache License 2.0
Official Website: https://arxiv.org/abs/2204.06745
Hugging Face: https://huggingface.co/EleutherAI/gpt-neox-20b

BLOOM (176B) (Released: 2022)

Multilingual open LLM by BigScience project (176B parameters)
Core Features: Transparent training, multilingual (46+ languages), responsible AI license
Use Cases: Research, multilingual NLP, education.
License: BigScience RAIL License
Official Website: https://arxiv.org/abs/2211.05100
Hugging Face: https://huggingface.co/bigscience/bloom

GPT-J-6B (Released: 2021)

6B parameter open model by EleutherAI implemented in JAX
Core Features: Lightweight, Apache 2.0, widely used in open research.
Use Cases: NLP research, prototypes, text generation.
License: Apache License 2.0
Hugging Face: https://huggingface.co/EleutherAI/gpt-j-6B

GPT-2 (Released: 2019)

Large autoregressive Transformer (up to 1.5B) with strong text-generation capabilities; staged release due to misuse concerns
Core Features: Developed as a large-scale language model for text generation, following GPT-1. The architecture is Transformer decoder-only, with 150 million to 1.5 billion parameters (Small, Medium, Large, XL versions), and uses Byte Pair Encoding (BPE) for tokenization. After pre-training, the model can be fine-tuned for specific tasks.
Use Cases: Text generation, summarization, creative writing, fine-tuning research.
License: Weights widely available via Hugging Face (community) — license varies by distribution
Hugging Face: https://huggingface.co/openai-community/gpt2

RoBERTa (Released: 2019)

Robustly optimized BERT variant (more data, longer training, no NSP(Next Sentence Prediction) objective)
Core Features: Improved performance on many NLP benchmarks; used for classification, QA, embedding tasks
Use Cases: NLP, QA
License: Code & weights available (e.g., Hugging Face); Apache-style/OSS for code
Hugging Face: https://huggingface.co/roberta-base

T5 (Text-to-Text Transfer Transformer) (Released: 2019)

Unified text-to-text formulation; released pretrained checkpoints and Colossal Clean Crawled Corpus(CCC)
Core Features: Summarization, translation, many text-to-text tasks, research baseline. * The Colossal Clean Crawled Corpus (CCC) is a text dataset created by cleaning large-scale web data. It is primarily used for pretraining natural language processing (NLP) models.
Use Cases: Summarization, translation
License: Released with Apache-2.0 style code & checkpoints
Official Website: https://research.google/blog/exploring-transfer-learning-with-t5-the-text-to-text-transfer-transformer/

BERT (Bidirectional Encoder Representations from Transformers) (Released: 2018)

Bidirectional encoder Transformer; masked language modeling pretraining for contextual embeddings
Core Features: QA, classification, NER(Named Entity Recognition), sentence embeddings and many downstream NLP((Natural Language Processing) tasks.
Use Cases: QA, Classification., NER, NLP
License: Apache-2.0 (code & checkpoints available via TensorFlow / Hugging Face)
Official Website: https://research.google/blog/open-sourcing-bert-state-of-the-art-pre-training-for-natural-language-processing/
Hugging Face: https://huggingface.co/google-bert/bert-base-uncased

GPT-1 (OpenAI GPT) (Released: 2018)

Early causal-transformer language model demonstrating generative pretraining + fine-tuning paradigm
Core Features: Foundation experiments for generative pretraining; research and fine-tuning experiments.
License: Model weights/code: community repositories; research paper available
Hugging Face: https://huggingface.co/openai
GitHub: https://huggingface.co/openai-community/openai-gpt

ELMo (Deep Contextualized Word Representations) (Released: 2018)

Contextualized word embeddings via Bi-LSTM; dynamic token representations by context
Core Features: Used as contextual embeddings for NER(Named Entity Recognition), tagging, classification tasks before Transformers dominated.
License: Research code / checkpoints available (ACL paper & repos)
Official Website: https://aclanthology.org/N18-1202/
Hugging Face: https://huggingface.co/allenai/bidaf-elmo

Transformer ("Attention Is All You Need") (Released: 2017)

Introduced the Transformer architecture based on self-attention, replacing RNNs/Conv in sequence models
Core Features: Foundation for nearly all modern LLMs and sequence models; translation, pretraining backbones (research)
License: N/A (research paper)
Official Website: https://arxiv.org/abs/1706.03762