LLM Collection
This section consists of a collection and summary of notable and foundational LLMs.
Models
Model | Release Date | Size (B) | Checkpoints | Description |
---|---|---|---|---|
Falcon LLM (opens in a new tab) | May 2023 | 7, 40 | Falcon-7B (opens in a new tab), Falcon-40B (opens in a new tab) | Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model. |
PaLM 2 (opens in a new tab) | May 2023 | - | - | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |
Med-PaLM 2 (opens in a new tab) | May 2023 | - | - | Towards Expert-Level Medical Question Answering with Large Language Models |
Gorilla (opens in a new tab) | May 2023 | 7 | Gorilla (opens in a new tab) | Gorilla: Large Language Model Connected with Massive APIs |
RedPajama-INCITE (opens in a new tab) | May 2023 | 3, 7 | RedPajama-INCITE (opens in a new tab) | A family of models including base, instruction-tuned & chat models. |
LIMA (opens in a new tab) | May 2023 | 65 | - | A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. |
Replit Code (opens in a new tab) | May 2023 | 3 | Replit Code (opens in a new tab) | replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset. |
h2oGPT (opens in a new tab) | May 2023 | 12 | h2oGPT (opens in a new tab) | h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. |
CodeGen2 (opens in a new tab) | May 2023 | 1, 3, 7, 16 | CodeGen2 (opens in a new tab) | Code models for program synthesis. |
CodeT5 and CodeT5+ (opens in a new tab) | May 2023 | 16 | CodeT5 (opens in a new tab) | CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research. |
StarCoder (opens in a new tab) | May 2023 | 15 | StarCoder (opens in a new tab) | StarCoder: A State-of-the-Art LLM for Code |
MPT-7B (opens in a new tab) | May 2023 | 7 | MPT-7B (opens in a new tab) | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models. |
DLite (opens in a new tab) | May 2023 | 0.124 - 1.5 | DLite-v2-1.5B (opens in a new tab) | Lightweight instruction following models which exhibit ChatGPT-like interactivity. |
Dolly (opens in a new tab) | April 2023 | 3, 7, 12 | Dolly (opens in a new tab) | An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. |
StableLM (opens in a new tab) | April 2023 | 3, 7 | StableLM-Alpha (opens in a new tab) | Stability AI's StableLM series of language models |
Pythia (opens in a new tab) | April 2023 | 0.070 - 12 | Pythia (opens in a new tab) | A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. |
Open Assistant (Pythia Family) (opens in a new tab) | March 2023 | 12 | Open Assistant (opens in a new tab) | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. |
Cerebras-GPT (opens in a new tab) | March 2023 | 0.111 - 13 | Cerebras-GPT (opens in a new tab) | Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster |
BloombergGPT (opens in a new tab) | March 2023 | 50 | - | BloombergGPT: A Large Language Model for Finance |
PanGu-Σ (opens in a new tab) | March 2023 | 1085 | - | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
GPT-4 (opens in a new tab) | March 2023 | - | - | GPT-4 Technical Report |
LLaMA (opens in a new tab) | Feb 2023 | 7, 13, 33, 65 | LLaMA (opens in a new tab) | LLaMA: Open and Efficient Foundation Language Models |
ChatGPT (opens in a new tab) | Nov 2022 | - | - | A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. |
Galactica (opens in a new tab) | Nov 2022 | 0.125 - 120 | Galactica (opens in a new tab) | Galactica: A Large Language Model for Science |
mT0 (opens in a new tab) | Nov 2022 | 13 | mT0-xxl (opens in a new tab) | Crosslingual Generalization through Multitask Finetuning |
BLOOM (opens in a new tab) | Nov 2022 | 176 | BLOOM (opens in a new tab) | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
U-PaLM (opens in a new tab) | Oct 2022 | 540 | - | Transcending Scaling Laws with 0.1% Extra Compute |
UL2 (opens in a new tab) | Oct 2022 | 20 | UL2, Flan-UL2 (opens in a new tab) | UL2: Unifying Language Learning Paradigms |
Sparrow (opens in a new tab) | Sep 2022 | 70 | - | Improving alignment of dialogue agents via targeted human judgements |
Flan-T5 (opens in a new tab) | Oct 2022 | 11 | Flan-T5-xxl (opens in a new tab) | Scaling Instruction-Finetuned Language Models |
AlexaTM (opens in a new tab) | Aug 2022 | 20 | - | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
GLM-130B (opens in a new tab) | Oct 2022 | 130 | GLM-130B (opens in a new tab) | GLM-130B: An Open Bilingual Pre-trained Model |
OPT-IML (opens in a new tab) | Dec 2022 | 30, 175 | OPT-IML (opens in a new tab) | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
OPT (opens in a new tab) | May 2022 | 175 | OPT-13B (opens in a new tab), OPT-66B (opens in a new tab) | OPT: Open Pre-trained Transformer Language Models |
PaLM (opens in a new tab) | April 2022 | 540 | - | PaLM: Scaling Language Modeling with Pathways |
Tk-Instruct (opens in a new tab) | April 2022 | 11 | Tk-Instruct-11B (opens in a new tab) | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
GPT-NeoX-20B (opens in a new tab) | April 2022 | 20 | GPT-NeoX-20B (opens in a new tab) | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
Chinchilla (opens in a new tab) | Mar 2022 | 70 | - | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
InstructGPT (opens in a new tab) | Mar 2022 | 175 | - | Training language models to follow instructions with human feedback |
CodeGen (opens in a new tab) | Mar 2022 | 0.350 - 16 | CodeGen (opens in a new tab) | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
AlphaCode (opens in a new tab) | Feb 2022 | 41 | - | Competition-Level Code Generation with AlphaCode |
MT-NLG (opens in a new tab) | Jan 2022 | 530 | - | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
LaMDA (opens in a new tab) | Jan 2022 | 137 | - | LaMDA: Language Models for Dialog Applications |
GLaM (opens in a new tab) | Dec 2021 | 1200 | - | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
Gopher (opens in a new tab) | Dec 2021 | 280 | - | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
WebGPT (opens in a new tab) | Dec 2021 | 175 | - | WebGPT: Browser-assisted question-answering with human feedback |
Yuan 1.0 (opens in a new tab) | Oct 2021 | 245 | - | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
T0 (opens in a new tab) | Oct 2021 | 11 | T0 (opens in a new tab) | Multitask Prompted Training Enables Zero-Shot Task Generalization |
FLAN (opens in a new tab) | Sep 2021 | 137 | - | Finetuned Language Models Are Zero-Shot Learners |
HyperCLOVA (opens in a new tab) | Sep 2021 | 82 | - | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
ERNIE 3.0 Titan (opens in a new tab) | July 2021 | 10 | - | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
Jurassic-1 (opens in a new tab) | Aug 2021 | 178 | - | Jurassic-1: Technical Details and Evaluation |
ERNIE 3.0 (opens in a new tab) | July 2021 | 10 | - | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
Codex (opens in a new tab) | July 2021 | 12 | - | Evaluating Large Language Models Trained on Code |
GPT-J-6B (opens in a new tab) | June 2021 | 6 | GPT-J-6B (opens in a new tab) | A 6 billion parameter, autoregressive text generation model trained on The Pile. |
CPM-2 (opens in a new tab) | Jun 2021 | 198 | CPM (opens in a new tab) | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
PanGu-α (opens in a new tab) | April 2021 | 13 | PanGu-α (opens in a new tab) | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation |
mT5 (opens in a new tab) | Oct 2020 | 13 | mT5 (opens in a new tab) | mT5: A massively multilingual pre-trained text-to-text transformer |
BART (opens in a new tab) | Jul 2020 | - | BART (opens in a new tab) | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
GShard (opens in a new tab) | Jun 2020 | 600 | - | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding |
GPT-3 (opens in a new tab) | May 2020 | 175 | - | Language Models are Few-Shot Learners |
CTRL (opens in a new tab) | Sep 2019 | 1.63 | CTRL (opens in a new tab) | CTRL: A Conditional Transformer Language Model for Controllable Generation |
ALBERT (opens in a new tab) | Sep 2019 | 0.235 | ALBERT (opens in a new tab) | A Lite BERT for Self-supervised Learning of Language Representations |
XLNet (opens in a new tab) | Jun 2019 | - | XLNet (opens in a new tab) | Generalized Autoregressive Pretraining for Language Understanding and Generation |
T5 (opens in a new tab) | Oct 2019 | 0.06 - 11 | Flan-T5 (opens in a new tab) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
GPT-2 (opens in a new tab) | Nov 2019 | 1.5 | GPT-2 (opens in a new tab) | Language Models are Unsupervised Multitask Learners |
RoBERTa (opens in a new tab) | July 2019 | 0.125 - 0.355 | RoBERTa (opens in a new tab) | A Robustly Optimized BERT Pretraining Approach |
BERT (opens in a new tab) | Oct 2018 | - | BERT (opens in a new tab) | Bidirectional Encoder Representations from Transformers |
GPT (opens in a new tab) | June 2018 | - | GPT (opens in a new tab) | Improving Language Understanding by Generative Pre-Training |
⚠️
This section is under development.
Data adopted from Papers with Code (opens in a new tab) and the recent work by Zhao et al. (2023) (opens in a new tab).