LLM Collection

LLM Collection

This section consists of a collection and summary of notable and foundational LLMs.

Models

ModelRelease DateSize (B)CheckpointsDescription
Falcon LLM (opens in a new tab)May 20237, 40Falcon-7B (opens in a new tab), Falcon-40B (opens in a new tab)Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
PaLM 2 (opens in a new tab)May 2023--A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Med-PaLM 2 (opens in a new tab)May 2023--Towards Expert-Level Medical Question Answering with Large Language Models
Gorilla (opens in a new tab)May 20237Gorilla (opens in a new tab)Gorilla: Large Language Model Connected with Massive APIs
RedPajama-INCITE (opens in a new tab)May 20233, 7RedPajama-INCITE (opens in a new tab)A family of models including base, instruction-tuned & chat models.
LIMA (opens in a new tab)May 202365-A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.
Replit Code (opens in a new tab)May 20233Replit Code (opens in a new tab)replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset.
h2oGPT (opens in a new tab)May 202312h2oGPT (opens in a new tab)h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities.
CodeGen2 (opens in a new tab)May 20231, 3, 7, 16CodeGen2 (opens in a new tab)Code models for program synthesis.
CodeT5 and CodeT5+ (opens in a new tab)May 202316CodeT5 (opens in a new tab)CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research.
StarCoder (opens in a new tab)May 202315StarCoder (opens in a new tab)StarCoder: A State-of-the-Art LLM for Code
MPT-7B (opens in a new tab)May 20237MPT-7B (opens in a new tab)MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models.
DLite (opens in a new tab)May 20230.124 - 1.5DLite-v2-1.5B (opens in a new tab)Lightweight instruction following models which exhibit ChatGPT-like interactivity.
Dolly (opens in a new tab)April 20233, 7, 12Dolly (opens in a new tab)An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
StableLM (opens in a new tab)April 20233, 7StableLM-Alpha (opens in a new tab)Stability AI's StableLM series of language models
Pythia (opens in a new tab)April 20230.070 - 12Pythia (opens in a new tab)A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters.
Open Assistant (Pythia Family) (opens in a new tab)March 202312Open Assistant (opens in a new tab)OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Cerebras-GPT (opens in a new tab)March 20230.111 - 13Cerebras-GPT (opens in a new tab)Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
BloombergGPT (opens in a new tab)March 202350-BloombergGPT: A Large Language Model for Finance
PanGu-Σ (opens in a new tab)March 20231085-PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
GPT-4 (opens in a new tab)March 2023--GPT-4 Technical Report
LLaMA (opens in a new tab)Feb 20237, 13, 33, 65LLaMA (opens in a new tab)LLaMA: Open and Efficient Foundation Language Models
ChatGPT (opens in a new tab)Nov 2022--A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
Galactica (opens in a new tab)Nov 20220.125 - 120Galactica (opens in a new tab)Galactica: A Large Language Model for Science
mT0 (opens in a new tab)Nov 202213mT0-xxl (opens in a new tab)Crosslingual Generalization through Multitask Finetuning
BLOOM (opens in a new tab)Nov 2022176BLOOM (opens in a new tab)BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
U-PaLM (opens in a new tab)Oct 2022540-Transcending Scaling Laws with 0.1% Extra Compute
UL2 (opens in a new tab)Oct 202220UL2, Flan-UL2 (opens in a new tab)UL2: Unifying Language Learning Paradigms
Sparrow (opens in a new tab)Sep 202270-Improving alignment of dialogue agents via targeted human judgements
Flan-T5 (opens in a new tab)Oct 202211Flan-T5-xxl (opens in a new tab)Scaling Instruction-Finetuned Language Models
AlexaTM (opens in a new tab)Aug 202220-AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
GLM-130B (opens in a new tab)Oct 2022130GLM-130B (opens in a new tab)GLM-130B: An Open Bilingual Pre-trained Model
OPT-IML (opens in a new tab)Dec 202230, 175OPT-IML (opens in a new tab)OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
OPT (opens in a new tab)May 2022175OPT-13B (opens in a new tab), OPT-66B (opens in a new tab)OPT: Open Pre-trained Transformer Language Models
PaLM (opens in a new tab)April 2022540-PaLM: Scaling Language Modeling with Pathways
Tk-Instruct (opens in a new tab)April 202211Tk-Instruct-11B (opens in a new tab)Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
GPT-NeoX-20B (opens in a new tab)April 202220GPT-NeoX-20B (opens in a new tab)GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Chinchilla (opens in a new tab)Mar 202270-Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
InstructGPT (opens in a new tab)Mar 2022175-Training language models to follow instructions with human feedback
CodeGen (opens in a new tab)Mar 20220.350 - 16CodeGen (opens in a new tab)CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
AlphaCode (opens in a new tab)Feb 202241-Competition-Level Code Generation with AlphaCode
MT-NLG (opens in a new tab)Jan 2022530-Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
LaMDA (opens in a new tab)Jan 2022137-LaMDA: Language Models for Dialog Applications
GLaM (opens in a new tab)Dec 20211200-GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Gopher (opens in a new tab)Dec 2021280-Scaling Language Models: Methods, Analysis & Insights from Training Gopher
WebGPT (opens in a new tab)Dec 2021175-WebGPT: Browser-assisted question-answering with human feedback
Yuan 1.0 (opens in a new tab)Oct 2021245-Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
T0 (opens in a new tab)Oct 202111T0 (opens in a new tab)Multitask Prompted Training Enables Zero-Shot Task Generalization
FLAN (opens in a new tab)Sep 2021137-Finetuned Language Models Are Zero-Shot Learners
HyperCLOVA (opens in a new tab)Sep 202182-What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
ERNIE 3.0 Titan (opens in a new tab)July 202110-ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Jurassic-1 (opens in a new tab)Aug 2021178-Jurassic-1: Technical Details and Evaluation
ERNIE 3.0 (opens in a new tab)July 202110-ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Codex (opens in a new tab)July 202112-Evaluating Large Language Models Trained on Code
GPT-J-6B (opens in a new tab)June 20216GPT-J-6B (opens in a new tab)A 6 billion parameter, autoregressive text generation model trained on The Pile.
CPM-2 (opens in a new tab)Jun 2021198CPM (opens in a new tab)CPM-2: Large-scale Cost-effective Pre-trained Language Models
PanGu-α (opens in a new tab)April 202113PanGu-α (opens in a new tab)PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
mT5 (opens in a new tab)Oct 202013mT5 (opens in a new tab)mT5: A massively multilingual pre-trained text-to-text transformer
BART (opens in a new tab)Jul 2020-BART (opens in a new tab)Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
GShard (opens in a new tab)Jun 2020600-GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GPT-3 (opens in a new tab)May 2020175-Language Models are Few-Shot Learners
CTRL (opens in a new tab)Sep 20191.63CTRL (opens in a new tab)CTRL: A Conditional Transformer Language Model for Controllable Generation
ALBERT (opens in a new tab)Sep 20190.235ALBERT (opens in a new tab)A Lite BERT for Self-supervised Learning of Language Representations
XLNet (opens in a new tab)Jun 2019-XLNet (opens in a new tab)Generalized Autoregressive Pretraining for Language Understanding and Generation
T5 (opens in a new tab)Oct 20190.06 - 11Flan-T5 (opens in a new tab)Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GPT-2 (opens in a new tab)Nov 20191.5GPT-2 (opens in a new tab)Language Models are Unsupervised Multitask Learners
RoBERTa (opens in a new tab)July 20190.125 - 0.355RoBERTa (opens in a new tab)A Robustly Optimized BERT Pretraining Approach
BERT (opens in a new tab)Oct 2018-BERT (opens in a new tab)Bidirectional Encoder Representations from Transformers
GPT (opens in a new tab)June 2018-GPT (opens in a new tab)Improving Language Understanding by Generative Pre-Training
⚠️

This section is under development.

Data adopted from Papers with Code (opens in a new tab) and the recent work by Zhao et al. (2023) (opens in a new tab).