# LLM Timeline > Interactive timeline of Large Language Model releases from 2017 to present. Tracks 3937+ models with release dates, organizations, parameter counts, license types, and descriptions. ## About This site provides a chronological index of large language models (LLMs) from 2017 through the present. Data is sourced from LifeArchitect.AI's models table and synchronized weekly. ## Models - **Nemotron-Cascade-2-30B-A3B** (NVIDIA) — 2026-03-01 | Parameters: Nemotron-Cascade-2-30B-A3B - License: open | Type: model - 30BA3B. Gold medal performance in both the 2025 IMO and the IOI. HLE=no tools. - **MiMo-V2-Pro** (Xiaomi) — 2026-03-01 | Parameters: MiMo-V2-Pro - License: open | Type: model - 1T42B. Over 1T total parameters (42B active). Uses a 7:1 Hybrid Attention mechanism and supports a 1M-token context window. - **Mamba-3** (CMU) — 2026-03-01 | Parameters: Mamba-3 - License: open | Type: model - "with architectural refinements, our Mamba-3 model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks." - **MiniMax-M2.5** (MiniMax) — 2026-03-01 | Parameters: MiniMax-M2.5 - License: open | Type: model - 230B-A10B. Early RSI. "M2.7 is our first model deeply participating in its own evolution…" https://lifearchitect.ai/asi/ - **Holotron-12B** (H Company) — 2026-03-01 | Parameters: Holotron-12B - License: open | Type: model - "Holotron-12B is a high-throughput, multimodal Vision-Language Model (VLM) designed specifically as a policy model for computer-use agents." - **Mistral Small 4** (Mistral) — 2026-03-01 | Parameters: Mistral Small 4 - License: open | Type: model - 119BA6.5B. "unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model." - **MiroThinker-H1** (MiroMindAI) — 2026-03-01 | Parameters: MiroThinker-H1 - License: open | Type: model - "Our proprietary agent, MiroThinker-H1 provides promising evidence for long-chain verifiable reasoning [based on new model, MiroThinker-1.7" - **Covenant-72B** (1Covenant) — 2026-03-01 | Parameters: Covenant-72B - License: open | Type: model - "largest permissionless collaboratively trained language model" ~20 distinct peers, each running 8xB200 GPUs. - **Nemotron 3 Super** (NVIDIA) — 2026-03-01 | Parameters: Nemotron 3 Super - License: open | Type: model - 120B-A12B. Announce: https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/ - **Sarvam 105B** (Sarvam.ai) — 2026-03-01 | Parameters: Sarvam 105B - License: open | Type: model - 105BA10.3B. "22 Indian languages" - **GPT-5.4** (OpenAI) — 2026-03-01 | Parameters: GPT-5.4 - License: open | Type: model - "most capable and efficient frontier model for professional work." Announce: https://openai.com/index/introducing-gpt-5-4/ - **Yuan3.0-Ultra** (YuanLabAI) — 2026-03-01 | Parameters: Yuan3.0-Ultra - License: open | Type: model - 1515BA68.8B. Poor performance due to low training data/ratio. - **GPT-5.3 Instant** (OpenAI) — 2026-03-01 | Parameters: GPT-5.3 Instant - License: open | Type: model - Announce: https://openai.com/index/gpt-5-3-instant/ - **STATIC** (Google) — 2026-02-01 | Parameters: STATIC - License: open | Type: model - YouTube (Google). STATIC (Sparse Transition Matrix-Accelerated Trie Index for Constrained Decoding). "The model is a Gemini-based generative retrieval model similar to PLUM [8], served with a batch size of 2 (per chip) and a beam size of 𝑀 = 70. The model is based on a non-Mixture-of-Experts (MoE) architecture with 3 billion dense parameters. All benchmark experiments are conducted on Google TPU v6e accelerators." - **Arrow 1.0** (Quiver) — 2026-02-01 | Parameters: Arrow 1.0 - License: open | Type: model - "A first of it's kind SVG AI model." Announce: https://x.com/QuiverAI/status/2026792057893708072 - **Qwen3.5-27B** (Alibaba) — 2026-02-01 | Parameters: Qwen3.5-27B - License: open | Type: model - "Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility" - **LFM2-24B-A2B** (Liquid AI) — 2026-02-01 | Parameters: LFM2-24B-A2B - License: open | Type: model - "a traditional instruct model without reasoning traces." - **Mercury 2** (Inception) — 2026-02-01 | Parameters: Mercury 2 - License: open | Type: model - Diffusion large language model (dLLM). - **Gemini 3.1 Pro** (Google DeepMind) — 2026-02-01 | Parameters: Gemini 3.1 Pro - License: open | Type: model - Knowledge cutoff still=January 2025. Announce: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/ - **ZUNA** (Zyphra) — 2026-02-01 | Parameters: ZUNA - License: open | Type: model - For BCI, 'thought-to-text'. Training dataset calcs: (2M hours * 3,600 seconds/hour * 256 samples/second ) / 32 samples/token = 57.6B tokens (refined to 45.1B after rigorous filtering ); 150,000 steps * 2.16M tokens/batch = 324B total tokens seen during training. Announce: https://www.zyphra.com/post/zuna - **Grok 4.2** (xAI) — 2026-02-01 | Parameters: Grok 4.2 - License: open | Type: model - No details provided. Announce: https://x.com/elonmusk/status/2023829664318583105 - **INTELLECT-3.1** (Prime Intellect) — 2026-02-01 | Parameters: INTELLECT-3.1 - License: open | Type: model - Base: GLM-4.5-Air-Base, INTELLECT-3 model. 106BA12B. - **Claude Sonnet 4.6** (Anthropic) — 2026-02-01 | Parameters: Claude Sonnet 4.6 - License: open | Type: model - 1M context. Announce: https://www.anthropic.com/news/claude-sonnet-4-6 Showing GMMLU (Global MMLU by Cohere). - **Tiny Aya** (Cohere) — 2026-02-01 | Parameters: Tiny Aya - License: open | Type: model - 70+ languages. Showing GMMLU (Global MMLU by Cohere). - **Qwen3.5-397B-A17B** (Alibaba) — 2026-02-01 | Parameters: Qwen3.5-397B-A17B - License: open | Type: model - "Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility" - **JoyAI-LLM Flash** (JD Open Source) — 2026-02-01 | Parameters: JoyAI-LLM Flash - License: open | Type: model - 48B-A3B. - **MiniMax-M2.5** (MiniMax) — 2026-02-01 | Parameters: MiniMax-M2.5 - License: open | Type: model - 230B-A10B. HLE showing without tools. - **GLM-5** (Z.AI) — 2026-02-01 | Parameters: GLM-5 - License: open | Type: model - 744B-A40B. Announce: https://z.ai/blog/glm-5 - **Nanbeige4.1-3B** (Nanbeige) — 2026-02-01 | Parameters: Nanbeige4.1-3B - License: open | Type: model - SOTA for size (3B) - **RynnBrain-30B-A3B** (Alibaba) — 2026-02-01 | Parameters: RynnBrain-30B-A3B - License: open | Type: model - Base: Qwen3-VL-30B-A3B-Instruct. "an embodied foundation model grounded in physical reality." - **Claude Opus 4.6** (Anthropic) — 2026-02-01 | Parameters: Claude Opus 4.6 - License: open | Type: model - - **Intern-S1-Pro** (Shanghai AI Laboratory/SenseTime) — 2026-02-01 | Parameters: Intern-S1-Pro - License: open | Type: model - 1000TA22B. Assumes base model of Qwen3. "Built upon a 235B MoE language model and a 6B Vision encoder, Intern-S1 has been further pretrained on 5 trillion tokens of multimodal data" - **Step 3.5 Flash** (StepFun) — 2026-02-01 | Parameters: Step 3.5 Flash - License: open | Type: model - 196B-A11B. - **Assistant_Pepe_8B** (Independent) — 2026-01-01 | Parameters: Assistant_Pepe_8B - License: open | Type: model - Warning for inappropriate content. Base: Llama-3.1-Nemotron-8B. "trained it on an extended 4chan dataset" "the original, gpt4chan (by Yannic Kilcher) scored especially high in truthfulness (that was b4 benchmaxxing)... outperformed the base tune (the unabliterated one), it also changed its political alignment... People were initially joking about the "alignment tax", I think there's a none trivial substance in all of this. It seems to me just above a marginal error or statistical noise." - **Trinity-Large** (Arcee AI) — 2026-01-01 | Parameters: Trinity-Large - License: open | Type: model - 400BA13B. "we worked closely with Prime Intellect. They not only served the H100 clusters Datology used to generate synthetic data, they have been deeply involved in helping scale our training setup to the GPU footprint required for a fully frontier sized model, including the current 2048 B300 GPU configuration for Trinity Large." - **SERA** (Allen AI) — 2026-01-01 | Parameters: SERA - License: open | Type: model - Base: Qwen3-32B. SERA=Soft-verified Efficient Repository Agents. "SERA was built largely by a single Ai2 researcher." https://allenai.org/blog/open-coding-agents "SERA-32B was trained using Soft Verified Generation (SVG), a simple and efficient method that is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. The total cost for data generation and training is approximately $2,000 (40 GPU-days)." - **Kimi K2.5** (Moonshot AI) — 2026-01-01 | Parameters: Kimi K2.5 - License: open | Type: model - 1TA32B. 1T parameters and 384 experts. Open source SOTA. "Kimi K2.5 builds on Kimi K2 [15.5T tokens] with continued pretraining over approximately 15T mixed visual and text tokens. [+ 15T=30.5T]" - **GLM-4.7-Flash** (Z.AI) — 2026-01-01 | Parameters: GLM-4.7-Flash - License: open | Type: model - 30B-A3B. - **MedGemma 1.5 4B** (Google DeepMind) — 2026-01-01 | Parameters: MedGemma 1.5 4B - License: open | Type: model - Lower MMLU score compared to previous MedGemma 1 27B (67.2 v 87). Announce: https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/ - **FrogBoss** (Microsoft) — 2026-01-01 | Parameters: FrogBoss - License: open | Type: model - Base: Qwen3-32B. - **EDEN** (NVIDIA) — 2026-01-01 | Parameters: EDEN - License: closed | Type: model - "EDEN (environmentally-derived evolutionary network) family of metagenomic foundation models, including a 28 billion parameter model trained on 9.7 trillion nucleotide tokens from BaseData1 . This dataset, at the time of training, contained more than 10 billion novel genes from over 1 million new species, and is intentionally enriched for environmental and host-associated metagenomes, phage sequences, and mobile genetic elements, enabling the model to learn from diverse and novel cross-species evolutionary mechanisms and apply them to key challenges in human health." - **Baichuan-M3** (Baichuan) — 2026-01-01 | Parameters: Baichuan-M3 - License: open | Type: model - "new-generation medical-enhanced large language model" - **Engram** (DeepSeek-AI) — 2026-01-01 | Parameters: Engram - License: partial | Type: model - 39.5BA3.8B. "we explore conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic N -gram embeddings for O ( 1 ) lookup." - **SleepFM** (Stanford) — 2026-01-01 | Parameters: SleepFM - License: open | Type: model - Uses a leave-one-out contrastive learning approach to align brain activity (EEG), heart activity (ECG), and respiratory signals. 130+ disease categories and 19–20+ clinical PSG channels. Dataset ~12.63B (Calculated based on 585,000 hours of data across 3 modality groups using 5-second window tokens) x 10 epochs. - **TimeCapsuleLLM-v2-1800-1875** (Independent) — 2026-01-01 | Parameters: TimeCapsuleLLM-v2-1800-1875 - License: open | Type: model - 112GB dataset=30B tokens x 0.5 epochs = 15B tokens. - **Jamba2** (AI21) — 2026-01-01 | Parameters: Jamba2 - License: open | Type: model - 52B-A12B. Pre-training tokens from Jamba=1.2T + 500B mid. - **LFM2.5** (Liquid AI) — 2026-01-01 | Parameters: LFM2.5 - License: open | Type: model - For on-device agentic applications. "Extended pre-training from 10T to 28T tokens and large-scale multi-stage reinforcement learning." - **MiroThinker v1.5** (MiroMindAI) — 2026-01-01 | Parameters: MiroThinker v1.5 - License: open | Type: model - Base: Qwen3 235B-A22B. Official demo: https://dr.miromind.ai - **Falcon-H1R** (TII) — 2026-01-01 | Parameters: Falcon-H1R - License: open | Type: model - Base model: Falcon-H1 (May/2025). Announce: https://huggingface.co/blog/tiiuae/falcon-h1r-7b - **Solar Open 100B** (Upstage) — 2025-12-31 | Parameters: 102B - License: closed | Type: model - AI model by Upstage - **K-EXAONE** (LG AI Research) — 2025-12-31 | Parameters: 236B - License: closed | Type: model - AI model by LG AI Research - **VAETKI** (NC AI) — 2025-12-30 | Parameters: 100B - License: open | Type: model - AI model by NC AI - **A.X K1** (SK Telecom) — 2025-12-30 | Parameters: 519B - License: closed | Type: model - AI model by SK Telecom - **HyperCLOVA X SEED 32B Think** (NAVER) — 2025-12-29 | Parameters: 32B - License: closed | Type: model - AI model by NAVER - **MiniMax-M2.1** (MiniMax) — 2025-12-23 | Parameters: 229B - License: open | Type: model - AI model by MiniMax - **GLM-4.7** (Z.ai (Zhipu AI)) — 2025-12-22 | Parameters: 358B - License: open | Type: model - AI model by Z.ai (Zhipu AI) - **GPT-5.2 Codex** (OpenAI) — 2025-12-18 - License: closed | Type: model - AI model by OpenAI - **Nomos 1** (Nous Research) — 2025-12-11 | Parameters: 30B - License: open | Type: model - AI model by Nous Research - **Nova 2** (Amazon Web Services (AWS)) — 2025-12-02 - License: closed | Type: model - AI model by Amazon Web Services (AWS) - **mHC 27B** (DeepSeek-AI) — 2025-12-01 | Parameters: mHC 27B - License: closed | Type: model - 27BA4.14B. Scaling tested with 3B MoE on 1T tokens=334:1. "Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability." - **IQuest-Coder-V1** (IQuestLab) — 2025-12-01 | Parameters: IQuest-Coder-V1 - License: open | Type: model - "IQuest-Coder-V1 captures the dynamic evolution of software logic, delivering state-of-the-art performance across critical dimensions" https://github.com/IQuestLab/IQuest-Coder-V1 - **A.X K1** (SK Hynix) — 2025-12-01 | Parameters: A.X K1 - License: open | Type: model - 519BA33B. - **K-EXAONE** (LG) — 2025-12-01 | Parameters: K-EXAONE - License: open | Type: model - 236BA23B. “EXAONE”=“EXpert AI for EveryONE”. - **Ranke-4B** (UZH) — 2025-12-01 | Parameters: Ranke-4B - License: closed | Type: model - Base Model: Qwen 3. 600B tokens of pre-(1913, 1929, 1933, 1939, 1946) data only. - **WeDLM** (Tencent) — 2025-12-01 | Parameters: WeDLM - License: open | Type: model - Project page: https://wedlm.github.io/ "WeDLM, a diffusion decoding framework built entirely on standard causal attention to make parallel generation prefix-cache friendly. The core idea is to let each masked position condition on all currently observed tokens while keeping a strict causal mask, achieved by Topological Reordering that moves observed tokens to the physical prefix while preserving their logical positions.. We instantiate WeDLM on both Qwen2.5-7B and Qwen3-8B, utilizing 100B tokens for continued training and 10B tokens for SFT." - **SOLAR Open** (Upstage AI) — 2025-12-01 | Parameters: SOLAR Open - License: open | Type: model - South Korean. 102BA12B. Releasing 31/Dec. - **GLM-4.7** (Z.AI) — 2025-12-01 | Parameters: GLM-4.7 - License: open | Type: model - 355B-A32B. "context window has been expanded from 128K to 200K tokens" - **NitroGen** (NVIDIA) — 2025-12-01 | Parameters: NitroGen - License: open | Type: model - "NitroGen is a unified vision-to-action model designed to play video games directly from raw frames. It takes video game footage as input and outputs gamepad actions... trained on 40,000 hours of gameplay videos across more than 1,000 games." - **MiMo-V2-Flash** (Xiaomi) — 2025-12-01 | Parameters: MiMo-V2-Flash - License: open | Type: model - 309BA15B. - **FunctionGemma** (Google DeepMind) — 2025-12-01 | Parameters: FunctionGemma - License: open | Type: model - "FunctionGemma, a specialized version of our Gemma 3 270M model tuned for function calling. It is designed as a strong base for further training into custom, fast, private, local agents that translate natural language into executable API actions." - **T5Gemma 2** (Google DeepMind) — 2025-12-01 | Parameters: T5Gemma 2 - License: open | Type: model - Base model: Gemma 3. Dataset: Gemma 3 4B checkpoint (4T) + pretraining (2T)=6T. - **Gemini 3 Flash** (Google DeepMind) — 2025-12-01 | Parameters: Gemini 3 Flash - License: open | Type: model - Announce: https://deepmind.google/models/gemini/flash/ - **NVIDIA-Nemotron-3-Nano-30B-A3B** (NVIDIA) — 2025-12-01 | Parameters: NVIDIA-Nemotron-3-Nano-30B-A3B - License: open | Type: model - Knowledge cutoff November 28, 2025 (post). - **Bolmo** (Allen AI) — 2025-12-01 | Parameters: Bolmo - License: open | Type: model - Base Model: Olmo 3 7B. Announce: https://allenai.org/blog/bolmo - **EuroLLM-22B** (Consortium) — 2025-12-01 | Parameters: EuroLLM-22B - License: open | Type: model - A fully open language model developed in Europe. - **LLaDA2.0 Flash** (Inclusion AI) — 2025-12-01 | Parameters: LLaDA2.0 Flash - License: open | Type: model - Base Model: Ling-flash-2.0: 103B total parameters with 6.1B activated. "largest diffusion language model to date" - **GPT-5.2** (OpenAI) — 2025-12-01 | Parameters: GPT-5.2 - License: open | Type: model - "GPT‑5.2 sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations." Announce: https://openai.com/index/introducing-gpt-5-2/ MMLU is for Spanish. - **Apriel-1.6-15B-Thinker** (ServiceNow) — 2025-12-01 | Parameters: Apriel-1.6-15B-Thinker - License: open | Type: model - - **Motif 2 12.7B** (Motif-Technologies) — 2025-12-01 | Parameters: Motif 2 12.7B - License: open | Type: model - - **Devstral 2** (Mistral) — 2025-12-01 | Parameters: Devstral 2 - License: open | Type: model - SWE-bench Verified=72.2%. - **Nanbeige4-3B-Base** (Nanbeige4-3B-Base) — 2025-12-01 | Parameters: Nanbeige4-3B-Base - License: open | Type: model - - **HY 2.0** (Tencent) — 2025-12-01 | Parameters: HY 2.0 - License: open | Type: model - 406BA32B. - **K2-V2** (MBZUAI) — 2025-12-01 | Parameters: K2-V2 - License: open | Type: model - 8.5x more tokens trained than K2 (1.4T v 12T). Project page: https://ifm.ai/k2/ - **Trinity-Mini** (Arcee AI) — 2025-12-01 | Parameters: Trinity-Mini - License: open | Type: model - 26BA3B. "we worked closely with Prime Intellect. They not only served the H100 clusters Datology used to generate synthetic data, they have been deeply involved in helping scale our training setup to the GPU footprint required for a fully frontier sized model, including the current 2048 B300 GPU configuration for Trinity Large." - **Nova 2 Pro** (Amazon) — 2025-12-01 | Parameters: Nova 2 Pro - License: open | Type: model - "Nova 2 Pro is Amazon's most intelligent reasoning model that can process text, images, video, and speech to generate text." - **Mistral Large 3** (Mistral) — 2025-12-01 | Parameters: Mistral Large 3 - License: open | Type: model - 675BA41B. "Mistral Large 3 joins the ranks of frontier instruction-fine-tuned open-source models." EU tech doc: https://legal.cms.mistral.ai/assets/1e37fffd-7ea5-469b-822f-05dcfbb43623 - **DeepSeek-V3.2-Speciale** (DeepSeek-AI) — 2025-12-01 | Parameters: DeepSeek-V3.2-Speciale - License: open | Type: model - The word 'Speciale' may be a reference to Ferrari. "It shows gold-medal performance in the IOI 2025, ICPC World Final 2025, IMO 2025, and CMO 2025." API: https://api-docs.deepseek.com/news/news251201 - **DeepSeekMath-V2** (DeepSeek) — 2025-11-27 | Parameters: 685B - License: open | Type: model - AI model by DeepSeek - **Claude Opus 4.5** (Anthropic) — 2025-11-24 - License: closed | Type: model - AI model by Anthropic - **Olmo 3** (Allen Institute for AI (Ai2)) — 2025-11-20 | Parameters: 32B - License: open | Type: model - AI model by Allen Institute for AI (Ai2) - **Grok 4.1 Fast** (xAI) — 2025-11-19 - License: closed | Type: model - AI model by xAI - **GPT-5.1-Codex-Max** (OpenAI) — 2025-11-19 - License: closed | Type: model - AI model by OpenAI - **Gemini 3 Pro** (Google DeepMind) — 2025-11-18 - License: closed | Type: model - AI model by Google DeepMind - **Grok 4.1** (xAI) — 2025-11-17 - License: closed | Type: model - AI model by xAI - **GPT-5.1** (OpenAI) — 2025-11-13 - License: closed | Type: model - AI model by OpenAI - **GPT-5.1 Instant** (OpenAI) — 2025-11-13 - License: closed | Type: model - AI model by OpenAI - **GPT-5.1-Codex** (OpenAI) — 2025-11-12 - License: closed | Type: model - AI model by OpenAI - **Kimi K2 Thinking** (Moonshot) — 2025-11-06 | Parameters: 1T - License: open | Type: model - AI model by Moonshot - **Gen-0** (Generalist) — 2025-11-04 | Parameters: 10B - License: closed | Type: model - AI model by Generalist - **DeepSeek-Math-V2** (DeepSeek-AI) — 2025-11-01 | Parameters: DeepSeek-Math-V2 - License: open | Type: model - "DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled testtime compute. " - **Orchestrator-8B** (NVIDIA) — 2025-11-01 | Parameters: Orchestrator-8B - License: open | Type: model - Base Model: Qwen3-8B - **INTELLECT-3** (Prime Intellect) — 2025-11-01 | Parameters: INTELLECT-3 - License: open | Type: model - Base: GLM-4.5-Air-Base model. 106BA12B. Announce: https://www.primeintellect.ai/blog/intellect-3 - **Fara-7B** (Microsoft) — 2025-11-01 | Parameters: Fara-7B - License: open | Type: model - "Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA)...Current production baselines leverage Qwen 2.5-VL (7B)." - **Claude Opus 4.5** (Anthropic) — 2025-11-01 | Parameters: Claude Opus 4.5 - License: open | Type: model - "the best model in the world for coding, agents, and computer use." Announce: https://www.anthropic.com/news/claude-opus-4-5 - **Nemotron Elastic** (NVIDIA) — 2025-11-01 | Parameters: Nemotron Elastic - License: open | Type: model - "Nemotron Elastic, a framework for building reasoning-oriented LLMs, including hybrid Mamba-Attention architectures, that embed multiple nested submodels within a single parent model, each optimized for different deployment configurations and budgets. Each of these submodels shares weights with the parent model and can be extracted zero-shot during deployment without additional training or fine-tuning...We apply Nemotron Elastic to the Nemotron Nano V2 12B model, simultaneously producing a 9B and a 6B model using only 110B training tokens" - **GeoVista** (Tencent) — 2025-11-01 | Parameters: GeoVista - License: open | Type: model - Base model: Qwen2.5-VL-7B-Instruct. "GeoVista, an agentic model that seamlessly integrates tool invocation within the reasoning loop, including an image-zoom-in tool to magnify regions of interest and a web-search tool to retrieve related web information. " Project page: https://ekonwang.github.io/geo-vista/ - **OLMo 3** (Allen AI) — 2025-11-01 | Parameters: OLMo 3 - License: open | Type: model - Announce: https://allenai.org/blog/olmo3 - **Gemini 3 Pro** (Google DeepMind) — 2025-11-01 | Parameters: Gemini 3 Pro - License: open | Type: model - "The knowledge cutoff date for Gemini 3 Pro was January 2025." - **Grok 4.1** (xAI) — 2025-11-01 | Parameters: Grok 4.1 - License: open | Type: model - - **Baguettotron** (PleIAs) — 2025-11-01 | Parameters: Baguettotron - License: open | Type: model - "The name is both a nod to French origins and to the unusual shape of the model: with 80 layers, Baguettotron is currently the deepest SLM in its size range." - **ERNIE-5.0-Preview-1022** (Baidu) — 2025-11-01 | Parameters: ERNIE-5.0-Preview-1022 - License: open | Type: model - Very low performance on ALPrompt. 2.4T params confirmed: https://global.chinadaily.com.cn/a/202511/13/WS691571bda310d6866eb29500.html - **GPT-5.1** (OpenAI) — 2025-11-01 | Parameters: GPT-5.1 - License: open | Type: model - Personality change via fine-tuning. GPQA (no tools) increased from GPT-5=85.7 to GPT-5.1=88.1. MMLU is for Spanish. - **TiDAR** (NVIDIA) — 2025-11-01 | Parameters: TiDAR - License: open | Type: model - Base model: Qwen3-8B (36T) + 150B continual training. "TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks" - **SONIC** (NVIDIA) — 2025-11-01 | Parameters: SONIC - License: open | Type: model - Supersizing mOtion tracking for Natural humanoId Control (SONIC). Training dataset calcs: (700 hours * 3,600 seconds/hour * 50 frames/second ) / 1 frame/token = 126M tokens (refined to 100M+ after rigorous filtering ); 150,000 steps * 6.67M tokens/batch = 1.0T total tokens seen during training. - **JustRL-Nemotron-1.5B** (Tsinghua) — 2025-11-01 | Parameters: JustRL-Nemotron-1.5B - License: open | Type: model - "JustRL, a simple recipe with fixed hyperparameters, achieves state-of-the-art performance on two different 1.5B base models (54.5% and 64.3% across 9 math benchmarks) while using 2× less compute than sophisticated approaches. The same hyperparameters transfer across both models without tuning, and training remains stable over thousands of steps without intervention. This suggests the field may be adding complexity to solve problems that disappear with a stable, scaled-up baseline." - **ERNIE-4.5-VL-28B-A3B-Thinking** (Baidu) — 2025-11-01 | Parameters: ERNIE-4.5-VL-28B-A3B-Thinking - License: open | Type: model - 28B-A3B. Open-sourced 12/Nov/2025 from Jun/2025 release. - **HOPE** (Google DeepMind) — 2025-11-01 | Parameters: HOPE - License: partial | Type: model - "Combining our self-modifying sequence model with the continuum memory system, we present a learning module, called HOPE, showing promising results in language modeling, continual learning, and long-context reasoning tasks." Announce: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ May be released after paper is public. - **Kimi K2 Thinking** (Moonshot AI) — 2025-11-01 | Parameters: Kimi K2 Thinking - License: open | Type: model - 1TA32B. 1T parameters and 384 experts. Open source SOTA. HLE=51.0 on text-only subset, compare to Grok-4 HLE=50.7 also on text-only, but Grok-4 HLE=44.4 on HLE full, ∴ Kimi K2 Thinking HLE≈44 full (estimated). - **Ling-1T** (Inclusion AI) — 2025-11-01 | Parameters: Ling-1T - License: open | Type: model - 1TA50B. - **GEN-0** (Generalist) — 2025-11-01 | Parameters: GEN-0 - License: partial | Type: model - "GEN-0, a new class of embodied foundation models built for multimodal training directly on high-fidelity raw physical interaction. Its architecture builds on the strengths of vision and language models while also going beyond them—natively designed to capture human-level reflexes and physical commonsense. One core feature is Harmonic Reasoning, in which the models are trained to simultaneously think and act seamlessly... GEN-0 is pretrained on our in-house robotics dataset, which includes over 270,000 hours of real-world diverse manipulation data, growing at a rate of 10,000 hours a week and accelerating." - **Emu3.5** (Beijing Academy of Artificial Intelligence / BAAI) — 2025-10-30 | Parameters: 34.1B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Kimi Linear** (Moonshot) — 2025-10-30 | Parameters: 48B - License: open | Type: model - AI model by Moonshot - **Composer** (Cursor) — 2025-10-29 - License: closed | Type: model - AI model by Cursor - **SWE-1.5** (Cognition) — 2025-10-29 | Parameters: 300B - License: closed | Type: model - AI model by Cognition - **Tongyi DeepResearch** (Alibaba) — 2025-10-28 | Parameters: 30.5B - License: open | Type: model - AI model by Alibaba - **MiniMax-M2** (MiniMax) — 2025-10-27 | Parameters: 229B - License: open | Type: model - AI model by MiniMax - **LoongRL 7B** (Microsoft Research Asia,Shanghai Jiao Tong University,Carnegie Mellon University (CMU)) — 2025-10-27 | Parameters: 7B - License: closed | Type: model - AI model by Microsoft Research Asia,Shanghai Jiao Tong University,Carnegie Mellon University (CMU) - **LoongRL 14B** (Microsoft Research Asia,Shanghai Jiao Tong University,Carnegie Mellon University (CMU)) — 2025-10-27 | Parameters: 14B - License: closed | Type: model - AI model by Microsoft Research Asia,Shanghai Jiao Tong University,Carnegie Mellon University (CMU) - **Lapa LLM** (Ukrainian Catholic University,Igor Sikorsky Kyiv Polytechnic Institute,AGH University of Krakow,Lviv Polytechnic) — 2025-10-25 | Parameters: 12B - License: open | Type: model - AI model by Ukrainian Catholic University,Igor Sikorsky Kyiv Polytechnic Institute,AGH University of Krakow,Lviv Polytechnic - **Ring-mini-linear-2.0** (Ant Group) — 2025-10-23 | Parameters: 16.4B - License: open | Type: model - AI model by Ant Group - **Ring-flash-linear-2.0** (Ant Group) — 2025-10-23 | Parameters: 104.2B - License: open | Type: model - AI model by Ant Group - **Deepseek OCR** (DeepSeek) — 2025-10-21 | Parameters: 3B - License: open | Type: model - AI model by DeepSeek - **BAPO 32B** (Fudan University,Shanghai Qiji Zhifeng) — 2025-10-21 | Parameters: 32B - License: closed | Type: model - AI model by Fudan University,Shanghai Qiji Zhifeng - **Odyssey 102B** (Anthrogen) — 2025-10-18 | Parameters: 102B - License: closed | Type: model - AI model by Anthrogen - **Odyssey 12B** (Anthrogen) — 2025-10-18 | Parameters: 12B - License: closed | Type: model - AI model by Anthrogen - **Odyssey 1.2B** (Anthrogen) — 2025-10-18 | Parameters: 1.2B - License: closed | Type: model - AI model by Anthrogen - **Claude Haiku 4.5** (Anthropic) — 2025-10-15 - License: closed | Type: model - AI model by Anthropic - **Veo 3.1** (Google DeepMind) — 2025-10-15 - License: closed | Type: model - AI model by Google DeepMind - **Llama 4 Scout + ScaleRL** (Meta AI) — 2025-10-15 | Parameters: 109B - License: closed | Type: model - AI model by Meta AI - **MAI-Image-1** (Microsoft) — 2025-10-13 - License: closed | Type: model - AI model by Microsoft - **C2S-Scale** (Google Research,Yale University,Google DeepMind,Brown University,University of Southern California) — 2025-10-11 | Parameters: 27B - License: open | Type: model - AI model by Google Research,Yale University,Google DeepMind,Brown University,University of Southern California - **Ring-1T** (Ant Group) — 2025-10-10 | Parameters: 1T - License: open | Type: model - AI model by Ant Group - **Ling-1T** (Ant Group) — 2025-10-10 | Parameters: 1T - License: open | Type: model - AI model by Ant Group - **Grok Imagine** (xAI) — 2025-10-08 - License: closed | Type: model - AI model by xAI - **GPT-5 Pro** (OpenAI) — 2025-10-07 - License: closed | Type: model - AI model by OpenAI - **Gemini 2.5 Computer Use** (Google) — 2025-10-07 - License: closed | Type: model - AI model by Google - **Tiny Recursive Model (TRM-Att)** (Samsung SAIT AI Lab) — 2025-10-06 | Parameters: 7M - License: closed | Type: model - AI model by Samsung SAIT AI Lab - **Granite-4.0-H-Tiny** (IBM) — 2025-10-02 | Parameters: 7B - License: open | Type: model - AI model by IBM - **Granite-4.0-H-Micro** (IBM) — 2025-10-02 | Parameters: 3B - License: open | Type: model - AI model by IBM - **Granite-4.0-H-Small** (IBM) — 2025-10-02 | Parameters: 32B - License: open | Type: model - AI model by IBM - **CALM** (Wechat) — 2025-10-01 | Parameters: CALM - License: open | Type: model - "Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9% accuracy... We train our models on the Pile uncopyrighted dataset (Gao et al., 2020). The raw text is processed with the Llama 3 tokenizer (Grattafiori et al., 2024), resulting in a training set of ∼230B tokens." - **Kimi-Linear** (Moonshot AI) — 2025-10-01 | Parameters: Kimi-Linear - License: open | Type: model - 48B-A3B. "Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory." - **MiniMax-M2** (MiniMax) — 2025-10-01 | Parameters: MiniMax-M2 - License: open | Type: model - 230B-A10B. - **MACE-MH-1** (Cambridge/LBNL) — 2025-10-01 | Parameters: MACE-MH-1 - License: open | Type: model - MACE-MH-1 (Multi-Head 1). Features Multiple Heads (OMAT PBE, OMOL r2scan, OC20) to maintain high accuracy across domains - **DeepSeek-OCR** (DeepSeek-AI) — 2025-10-01 | Parameters: DeepSeek-OCR - License: open | Type: model - 2D vision tokens for 1D text achieves huge compression. Encoder/Decoder: DeepEncoder 380M (80M SAM-base + 300M CLIP-large), DeepSeek-3B-MoE (A570M). - **UserLM-8b** (Microsoft) — 2025-10-01 | Parameters: UserLM-8b - License: open | Type: model - "we trained UserLM-8b to simulate the “user” role in conversation (by training it to predict user turns in a large corpus of conversations called WildChat)." - **CoDA** (Salesforce) — 2025-10-01 | Parameters: CoDA - License: open | Type: model - "diffusion coder trained on TPU [Google TPU v4-1024 VM]" - **TRM** (Samsung) — 2025-10-01 | Parameters: TRM - License: open | Type: model - "Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers" - **Granite-4.0 Small** (IBM) — 2025-10-01 | Parameters: Granite-4.0 Small - License: open | Type: model - 32B-A9B. Announce: https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models - **Octave 2** (Hume) — 2025-10-01 - License: closed | Type: model - AI model by Hume - **EVI 4 mini** (Hume) — 2025-10-01 - License: closed | Type: model - AI model by Hume - **GLM-4.6** (Z.ai (Zhipu AI),Tsinghua University) — 2025-09-30 | Parameters: 357B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **Sora 2.0** (OpenAI) — 2025-09-30 - License: closed | Type: model - AI model by OpenAI - **Kandinsky 5.0 Video Lite** (Sber) — 2025-09-30 | Parameters: 2B - License: open | Type: model - AI model by Sber - **Claude Sonnet 4.5** (Anthropic) — 2025-09-29 - License: closed | Type: model - AI model by Anthropic - **NVIDIA Isaac GR00T N1.6** (NVIDIA) — 2025-09-29 - License: closed | Type: model - AI model by NVIDIA - **Cosmos-Transfer2.5-2B** (NVIDIA) — 2025-09-29 | Parameters: 2B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Predict2.5-14B** (NVIDIA) — 2025-09-29 | Parameters: 14B - License: closed | Type: model - AI model by NVIDIA - **Cosmos-Predict2.5 2B** (NVIDIA) — 2025-09-29 | Parameters: 2B - License: open | Type: model - AI model by NVIDIA - **DeepSeek-V3.2-Exp** (DeepSeek) — 2025-09-29 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **MinerU2.5** (Shanghai AI Lab,Peking University,Shanghai Jiao Tong University) — 2025-09-29 | Parameters: 1.2B - License: open | Type: model - AI model by Shanghai AI Lab,Peking University,Shanghai Jiao Tong University - **Wan 2.5** (Alibaba) — 2025-09-29 - License: closed | Type: model - AI model by Alibaba - **Seedream 4.0** (ByteDance) — 2025-09-28 - License: closed | Type: model - AI model by ByteDance - **Kling 2.5 Turbo** (Kuaishou Technology) — 2025-09-26 - License: closed | Type: model - AI model by Kuaishou Technology - **Suno v5** (Suno) — 2025-09-25 - License: closed | Type: model - AI model by Suno - **Gemini Robotics 1.5** (Google DeepMind) — 2025-09-25 - License: closed | Type: model - AI model by Google DeepMind - **Gemini Robotics-ER 1.5** (Google DeepMind) — 2025-09-25 - License: closed | Type: model - AI model by Google DeepMind - **GigaEmbeddings** (Sber,Moscow Institute of Physics and Technology) — 2025-09-25 | Parameters: 3B - License: open | Type: model - AI model by Sber,Moscow Institute of Physics and Technology - **SimpleFold** (Apple) — 2025-09-23 | Parameters: 3B - License: open | Type: model - AI model by Apple - **DeepSeek-V3.1-Terminus** (DeepSeek) — 2025-09-22 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **Qwen3-Omni-Flash** (Alibaba) — 2025-09-22 | Parameters: 35.3B - License: closed | Type: model - AI model by Alibaba - **Qwen3-Omni-30B-A3B** (Alibaba) — 2025-09-22 | Parameters: 35.3B - License: open | Type: model - AI model by Alibaba - **Grok 4 Fast** (xAI) — 2025-09-19 - License: closed | Type: model - AI model by xAI - **Magistral Small 1.2** (Mistral AI) — 2025-09-18 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI - **Magistral Medium 1.2** (Mistral AI) — 2025-09-18 - License: closed | Type: model - AI model by Mistral AI - **Granite-Docling** (IBM) — 2025-09-17 | Parameters: 258M - License: open | Type: model - AI model by IBM - **AgentFounder-30B** (Alibaba) — 2025-09-16 | Parameters: 30B - License: open | Type: model - AI model by Alibaba - **Fabric 1.0** (Veed) — 2025-09-15 - License: closed | Type: model - AI model by Veed - **GPT‑5-Codex** (OpenAI) — 2025-09-15 - License: closed | Type: model - AI model by OpenAI - **Qwen3-Next-80B-A3B** (Alibaba) — 2025-09-10 | Parameters: 80B - License: open | Type: model - AI model by Alibaba - **Lucid Origin** (Leonardo AI) — 2025-09-10 - License: closed | Type: model - AI model by Leonardo AI - **Ling-mini-base-2.0-20T** (Ant Group) — 2025-09-10 | Parameters: 16B - License: open | Type: model - AI model by Ant Group - **Ling-flash-base-2.0-20T** (Ant Group) — 2025-09-10 | Parameters: 100B - License: open | Type: model - AI model by Ant Group - **K2 Think** (Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),G42) — 2025-09-09 | Parameters: 32B - License: open | Type: model - AI model by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),G42 - **Signal Processing Transformer** (Softbank) — 2025-09-09 - License: closed | Type: model - AI model by Softbank - **Qwen3-Max** (Alibaba) — 2025-09-05 | Parameters: 1T - License: closed | Type: model - AI model by Alibaba - **EmbeddingGemma** (Google DeepMind) — 2025-09-05 | Parameters: 308M - License: open | Type: model - AI model by Google DeepMind - **Chatterbox Multilingual** (Resemble AI) — 2025-09-04 - License: open | Type: model - AI model by Resemble AI - **Apertus 8B** (ETH Zurich,Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss National Supercomputing Centre (CSCS),Swisscom) — 2025-09-02 | Parameters: 8B - License: open | Type: model - AI model by ETH Zurich,Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss National Supercomputing Centre (CSCS),Swisscom - **Apertus 70B** (ETH Zurich,Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss National Supercomputing Centre (CSCS),Swisscom) — 2025-09-02 | Parameters: 70B - License: open | Type: model - AI model by ETH Zurich,Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss National Supercomputing Centre (CSCS),Swisscom - **GLM-4.6** (Z.AI) — 2025-09-01 | Parameters: GLM-4.6 - License: open | Type: model - 355B-A32B. "context window has been expanded from 128K to 200K tokens" - **Ring-1T-preview** (InclusionAI) — 2025-09-01 | Parameters: Ring-1T-preview - License: open | Type: model - 1T-A48.5B. - **Claude Sonnet 4.5** (Anthropic) — 2025-09-01 | Parameters: Claude Sonnet 4.5 - License: open | Type: model - The Claude Sonnet 4.5 "system card" is an absolute farce. Announce: https://www.anthropic.com/news/claude-sonnet-4-5 - **Gemini Robotics 1.5** (Google DeepMind) — 2025-09-01 | Parameters: Gemini Robotics 1.5 - License: open | Type: model - 2. "vision-language-action (VLA) model turns visual information and instructions into motor commands for a robot to perform a task." Available to select partners. - **Gemini Robotics-ER 1.5** (Google DeepMind) — 2025-09-01 | Parameters: Gemini Robotics-ER 1.5 - License: open | Type: model - 1. "vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission." Available to all devs. - **TimesFM-ICF** (Google) — 2025-09-01 | Parameters: TimesFM-ICF - License: closed | Type: model - TimesFM-ICF is 6.8% more accurate than TimesFM (Base). Time-series forecasting only. 'a large pretraining corpus of 100B real world time-points' may be more than 100B tokens. - **Qwen3-Max** (Alibaba) — 2025-09-01 | Parameters: Qwen3-Max - License: open | Type: model - "Qwen3-Max-Thinking — still under active training — is already demonstrating remarkable potential. When augmented with tool usage and scaled test-time compute, the Thinking variant has achieved 100% on challenging reasoning benchmarks such as AIME 25 and HMMT. " - **Qwen3-Omni** (Alibaba) — 2025-09-01 | Parameters: Qwen3-Omni - License: open | Type: model - "Qwen3-Omni is a unified end-to-end model capable of processing multiple modalities, such as text, audio, image and video, and generating real-time text or speech response."... "pretraining utilizes a large-scale dataset containing approximately 2 trillion tokens, with the following distribution across modalities: text (0.57 trillion), audio (0.77 trillion), image (0.82 trillion), video (0.05 trillion), and video-audio (0.05 trillion)." - **DeepSeek-V3.1-Terminus** (DeepSeek-AI) — 2025-09-01 | Parameters: DeepSeek-V3.1-Terminus - License: open | Type: model - Hybrid reasoning. Dataset tokens: https://x.com/deepseek_ai/status/1958417072536608952 HLE: https://x.com/deepseek_ai/status/1958417068568481854/photo/2 - **Isaac 0.1** (Perceptron) — 2025-09-01 | Parameters: Isaac 0.1 - License: open | Type: model - "perceptive-language model...delivering capabilities that meet or exceed those of models over 50 times its size. Founded by the team behind Meta's Chameleon multimodal models, Perceptron is tackling a fundamental challenge: bringing the power of physical AI to the dynamic, multimodal, and real-time environments we live and work in." - **Grok 4 Fast** (xAI) — 2025-09-01 | Parameters: Grok 4 Fast - License: open | Type: model - "2M token context window, and a unified architecture that blends reasoning and non-reasoning modes in one model." - **VaultGemma** (Google DeepMind) — 2025-09-01 | Parameters: VaultGemma - License: open | Type: model - "Differential Privacy (DP) has emerged as the gold standard, providing a rigorous, mathematical framework to limit the influence of any single example in the training data on the resulting model. A model trained with DP provably bounds the reconstruction or leakage of information tied to individual data points." Announce: https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/ - **Qwen3-Next-80B-A3B** (Alibaba) — 2025-09-01 | Parameters: Qwen3-Next-80B-A3B - License: open | Type: model - "Qwen3-Next introduces several key improvements: a hybrid attention mechanism, a highly sparse Mixture-of-Experts (MoE) structure, training-stability-friendly optimizations, and a multi-token prediction mechanism for faster inference." - **K2-Think** (MBZUAI) — 2025-09-01 | Parameters: K2-Think - License: open | Type: model - "Built on the Qwen2.5 base model, our system shows that smaller models can compete at the highest levels by combining advanced post-training and test-time computation techniques. The approach is based on six key technical pillars: Long Chain-of-thought Supervised Finetuning, Reinforcement Learning with Verifiable Rewards (RLVR), Agentic planning prior to reasoning, Test-time Scaling, Speculative Decoding, and Inference-optimized Hardware, all using publicly available open-source datasets." - **mmBERT** (JHU) — 2025-09-01 | Parameters: mmBERT - License: open | Type: model - "a modern multilingual encoder trained on 3T tokens and 1833 languages. We introduce several novel elements in training: an inverse masking schedule and a cascading annealed language learning schedule for multilingual data" Announce: https://huggingface.co/blog/mmbert - **ERNIE X1.1** (Baidu) — 2025-09-01 | Parameters: ERNIE X1.1 - License: open | Type: model - - **ERNIE-4.5-21B-A3B-Thinking** (Baidu) — 2025-09-01 | Parameters: ERNIE-4.5-21B-A3B-Thinking - License: open | Type: model - - **Klear-46B-A2.5B** (Kuaishou) — 2025-09-01 | Parameters: Klear-46B-A2.5B - License: open | Type: model - 46B-A2.5B. - **TildeOpen-30b** (Tilde AI) — 2025-09-01 | Parameters: TildeOpen-30b - License: open | Type: model - "language data from across Europe" - **Qwen3-Max-Preview** (Alibaba) — 2025-09-01 | Parameters: Qwen3-Max-Preview - License: open | Type: model - GPQA score is SuperGPQA. "our biggest model yet, with over 1 trillion parameters" - **Kimi K2-Instruct-0905** (Moonshot AI) — 2025-09-01 | Parameters: Kimi K2-Instruct-0905 - License: open | Type: model - 1TA32B. 1T parameters and 384 experts. Open source SOTA. - **Apertus** (ETH Zürich) — 2025-09-01 | Parameters: Apertus - License: open | Type: model - "Apertus – Latin for “open”" 1,811 languages. Announce: https://ethz.ch/en/news-and-events/eth-news/news/2025/09/press-release-apertus-a-fully-open-transparent-multilingual-language-model.html - **LongCat-Flash** (Meituan) — 2025-09-01 | Parameters: LongCat-Flash - License: open | Type: model - 560B-A18.6B–31.3B (27B on average). Announce: https://lmsys.org/blog/2025-09-01-sglang-longcat-flash/ - **Baichuan-M2** (Baichuan) — 2025-09-01 | Parameters: Baichuan-M2 - License: open | Type: model - Base: Qwen2.5. "medical augmented reasoning model" - **LongCat-Flash** (Meituan Inc) — 2025-09-01 | Parameters: 560B - License: open | Type: model - AI model by Meituan Inc - **MultiverSeg** (Massachusetts Institute of Technology (MIT),Databricks) — 2025-08-31 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT),Databricks - **gpt-realtime** (OpenAI) — 2025-08-28 - License: closed | Type: model - AI model by OpenAI - **MAI-Voice-1** (Microsoft) — 2025-08-28 - License: closed | Type: model - AI model by Microsoft - **Grok Code Fast 1** (xAI) — 2025-08-28 - License: closed | Type: model - AI model by xAI - **Wan 2.2 14B S2V** (Alibaba) — 2025-08-26 | Parameters: 27B - License: open | Type: model - AI model by Alibaba - **Gemini 2.5 Flash Image (Nano Banana)** (Google) — 2025-08-26 - License: closed | Type: model - AI model by Google - **YandexGPT 5.1 Pro** (Yandex) — 2025-08-25 - License: closed | Type: model - AI model by Yandex - **DeepSeek-V3.1** (DeepSeek) — 2025-08-21 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **Seed-OSS-36B-Base** (ByteDance) — 2025-08-21 - License: open | Type: model - AI model by ByteDance - **Cohere Command A Reasoning** (Cohere) — 2025-08-21 | Parameters: 111B - License: open | Type: model - AI model by Cohere - **Teuken 7B** (OpenGPT-X,Fraunhofer Institute for Algorithms and Scientific Computing,Forschungszentrum Julich,Technische Universität Dresden) — 2025-08-21 | Parameters: 7B - License: open | Type: model - AI model by OpenGPT-X,Fraunhofer Institute for Algorithms and Scientific Computing,Forschungszentrum Julich,Technische Universität Dresden - **Surya** (NASA,University of Alabama,IBM Research) — 2025-08-20 | Parameters: 366M - License: open | Type: model - AI model by NASA,University of Alabama,IBM Research - **Gemma-SEA-LION-v4-27B-IT** (AI Singapore) — 2025-08-20 | Parameters: 27B - License: open | Type: model - AI model by AI Singapore - **FlowER** (Massachusetts Institute of Technology (MIT)) — 2025-08-20 | Parameters: 7B - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **MolmoAct-7B-D** (Allen Institute for AI,University of Washington) — 2025-08-19 | Parameters: 7B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **NVIDIA-Nemotron-Nano-9B-v2** (NVIDIA) — 2025-08-18 | Parameters: 9B - License: open | Type: model - AI model by NVIDIA - **NVIDIA-Nemotron-Nano-12B-v2** (NVIDIA) — 2025-08-18 | Parameters: 12B - License: open | Type: model - AI model by NVIDIA - **Qwen Image Edit** (Alibaba) — 2025-08-18 | Parameters: 27B - License: open | Type: model - AI model by Alibaba - **Ovis2.5 9B** (Alibaba) — 2025-08-15 | Parameters: 9B - License: open | Type: model - AI model by Alibaba - **Ovis2.5 2B** (Alibaba) — 2025-08-15 | Parameters: 2B - License: open | Type: model - AI model by Alibaba - **GLM-4.5V** (Z.ai (Zhipu AI),Tsinghua University) — 2025-08-15 | Parameters: 108B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **GLM-4.1V-Thinking** (Z.ai (Zhipu AI),Tsinghua University) — 2025-08-15 | Parameters: 9B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **imagen 4 fast** (Google) — 2025-08-15 - License: closed | Type: model - AI model by Google - **Canary 1B v2** (NVIDIA) — 2025-08-14 | Parameters: 1B - License: open | Type: model - AI model by NVIDIA - **Parakeet-tdt-0.6b-v3** (NVIDIA) — 2025-08-14 | Parameters: 600K - License: open | Type: model - AI model by NVIDIA - **Gemma 3 270M** (Google DeepMind) — 2025-08-14 | Parameters: 270M - License: open | Type: model - AI model by Google DeepMind - **Marey Realism v1.5** (Moonvalley) — 2025-08-14 - License: closed | Type: model - AI model by Moonvalley - **GPT-5** (OpenAI) — 2025-08-07 - License: closed | Type: model - AI model by OpenAI - **GPT-5 mini** (OpenAI) — 2025-08-07 - License: closed | Type: model - AI model by OpenAI - **GPT-5 nano** (OpenAI) — 2025-08-07 - License: closed | Type: model - AI model by OpenAI - **Ideogram Character** (Ideogram) — 2025-08-07 - License: closed | Type: model - AI model by Ideogram - **Claude Opus 4.1** (Anthropic) — 2025-08-05 - License: closed | Type: model - AI model by Anthropic - **gpt-oss-120b** (OpenAI) — 2025-08-05 | Parameters: 116.8B - License: open | Type: model - AI model by OpenAI - **gpt-oss-20b** (OpenAI) — 2025-08-05 | Parameters: 20.9B - License: open | Type: model - AI model by OpenAI - **GLM-4.5** (Z.ai (Zhipu AI),Tsinghua University) — 2025-08-05 | Parameters: 355B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **Genie 3** (Google DeepMind) — 2025-08-05 - License: closed | Type: model - AI model by Google DeepMind - **GLM-4.5-Air** (Z.ai (Zhipu AI),Tsinghua University) — 2025-08-05 | Parameters: 106B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **Qwen Image** (Alibaba) — 2025-08-04 | Parameters: 27B - License: open | Type: model - AI model by Alibaba - **Hierarchical Reasoning Model (HPM)** (Sapient Intelligence) — 2025-08-04 | Parameters: 27M - License: closed | Type: model - AI model by Sapient Intelligence - **MAI-1-preview** (Microsoft) — 2025-08-01 | Parameters: MAI-1-preview - License: open | Type: model - MAI=Microsoft artificial intelligence. "MAI’s first foundation model trained end-to-end... MAI-1-preview is an in-house mixture-of-experts model, pre-trained and post-trained on ~15,000 NVIDIA H100 GPUs. This model is designed to provide powerful capabilities to consumers seeking to benefit from models that specialize in following instructions and providing helpful responses to everyday queries. We will be rolling MAI-1-preview out for certain text use cases within Copilot" - **grok-code-fast-1** (xAI) — 2025-08-01 | Parameters: grok-code-fast-1 - License: open | Type: model - "We built grok-code-fast-1 from scratch, starting with a brand-new model architecture. To lay a robust foundation, we carefully assembled a pre-training corpus rich with programming-related content. For post-training, we curated high-quality datasets that reflect real-world pull requests and coding tasks." Announce: https://x.ai/news/grok-code-fast-1 - **Hermes 4** (Nous Research) — 2025-08-01 | Parameters: Hermes 4 - License: open | Type: model - Based on Llama 3. Announce: https://hermes4.nousresearch.com/ - **Jet-Nemotron-4B** (NVIDIA) — 2025-08-01 | Parameters: Jet-Nemotron-4B - License: open | Type: model - "pre-training corpus and train Jet-Nemotron models for 50B tokens. This is also the setting in Section 2 where we perform PostNAS. At the second stage, we include more high-quality data from math [65] and coding [66, 67] domains into our data mixture. The models are then trained on 350B tokens." - **DeepSeek-V3.1-Base** (DeepSeek-AI) — 2025-08-01 | Parameters: DeepSeek-V3.1-Base - License: open | Type: model - Hybrid reasoning. Dataset tokens: https://x.com/deepseek_ai/status/1958417072536608952 HLE: https://x.com/deepseek_ai/status/1958417068568481854/photo/2 - **Nemotron Nano 2** (NVIDIA) — 2025-08-01 | Parameters: Nemotron Nano 2 - License: open | Type: model - Announce: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2/ - **Gemma 3 270M** (Google DeepMind) — 2025-08-01 | Parameters: Gemma 3 270M - License: open | Type: model - - **GPT-5** (OpenAI) — 2025-08-01 | Parameters: GPT-5 - License: open | Type: model - Announce: https://openai.com/index/introducing-gpt-5/. MMLU is based on ES and PT translated from EN. - **gpt-oss-120b** (OpenAI) — 2025-08-01 | Parameters: gpt-oss-120b - License: open | Type: model - 116.8B total parameters and 5.1B “active” parameters per token per forward pass. https://openai.com/index/introducing-gpt-oss/ - **gpt-oss-20b** (OpenAI) — 2025-08-01 | Parameters: gpt-oss-20b - License: open | Type: model - 20.9B total and 3.6B active parameters. https://openai.com/index/introducing-gpt-oss/ - **Claude Opus 4.1** (Anthropic) — 2025-08-01 | Parameters: Claude Opus 4.1 - License: open | Type: model - - **rStar-Math (Qwen2.5-Math-7B base)** (Microsoft Research Asia,Peking University,Tsinghua University) — 2025-08-01 | Parameters: 7B - License: closed | Type: model - AI model by Microsoft Research Asia,Peking University,Tsinghua University - **rStar-Math (Qwen2-Math-7B base)** (Microsoft Research Asia,Peking University,Tsinghua University) — 2025-08-01 | Parameters: 7B - License: closed | Type: model - AI model by Microsoft Research Asia,Peking University,Tsinghua University - **MindLink-72B** (Kunlun Inc.) — 2025-08-01 | Parameters: 72B - License: closed | Type: model - AI model by Kunlun Inc. - **MindLink-32B** (Kunlun Inc.) — 2025-08-01 | Parameters: 32B - License: closed | Type: model - AI model by Kunlun Inc. - **Gemini 2.5 Deep Think** (Google,Google DeepMind) — 2025-08-01 - License: closed | Type: model - AI model by Google,Google DeepMind - **Tri-21B** (Trillion Labs) — 2025-08-01 | Parameters: 20.7B - License: open | Type: model - AI model by Trillion Labs - **Veo 3 Fast** (Google DeepMind) — 2025-07-31 - License: closed | Type: model - AI model by Google DeepMind - **Command A Vision** (Cohere) — 2025-07-31 | Parameters: 112B - License: open | Type: model - AI model by Cohere - **AlphaEarth Foundations (AEF)** (Google DeepMind,Google) — 2025-07-30 | Parameters: 480M - License: closed | Type: model - AI model by Google DeepMind,Google - **Llama Nemotron Super v1.5** (NVIDIA) — 2025-07-29 | Parameters: 49B - License: open | Type: model - AI model by NVIDIA - **Wan 2.2 14B T2V** (Alibaba) — 2025-07-28 | Parameters: 14B - License: open | Type: model - AI model by Alibaba - **Wan 2.2 14B I2V** (Alibaba) — 2025-07-28 | Parameters: 14B - License: open | Type: model - AI model by Alibaba - **Agentar-Fin-R1 32B** (Ant Group) — 2025-07-27 | Parameters: 32B - License: closed | Type: model - AI model by Ant Group - **Agentar-Fin-R1 8B** (Ant Group) — 2025-07-27 | Parameters: 8B - License: closed | Type: model - AI model by Ant Group - **Qwen3-235B-A22B-Thinking (Jul 2025)** (Alibaba) — 2025-07-25 | Parameters: 235B - License: open | Type: model - AI model by Alibaba - **Qwen3-235B-A22B (Jul 2025)** (Alibaba) — 2025-07-25 | Parameters: 235B - License: open | Type: model - AI model by Alibaba - **Seed Prover** (ByteDance) — 2025-07-23 - License: closed | Type: model - AI model by ByteDance - **Aeneas** (Google DeepMind,University of Nottingham,University of Warwick,Athens University of Economics and Business,Google,University of Oxford) — 2025-07-23 - License: closed | Type: model - AI model by Google DeepMind,University of Nottingham,University of Warwick,Athens University of Economics and Business,Google,University of Oxford - **Qwen3-Coder-480B-A35B** (Alibaba) — 2025-07-22 | Parameters: 480B - License: open | Type: model - AI model by Alibaba - **T-Pro 2.0** (T-Bank) — 2025-07-18 | Parameters: 32B - License: open | Type: model - AI model by T-Bank - **ChatGPT agent** (OpenAI) — 2025-07-17 - License: closed | Type: model - AI model by OpenAI - **OpenReasoning-Nemotron-32B** (NVIDIA) — 2025-07-16 | Parameters: 32B - License: open | Type: model - AI model by NVIDIA - **EXAONE 4.0 (32B)** (LG AI Research) — 2025-07-15 | Parameters: 32B - License: open | Type: model - AI model by LG AI Research - **EXAONE 4.0 (1.2B)** (LG AI Research) — 2025-07-15 | Parameters: 1.2B - License: open | Type: model - AI model by LG AI Research - **Voxtral Small** (Mistral AI) — 2025-07-15 | Parameters: 24.3B - License: open | Type: model - AI model by Mistral AI - **Voxtral Mini** (Mistral AI) — 2025-07-15 | Parameters: 4.7B - License: open | Type: model - AI model by Mistral AI - **Gemini Embedding** (Google DeepMind) — 2025-07-14 - License: closed | Type: model - AI model by Google DeepMind - **Kimi K2** (Moonshot) — 2025-07-11 | Parameters: 1T - License: open | Type: model - AI model by Moonshot - **Grok 4 Heavy** (xAI) — 2025-07-10 - License: closed | Type: model - AI model by xAI - **Grok 4** (xAI) — 2025-07-09 | Parameters: 3T - License: closed | Type: model - AI model by xAI - **EXAONE Path 2.0** (LG AI Research) — 2025-07-09 | Parameters: 175M - License: open | Type: model - AI model by LG AI Research - **MedSigLIP** (Google) — 2025-07-09 | Parameters: 800M - License: open | Type: model - AI model by Google - **dots.llm1** (Rednote) — 2025-07-06 | Parameters: 142B - License: open | Type: model - AI model by Rednote - **GLM-4.5** (Z.AI) — 2025-07-01 | Parameters: GLM-4.5 - License: open | Type: model - 355B-A32B. - **T1** (China Telecom Artificial Intelligence Research Institute) — 2025-07-01 | Parameters: T1 - License: open | Type: model - - **Intern-S1** (Shanghai AI Laboratory/SenseTime) — 2025-07-01 | Parameters: Intern-S1 - License: open | Type: model - 41T tokens assumes base model of Qwen3. "Built upon a 235B MoE language model and a 6B Vision encoder, Intern-S1 has been further pretrained on 5 trillion tokens of multimodal data" - **Step 3** (StepFun) — 2025-07-01 | Parameters: Step 3 - License: open | Type: model - 321B-A38B. https://x.com/CyouSakura/status/1948767450751009227 - **Qwen3-235B-A22B-Thinking-2507** (Alibaba) — 2025-07-01 | Parameters: Qwen3-235B-A22B-Thinking-2507 - License: open | Type: model - 235B-A22B. "Qwen3 is pre-trained on 36 trillion tokens across 119 languages" MMLU score is MMLU-Redux. - **KAT-V1-200B** (Kuaishou) — 2025-07-01 | Parameters: KAT-V1-200B - License: closed | Type: model - 200BA40B. In training as of Jul/2025. "to address the overthinking problem in reasoning-intensive tasks" - **KAT-V1-40B** (Kuaishou) — 2025-07-01 | Parameters: KAT-V1-40B - License: open | Type: model - "to address the overthinking problem in reasoning-intensive tasks" - **Qwen3-Coder-480B-A35B-Instruct** (Alibaba) — 2025-07-01 | Parameters: Qwen3-Coder-480B-A35B-Instruct - License: open | Type: model - 480B-A35B. - **Qwen3-235B-A22B-Instruct-2507** (Alibaba) — 2025-07-01 | Parameters: Qwen3-235B-A22B-Instruct-2507 - License: open | Type: model - 235B-A22B. "Qwen3 is pre-trained on 36 trillion tokens across 119 languages" MMLU score is MMLU-Redux. - **FlexOlmo** (Allen AI) — 2025-07-01 | Parameters: FlexOlmo - License: open | Type: model - 37B-A20B. "We adopt the OLMo-2 7B setup, starting from a a checkpoint pre-trained on 4T tokens and annealed for 50B tokens to produce a public expert. We then train two additional experts on math and code, each for 50B tokens, and combine them with the public expert to form a three-expert version of FLEXOLMO." - **EXAONE 4.0** (LG) — 2025-07-01 | Parameters: EXAONE 4.0 - License: open | Type: model - “EXAONE”=“EXpert AI for EveryONE”. Training tokens/ratio: EXAONE-3 7.8B=8T tokens (Aug/2024) -> EXAONE-3.5 7.8B=9T -> EXAONE-3.5 32B=6.5T tokens -> EXAONE 4.0 32B=14T tokens. MMLU score is MMLU-Redux. Interesting: "To focus [RL] training on more informative data samples, we perform accuracy-based filtering by generating eight responses from the SFT model and excluding samples where all eight responses are correct, a pre-filtering step that removes problems that are easy for the model to avoid inefficient training." - **Kimi K2** (Moonshot AI) — 2025-07-01 | Parameters: Kimi K2 - License: open | Type: model - 1TA32B. 1T parameters and 384 experts. Open source SOTA. - **Reka Flash 3.1** (Reka AI) — 2025-07-01 | Parameters: Reka Flash 3.1 - License: open | Type: model - - **Devstral Medium** (Mistral) — 2025-07-01 | Parameters: Devstral Medium - License: open | Type: model - Non-reasoning. - **Grok 4** (xAI) — 2025-07-01 | Parameters: Grok 4 - License: open | Type: model - 2.4T? https://x.com/kalomaze/status/1942996555088134592 "The smartest AI in the world, 100% on SAT, etc, questions that it's never seen before." - **KAT-V1-200B** (Kwaipilot) — 2025-07-01 | Parameters: KAT-V1-200B - License: open | Type: model - 200BA40B. - **KAT-V1-40B** (Kwaipilot) — 2025-07-01 | Parameters: KAT-V1-40B - License: open | Type: model - - **Phi-4-mini-flash-reasoning** (Microsoft) — 2025-07-01 | Parameters: Phi-4-mini-flash-reasoning - License: open | Type: model - "Pre-training: 5T tokens; Reasoning training: 150B tokens" "At the core of Phi-4-mini-flash-reasoning is the newly introduced decoder-hybrid-decoder architecture, SambaY, whose central innovation is the Gated Memory Unit (GMU), a simple yet effective mechanism for sharing representations between layers. The architecture includes a self-decoder that combines Mamba (a State Space Model) and Sliding Window Attention (SWA), along with a single layer of full attention. The architecture also involves a cross-decoder that interleaves expensive cross-attention layers with the new, efficient GMUs. This new architecture with GMU modules drastically improves decoding efficiency, boosts long-context retrieval performance and enables the architecture to deliver exceptional performance across a wide range of tasks. " - **T5Gemma** (Google DeepMind) — 2025-07-01 | Parameters: T5Gemma - License: open | Type: model - Related paper: https://arxiv.org/abs/2504.06225. Dataset was Gemma 2 9B on 8T tokens + 2T tokens adapted. - **MedGemma 1 27B** (Google DeepMind) — 2025-07-01 | Parameters: MedGemma 1 27B - License: open | Type: model - Multimodal model. Text MMLU score for med only=87.0. - **R1T2 Chimera** (TNG) — 2025-07-01 | Parameters: R1T2 Chimera - License: open | Type: model - Assembly of Experts-method of V3-0324, R1, R1-0528. Announce: https://x.com/tngtech/status/1940531045432283412?s=46 - **Finix-P1-32B** (Ant Group) — 2025-07-01 | Parameters: 32B - License: closed | Type: model - AI model by Ant Group - **ERNIE-4.5-VL-28B-A3B** (Baidu) — 2025-06-29 | Parameters: 28B - License: open | Type: model - AI model by Baidu - **ERNIE-4.5-300B-A47B** (Baidu) — 2025-06-29 | Parameters: 300B - License: open | Type: model - AI model by Baidu - **ERNIE-4.5-21B-A3B** (Baidu) — 2025-06-29 | Parameters: 21B - License: open | Type: model - AI model by Baidu - **ERNIE-4.5-0.3B** (Baidu) — 2025-06-29 | Parameters: 360M - License: open | Type: model - AI model by Baidu - **DiLoCoX (Qwen1.5-107B on WT-103)** (China Mobile,Zero Gravity Labs (0g AI)) — 2025-06-26 | Parameters: 107B - License: closed | Type: model - AI model by China Mobile,Zero Gravity Labs (0g AI) - **BlueOcean LLM 2.0 (萤石蓝海)** (Hangzhou EZVIZ Software Co., Ltd. (Hikvision)) — 2025-06-26 - License: closed | Type: model - AI model by Hangzhou EZVIZ Software Co., Ltd. (Hikvision) - **AlphaGenome** (Google DeepMind) — 2025-06-25 | Parameters: 450M - License: closed | Type: model - AI model by Google DeepMind - **Kimi-VL** (Moonshot) — 2025-06-23 | Parameters: 16B - License: closed | Type: model - AI model by Moonshot - **Minimax Hailuo 02** (MiniMax,Hailuo AI) — 2025-06-18 - License: closed | Type: model - AI model by MiniMax,Hailuo AI - **Mercury Coder Mini** (Inception Labs) — 2025-06-17 - License: closed | Type: model - AI model by Inception Labs - **Mercury Coder Small** (Inception Labs) — 2025-06-17 - License: closed | Type: model - AI model by Inception Labs - **Kimi Dev 72b** (Moonshot) — 2025-06-16 | Parameters: 72B - License: open | Type: model - AI model by Moonshot - **Gemini 2.5 Flash-Lite** (Google DeepMind) — 2025-06-15 - License: closed | Type: model - AI model by Google DeepMind - **Mistral Small 3.2** (Mistral AI) — 2025-06-15 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI - **MiniMax-M1-80k** (MiniMax) — 2025-06-13 | Parameters: 456B - License: open | Type: model - AI model by MiniMax - **MiniMax-M1-40k** (MiniMax) — 2025-06-13 | Parameters: 456B - License: open | Type: model - AI model by MiniMax - **FGN** (Google DeepMind) — 2025-06-12 | Parameters: 720M - License: closed | Type: model - AI model by Google DeepMind - **CollabLLM** (Stanford University,Microsoft,Georgia Institute of Technology) — 2025-06-12 | Parameters: 8B - License: open | Type: model - AI model by Stanford University,Microsoft,Georgia Institute of Technology - **V-JEPA 2** (Facebook AI Research) — 2025-06-11 | Parameters: 1B - License: open | Type: model - AI model by Facebook AI Research - **Cosmos-Predict2-2B-Text2Image** (NVIDIA) — 2025-06-11 | Parameters: 2B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Predict2-14B-Text2Image** (NVIDIA) — 2025-06-11 | Parameters: 14B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Predict2-2B-Video2World** (NVIDIA) — 2025-06-11 | Parameters: 2B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Predict2-14B-Video2World** (NVIDIA) — 2025-06-11 | Parameters: 14B - License: open | Type: model - AI model by NVIDIA - **Seed-1.6-Thinking** (ByteDance) — 2025-06-11 | Parameters: 230B - License: closed | Type: model - AI model by ByteDance - **Seed 1.6** (ByteDance) — 2025-06-11 | Parameters: 230B - License: closed | Type: model - AI model by ByteDance - **Magistral Medium 1.1** (Mistral AI) — 2025-06-10 - License: closed | Type: model - AI model by Mistral AI - **Magistral Small 1.1** (Mistral AI) — 2025-06-10 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI - **o3-pro** (OpenAI) — 2025-06-10 - License: closed | Type: model - AI model by OpenAI - **Seedance 1.0** (ByteDance) — 2025-06-10 - License: closed | Type: model - AI model by ByteDance - **Skywork-R1V3** (Kunlun Inc.) — 2025-06-10 | Parameters: 38B - License: closed | Type: model - AI model by Kunlun Inc. - **Devstral Medium** (Mistral AI,All Hands AI) — 2025-06-10 - License: closed | Type: model - AI model by Mistral AI,All Hands AI - **MiniCPM-4-8B** (OpenBMB (Open Lab for Big Model Base)) — 2025-06-09 | Parameters: 8B - License: closed | Type: model - AI model by OpenBMB (Open Lab for Big Model Base) - **Redwood AI** (1X) — 2025-06-09 - License: closed | Type: model - AI model by 1X - **Boltz-2** (Massachusetts Institute of Technology (MIT),Recursion Pharmaceuticals,ETH Zurich,Valence Labs) — 2025-06-06 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT),Recursion Pharmaceuticals,ETH Zurich,Valence Labs - **MiMo-7B-Base** (Xiaomi Corp) — 2025-06-05 | Parameters: 7B - License: open | Type: model - AI model by Xiaomi Corp - **Qwen3 Embedding** (Alibaba) — 2025-06-05 | Parameters: 8B - License: open | Type: model - AI model by Alibaba - **Qwen3 Reranker** (Alibaba) — 2025-06-05 | Parameters: 8B - License: open | Type: model - AI model by Alibaba - **ether0** (FutureHouse) — 2025-06-05 | Parameters: 24B - License: open | Type: model - AI model by FutureHouse - **Claude Gov** (Anthropic) — 2025-06-05 - License: closed | Type: model - AI model by Anthropic - **Gemini 2.5 Pro (Jun 2025)** (Google DeepMind) — 2025-06-05 - License: closed | Type: model - AI model by Google DeepMind - **Ink Whisper** (Cartesia) — 2025-06-04 - License: closed | Type: model - AI model by Cartesia - **MiMo-VL-7B-SFT** (Xiaomi Corp) — 2025-06-04 | Parameters: 7B - License: open | Type: model - AI model by Xiaomi Corp - **MiMo‑VL‑7B‑RL** (Xiaomi Corp) — 2025-06-04 | Parameters: 7B - License: open | Type: model - AI model by Xiaomi Corp - **Eleven v3** (ElevenLabs) — 2025-06-03 - License: closed | Type: model - AI model by ElevenLabs - **OpenAudio-S1** (Fish Audio) — 2025-06-03 | Parameters: 4B - License: closed | Type: model - AI model by Fish Audio - **OpenAudio-S1-mini** (Fish Audio) — 2025-06-03 | Parameters: 500M - License: open | Type: model - AI model by Fish Audio - **Gemini 2.5 Flash Native Audio** (Google DeepMind) — 2025-06-03 - License: closed | Type: model - AI model by Google DeepMind - **Spectra 1.1** (Consortium) — 2025-06-01 | Parameters: Spectra 1.1 - License: open | Type: model - "Spectra-1.1, an open suite of TriLMs trained on up to 1.2 trillion tokens, demonstrating sustained performance gains at scale. Furthermore, to improve inference efficiency, we propose novel 2-bit and 1.6-bit packing schemes for ternary weights" - **DiffuCoder** (Apple) — 2025-06-01 | Parameters: DiffuCoder - License: open | Type: model - "We adapt our model from Qwen2.5-Coder (Hui et al., 2024) as the base model to perform continual pre-training using the adaptation approach from Gong et al. (2025). During this pre-training, we use a 400B-token code pre-training corpus from RefineCode (Huang et al., 2024) and Stackv2 (Lozhkov et al., 2024)." - **Hunyuan-A13B** (Tencent) — 2025-06-01 | Parameters: Hunyuan-A13B - License: open | Type: model - 80B-A13B. 'We have open-sourced Hunyuan-A13B-Pretrain , Hunyuan-A13B-Instruct , Hunyuan-A13B-Instruct-FP8 , Hunyuan-A13B-Instruct-GPTQ-Int4 on Hugging Face.' - **Mercury** (Inception) — 2025-06-01 | Parameters: Mercury - License: open | Type: model - Diffusion large language model (dLLM). - **Mu** (Microsoft) — 2025-06-01 | Parameters: Mu - License: open | Type: model - "distillation from Microsoft’s Phi models...Mu is an efficient 330M encoder–decoder language model optimized for small-scale deployment, particularly on the NPUs on Copilot+ PCs. It follows a transformer encoder–decoder architecture" - **Gemini Robotics On-Device** (Google DeepMind) — 2025-06-01 | Parameters: Gemini Robotics On-Device - License: open | Type: model - See Mar/2025 Gemini Robotics-ER model for comparison. Announce: https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/ - **ICONN-1** (ICONNAI) — 2025-06-01 | Parameters: ICONN-1 - License: open | Type: model - "ICONN-1 (this version) is optimized for natural, emotionally resonant, and conversational interactions. ICONN-e1 is a specialized variant of the model fine-tuned for advanced reasoning, critical analysis, and complex problem-solving." - **MiniMax-M1** (MiniMax) — 2025-06-01 | Parameters: MiniMax-M1 - License: open | Type: model - 456B-A45.9B. Announce: https://www.minimax.io/news/minimaxm1 - **Magistral Medium** (Mistral) — 2025-06-01 | Parameters: Magistral Medium - License: open | Type: model - Magistral Small=24B. Announce: https://mistral.ai/news/magistral - **Comma v0.1-2T** (EleutherAI) — 2025-06-01 | Parameters: Comma v0.1-2T - License: open | Type: model - "Comma v0.1-2T is a decoder-only transformer that uses the same architecture as Llama 3. Training was done in two stages: first on 1.93 trillion tokens with a cosine learning rate schedule, and second a "cool-down" training phase on 75.5 billion tokens from high-quality sources. The final model is the average of 10 checkpoints during this cool-down phase. Both training phases use a batch size of 8.3 million tokens per step. Training was performed using lingua on 512 AMD MI300A GPUs." - **dots.llm1** (Xiaohongshu/RedNote) — 2025-06-01 | Parameters: dots.llm1 - License: open | Type: model - 142B-A14B. "dots.llm1, a large-scale MoE model that activates 14 billion parameters out of a total of 142 billion parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on 11.2T high-quality tokens and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models." - **Gemini 2.5 Pro 06-05** (Google DeepMind) — 2025-06-01 | Parameters: Gemini 2.5 Pro 06-05 - License: open | Type: model - "an upgraded preview of Gemini 2.5 Pro, our most intelligent model yet. Building on the version we released in May and showed at I/O, this model will be the generally available, stable version starting in a couple of weeks, ready for enterprise-scale applications." - **Kling 2.1** (Kuaishou Technology) — 2025-05-29 - License: closed | Type: model - AI model by Kuaishou Technology - **FLUX.1 Kontext [pro]** (Black Forest Labs) — 2025-05-29 - License: closed | Type: model - AI model by Black Forest Labs - **FLUX.1 Kontext [max]** (Black Forest Labs) — 2025-05-29 - License: closed | Type: model - AI model by Black Forest Labs - **FLUX.1 Kontext [dev]** (Black Forest Labs) — 2025-05-29 | Parameters: 12B - License: closed | Type: model - AI model by Black Forest Labs - **EVI 3** (Hume) — 2025-05-29 - License: closed | Type: model - AI model by Hume - **Skywork-OR1-32B** (Kunlun Inc.) — 2025-05-29 | Parameters: 32B - License: open | Type: model - AI model by Kunlun Inc. - **Codestral Embed** (Mistral AI) — 2025-05-28 - License: closed | Type: model - AI model by Mistral AI - **Pangu Pro MoE** (Huawei) — 2025-05-28 | Parameters: 72.0B - License: open | Type: model - AI model by Huawei - **DeepSeek-R1 (May 2025)** (DeepSeek) — 2025-05-28 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **SignGemma** (Google) — 2025-05-27 - License: closed | Type: model - AI model by Google - **OpenOmni** (Chinese Academy of Sciences,Shenzhen Institute of Advanced Technology,University of Chinese Academy of Sciences,National University of Singapore,University of Science and Technology of China (USTC)) — 2025-05-24 | Parameters: 7B - License: open | Type: model - AI model by Chinese Academy of Sciences,Shenzhen Institute of Advanced Technology,University of Chinese Academy of Sciences,National University of Singapore,University of Science and Technology of China (USTC) - **o3 Operator** (OpenAI) — 2025-05-23 - License: closed | Type: model - AI model by OpenAI - **DataRater test model (1B)** (Google DeepMind) — 2025-05-23 | Parameters: 1B - License: closed | Type: model - AI model by Google DeepMind - **Claude Opus 4** (Anthropic) — 2025-05-22 - License: closed | Type: model - AI model by Anthropic - **Claude Sonnet 4** (Anthropic) — 2025-05-22 - License: closed | Type: model - AI model by Anthropic - **Reason-ModernColBERT** (LightOn) — 2025-05-22 | Parameters: 150M - License: open | Type: model - AI model by LightOn - **Veo 3** (Google DeepMind) — 2025-05-21 - License: closed | Type: model - AI model by Google DeepMind - **Falcon-Arabic** (Technology Innovation Institute) — 2025-05-21 | Parameters: 7B - License: closed | Type: model - AI model by Technology Innovation Institute - **Falcon-H1** (Technology Innovation Institute) — 2025-05-21 | Parameters: 34B - License: open | Type: model - AI model by Technology Innovation Institute - **Devstral Small** (Mistral AI,All Hands AI) — 2025-05-21 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI,All Hands AI - **MedGemma 27B** (Google) — 2025-05-20 | Parameters: 27B - License: open | Type: model - AI model by Google - **Gemma 3n** (Google) — 2025-05-20 | Parameters: 7.8B - License: open | Type: model - AI model by Google - **Lyria RealTime** (Google DeepMind) — 2025-05-20 - License: closed | Type: model - AI model by Google DeepMind - **Imagen 4** (Google) — 2025-05-20 - License: closed | Type: model - AI model by Google - **voyage-3.5** (Voyage AI) — 2025-05-20 - License: closed | Type: model - AI model by Voyage AI - **Imagen 4 ultra** (Google) — 2025-05-20 - License: closed | Type: model - AI model by Google - **Marin 8B** (Marin) — 2025-05-19 | Parameters: 8B - License: open | Type: model - AI model by Marin - **Cosmos-Reason1 7B** (NVIDIA) — 2025-05-19 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Reason1 56B** (NVIDIA) — 2025-05-19 | Parameters: 56B - License: closed | Type: model - AI model by NVIDIA - **NVIDIA Isaac GR00T N1.5 3B** (NVIDIA) — 2025-05-18 | Parameters: 3B - License: open | Type: model - AI model by NVIDIA - **SANA 1.5 4.8B** (NVIDIA,Massachusetts Institute of Technology (MIT),Tsinghua University,Playground,Peking University,The University of Hong Kong) — 2025-05-17 | Parameters: 4.8B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT),Tsinghua University,Playground,Peking University,The University of Hong Kong - **codex-1** (OpenAI) — 2025-05-16 - License: closed | Type: model - AI model by OpenAI - **codex-mini** (OpenAI) — 2025-05-16 - License: closed | Type: model - AI model by OpenAI - **Repress** (DeepGenomics) — 2025-05-16 | Parameters: 133M - License: closed | Type: model - AI model by DeepGenomics - **II-Medical-8B** (Intelligent Internet) — 2025-05-15 | Parameters: 8B - License: open | Type: model - AI model by Intelligent Internet - **AlphaEvolve** (DeepMind) — 2025-05-14 - License: closed | Type: model - AI model by DeepMind - **LTX-Video-0.9.7. 13B distilled** (Lightricks) — 2025-05-14 - License: open | Type: model - AI model by Lightricks - **Pixverse v4.5** (PixVerse AI) — 2025-05-13 - License: closed | Type: model - AI model by PixVerse AI - **Hunyuan T1-Vision** (Tencent) — 2025-05-12 - License: closed | Type: model - AI model by Tencent - **Minimax-Speech-02-HD** (MiniMax) — 2025-05-12 - License: closed | Type: model - AI model by MiniMax - **NTele-R1-32B-V1** (ZTE) — 2025-05-12 | Parameters: 32.8B - License: open | Type: model - AI model by ZTE - **Seed1.5-VL** (ByteDance) — 2025-05-11 - License: closed | Type: model - AI model by ByteDance - **Earth-2 (cBottle-SR)** (NVIDIA) — 2025-05-10 | Parameters: 330M - License: open | Type: model - AI model by NVIDIA - **Tianxi-32B** (Lenovo) — 2025-05-09 | Parameters: 32B - License: closed | Type: model - AI model by Lenovo - **Tianxi-72B** (Lenovo) — 2025-05-09 | Parameters: 72B - License: closed | Type: model - AI model by Lenovo - **Mistral Medium 3** (Mistral AI) — 2025-05-07 - License: closed | Type: model - AI model by Mistral AI - **Pangu Ultra MoE** (Huawei) — 2025-05-07 | Parameters: 718B - License: closed | Type: model - AI model by Huawei - **Apriel Nemotron 15B** (NVIDIA,ServiceNow) — 2025-05-06 | Parameters: 15B - License: open | Type: model - AI model by NVIDIA,ServiceNow - **Kevin-32B** (Cognition,Stanford University) — 2025-05-06 | Parameters: 32B - License: open | Type: model - AI model by Cognition,Stanford University - **Gemini 2.5 Pro (May 2025)** (Google DeepMind) — 2025-05-06 - License: closed | Type: model - AI model by Google DeepMind - **Typhoon 2.1 Gemma 4B** (Typhoon / SCB 10X) — 2025-05-05 | Parameters: 4B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Typhoon 2.1 Gemma 12B** (Typhoon / SCB 10X) — 2025-05-05 | Parameters: 12B - License: open | Type: model - AI model by Typhoon / SCB 10X - **LTX-Video-0.9.7. 13B** (Lightricks) — 2025-05-05 | Parameters: 13B - License: open | Type: model - AI model by Lightricks - **LinOSS** (Massachusetts Institute of Technology (MIT)) — 2025-05-02 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **MiniMax-Speech-02-turbo** (MiniMax) — 2025-05-02 - License: closed | Type: model - AI model by MiniMax - **MiMo-7B-RL-0530** (Xiaomi) — 2025-05-01 | Parameters: MiMo-7B-RL-0530 - License: open | Type: model - "[2025.05.30] During the RL training, by continuously expanding the training window size (from 32K to 48K), the performance of MiMo-7B-RL-0530 on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1... MiMo-7B-Base is pre-trained on approximately 25 trillion tokens." - **DeepTransformers** (Google DeepMind) — 2025-05-01 | Parameters: DeepTransformers - License: closed | Type: model - "Atlas, a long-term memory module with high capacity that learns to memorize the context by optimizing the memory based on the current and past tokens, overcoming the online nature of long-term memory models. Building on this insight, we present a new family of Transformer-like architectures, called DeepTransformers, that are strict generalizations of the original Transformer architecture." - **Atlas** (Google DeepMind) — 2025-05-01 | Parameters: Atlas - License: closed | Type: model - "Atlas, a long-term memory module with high capacity that learns to memorize the context by optimizing the memory based on the current and past tokens, overcoming the online nature of long-term memory models. Building on this insight, we present a new family of Transformer-like architectures, called DeepTransformers, that are strict generalizations of the original Transformer architecture." - **DeepSeek-R1-0528** (DeepSeek-AI) — 2025-05-01 | Parameters: DeepSeek-R1-0528 - License: open | Type: model - Censorship increased significantly. "overall performance is now approaching that of leading models, such as o3 and Gemini 2.5 Pro." MMLU shows MMLU-Redux score with lower error rate. - **Fathom-R1-14B** (Fractal Analytics) — 2025-05-01 | Parameters: Fathom-R1-14B - License: open | Type: model - Base R1-distilled-14B model, based on Qwen 14B. Media release. - **QwenLong-L1-32B** (Alibaba) — 2025-05-01 | Parameters: QwenLong-L1-32B - License: open | Type: model - "the first long-context LRM trained with reinforcement learniing for long-context reasoning." - **Claude Opus 4** (Anthropic) — 2025-05-01 | Parameters: Claude Opus 4 - License: open | Type: model - "Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing. With advanced reasoning and powerful collaboration capabilities…Both models can also alternate between reasoning and tool use—like web search—to improve responses…Claude Opus 4 can work continuously for hours on complex, long-running tasks" - **Falcon-H1** (TII) — 2025-05-01 | Parameters: Falcon-H1 - License: open | Type: model - "hybrid architecture that combines the strengths of the classical Transformer-based attention mechanism with the State Space Model (SSM), known for its superior long-context memory and computational efficiency." - **Gemini Diffusion** (Google DeepMind) — 2025-05-01 | Parameters: Gemini Diffusion - License: open | Type: model - "Gemini Diffusion’s external benchmark performance is comparable to much larger models [like Gemini-2.0-Flash-Lite], whilst also being faster." - **Gemma 3n** (Google DeepMind) — 2025-05-01 | Parameters: Gemma 3n - License: open | Type: model - Matryoshka Transformer or MatFormer model architecture. 850M (696M / 620M / 582M). - **ParScale** (Alibaba) — 2025-05-01 | Parameters: ParScale - License: open | Type: model - "We introduce the third scaling paradigm for scaling LLMs: leverages parallel computation during both training and inference time (Parallel Scaling, or ParScale)... ParScale can use up to 22× less memory increase and 6× less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget." MMLU shows for 1.8B models, not the 4.7B models. - **codex-1** (OpenAI) — 2025-05-01 | Parameters: codex-1 - License: open | Type: model - o3 base. "codex-1, a version of OpenAI o3 optimized for software engineering. It was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result." - **Falcon-Edge** (TII) — 2025-05-01 | Parameters: Falcon-Edge - License: open | Type: model - "Falcon-Edge series - a collection of powerful, universal, and fine-tunable language models available in ternary format, based on the BitNet architecture." - **SWE-1** (Windsurf) — 2025-05-01 | Parameters: SWE-1 - License: open | Type: model - "SWE-1, optimized for the entire software engineering process, not just the task of coding." - **INTELLECT-2** (Prime Intellect) — 2025-05-01 | Parameters: INTELLECT-2 - License: open | Type: model - QwQ-32B base. Announce: https://www.primeintellect.ai/blog/intellect-2-release Finished training 30/Apr/2025: https://app.primeintellect.ai/intelligence/intellect-2 - **Pangu Ultra MoE** (Huawei) — 2025-05-01 | Parameters: Pangu Ultra MoE - License: closed | Type: model - 718B-A39B. Trained on 6,000 Ascend NPUs (Kunpeng 920 processors in Huawei Atlas 800T A2 servers). - **Mistral Medium 3** (Mistral) — 2025-05-01 | Parameters: Mistral Medium 3 - License: open | Type: model - Multimodal. 50B param estimate based on "Mistral Medium 3 can also be deployed on any cloud, including self-hosted environments of four GPUs and above.". Note: "With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :) " - **Granite-4.0-Tiny-Preview** (IBM) — 2025-05-01 | Parameters: Granite-4.0-Tiny-Preview - License: open | Type: model - "the model is only partially trained—it has only seen 2.5T of a planned 15T or more training tokens...Granite 4.0 Tiny-Preview, specifically, is a fine-grained hybrid mixture of experts (MoE) model, with 7B total parameters and only 1B active parameters at inference time... Like its predecessors in Granite 3.2 and Granite 3.3, Granite 4.0 Tiny Preview offers toggleable thinking on and thinking off functionality (though its reasoning-focused post-training is very much incomplete)." - **Phi-4-Reasoning** (Microsoft) — 2025-04-30 | Parameters: 14B - License: open | Type: model - AI model by Microsoft - **Phi-4-Reasoning-plus** (Microsoft) — 2025-04-30 | Parameters: 14B - License: open | Type: model - AI model by Microsoft - **DeepSeek-Prover-V2-671B** (DeepSeek) — 2025-04-30 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-Prover-V2-7B** (DeepSeek) — 2025-04-30 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **TranscriptFormer** (Chan Zuckerberg Initiative) — 2025-04-30 - License: closed | Type: model - AI model by Chan Zuckerberg Initiative - **Amazon Nova Premier** (Amazon) — 2025-04-30 - License: closed | Type: model - AI model by Amazon - **GTE-ModernColBERT-v1** (LightOn) — 2025-04-30 | Parameters: 149M - License: open | Type: model - AI model by LightOn - **Qwen3-235B-A22B** (Alibaba) — 2025-04-29 | Parameters: 235B - License: open | Type: model - AI model by Alibaba - **Qwen3-30B-A3B** (Alibaba) — 2025-04-29 | Parameters: 30B - License: open | Type: model - AI model by Alibaba - **Qwen3-32B** (Alibaba) — 2025-04-29 | Parameters: 32.8B - License: open | Type: model - AI model by Alibaba - **Qwen3-14B** (Alibaba) — 2025-04-29 | Parameters: 14.8B - License: open | Type: model - AI model by Alibaba - **Qwen3-8B** (Alibaba) — 2025-04-29 | Parameters: 8.2B - License: open | Type: model - AI model by Alibaba - **Qwen3-4B** (Alibaba) — 2025-04-29 | Parameters: 4B - License: open | Type: model - AI model by Alibaba - **Qwen3-1.7B** (Alibaba) — 2025-04-29 | Parameters: 1.7B - License: open | Type: model - AI model by Alibaba - **Qwen3-0.6B** (Alibaba) — 2025-04-29 | Parameters: 600M - License: open | Type: model - AI model by Alibaba - **Foundation-sec-8b** (Cisco) — 2025-04-28 | Parameters: 8B - License: open | Type: model - AI model by Cisco - **Palmyra X5** (Writer) — 2025-04-28 - License: closed | Type: model - AI model by Writer - **Atlantes** (Allen Institute for AI) — 2025-04-26 - License: closed | Type: model - AI model by Allen Institute for AI - **Pleias-RAG-350m** (PleIAs) — 2025-04-25 | Parameters: 350M - License: open | Type: model - AI model by PleIAs - **Pleias-RAG-1B** (PleIAs) — 2025-04-25 | Parameters: 1.2B - License: open | Type: model - AI model by PleIAs - **HiDream-I1** (HiDream) — 2025-04-25 | Parameters: 18B - License: open | Type: model - AI model by HiDream - **Firefly Image 4** (Adobe) — 2025-04-24 - License: closed | Type: model - AI model by Adobe - **Firefly Image 4 Ultra** (Adobe) — 2025-04-24 - License: closed | Type: model - AI model by Adobe - **gpt-image-1** (OpenAI) — 2025-04-23 - License: closed | Type: model - AI model by OpenAI - **MamayLM** (INSAIT,ETH Zurich) — 2025-04-23 | Parameters: 9B - License: open | Type: model - AI model by INSAIT,ETH Zurich - **π0.5 (pi-0.5)** (Physical Intelligence) — 2025-04-22 | Parameters: 3.3B - License: closed | Type: model - AI model by Physical Intelligence - **Eagle 2.5** (NVIDIA,Nanjing University,Hong Kong Polytechnic University,Rutgers University) — 2025-04-21 | Parameters: 8B - License: closed | Type: model - AI model by NVIDIA,Nanjing University,Hong Kong Polytechnic University,Rutgers University - **SkyReels-V2** (Kunlun Inc.) — 2025-04-21 | Parameters: 14B - License: closed | Type: model - AI model by Kunlun Inc. - **Llama-Primus-Nemotron-70B** (Trend Micro) — 2025-04-21 | Parameters: 70B - License: open | Type: model - AI model by Trend Micro - **Trillion-7B** (Trillion Labs) — 2025-04-21 | Parameters: 7B - License: open | Type: model - AI model by Trillion Labs - **Gemma 3 QAT 4B** (Google DeepMind) — 2025-04-18 | Parameters: 4B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 QAT 1B** (Google DeepMind) — 2025-04-18 | Parameters: 1B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 QAT 12B** (Google DeepMind) — 2025-04-18 | Parameters: 12B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 QAT 27B** (Google DeepMind) — 2025-04-18 | Parameters: 27B - License: open | Type: model - AI model by Google DeepMind - **Gemini 2.5 Flash** (Google DeepMind) — 2025-04-17 - License: closed | Type: model - AI model by Google DeepMind - **Demist-2** (Darktrace) — 2025-04-17 | Parameters: 95M - License: closed | Type: model - AI model by Darktrace - **o4-mini** (OpenAI) — 2025-04-16 - License: closed | Type: model - AI model by OpenAI - **Seedream 3.0** (ByteDance) — 2025-04-16 - License: closed | Type: model - AI model by ByteDance - **Digest: Cyber AI Analyst** (Darktrace) — 2025-04-16 - License: closed | Type: model - AI model by Darktrace - **MAI-DS-R1** (Microsoft) — 2025-04-16 | Parameters: 671B - License: open | Type: model - AI model by Microsoft - **Nova-3** (Deepgram) — 2025-04-15 - License: closed | Type: model - AI model by Deepgram - **Kling 2.0 Video Generation** (Kuaishou Technology) — 2025-04-15 - License: closed | Type: model - AI model by Kuaishou Technology - **Kolors 2.0 Image Generation** (Kuaishou Technology) — 2025-04-15 - License: closed | Type: model - AI model by Kuaishou Technology - **TerraMind** (IBM,Forschungszentrum Julich,European Space Agency (ESA),NASA) — 2025-04-15 - License: open | Type: model - AI model by IBM,Forschungszentrum Julich,European Space Agency (ESA),NASA - **Cohere Embed 4** (Cohere) — 2025-04-15 - License: closed | Type: model - AI model by Cohere - **GPT-4.1** (OpenAI) — 2025-04-14 - License: closed | Type: model - AI model by OpenAI - **GPT-4.1 mini** (OpenAI) — 2025-04-14 - License: closed | Type: model - AI model by OpenAI - **GPT-4.1 nano** (OpenAI) — 2025-04-14 - License: closed | Type: model - AI model by OpenAI - **Nemotron-H 8B** (NVIDIA) — 2025-04-14 | Parameters: 8B - License: open | Type: model - AI model by NVIDIA - **Nemotron-H 47B** (NVIDIA) — 2025-04-14 | Parameters: 47B - License: open | Type: model - AI model by NVIDIA - **Nemotron-H 56B** (NVIDIA) — 2025-04-14 | Parameters: 56B - License: open | Type: model - AI model by NVIDIA - **DolphinGemma** (Google DeepMind,Georgia Institute of Technology,Wild Dolphin Project) — 2025-04-14 | Parameters: 400M - License: closed | Type: model - AI model by Google DeepMind,Georgia Institute of Technology,Wild Dolphin Project - **GLM-Z1-Rumination-32B-0414** (Tsinghua University) — 2025-04-14 | Parameters: 32B - License: open | Type: model - AI model by Tsinghua University - **GLM-4-9B-0414** (Tsinghua University) — 2025-04-14 | Parameters: 9B - License: open | Type: model - AI model by Tsinghua University - **360Zhinao3-7B-O1.5** (360 Security Technology) — 2025-04-14 | Parameters: 7B - License: open | Type: model - AI model by 360 Security Technology - **GLM-4-32B-0414** (Z.ai (Zhipu AI),Tsinghua University) — 2025-04-14 | Parameters: 32B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **SenseNova V6** (SenseTime) — 2025-04-12 | Parameters: 600B - License: closed | Type: model - AI model by SenseTime - **Seaweed-7B** (ByteDance) — 2025-04-11 | Parameters: 7B - License: closed | Type: model - AI model by ByteDance - **Pangu Ultra** (Huawei) — 2025-04-10 | Parameters: 135B - License: closed | Type: model - AI model by Huawei - **AMIE (Articulate Medical Intelligence Explorer)** (Google DeepMind,Google Research) — 2025-04-09 | Parameters: 340B - License: closed | Type: model - AI model by Google DeepMind,Google Research - **SHIFT-SUV** (Luminary Cloud,NVIDIA,Honda) — 2025-04-09 - License: open | Type: model - AI model by Luminary Cloud,NVIDIA,Honda - **Gen-4 Turbo** (Runway) — 2025-04-09 - License: closed | Type: model - AI model by Runway - **QWQ-Plus** (Alibaba) — 2025-04-08 - License: closed | Type: model - AI model by Alibaba - **TxGemma 27B** (Google DeepMind,Google Research) — 2025-04-08 | Parameters: 27B - License: open | Type: model - AI model by Google DeepMind,Google Research - **TxGemma 9B** (Google DeepMind,Google Research) — 2025-04-08 | Parameters: 9B - License: open | Type: model - AI model by Google DeepMind,Google Research - **TxGemma 2B** (Google DeepMind,Google Research) — 2025-04-08 | Parameters: 2.6B - License: open | Type: model - AI model by Google DeepMind,Google Research - **Amazon Nova Sonic** (Amazon) — 2025-04-08 - License: closed | Type: model - AI model by Amazon - **T5Gemma (Gemma 9B-9B)** (Google DeepMind) — 2025-04-08 | Parameters: 16.7B - License: open | Type: model - AI model by Google DeepMind - **Amazon Nova Reel** (Amazon) — 2025-04-07 - License: closed | Type: model - AI model by Amazon - **Llama 4 Scout** (Meta AI) — 2025-04-05 | Parameters: 109B - License: open | Type: model - AI model by Meta AI - **Llama 4 Maverick** (Meta AI) — 2025-04-05 | Parameters: 400B - License: open | Type: model - AI model by Meta AI - **Llama 4 Behemoth (preview)** (Meta AI) — 2025-04-05 | Parameters: 2T - License: closed | Type: model - AI model by Meta AI - **Sec-Gemini v1** (Google) — 2025-04-04 - License: closed | Type: model - AI model by Google - **FoundationStereo** (NVIDIA) — 2025-04-04 | Parameters: 335.3M - License: open | Type: model - AI model by NVIDIA - **Midjourney V7** (Midjourney) — 2025-04-03 - License: closed | Type: model - AI model by Midjourney - **SkyReels-A2** (Kunlun Inc.) — 2025-04-03 | Parameters: 14B - License: closed | Type: model - AI model by Kunlun Inc. - **OpenThaiGPT 1.6 / OTG-1.6 (72B)** (Mahidol University,AI Entrepreneurs Association of Thailand) — 2025-04-02 | Parameters: 72B - License: open | Type: model - AI model by Mahidol University,AI Entrepreneurs Association of Thailand - **OpenThaiGPT R1 32b / OTG-R1 (32B)** (Mahidol University,AI Entrepreneurs Association of Thailand) — 2025-04-02 | Parameters: 32B - License: open | Type: model - AI model by Mahidol University,AI Entrepreneurs Association of Thailand - **Nova Premier** (Amazon) — 2025-04-01 | Parameters: Nova Premier - License: open | Type: model - Announce: https://aws.amazon.com/blogs/aws/amazon-nova-premier-our-most-capable-model-for-complex-tasks-and-teacher-for-model-distillation/ - **Phi-4-reasoning-plus** (Microsoft) — 2025-04-01 | Parameters: Phi-4-reasoning-plus - License: open | Type: model - "Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning." - **Bamba-9B-v2** (IBM) — 2025-04-01 | Parameters: Bamba-9B-v2 - License: open | Type: model - "During Christmas of 2024, IBM, Princeton, CMU, and UIUC released, Bamba v1, a performant Mamba2 based pretrained model with full data lineage trained to 2T tokens. Since then, we have been busy cooking an update with new datasets. Today, we are excited to release Bamba v2, trained for an additional 1T tokens that significantly improves on Bamba v1. The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data. All of this with the inference speedup that we get from Mamba2 based architecture, which with the latest vLLM is 2-2.5x faster than similar sized transformer models." - **Qwen3-235B-A22B** (Alibaba) — 2025-04-01 | Parameters: Qwen3-235B-A22B - License: open | Type: model - Qwen3-235B-A22B. Qwen3-30B-A3B. "Qwen3 is pre-trained on 36 trillion tokens across 119 languages" - **Qwen3-0.6B** (Alibaba) — 2025-04-01 | Parameters: Qwen3-0.6B - License: open | Type: model - Record data ratio 60,000:1. "Qwen3 is pre-trained on 36 trillion tokens across 119 languages" - **ERNIE X1 Turbo** (Baidu) — 2025-04-01 | Parameters: ERNIE X1 Turbo - License: open | Type: model - Announce: https://x.com/Baidu_Inc/status/1915603080336597310 - **ERNIE 4.5 Turbo** (Baidu) — 2025-04-01 | Parameters: ERNIE 4.5 Turbo - License: open | Type: model - Announce: https://x.com/Baidu_Inc/status/1915603080336597310 - **MAI-DS-R1** (Microsoft) — 2025-04-01 | Parameters: MAI-DS-R1 - License: open | Type: model - DeepSeek-R1 base. "MAI-DS-R1, a new open weights DeepSeek R1 model variant... post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile, while maintaining its reasoning capabilities and competitive performance." - **Gemini 2.5 Flash Preview** (Google DeepMind) — 2025-04-01 | Parameters: Gemini 2.5 Flash Preview - License: open | Type: model - Context in=1M, out=64k. Knowledge cutoff Jan/2025. Codename 'nebula'. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **o4-mini** (OpenAI) — 2025-04-01 | Parameters: o4-mini - License: open | Type: model - https://openai.com/index/introducing-o3-and-o4-mini/ MMLU shows a translated LOTE. - **o3** (OpenAI) — 2025-04-01 | Parameters: o3 - License: open | Type: model - https://openai.com/index/introducing-o3-and-o4-mini/ MMLU shows a translated LOTE. - **BitNet b1.58 2B4T** (Microsoft) — 2025-04-01 | Parameters: BitNet b1.58 2B4T - License: open | Type: model - "the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens" - **Granite 3.3 8B Instruct** (IBM) — 2025-04-01 | Parameters: Granite 3.3 8B Instruct - License: open | Type: model - "Built on top of an updated Granite 3.3 base model and fine-tuned through multi-stage reinforcement learning using TPO and Group Relative Policy Optimization (GRPO), both Granite 3.3 Instruct models demonstrated significant improvement on the highly technical benchmarks conventionally associated with “reasoning” capabilities." - **GLM-4-0414** (Zhipu AI (Tsinghua)) — 2025-04-01 | Parameters: GLM-4-0414 - License: open | Type: model - Family: GLM-4-32B-Base-0414, GLM-4-32B-0414, GLM-Z1-32B-0414 (reasoning), GLM-Z1-Rumination-32B-0414 (reasoning + deep research). - **SEA-LION v3.5 70B R** (AI Singapore) — 2025-04-01 | Parameters: SEA-LION v3.5 70B R - License: open | Type: model - "Based on Llama 3.1 70B. SEA-LION v3.5, our first set of hybrid reasoning models trained on Southeast Asian data. Mode selection is managed through the tokenizer’s chat template and offers versatile functionality, handling both complex reasoning tasks and general text generation." - **GPT-4.1** (OpenAI) — 2025-04-01 | Parameters: GPT-4.1 - License: open | Type: model - Outperforms GPT‑4o "across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension. They feature a refreshed knowledge cutoff of June 2024." - **DolphinGemma** (Google DeepMind) — 2025-04-01 | Parameters: DolphinGemma - License: open | Type: model - "trained on Atlantic spotted dolphin sounds, we anticipate its potential utility for researchers studying other cetacean species, like bottlenose or spinner dolphins... Developed by Google, this AI model makes use of specific Google audio technologies: the SoundStream tokenizer efficiently represents dolphin sounds, which are then processed by a model architecture suited for complex sequences. This ~400M parameter model is optimally-sized to run directly on the Pixel phones WDP uses in the field." - **Apriel-5B** (ServiceNow) — 2025-04-01 | Parameters: Apriel-5B - License: open | Type: model - SLAM - ServiceNow Language Models Lab. The first release in the Apriel model family, designed to support research on foundation models. - **Seed-Thinking-v1.5** (ByteDance) — 2025-04-01 | Parameters: Seed-Thinking-v1.5 - License: open | Type: model - 200B-A20B. "Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding." - **Dream 7B** (Huawei) — 2025-04-01 | Parameters: Dream 7B - License: open | Type: model - "with Huawei Noah’s Ark Lab, we [Hong Kong University] release Dream 7B (Diffusion reasoning model), the most powerful open diffusion large language model to date." - **UltraLong-8B** (NVIDIA) — 2025-04-01 | Parameters: UltraLong-8B - License: open | Type: model - Llama-3.1-8B-Instruct base. 4M context window. - **Deepcoder-14B-Preview** (Together) — 2025-04-01 | Parameters: Deepcoder-14B-Preview - License: open | Type: model - Base DeepSeek-R1-Distill-Qwen-14B. - **Pangu Ultra** (Huawei) — 2025-04-01 | Parameters: Pangu Ultra - License: open | Type: model - Trained on 8,192 Ascend NPUs (Kunpeng 920 processors in Huawei Atlas 800T A2 servers). - **Nemotron-H-56B-Base** (NVIDIA) — 2025-04-01 | Parameters: Nemotron-H-56B-Base - License: open | Type: model - https://research.nvidia.com/labs/adlr/nemotronh/ - **Llama-3.1-Nemotron-Ultra-253B** (NVIDIA) — 2025-04-01 | Parameters: Llama-3.1-Nemotron-Ultra-253B - License: open | Type: model - Llama 3.1 405B base. "Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference." - **Llama 4 Behemoth** (Meta AI) — 2025-04-01 | Parameters: Llama 4 Behemoth - License: closed | Type: model - 2T-A288B. Announced Apr/2025, abandoned Jul/2025. "We also trained a teacher model, Llama 4 Behemoth, that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond... 288B active parameters, 16 experts, and nearly two trillion total parameters." - **Llama 4 Maverick** (Meta AI) — 2025-04-01 | Parameters: Llama 4 Maverick - License: open | Type: model - 400B-A17B. "Our most powerful open source multimodal model. 17B active params x 128 experts, 400B total params" - **Llama 4 Scout** (Meta AI) — 2025-04-01 | Parameters: Llama 4 Scout - License: open | Type: model - 200 languages, "includes diverse text, image, and video datasets." - **Sec-Gemini v1** (Google DeepMind) — 2025-04-01 | Parameters: Sec-Gemini v1 - License: open | Type: model - "Sec-Gemini v1 achieves this by combining Gemini’s advanced capabilities with near real-time cybersecurity knowledge and tooling. This combination allows it to achieve superior performance on key cybersecurity workflows, including incident root cause analysis, threat analysis, and vulnerability impact understanding." - **DeepSeek-GRM-27B** (DeepSeek-AI) — 2025-04-01 | Parameters: DeepSeek-GRM-27B - License: open | Type: model - Gemma-2-27B base. "Self-Principled Critique Tuning (SPCT) to foster scalable reward generation behaviors in GRMs through online RL, to generate principles adaptively and critiques accurately, resulting in DeepSeek-GRM models... The models will be released and open-sourced." - **Qwerky-72B** (Featherless AI) — 2025-04-01 | Parameters: Qwerky-72B - License: open | Type: model - "As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 72B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention" Dataset from Qwen2.5=18,000 tokens. - **Cogito 70B** (Deep Cogito) — 2025-04-01 | Parameters: Cogito 70B - License: open | Type: model - "We are releasing early checkpoints of models in sizes 3B, 8B, 14B, 32B and 70B trained using this methodology, starting from pretrained Llama / Qwen base checkpoints." - **AutoGLM Rumination** (Z.ai (Zhipu AI)) — 2025-04-01 - License: closed | Type: model - AI model by Z.ai (Zhipu AI) - **Amazon Nova Act** (Amazon) — 2025-03-31 - License: closed | Type: model - AI model by Amazon - **Gen-4** (Runway) — 2025-03-31 - License: closed | Type: model - AI model by Runway - **Papla P1** (Papla Media) — 2025-03-31 - License: closed | Type: model - AI model by Papla Media - **DeepHermes 3 - Mistral 24B** (Nous Research) — 2025-03-29 | Parameters: 24B - License: open | Type: model - AI model by Nous Research - **QVQ-Max** (Alibaba) — 2025-03-28 - License: closed | Type: model - AI model by Alibaba - **Lingju Lingnao** (Guangzhou Lingju Information Technology Co Ltd.) — 2025-03-28 - License: closed | Type: model - AI model by Guangzhou Lingju Information Technology Co Ltd. - **Lumina-Image-2.0** (Shanghai AI Lab,University of Sydney,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University,Krea AI) — 2025-03-27 | Parameters: 2.6B - License: open | Type: model - AI model by Shanghai AI Lab,University of Sydney,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University,Krea AI - **CassetteAI** (CassetteAI) — 2025-03-27 - License: closed | Type: model - AI model by CassetteAI - **FASHN v1.5** (FASHN AI) — 2025-03-27 - License: closed | Type: model - AI model by FASHN AI - **GPT-4o (Mar 2025)** (OpenAI) — 2025-03-27 - License: closed | Type: model - AI model by OpenAI - **GAIA-2** (Wayve) — 2025-03-26 | Parameters: 8.7B - License: closed | Type: model - AI model by Wayve - **Ideogram 3.0** (Ideogram) — 2025-03-26 - License: closed | Type: model - AI model by Ideogram - **Qwen2.5-Omni 7B** (Alibaba) — 2025-03-26 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-Omni 3B** (Alibaba) — 2025-03-26 | Parameters: 3B - License: open | Type: model - AI model by Alibaba - **Gemini 2.5 Pro (Mar 2025)** (Google DeepMind) — 2025-03-25 - License: closed | Type: model - AI model by Google DeepMind - **4o Image Generation** (OpenAI) — 2025-03-25 - License: closed | Type: model - AI model by OpenAI - **Sonar Reasoning Pro** (Perplexity) — 2025-03-25 - License: closed | Type: model - AI model by Perplexity - **Stable Video 4D 2.0 (SV4D 2.0)** (Stability AI,Northeastern University) — 2025-03-25 - License: open | Type: model - AI model by Stability AI,Northeastern University - **3DGUT** (NVIDIA,University of Toronto) — 2025-03-24 - License: open | Type: model - AI model by NVIDIA,University of Toronto - **DeepSeek-V3 (Mar 2025)** (DeepSeek) — 2025-03-24 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **Diffusion Renderer** (NVIDIA,University of Toronto,Vector Institute,University of Illinois Urbana-Champaign (UIUC)) — 2025-03-22 | Parameters: 1.1B - License: open | Type: model - AI model by NVIDIA,University of Toronto,Vector Institute,University of Illinois Urbana-Champaign (UIUC) - **HiCEGNN** (University of Missouri) — 2025-03-22 - License: open | Type: model - AI model by University of Missouri - **CodeScientist** (Allen Institute for AI) — 2025-03-20 - License: closed | Type: model - AI model by Allen Institute for AI - **Gezhi (格致大模型)** (Troy Information Technology Co., Ltd.) — 2025-03-19 - License: closed | Type: model - AI model by Troy Information Technology Co., Ltd. - **o1-pro** (OpenAI) — 2025-03-19 - License: closed | Type: model - AI model by OpenAI - **Llama Nemotron Nano 8B** (NVIDIA) — 2025-03-18 | Parameters: 8B - License: open | Type: model - AI model by NVIDIA - **Llama Nemotron Ultra 253B** (NVIDIA) — 2025-03-18 | Parameters: 253B - License: open | Type: model - AI model by NVIDIA - **Llama Nemotron Super 49B** (NVIDIA) — 2025-03-18 | Parameters: 49B - License: open | Type: model - AI model by NVIDIA - **GR00T N1 2B** (NVIDIA) — 2025-03-18 | Parameters: 2.2B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Transfer1-7B** (NVIDIA) — 2025-03-18 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA - **Mistral Small 3.1** (Mistral AI) — 2025-03-17 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI - **Chirp 3 Speech-to-Text** (Google,Google DeepMind) — 2025-03-17 - License: closed | Type: model - AI model by Google,Google DeepMind - **Chirp 3 HD Text-to-Speech** (Google,Google DeepMind) — 2025-03-17 - License: closed | Type: model - AI model by Google,Google DeepMind - **EXAONE Deep 32B** (LG AI Research) — 2025-03-16 | Parameters: 32B - License: open | Type: model - AI model by LG AI Research - **EXAONE Deep 7.8B** (LG AI Research) — 2025-03-16 | Parameters: 7.8B - License: open | Type: model - AI model by LG AI Research - **EXAONE Deep 2.4B** (LG AI Research) — 2025-03-16 | Parameters: 2.4B - License: open | Type: model - AI model by LG AI Research - **ERNIE-4.5-VL-424B-A47B (文心大模型4.5)** (Baidu) — 2025-03-16 | Parameters: 424B - License: open | Type: model - AI model by Baidu - **ERNIE x1 (文心大模型X1)** (Baidu) — 2025-03-16 - License: closed | Type: model - AI model by Baidu - **EXAONE 3.5-R 2.4B** (LG AI Research) — 2025-03-14 | Parameters: 2.4B - License: closed | Type: model - AI model by LG AI Research - **EXAONE 3.5-R 32B** (LG AI Research) — 2025-03-14 | Parameters: 32B - License: closed | Type: model - AI model by LG AI Research - **EXAONE 3.5-R 7.8B** (LG AI Research) — 2025-03-14 | Parameters: 7.8B - License: closed | Type: model - AI model by LG AI Research - **Cohere Command A** (Cohere) — 2025-03-13 | Parameters: 111B - License: open | Type: model - AI model by Cohere - **Meissonic** (National University of Singapore,Skywork AI,Hong Kong University of Science and Technology (HKUST),University of California (UC) Berkeley,Zhejiang University (ZJU)) — 2025-03-13 | Parameters: 1B - License: open | Type: model - AI model by National University of Singapore,Skywork AI,Hong Kong University of Science and Technology (HKUST),University of California (UC) Berkeley,Zhejiang University (ZJU) - **OLMo 2 32B** (Allen Institute for AI) — 2025-03-13 | Parameters: 32B - License: open | Type: model - AI model by Allen Institute for AI - **GigaChat 2 MAX** (Sber) — 2025-03-13 - License: closed | Type: model - AI model by Sber - **GigaChat 2 Lite** (Sber) — 2025-03-13 - License: closed | Type: model - AI model by Sber - **GigaChat 2 Pro** (Sber) — 2025-03-13 - License: closed | Type: model - AI model by Sber - **Gemma 3 27B** (Google DeepMind) — 2025-03-12 | Parameters: 27B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 12B** (Google DeepMind) — 2025-03-12 | Parameters: 12B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 4B** (Google DeepMind) — 2025-03-12 | Parameters: 4B - License: open | Type: model - AI model by Google DeepMind - **Gemma 3 1B** (Google DeepMind) — 2025-03-12 | Parameters: 1B - License: open | Type: model - AI model by Google DeepMind - **Gemini Robotics** (Google DeepMind) — 2025-03-12 - License: closed | Type: model - AI model by Google DeepMind - **Gemini Robotics-ER** (Google DeepMind) — 2025-03-12 - License: closed | Type: model - AI model by Google DeepMind - **Marey** (Moonvalley) — 2025-03-12 - License: closed | Type: model - AI model by Moonvalley - **Hunyuan-TurboS** (Tencent) — 2025-03-11 | Parameters: 560B - License: closed | Type: model - AI model by Tencent - **YuE** (Hong Kong University of Science and Technology (HKUST),Multimodal Art Projection (MAP)) — 2025-03-11 | Parameters: 7B - License: closed | Type: model - AI model by Hong Kong University of Science and Technology (HKUST),Multimodal Art Projection (MAP) - **Reka Flash 3** (Reka AI) — 2025-03-10 | Parameters: 21B - License: open | Type: model - AI model by Reka AI - **Seedream 2.0** (ByteDance) — 2025-03-10 - License: closed | Type: model - AI model by ByteDance - **FoxBrain** (Foxconn) — 2025-03-10 | Parameters: 70B - License: open | Type: model - AI model by Foxconn - **Ling-lite-1.5 ("Bailing")** (Ant Group) — 2025-03-10 | Parameters: 16.8B - License: open | Type: model - AI model by Ant Group - **Ling-Plus ("Bailing")** (Ant Group) — 2025-03-10 | Parameters: 290B - License: open | Type: model - AI model by Ant Group - **Sonic 2** (Cartesia) — 2025-03-07 - License: closed | Type: model - AI model by Cartesia - **QwQ-32B** (Alibaba) — 2025-03-06 | Parameters: 32.5B - License: open | Type: model - AI model by Alibaba - **Mistral OCR** (Mistral AI) — 2025-03-06 - License: closed | Type: model - AI model by Mistral AI - **Jamba 1.6 Mini** (AI21 Labs) — 2025-03-06 | Parameters: 52B - License: open | Type: model - AI model by AI21 Labs - **Jamba 1.6 Large** (AI21 Labs) — 2025-03-06 | Parameters: 398B - License: open | Type: model - AI model by AI21 Labs - **Character-3** (Hedra AI) — 2025-03-06 - License: open | Type: model - AI model by Hedra AI - **LTX-Video-0.9.5. 2B** (Lightricks) — 2025-03-05 | Parameters: 1.9B - License: open | Type: model - AI model by Lightricks - **Phi-4 Mini** (Microsoft) — 2025-03-03 | Parameters: 3.8B - License: open | Type: model - AI model by Microsoft - **Phi-4-Multimodal** (Microsoft) — 2025-03-03 | Parameters: 5.6B - License: open | Type: model - AI model by Microsoft - **Aya Vision 32B** (Cohere) — 2025-03-03 | Parameters: 33.1B - License: open | Type: model - AI model by Cohere - **Hailuo I2V-01-Director** (MiniMax,Hailuo AI) — 2025-03-03 - License: closed | Type: model - AI model by MiniMax,Hailuo AI - **Hailuo T2V-01-Director** (MiniMax,Hailuo AI) — 2025-03-03 - License: closed | Type: model - AI model by MiniMax,Hailuo AI - **Spark-X1** (iFlytek) — 2025-03-03 | Parameters: 70B - License: closed | Type: model - AI model by iFlytek - **Difix3D+** (NVIDIA,National University of Singapore,University of Toronto,Vector Institute) — 2025-03-03 - License: open | Type: model - AI model by NVIDIA,National University of Singapore,University of Toronto,Vector Institute - **Agentic-Tx** (Google DeepMind) — 2025-03-01 | Parameters: Agentic-Tx - License: open | Type: model - "a therapeutics-focused agentic system powered by Gemini 2.0 Pro. Agentic-Tx is equipped with 18 tools, including: TxGemma as a tool for multi-step reasoning" - **TxGemma** (Google DeepMind) — 2025-03-01 | Parameters: TxGemma - License: open | Type: model - "a suite of efficient, generalist large language models (LLMs) capable of therapeutic property prediction as well as interactive reasoning and explainability. Unlike task-specific models, TxGemma synthesizes information from diverse sources, enabling broad application across the therapeutic development pipeline." - **Gemini 2.5 Pro Preview** (Google DeepMind) — 2025-03-01 | Parameters: Gemini 2.5 Pro Preview - License: open | Type: model - Context in=1M, out=64k. Knowledge cutoff Jan/2025. HLE SOTA. Codename 'nebula'. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **DeepSeek-V3 0324** (DeepSeek-AI) — 2025-03-01 | Parameters: DeepSeek-V3 0324 - License: open | Type: model - Non-reasoning. Significant increase in benchmark performance compared to original V3 from Dec/2024: MMLU-Pro: 75.9 ➜ 81.2, GPQA: 59.1 ➜ 68.4. 37B active. - **Llama-3.3-Nemotron-Super-49B-v1** (NVIDIA) — 2025-03-01 | Parameters: Llama-3.3-Nemotron-Super-49B-v1 - License: open | Type: model - Meta Llama-3.3-70B-Instruct derivative "that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens." - **EXAONE Deep** (LG) — 2025-03-01 | Parameters: EXAONE Deep - License: open | Type: model - “EXAONE”=“EXpert AI for EveryONE”. Training tokens/ratio dropped from EXAONE-3 7.8B with 8T (Aug/2024) to 3.5 (Dec/2024) 7.8B with 9T to 32B (also Deep) with 6.5T. Announce: https://www.lgresearch.ai/news/view?seq=543 - **Mistral Small 3.1** (Mistral) — 2025-03-01 | Parameters: Mistral Small 3.1 - License: open | Type: model - "Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance." - **ERNIE 4.5** (Baidu) — 2025-03-01 | Parameters: ERNIE 4.5 - License: open | Type: model - 424B-A47B. Announce: https://x.com/Baidu_Inc/status/1901094083508220035 - **X1** (Baidu) — 2025-03-01 | Parameters: X1 - License: open | Type: model - - **OLMo 2 32B** (Allen AI) — 2025-03-01 | Parameters: OLMo 2 32B - License: open | Type: model - "the first fully-open model (all data, code, weights, and details are freely available) to outperform GPT3.5-Turbo and GPT-4o mini on a suite of popular, multi-skill academic benchmarks. It is comparable to the leading open-weight models while requiring only a fraction of training compute." - **Command A** (Cohere) — 2025-03-01 | Parameters: Command A - License: open | Type: model - Context=256k. "Command A is an open weights research release of a 111 billion parameter model optimized for demanding enterprises that require fast, secure, and high-quality AI. Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks while‬ being deployable on just two GPUs." - **Gemini Robotics** (Google DeepMind) — 2025-03-01 | Parameters: Gemini Robotics - License: closed | Type: model - Gemini 2.0 Pro (cloud). "The second model is Gemini Robotics, a state-of-theart Vision-Language-Action (VLA) model that connects strong embodied reasoning priors to dexterous low-level control of real-world robots to solve challenging manipulation tasks. As a generalist VLA, Gemini Robotics can perform a wide array of diverse and complicated tasks, while also closely following language guidance and generalizing to distribution shifts in instructions, visuals, and motions. To emphasize the flexibility and generality of the Gemini Robotics models, we also introduce an optional specialization stage, which demonstrates how Gemini Robotics can be adapted for extreme dexterity, for advanced reasoning in difficult generalization settings, and for controlling completely new robot embodiments." - **Gemini Robotics-ER** (Google DeepMind) — 2025-03-01 | Parameters: Gemini Robotics-ER - License: closed | Type: model - Gemini 2.0 Flash (on device). "The first model is Gemini Robotics-ER, a VLM with strong embodied reasoning capabilities at its core, exhibiting generalization across a wide range of embodied reasoning tasks while also maintaining its core foundation model capabilities. Gemini Robotics-ER exhibits strong performance on multiple capabilities critical for understanding the physical world, ranging from 3D perception to detailed pointing to robot state estimation and affordance prediction via code." - **Gemma 3** (Google DeepMind) — 2025-03-01 | Parameters: Gemma 3 - License: open | Type: model - Trained on 1T more tokens than Gemma 2. "introduces vision understanding abilities, a wider coverage of languages and longer context – at least 128K tokens." - **Reka Flash 3** (Reka AI) — 2025-03-01 | Parameters: Reka Flash 3 - License: open | Type: model - "performs competitively with proprietary models such as OpenAI o1-mini, making it a good foundation to build applications that require low latency or on-device deployment. It is currently the best open model in its size category." - **QwQ-32B** (Alibaba) — 2025-03-01 | Parameters: QwQ-32B - License: open | Type: model - Update to QwQ-32B-Preview released Nov/2024. Scores 1/5 on latest ALPrompt 2024 H2. Qwen with Question=QwQ - **Jamba 1.6** (AI21) — 2025-03-01 | Parameters: Jamba 1.6 - License: open | Type: model - "The AI21 Jamba 1.6 family of models is state-of-the-art, hybrid SSM-Transformer instruction following foundation models. The Jamba models are the most powerful & efficient long-context models on the market, which deliver up to 2.5X faster inference than leading models of comparable sizes." - **Instella-3B** (AMD) — 2025-03-01 | Parameters: Instella-3B - License: open | Type: model - "trained from scratch on AMD Instinct™ MI300X GPUs. Instella models outperform existing fully open models of similar sizes and achieve competitive performance compared to state-of-the-art open-weight models such as Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B, including their instruction-tuned counterparts." - **Babel-83B** (Alibaba) — 2025-03-01 | Parameters: Babel-83B - License: open | Type: model - "top 25 languages by number of speakers, including English, Chinese, Hindi, Spanish, Arabic, French, Bengali, Portuguese, Russian, Urdu, Indonesian, German, Japanese, Swahili, Filipino, Tamil, Vietnamese, Turkish, Italian, Javanese, Korean, Hausa, Persian, Thai, and Burmese. These 25 languages support over 90% of the global population..." - **Image-01** (MiniMax) — 2025-02-28 - License: closed | Type: model - AI model by MiniMax - **GPT-4.5** (OpenAI) — 2025-02-27 - License: closed | Type: model - AI model by OpenAI - **Kimi 1.6** (Moonshot) — 2025-02-27 - License: closed | Type: model - AI model by Moonshot - **Mercury** (Inception Labs) — 2025-02-27 - License: closed | Type: model - AI model by Inception Labs - **Pika 2.2** (Pika Labs) — 2025-02-27 - License: closed | Type: model - AI model by Pika Labs - **Granite 3.2 8B** (IBM) — 2025-02-26 | Parameters: 8B - License: open | Type: model - AI model by IBM - **Granite 3.2 2B** (IBM) — 2025-02-26 | Parameters: 2.5B - License: open | Type: model - AI model by IBM - **Granite Guardian 3.2** (IBM) — 2025-02-26 - License: open | Type: model - AI model by IBM - **Eleven Scribe** (ElevenLabs) — 2025-02-26 - License: closed | Type: model - AI model by ElevenLabs - **Granite 3.2 2B** (IBM Research) — 2025-02-26 | Parameters: 2B - License: closed | Type: model - AI model by IBM Research - **Wan 2.1 14B I2V** (Alibaba) — 2025-02-25 | Parameters: 14B - License: open | Type: model - AI model by Alibaba - **Bailing-Pro-20250225** (Ant Group) — 2025-02-25 - License: closed | Type: model - AI model by Ant Group - **Tianxi-7B** (Lenovo) — 2025-02-25 | Parameters: 7B - License: closed | Type: model - AI model by Lenovo - **BFS-Prover** (ByteDance) — 2025-02-25 | Parameters: 7B - License: open | Type: model - AI model by ByteDance - **YandexGPT 5 Pro** (Yandex) — 2025-02-25 - License: closed | Type: model - AI model by Yandex - **YandexGPT 5 Lite** (Yandex) — 2025-02-25 | Parameters: 8B - License: open | Type: model - AI model by Yandex - **Claude 3.7 Sonnet** (Anthropic) — 2025-02-24 - License: closed | Type: model - AI model by Anthropic - **Step-Video-T2V** (StepFun) — 2025-02-24 | Parameters: 30B - License: open | Type: model - AI model by StepFun - **Baigong (百工)** (Shanghai Lingyi Artificial Intelligence Technology Co., Ltd.) — 2025-02-21 - License: closed | Type: model - AI model by Shanghai Lingyi Artificial Intelligence Technology Co., Ltd. - **Helix** (Figure AI) — 2025-02-20 | Parameters: 7.1B - License: closed | Type: model - AI model by Figure AI - **SigLIP 2** (Google DeepMind) — 2025-02-20 | Parameters: 1.1B - License: open | Type: model - AI model by Google DeepMind - **Evo 2 40B** (Arc Institute,Stanford University,NVIDIA,Liquid,University of California (UC) Berkeley,Goodfire,Columbia University,University of California San Francisco) — 2025-02-19 | Parameters: 40.3B - License: open | Type: model - AI model by Arc Institute,Stanford University,NVIDIA,Liquid,University of California (UC) Berkeley,Goodfire,Columbia University,University of California San Francisco - **Evo 2 7B** (Arc Institute,Stanford University,NVIDIA,Liquid,University of California (UC) Berkeley,Goodfire,Columbia University,University of California San Francisco) — 2025-02-19 | Parameters: 7B - License: open | Type: model - AI model by Arc Institute,Stanford University,NVIDIA,Liquid,University of California (UC) Berkeley,Goodfire,Columbia University,University of California San Francisco - **Grok-3 mini** (xAI) — 2025-02-19 - License: closed | Type: model - AI model by xAI - **PaliGemma 2 3B Mix 224** (Google) — 2025-02-19 | Parameters: 2.9B - License: open | Type: model - AI model by Google - **PaliGemma 2 3B Mix 448** (Google) — 2025-02-19 | Parameters: 2.9B - License: open | Type: model - AI model by Google - **Qwen2.5-VL-72B** (Alibaba) — 2025-02-19 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-VL-7B** (Alibaba) — 2025-02-19 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-VL-3B** (Alibaba) — 2025-02-19 | Parameters: 3B - License: open | Type: model - AI model by Alibaba - **R1 1776** (Perplexity) — 2025-02-18 | Parameters: 671B - License: open | Type: model - AI model by Perplexity - **TasiChat (TasiChat大模型)** (Chengdu Tasi Technology Co., Ltd.) — 2025-02-18 - License: closed | Type: model - AI model by Chengdu Tasi Technology Co., Ltd. - **Brain2Qwerty** (Meta AI,Universite de Technologie de Compiègne – CNRS,Basque Center on Cognition) — 2025-02-18 | Parameters: 400M - License: closed | Type: model - AI model by Meta AI,Universite de Technologie de Compiègne – CNRS,Basque Center on Cognition - **Step-Audio-Chat 130B** (StepFun) — 2025-02-18 | Parameters: 130B - License: open | Type: model - AI model by StepFun - **Step-1** (StepFun) — 2025-02-18 | Parameters: 130B - License: closed | Type: model - AI model by StepFun - **Step-Omni** (StepFun) — 2025-02-18 | Parameters: 130B - License: closed | Type: model - AI model by StepFun - **Grok 3** (xAI) — 2025-02-17 | Parameters: 3T - License: closed | Type: model - AI model by xAI - **Mistral Saba** (Mistral AI) — 2025-02-17 | Parameters: 24B - License: closed | Type: model - AI model by Mistral AI - **YAYI-Ultra** (Yayi (Wenge)) — 2025-02-15 - License: closed | Type: model - AI model by Yayi (Wenge) - **SkyReels-A1** (Kunlun Inc.) — 2025-02-15 - License: closed | Type: model - AI model by Kunlun Inc. - **BRIA 3.1** (BRIA AI) — 2025-02-15 | Parameters: 4B - License: open | Type: model - AI model by BRIA AI - **LLaDA** (Renmin University of China,Ant Group) — 2025-02-14 | Parameters: 8B - License: closed | Type: model - AI model by Renmin University of China,Ant Group - **Deephermes 3 Llama 3 8B Preview** (Nous Research) — 2025-02-14 | Parameters: 8B - License: open | Type: model - AI model by Nous Research - **Granite Vision 3.2 2B** (IBM) — 2025-02-14 | Parameters: 3.0B - License: open | Type: model - AI model by IBM - **Sonar Deep Research** (Perplexity) — 2025-02-14 - License: closed | Type: model - AI model by Perplexity - **Step-Audio-TTS-3B** (StepFun) — 2025-02-14 | Parameters: 3B - License: open | Type: model - AI model by StepFun - **OmniHuman-1** (ByteDance) — 2025-02-13 - License: closed | Type: model - AI model by ByteDance - **Sonar** (Perplexity) — 2025-02-11 | Parameters: 70B - License: closed | Type: model - AI model by Perplexity - **OREAL 32B** (Shanghai AI Lab,Shanghai Jiao Tong University,Chinese University of Hong Kong (CUHK),InnoHK) — 2025-02-10 | Parameters: 32B - License: open | Type: model - AI model by Shanghai AI Lab,Shanghai Jiao Tong University,Chinese University of Hong Kong (CUHK),InnoHK - **OREAL 7B** (Shanghai AI Lab,Shanghai Jiao Tong University,Chinese University of Hong Kong (CUHK),InnoHK) — 2025-02-10 | Parameters: 7B - License: open | Type: model - AI model by Shanghai AI Lab,Shanghai Jiao Tong University,Chinese University of Hong Kong (CUHK),InnoHK - **Animate Anyone 2** (Alibaba) — 2025-02-10 - License: closed | Type: model - AI model by Alibaba - **Zonos-v0.1** (Zyphra) — 2025-02-10 | Parameters: 1.6B - License: open | Type: model - AI model by Zyphra - **HAMSTER VLM** (NVIDIA,University of Washington,University of Southern California) — 2025-02-08 | Parameters: 13.5B - License: open | Type: model - AI model by NVIDIA,University of Washington,University of Southern California - **Goku-8B** (The University of Hong Kong,ByteDance) — 2025-02-07 | Parameters: 8B - License: closed | Type: model - AI model by The University of Hong Kong,ByteDance - **Goku+** (The University of Hong Kong,ByteDance) — 2025-02-07 - License: closed | Type: model - AI model by The University of Hong Kong,ByteDance - **Eurus-2-7B-PRIME** (Tsinghua University,University of Illinois Urbana-Champaign (UIUC),Shanghai AI Lab,Peking University,Shanghai Jiao Tong University,CUHK Shenzhen Research Institute) — 2025-02-03 | Parameters: 7B - License: open | Type: model - AI model by Tsinghua University,University of Illinois Urbana-Champaign (UIUC),Shanghai AI Lab,Peking University,Shanghai Jiao Tong University,CUHK Shenzhen Research Institute - **Prithvi-EO-2.0 600M** (IBM Research,NASA,University of Alabama,University of Iceland,Forschungszentrum Julich,Virginia Tech (Virginia Polytechnic Institute and State University),Arizona State University,Oregon State University,Boston University,University of California (UC) Berkeley,Julich Supercomputing Center) — 2025-02-03 | Parameters: 600M - License: open | Type: model - AI model by IBM Research,NASA,University of Alabama,University of Iceland,Forschungszentrum Julich,Virginia Tech (Virginia Polytechnic Institute and State University),Arizona State University,Oregon State University,Boston University,University of California (UC) Berkeley,Julich Supercomputing Center - **Prithvi-EO-2.0 300M** (IBM Research,NASA,University of Alabama,University of Iceland,Forschungszentrum Julich,Virginia Tech (Virginia Polytechnic Institute and State University),Arizona State University,Oregon State University,Boston University,University of California (UC) Berkeley,Julich Supercomputing Center) — 2025-02-03 | Parameters: 300M - License: open | Type: model - AI model by IBM Research,NASA,University of Alabama,University of Iceland,Forschungszentrum Julich,Virginia Tech (Virginia Polytechnic Institute and State University),Arizona State University,Oregon State University,Boston University,University of California (UC) Berkeley,Julich Supercomputing Center - **Granite-3.2-8B-Instruct** (IBM) — 2025-02-01 | Parameters: Granite-3.2-8B-Instruct - License: open | Type: model - "The new Granite 3.2 8B Instruct [offers] experimental chain-of-thought reasoning capabilities " - **C4AI Command R7B Arabic** (Cohere) — 2025-02-01 | Parameters: C4AI Command R7B Arabic - License: open | Type: model - "C4AI Command R7B Arabic is an open weights research release of a 7 billion parameter custom model with advanced capabilities optimized for the Arabic language (MSA dialect) along with English. The model excels at tasks that enterprises care about: instruction following, length control, RAG, and responding in the correct language. It also demonstrates excellent general purpose knowledge and understanding of Arabic language and cultures." - **GPT-4.5** (OpenAI) — 2025-02-01 | Parameters: GPT-4.5 - License: open | Type: model - "Our largest and best model for chat" https://openai.com/index/introducing-gpt-4-5/ "GPT-4.5 is not a frontier model, but it is OpenAI’s largest LLM, improving on GPT-4’s computational efficiency by more than 10x. While GPT-4.5 demonstrates increased world knowledge, improved writing ability, and refined personality over previous models, it does not introduce net-new frontier capabilities compared to previous reasoning releases, and its performance is below that of o1, o3-mini, and deep research on most preparedness evaluations." - **Hunyuan T1** (Tencent) — 2025-02-01 | Parameters: Hunyuan T1 - License: open | Type: model - "Based on Turbo S, by introducing technologies such as long thinking chains, retrieval enhancement and reinforcement learning, Hunyuan also launched the reasoning model T1 with deep thinking. This model has been fully launched on Tencent Yuanbao ( Tencent Hunyuan T1 model is open to all users ) , users can choose Deepseek R1 or Tencent Hunyuan T1 model to answer. The official version of Tencent Hunyuan T1 model will be launched soon, providing API access and other services to the outside world." - **Hunyuan Turbo S** (Tencent) — 2025-02-01 | Parameters: Hunyuan Turbo S - License: open | Type: model - Fast thinking ("Instant reply"). "This is also the first time in the industry that the Mamba architecture has been successfully applied losslessly to a very large MoE model." - **Phi-4-multimodal** (Microsoft) — 2025-02-01 | Parameters: Phi-4-multimodal - License: open | Type: model - "Training data: 5T tokens, 2.3M speech hours, and 1.1T image-text tokens" Announce: https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/ - **Phi-4-mini** (Microsoft) — 2025-02-01 | Parameters: Phi-4-mini - License: open | Type: model - "Phi-4-mini’s training data includes a wide variety of sources, totaling 5 trillion tokens, and is a combination of publicly available documents filtered for quality, selected high-quality educational data, and code newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.) high quality chat format supervised data covering various topics to reflect human preferences" Announce: https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/ - **Mercury Coder Small** (Inception) — 2025-02-01 | Parameters: Mercury Coder Small - License: open | Type: model - Diffusion large language model (dLLM). Very low 'IQ' performance (0/5 on all ALPrompts). Fast: 1,000tok/s. https://x.com/inceptionailabs/status/1894847921474150456 - **QwQ-Max-Preview** (Alibaba) — 2025-02-01 | Parameters: QwQ-Max-Preview - License: open | Type: model - "As a sneak peek into our upcoming QwQ-Max release, this version offers a glimpse of its enhanced capabilities, with ongoing refinements and an official Apache 2.0-licensed open-source launch of QwQ-Max and Qwen2.5-Max planned soon." Announce: https://x.com/Alibaba_Qwen/status/1894130603513319842 - **Claude 3.7 Sonnet** (Anthropic) — 2025-02-01 | Parameters: Claude 3.7 Sonnet - License: open | Type: model - Knowledge cutoff now November 2024 (was April 2024). "the first hybrid reasoning model on the market." https://www.anthropic.com/news/claude-3-7-sonnet - **Moonlight** (Moonshot AI) — 2025-02-01 | Parameters: Moonlight - License: open | Type: model - "Scaling law experiments indicate that Muon achieves ∼ 2× computational efficiency compared to AdamW with compute optimal training." https://github.com/MoonshotAI/Moonlight?tab=readme-ov-file - **S2** (Figure) — 2025-02-01 | Parameters: S2 - License: closed | Type: model - Likely based on OpenVLA 7B (Jun/2024, based on Llama 2 7B) or Molmo 7B-O (Sep/2024, based on OLMo-7B-1024 with OpenAI CLIP). "high quality, multi-robot, multi-operator dataset of diverse teleoperated behaviors, ~500 hours in total. To generate natural language-conditioned training pairs, we use an auto-labeling VLM to generate hindsight instructions. The VLM processes segmented video clips from the onboard robot cameras, prompted with: "What instruction would you have given the robot to get the action seen in this video?" All items handled during training are excluded from evaluations to prevent contamination. Architecture Our system comprises two main components: S2, a VLM backbone, and S1, a latent-conditional visuomotor transformer. S2 is built on a 7B-parameter open-source, open-weight VLM pretrained on internet-scale data. It processes monocular robot images and robot state information (consisting of wrist pose and finger positions) after projecting them into vision-language embedding space. Combined with natural language commands specifying desired behaviors, S2 distills all semantic task-relevant information into a single continuous latent vector, passed to S1 to condition its low-level actions. S1, an 80M parameter cross-attention encoder-decoder transformer, handles low-level control. It relies on a fully convolutional, multi-scale vision backbone for visual processing, initialized from pretraining done entirely in simulation. While S1 receives the same image and state inputs as S2, it processes them at a higher frequency to enable more responsive closed-loop control. The latent vector from S2 is projected into S1's token space and concatenated with visual features from S1's vision backbone along the sequence dimension, providing task conditioning. S1 outputs full upper body humanoid control at 200hz, including desired wrist poses, finger flexion and abduction control, and torso and head orientation targets. We append to the action space a synthetic "percentage task completion" action, allowing Helix to predict its own termination condition, which makes it easier to sequence multiple learned behaviors." - **S1** (Figure) — 2025-02-01 | Parameters: S1 - License: closed | Type: model - "high quality, multi-robot, multi-operator dataset of diverse teleoperated behaviors, ~500 hours in total. To generate natural language-conditioned training pairs, we use an auto-labeling VLM to generate hindsight instructions. The VLM processes segmented video clips from the onboard robot cameras, prompted with: "What instruction would you have given the robot to get the action seen in this video?" All items handled during training are excluded from evaluations to prevent contamination. Architecture Our system comprises two main components: S2, a VLM backbone, and S1, a latent-conditional visuomotor transformer. S2 is built on a 7B-parameter open-source, open-weight VLM pretrained on internet-scale data. It processes monocular robot images and robot state information (consisting of wrist pose and finger positions) after projecting them into vision-language embedding space. Combined with natural language commands specifying desired behaviors, S2 distills all semantic task-relevant information into a single continuous latent vector, passed to S1 to condition its low-level actions. S1, an 80M parameter cross-attention encoder-decoder transformer, handles low-level control. It relies on a fully convolutional, multi-scale vision backbone for visual processing, initialized from pretraining done entirely in simulation. While S1 receives the same image and state inputs as S2, it processes them at a higher frequency to enable more responsive closed-loop control. The latent vector from S2 is projected into S1's token space and concatenated with visual features from S1's vision backbone along the sequence dimension, providing task conditioning. S1 outputs full upper body humanoid control at 200hz, including desired wrist poses, finger flexion and abduction control, and torso and head orientation targets. We append to the action space a synthetic "percentage task completion" action, allowing Helix to predict its own termination condition, which makes it easier to sequence multiple learned behaviors." - **Baichuan-M1-14B** (Baichuan) — 2025-02-01 | Parameters: Baichuan-M1-14B - License: open | Type: model - Medical LLM. Huge increase to 20T tokens for 14B params standard. - **Evo 2** (Arc Institute) — 2025-02-01 | Parameters: Evo 2 - License: open | Type: model - "Evo 2 is a state of the art DNA language model for long context modeling and design. Evo 2 models DNA sequences at single-nucleotide resolution at up to 1 million base pair context length using the StripedHyena 2 architecture. Evo 2 was pretrained using Savanna. Evo 2 was trained autoregressively on OpenGenome2, a dataset containing 8.8 trillion tokens from all domains of life." Greg Brockman co-author. - **R1 1776** (Perplexity) — 2025-02-01 | Parameters: R1 1776 - License: open | Type: model - Censorship reduced, based on DeepSeek-R1. - **Grok-3** (xAI) — 2025-02-01 | Parameters: Grok-3 - License: open | Type: model - https://x.ai/blog/grok-3 My full analysis: https://lifearchitect.ai/whats-in-grok/ - **Mistral Saba** (Mistral) — 2025-02-01 | Parameters: Mistral Saba - License: open | Type: model - "Mistral Saba is a 24B parameter model trained on meticulously curated datasets from across the Middle East and South Asia." - **Salamandra** (Barcelona Supercomputing Center) — 2025-02-01 | Parameters: Salamandra - License: open | Type: model - "The final [pre-training] dataset is composed of 55.51% FineWeb-Edu, 25.32% Colossal Oscar, 8.38% Wikipedia, 7.17% Aya Collection, and 3.63% StarCoder, totalling 315 billion tokens." - **DeepHermes 3 Preview** (Nous Research) — 2025-02-01 | Parameters: DeepHermes 3 Preview - License: open | Type: model - Based on Llama 3 8B. GPQA score based on GPT-4o's analysis of the chart :-/ "one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model." https://x.com/NousResearch/status/1890148004029759612 - **OREAL-32B** (Shanghai AI Laboratory/SenseTime) — 2025-02-01 | Parameters: OREAL-32B - License: open | Type: model - OREAL=Outcome REwArd-based reinforcement Learning. - **Gemini 2.0 Pro** (Google DeepMind) — 2025-02-01 | Parameters: Gemini 2.0 Pro - License: open | Type: model - Context=2M. Disappointing benchmarks, this is the 'pro' (medium) not 'ultra' (large) model. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **s1-32B** (Stanford) — 2025-02-01 | Parameters: s1-32B - License: open | Type: model - Based on Qwen2.5-32B-Instruct. "we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to doublecheck its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24)." - **o3-mini** (OpenAI) — 2025-01-31 - License: closed | Type: model - AI model by OpenAI - **s1-32B** (Stanford University,University of Washington,Allen Institute for AI,Contextual AI) — 2025-01-31 | Parameters: 32B - License: open | Type: model - AI model by Stanford University,University of Washington,Allen Institute for AI,Contextual AI - **s1.1** (Stanford University,University of Washington,Allen Institute for AI,Contextual AI) — 2025-01-31 | Parameters: 32B - License: open | Type: model - AI model by Stanford University,University of Washington,Allen Institute for AI,Contextual AI - **Mistral Small 3** (Mistral AI) — 2025-01-30 | Parameters: 24B - License: open | Type: model - AI model by Mistral AI - **Tulu 3 405B** (Allen Institute for AI,University of Washington) — 2025-01-30 | Parameters: 405B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **Sonar Reasoning** (Perplexity) — 2025-01-29 - License: closed | Type: model - AI model by Perplexity - **GPT-4o (Jan 2025)** (OpenAI) — 2025-01-29 - License: closed | Type: model - AI model by OpenAI - **Qwen2.5-Max** (Alibaba) — 2025-01-28 - License: closed | Type: model - AI model by Alibaba - **Janus-Pro-7B** (DeepSeek) — 2025-01-27 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **Janus-Pro-1B** (DeepSeek) — 2025-01-27 | Parameters: 1B - License: open | Type: model - AI model by DeepSeek - **Kokoro v1.0** (hexgrad) — 2025-01-27 | Parameters: 82M - License: open | Type: model - AI model by hexgrad - **Baichuan-Omni-1.5** (Baichuan) — 2025-01-26 | Parameters: 11B - License: open | Type: model - AI model by Baichuan - **Qwen2.5-VL-7B** (Alibaba) — 2025-01-26 | Parameters: 7B - License: closed | Type: model - AI model by Alibaba - **Sonar Pro** (Perplexity) — 2025-01-25 - License: closed | Type: model - AI model by Perplexity - **Computer-Using Agent (CUA)** (OpenAI) — 2025-01-23 - License: closed | Type: model - AI model by OpenAI - **DoMINO** (NVIDIA) — 2025-01-23 - License: closed | Type: model - AI model by NVIDIA - **Luma Ray2** (LumaLabs) — 2025-01-23 - License: closed | Type: model - AI model by LumaLabs - **Doubao-1.5-pro** (ByteDance) — 2025-01-22 - License: closed | Type: model - AI model by ByteDance - **Kimi k1.5** (Moonshot) — 2025-01-22 - License: closed | Type: model - AI model by Moonshot - **DeepSeek-R1-Distill-Llama-70B** (DeepSeek) — 2025-01-22 | Parameters: 70B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Distill-Qwen-14B** (DeepSeek) — 2025-01-22 | Parameters: 14.8B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Distill-Qwen-1.5B** (DeepSeek) — 2025-01-22 | Parameters: 1.8B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Distill-Llama-8B** (DeepSeek) — 2025-01-22 | Parameters: 8B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Distill-Qwen-32B** (DeepSeek) — 2025-01-22 | Parameters: 32B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Distill-Qwen-7B** (DeepSeek) — 2025-01-22 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **gte-modernbert** (Alibaba) — 2025-01-22 | Parameters: 149M - License: open | Type: model - AI model by Alibaba - **Hunyuan3D 2.0** (Tencent) — 2025-01-21 - License: open | Type: model - AI model by Tencent - **DeepSeek-R1** (DeepSeek) — 2025-01-20 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-R1-Zero** (DeepSeek) — 2025-01-20 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **Eagle 2** (NVIDIA,Nanjing University,Tsinghua University,Hong Kong Polytechnic University,Johns Hopkins University,New York University (NYU)) — 2025-01-20 | Parameters: 8.9B - License: open | Type: model - AI model by NVIDIA,Nanjing University,Tsinghua University,Hong Kong Polytechnic University,Johns Hopkins University,New York University (NYU) - **Zero-shot Monocular Scene Flow (ZeroMSF)** (NVIDIA,Brown University) — 2025-01-20 - License: closed | Type: model - AI model by NVIDIA,Brown University - **INTELLECT-MATH** (Prime Intellect) — 2025-01-17 | Parameters: 7B - License: open | Type: model - AI model by Prime Intellect - **GPT-4b micro** (OpenAI,Retro Biosciences) — 2025-01-17 - License: closed | Type: model - AI model by OpenAI,Retro Biosciences - **MiniMax Speech-01-turbo (T2A-01-turbo)** (MiniMax) — 2025-01-17 - License: closed | Type: model - AI model by MiniMax - **MatterGen** (Microsoft Research AI for Science) — 2025-01-16 | Parameters: 46.8M - License: open | Type: model - AI model by Microsoft Research AI for Science - **MiniMax Speech-01-HD (T2A-01-HD)** (MiniMax) — 2025-01-15 - License: closed | Type: model - AI model by MiniMax - **Pika 2.1** (Pika Labs) — 2025-01-15 - License: closed | Type: model - AI model by Pika Labs - **Unichat-32B-c1** (China Unicom) — 2025-01-15 | Parameters: 32B - License: closed | Type: model - AI model by China Unicom - **InternLM3** (Shanghai AI Lab) — 2025-01-15 | Parameters: 8B - License: closed | Type: model - AI model by Shanghai AI Lab - **s1** (Stanford University,University of Washington,Allen Institute for AI,Contextual AI) — 2025-01-14 | Parameters: 32B - License: open | Type: model - AI model by Stanford University,University of Washington,Allen Institute for AI,Contextual AI - **MiniMax-Text-01** (MiniMax) — 2025-01-14 | Parameters: 456B - License: open | Type: model - AI model by MiniMax - **MiniMax-VL-01** (MiniMax) — 2025-01-14 - License: open | Type: model - AI model by MiniMax - **OpenBioLLM-Llama3-70B** (Saama) — 2025-01-14 | Parameters: 70B - License: open | Type: model - AI model by Saama - **OpenBioLLM-Llama3-8B** (Saama) — 2025-01-14 | Parameters: 8B - License: open | Type: model - AI model by Saama - **SenseNova Unified Large Model** (SenseTime) — 2025-01-13 - License: closed | Type: model - AI model by SenseTime - **Stable Point Aware 3D (SPAR3D)** (Stability AI,University of Illinois Urbana-Champaign (UIUC)) — 2025-01-08 - License: open | Type: model - AI model by Stability AI,University of Illinois Urbana-Champaign (UIUC) - **Cosmos-1.0- Diffusion-14B Video2World** (NVIDIA) — 2025-01-07 | Parameters: 14B - License: open | Type: model - AI model by NVIDIA - **Cosmos-Predict1-7b-Video2World** (NVIDIA) — 2025-01-07 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA - **voyage-3-large** (Voyage AI) — 2025-01-07 - License: closed | Type: model - AI model by Voyage AI - **Cosmos-Predict1-14b-Video2World** (NVIDIA) — 2025-01-07 | Parameters: 14B - License: open | Type: model - AI model by NVIDIA - **Tiangong 4.0** (Kunlun Inc.) — 2025-01-06 - License: closed | Type: model - AI model by Kunlun Inc. - **o3-mini** (OpenAI) — 2025-01-01 | Parameters: o3-mini - License: open | Type: model - GPQA=79.7 for 'high' thinking. ALPrompt 2025H1=1/5. My analysis is that this model’s performance is very poor, with responses often becoming disordered and illogical. OpenAI compared o3-mini to OpenAI’s software engineers, and it performed very poorly (o3-mini=0%, o1=12%). "o3-mini models have the lowest performance, with scores of 0%… We suspect o3-mini’s low performance is due to poor instruction following and confusion about specifying tools in the correct format. The model often attempts to use a hallucinated bash tool rather than python despite constant, multi-shot prompting and feedback that this format is incorrect. This resulted in long conversations that likely hurt its performance." (o3-mini paper, p31) - **Mistral Small 3** (Mistral) — 2025-01-01 | Parameters: Mistral Small 3 - License: open | Type: model - MMLU=base, -Pro=base, GPQA=instruct. "When quantized, Mistral Small 3 can be run privately on a single RTX 4090 or a Macbook with 32GB RAM." "Mistral Small 3 is neither trained with RL nor synthetic data" - **Llama-3.1-Tulu-3-405B** (Allen AI) — 2025-01-01 | Parameters: Llama-3.1-Tulu-3-405B - License: open | Type: model - Lower MMLU score than Llama 3.1 405B base. - **Qwen2.5-Max** (Alibaba) — 2025-01-01 | Parameters: Qwen2.5-Max - License: open | Type: model - "Qwen2.5-Max emerges as a milestone in MoE development, featuring an impressive 325 billion parameters. The model has been pretrained on over 20 trillion tokens and further refined with advanced post-training methodologies such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF)." https://wandb.ai/byyoung3/ml-news/reports/Qwen2-5-Max-Advancing-Large-Scale-Mixture-of-Expert-Models---VmlldzoxMTEyMjUyNg - **EvaByte** (SambaNova) — 2025-01-01 | Parameters: EvaByte - License: open | Type: model - "efficient byte-level processing at scale... [compared to tokenizer-based LMs:] 5x less training data, excelling in coding tasks, and decoding up to 2x faster. Its token-free design also brings added flexibility, avoiding tokenizer quirks while naturally extending to multimodal applications without any architecture tweaks." - **UI-TARS-72B** (ByteDance) — 2025-01-01 | Parameters: UI-TARS-72B - License: open | Type: model - VLM. SoTA agent 'computer use' model to 23/Jan/2024. - **Doubao-1.5-pro** (ByteDance) — 2025-01-01 | Parameters: Doubao-1.5-pro - License: open | Type: model - Includes 2.4B param ViT. "Doubao-1.5-pro uses a sparse MoE architecture. In the pre-training stage, the performance of the MoE model activated with only a small number of parameters can exceed that of ultra-large dense pre-trained models such as Llama3.1-405B. Through the study of the sparsity scaling law, the team determined the sparse ratio that balances performance and efficiency, and determined based on the MoE scaling law that a model activated with a small number of parameters can achieve the performance of a world-class model." - **Kimi k1.5** (Moonshot AI) — 2025-01-01 | Parameters: Kimi k1.5 - License: open | Type: model - "our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities---e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista---matching OpenAI's o1". GPQA score is my estimate from pp13–14, noting that "the scores above come from an internal long-cot model with much smaller model size than k1.5 long-CoT model." - **DeepSeek-R1** (DeepSeek-AI) — 2025-01-01 | Parameters: DeepSeek-R1 - License: open | Type: model - "DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks" - **GPT-4b** (OpenAI) — 2025-01-01 | Parameters: GPT-4b - License: closed | Type: model - Protein sequence model. "The model was trained on examples of protein sequences from many species, as well as information on which proteins tend to interact with one another. While that’s a lot of data, it’s just a fraction of what OpenAI’s flagship chatbots were trained on, making GPT-4b an example of a “small language model” that works with a focused data set." https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/ - **Helium-1** (Kyutai) — 2025-01-01 | Parameters: Helium-1 - License: open | Type: model - "Helium-1 preview, an initial version of our new backbone language model with 2B parameters, targeting edge and mobile devices... We use token level distillation of a 7B parameters model to train Helium-1 preview." - **InternLM3** (Shanghai AI Laboratory/SenseTime) — 2025-01-01 | Parameters: InternLM3 - License: open | Type: model - "InternLM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale." Playground: https://internlm-chat.intern-ai.org.cn/ - **MiniMax-Text-01** (MiniMax) — 2025-01-01 | Parameters: MiniMax-Text-01 - License: open | Type: model - A45.9B. "The pre-training corpus for MiniMax-Text-01 encompasses a comprehensive and meticulously curated dataset, incorporating diverse sources including academic literature, books, web content, and programming code... repeatedly training high-quality documents can lead to enhanced downstream performance, with certain high-quality domains being trained up to 50 times... Our findings indicate that low-quality data suffer a substantial decrease in performance after training for more than two epochs, while high-quality data can be effectively trained for up to four epochs" Login playground: https://www.hailuo.ai/ - **Sky-T1-32B-Preview** (Berkeley) — 2025-01-01 | Parameters: Sky-T1-32B-Preview - License: open | Type: model - "To generate our training data we use QwQ-32B-Preview, an open-source model with reasoning capabilities comparable to o1-preview. We curate the data mixture (see later section) to cover diverse domains that require reasoning, and a reject sampling procedure to improve the data quality. We then rewrite QwQ traces with GPT-4o-mini into a well-formatted version, inspired by Still-2, to improve data quality and ease parsing... Rejection Sampling: We discard QwQ samples if they are incorrect according to the solutions provided in datasets." - **Cosmos Nemotron 34B** (NVIDIA) — 2025-01-01 | Parameters: Cosmos Nemotron 34B - License: open | Type: model - VLM. MMMU=47.33. "VILA project becomes part of Cosmos Nemotron family" https://github.com/NVlabs/Cosmos-Nemotron Vision Encoder: SigLIP-400M, Language Encoder: Yi-34B https://blogs.nvidia.com/blog/nemotron-model-families/ - **Cosmos 1.0** (NVIDIA) — 2025-01-01 | Parameters: Cosmos 1.0 - License: open | Type: model - WFM (world foundation model). "The models range in size from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest... "Cosmos WFM models, were trained on 9,000 trillion tokens [9,000T] from 20 million hours of real-world human interactions, environment, industrial, robotics, and driving data..." https://techcrunch.com/2025/01/06/nvidia-releases-its-own-brand-of-world-models/ Actual working: https://lifearchitect.ai/cosmos/ - **METAGENE-1** (Prime Intellect) — 2025-01-01 | Parameters: METAGENE-1 - License: open | Type: model - Llama-2-7B base. "METAGENE-1 is a 7B parameter metagenomic foundation model designed for pathogen detection and pandemic monitoring, trained on over 1.5 trillion base pairs [∼370 billion tokens (≈1.69 trillion base pairs)] of DNA and RNA collected via metagenomic sequencing of wastewater." - **Sonus-1 Reasoning** (Rubik's AI) — 2025-01-01 | Parameters: Sonus-1 Reasoning - License: open | Type: model - Likely a Llama 3.1 405B wrapper. ALPrompt 2024H1=5/5. ALPrompt 2024H2=2/5. ALPrompt 2025H1=1/5. This is a strange model: slow and smart, but not as performant as o1. No arch details at all. - **OLMo 2 Furious 7B** (Allen Institute for AI,University of Washington,New York University (NYU)) — 2024-12-31 | Parameters: 7B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington,New York University (NYU) - **OLMo 2 Furious 13B** (Allen Institute for AI,University of Washington,New York University (NYU)) — 2024-12-31 | Parameters: 13B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington,New York University (NYU) - **STORM-B/8** (University of Southern California,Georgia Institute of Technology,Stanford University,NVIDIA) — 2024-12-31 | Parameters: 100.6M - License: closed | Type: model - AI model by University of Southern California,Georgia Institute of Technology,Stanford University,NVIDIA - **Zhaoyan (兆言大语言模型)** (Shanghai Jiao Tong University) — 2024-12-30 - License: closed | Type: model - AI model by Shanghai Jiao Tong University - **LTX-Video-0.9.1. 2B** (Lightricks) — 2024-12-30 | Parameters: 1.9B - License: open | Type: model - AI model by Lightricks - **HiDream Foundation Model 1.0** (HiDream) — 2024-12-28 | Parameters: 10B - License: closed | Type: model - AI model by HiDream - **Zhixiang 3.0 (智象)** (HiDream) — 2024-12-28 | Parameters: 13B - License: closed | Type: model - AI model by HiDream - **LinGan VL (临感VL)** (Beijing 58 Information Technology) — 2024-12-27 - License: closed | Type: model - AI model by Beijing 58 Information Technology - **Xiaomi Pengpai Image (小米澎湃图像)** (Xiaomi Corp) — 2024-12-27 - License: closed | Type: model - AI model by Xiaomi Corp - **Xiaomi Edge-side Text (小米端侧文本)** (Xiaomi Corp) — 2024-12-27 - License: closed | Type: model - AI model by Xiaomi Corp - **QVQ** (Alibaba) — 2024-12-25 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Jueqing LLM (觉卿大模型)** (Suzhou Jueqing Diyu Intelligent Technology) — 2024-12-25 - License: closed | Type: model - AI model by Suzhou Jueqing Diyu Intelligent Technology - **Kokoro v0.19** (hexgrad) — 2024-12-25 | Parameters: 82M - License: open | Type: model - AI model by hexgrad - **DeepSeek-V3** (DeepSeek) — 2024-12-24 | Parameters: 671B - License: open | Type: model - AI model by DeepSeek - **OCTAVE 8B** (Hume) — 2024-12-23 | Parameters: 8B - License: closed | Type: model - AI model by Hume - **OCTAVE 3B** (Hume) — 2024-12-23 | Parameters: 3B - License: closed | Type: model - AI model by Hume - **RMBG v2.0** (BRIA AI) — 2024-12-23 | Parameters: 221M - License: open | Type: model - AI model by BRIA AI - **VRMBG 2.0** (BRIA AI) — 2024-12-23 - License: closed | Type: model - AI model by BRIA AI - **o3** (OpenAI) — 2024-12-20 - License: closed | Type: model - AI model by OpenAI - **Gemini 2.0 Flash Thinking** (Google DeepMind,Google) — 2024-12-19 - License: closed | Type: model - AI model by Google DeepMind,Google - **SEA-LION V3 Gemma2 9B** (AI Singapore) — 2024-12-19 | Parameters: 9B - License: open | Type: model - AI model by AI Singapore - **SEA-LION V3 Llama3.1 8B** (AI Singapore) — 2024-12-19 | Parameters: 8B - License: open | Type: model - AI model by AI Singapore - **SEA-LION V3 Llama3.1 70B** (AI Singapore) — 2024-12-19 | Parameters: 70B - License: open | Type: model - AI model by AI Singapore - **Llama 3.1 Typhoon 2 70B** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 70B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Llama 3.1 Typhoon 2 8B** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 8B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Typhoon 2 7B** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 7B - License: open | Type: model - AI model by Typhoon / SCB 10X - **LLama 3.2 Typhoon 2 3B** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 3B - License: open | Type: model - AI model by Typhoon / SCB 10X - **LLama 3..2 Typhoon 2 1B** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 1B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Typhoon2-Vision** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 7B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Typhoon2-Audio** (Typhoon / SCB 10X) — 2024-12-19 | Parameters: 9.7B - License: open | Type: model - AI model by Typhoon / SCB 10X - **Kling 1.6 Pro** (Kuaishou Technology) — 2024-12-19 - License: closed | Type: model - AI model by Kuaishou Technology - **Granite 3.1 2B** (IBM) — 2024-12-18 | Parameters: 2.5B - License: open | Type: model - AI model by IBM - **Granite 3.1 8B** (IBM) — 2024-12-18 | Parameters: 8.1B - License: open | Type: model - AI model by IBM - **Eleven Flash v2.5** (ElevenLabs) — 2024-12-18 - License: closed | Type: model - AI model by ElevenLabs - **Typhoon2-Vision** (Typhoon / SCB 10X) — 2024-12-18 | Parameters: 7B - License: closed | Type: model - AI model by Typhoon / SCB 10X - **Falcon3-7B** (Technology Innovation Institute) — 2024-12-17 | Parameters: 7B - License: open | Type: model - AI model by Technology Innovation Institute - **Veo 2** (Google DeepMind) — 2024-12-16 - License: closed | Type: model - AI model by Google DeepMind - **SEA-LION-v1-7B-IT** (AI Singapore) — 2024-12-16 | Parameters: 7.5B - License: open | Type: model - AI model by AI Singapore - **F5-TTS** (Shanghai Jiao Tong University,University of Cambridge,Geely Automobile Research Institute (Ningbo) Company) — 2024-12-15 | Parameters: 335.8M - License: open | Type: model - AI model by Shanghai Jiao Tong University,University of Cambridge,Geely Automobile Research Institute (Ningbo) Company - **Jinshi (金石大模型)** (Wuxi Sixiang Digital Intelligence Technology Co., Ltd.) — 2024-12-15 - License: closed | Type: model - AI model by Wuxi Sixiang Digital Intelligence Technology Co., Ltd. - **Pika 2.0** (Pika Labs) — 2024-12-15 - License: closed | Type: model - AI model by Pika Labs - **Apollo 7B** (Meta AI,Stanford University) — 2024-12-13 | Parameters: 7B - License: closed | Type: model - AI model by Meta AI,Stanford University - **Apollo 3B** (Meta AI,Stanford University) — 2024-12-13 | Parameters: 3B - License: closed | Type: model - AI model by Meta AI,Stanford University - **Apollo 1.5B** (Meta AI,Stanford University) — 2024-12-13 | Parameters: 1.5B - License: closed | Type: model - AI model by Meta AI,Stanford University - **360zhinao2-o1** (360 Security Technology) — 2024-12-13 - License: closed | Type: model - AI model by 360 Security Technology - **Apollo-7B** (Meta AI,Stanford University) — 2024-12-13 | Parameters: 7B - License: closed | Type: model - AI model by Meta AI,Stanford University - **GigaChat Lite (GigaChat-20B-A3B)** (Sber) — 2024-12-13 | Parameters: 20B - License: open | Type: model - AI model by Sber - **Phi-4** (Microsoft Research) — 2024-12-12 | Parameters: 14B - License: open | Type: model - AI model by Microsoft Research - **Gemini 2.0 Flash** (Google DeepMind,Google) — 2024-12-11 - License: closed | Type: model - AI model by Google DeepMind,Google - **Gemini 2.0 Pro** (Google DeepMind) — 2024-12-11 - License: closed | Type: model - AI model by Google DeepMind - **Coconut** (Facebook,University of California San Diego) — 2024-12-11 - License: closed | Type: model - AI model by Facebook,University of California San Diego - **T-Pro** (T-Bank) — 2024-12-11 | Parameters: 32B - License: open | Type: model - AI model by T-Bank - **T-Lite** (T-Bank) — 2024-12-11 | Parameters: 7B - License: open | Type: model - AI model by T-Bank - **Sora Turbo** (OpenAI) — 2024-12-09 - License: closed | Type: model - AI model by OpenAI - **EXAONE 3.5 2.4B** (LG AI Research) — 2024-12-09 | Parameters: 2.4B - License: open | Type: model - AI model by LG AI Research - **EXAONE 3.5 7.8B** (LG AI Research) — 2024-12-09 | Parameters: 7.8B - License: open | Type: model - AI model by LG AI Research - **EXAONE 3.5 32B** (LG AI Research) — 2024-12-09 | Parameters: 32B - License: open | Type: model - AI model by LG AI Research - **Grok Image Generation / Aurora** (xAI) — 2024-12-09 - License: closed | Type: model - AI model by xAI - **Llama 3.3 70B** (Meta AI) — 2024-12-06 | Parameters: 70B - License: open | Type: model - AI model by Meta AI - **InternVL2_5-26B** (Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University) — 2024-12-06 | Parameters: 25.5B - License: open | Type: model - AI model by Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University - **InternVL2_5-38B** (Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University) — 2024-12-06 | Parameters: 38.4B - License: open | Type: model - AI model by Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University - **InternVL2_5-78B** (Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University) — 2024-12-06 | Parameters: 78.4B - License: open | Type: model - AI model by Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University - **o1** (OpenAI) — 2024-12-05 - License: closed | Type: model - AI model by OpenAI - **NVILA 15B** (NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University) — 2024-12-05 | Parameters: 15B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University - **NVILA 8B** (NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University) — 2024-12-05 | Parameters: 8B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University - **Infinity** (ByteDance) — 2024-12-05 | Parameters: 2B - License: open | Type: model - AI model by ByteDance - **Pleias 1.0 350m** (PleIAs) — 2024-12-05 | Parameters: 350M - License: open | Type: model - AI model by PleIAs - **Pleias 1.0 1.2B** (PleIAs) — 2024-12-05 | Parameters: 1.2B - License: open | Type: model - AI model by PleIAs - **Genie 2** (Google DeepMind) — 2024-12-04 - License: closed | Type: model - AI model by Google DeepMind - **voyage-code-3** (Voyage AI) — 2024-12-04 - License: closed | Type: model - AI model by Voyage AI - **TokenFlow-XL** (ByteDance) — 2024-12-04 | Parameters: 14B - License: open | Type: model - AI model by ByteDance - **TokenFlow-t2i** (ByteDance) — 2024-12-04 - License: open | Type: model - AI model by ByteDance - **Amazon Nova Pro** (Amazon) — 2024-12-03 - License: closed | Type: model - AI model by Amazon - **Amazon Nova Lite** (Amazon) — 2024-12-03 - License: closed | Type: model - AI model by Amazon - **Amazon Nova Micro** (Amazon) — 2024-12-03 - License: closed | Type: model - AI model by Amazon - **Luma Photon** (LumaLabs) — 2024-12-03 - License: closed | Type: model - AI model by LumaLabs - **Hunyuan Video** (Tencent) — 2024-12-03 | Parameters: 13B - License: open | Type: model - AI model by Tencent - **Intelligent Go-Explore (IGE)** (University of British Columbia (UBC),Vector Institute,CIFAR AI Research) — 2024-12-03 - License: open | Type: model - AI model by University of British Columbia (UBC),Vector Institute,CIFAR AI Research - **Hailuo I2V-01-Live** (MiniMax,Hailuo AI) — 2024-12-03 - License: closed | Type: model - AI model by MiniMax,Hailuo AI - **Amazon Nova Canvas** (Amazon) — 2024-12-03 - License: closed | Type: model - AI model by Amazon - **Luma Photon Flash** (LumaLabs) — 2024-12-03 - License: closed | Type: model - AI model by LumaLabs - **Poolside Malibu** (Poolside) — 2024-12-02 - License: closed | Type: model - AI model by Poolside - **YuLan-Mini** (Renmin) — 2024-12-01 | Parameters: YuLan-Mini - License: open | Type: model - "1.08T tokens for training. Among them are 481B English web data, 138B general English knowledge, 227B code pre-training data, 16.7B code instruction data, 93.8B mathematics pre-training data, 15.5B mathematics instruction data, and 108B Chinese data." - **DeepSeek-V3** (DeepSeek-AI) — 2024-12-01 | Parameters: DeepSeek-V3 - License: open | Type: model - 37B active. Explain: https://threadreaderapp.com/thread/1872318161883959485.html Announce: https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file - **EON-8B** (LinkedIn) — 2024-12-01 | Parameters: EON-8B - License: closed | Type: model - "We found the EON-8B model (a domain-adapted Llama 3.1-8B variant) to be 75x and 6x cost effective in comparison to GPT-4 and GPT-4o respectively (Figure 4)... On tasks seen during training, the EON-8B model outperformed base Llama-3-8B-Instruct and its performance was comparable to SOTA GPT models." - **o3-preview** (OpenAI) — 2024-12-01 | Parameters: o3-preview - License: open | Type: model - SoTA model for Dec/2024. Parameter estimate is very rough centrepoint for range 400B-52T. - **RWKV-7 Goose** (RWKV) — 2024-12-01 | Parameters: RWKV-7 Goose - License: open | Type: model - RWKV (pronounced RwaKuv) is an RNN: "multilingual, supporting over 100 languages and code.". Full run is 332B tokens of 3.1T dataset. - **ModernBERT** (International) — 2024-12-01 | Parameters: ModernBERT - License: open | Type: model - "a proper workhorse model, for retrieval, classification, etc." https://bsky.app/profile/howard.fm/post/3ldod2afps62x - **Granite 3.1 8B** (IBM) — 2024-12-01 | Parameters: Granite 3.1 8B - License: open | Type: model - - **Bamba-9B** (IBM) — 2024-12-01 | Parameters: Bamba-9B - License: open | Type: model - "trained by IBM, Princeton, CMU, and UIUC on completely open data. At inference time, the model demonstrates 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM." - **o1-2024-12-17** (OpenAI) — 2024-12-01 | Parameters: o1-2024-12-17 - License: closed | Type: model - "o1-2024-12-17 sets new state-of-the-art results on several benchmarks, improving cost-efficiency and performance." - **Falcon 3** (TII) — 2024-12-01 | Parameters: Falcon 3 - License: open | Type: model - "We conducted a single large-scale pretraining run on the 7B model, using 1024 H100 GPU chips, leveraging 14 trillion tokens... upscaled the 7B model to a 10B parameters model by duplicating the redundant layers and continuing pre-training with 2 trillion tokens of high-quality data." - **Command R7B** (Cohere) — 2024-12-01 | Parameters: Command R7B - License: open | Type: model - - **Maya** (Cohere) — 2024-12-01 | Parameters: Maya - License: open | Type: model - VLM. - **BLT** (Meta AI) — 2024-12-01 | Parameters: BLT - License: open | Type: model - Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance - **Large Concept Model** (Meta AI) — 2024-12-01 | Parameters: Large Concept Model - License: open | Type: model - "autoregressive sentence prediction in an embedding space." 7.7T tokens is a misprint, should be 2.2T as in paper. - **Phi-4** (Microsoft) — 2024-12-01 | Parameters: Phi-4 - License: open | Type: model - Use unsloth: https://huggingface.co/unsloth/phi-4-GGUF & https://www.reddit.com/r/singularity/comments/1i0kso4/i_fixed_4_bugs_in_microsofts_opensource_phi4_model/ - **Gemini 2.0 Flash exp** (Google DeepMind) — 2024-12-01 | Parameters: Gemini 2.0 Flash exp - License: open | Type: model - Gemini 2.0 Flash was first model released, 11/Dec/2024. "New Modalities: Gemini 2.0 introduces native image generation and controllable text-to-speech capabilities" Announce: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **Moxin-7B** (International) — 2024-12-01 | Parameters: Moxin-7B - License: open | Type: model - "Fully Open Source" with pre-training code, configurations, training and fine-tuning datasets, and intermediate checkpoints. - **1T** (Cerebras) — 2024-12-01 | Parameters: 1T - License: closed | Type: model - "For Sandia’s trillion parameter training run, Cerebras configured a 55 terabyte MemoryX device." - **InternVL 2.5** (Shanghai AI Laboratory/SenseTime) — 2024-12-01 | Parameters: InternVL 2.5 - License: open | Type: model - Benchmarks are estimates based on Qwen2.5 72B Instruct as the base LLM (InternVL 2.5=InternViT-6B-448px-V2.5 5.5B + Qwen2.5-72B-Instruct). "Notably, Qwen2-VL processed a cumulative total of 1.4T tokens, while our InternVL2.5-78B is trained on just ∼120B tokens [of vision]."Dataset... we identify repetitive generation as one of the most detrimental issues. In many open-source or synthetic datasets, a small number of repetitive samples—comprising merely thousands of examples in our Stage 2 data mixture—can cause the model to spiral into repetitive loops, particularly in long-form outputs or CoT reasoning tasks. This phenomenon undermines the effectiveness of test-time scaling strategies. To address this challenge and support future research, we designed an efficient data filtering pipeline to remove low-quality samples, thereby minimizing the risk of repetitive generation." Repo: https://github.com/OpenGVLab/InternVL - **Llama 3.3** (Meta AI) — 2024-12-01 | Parameters: Llama 3.3 - License: open | Type: model - Drop-in replacement for Llama 3.1 70B, comparable performance to Llama 3.1 405B. - **EXAONE-3.5** (LG) — 2024-12-01 | Parameters: EXAONE-3.5 - License: open | Type: model - “EXAONE”=“EXpert AI for EveryONE”. Training tokens/ratio dropped from EXAONE-3 7.8B with 8T (Aug/2024) to this (Dec/2024) 7.8B with 9T to 32B with 6.5T. - **Deepthought-8B** (Ruliad) — 2024-12-01 | Parameters: Deepthought-8B - License: open | Type: model - No evals. Llama 3.1 8B base. - **Sailor2** (Sail) — 2024-12-01 | Parameters: Sailor2 - License: open | Type: model - SEA languages. Continual pretraining based on Qwen2.5. Project page: https://sea-sailor.github.io/blog/sailor2/ - **Pleias 1.0** (PleIAs) — 2024-12-01 | Parameters: Pleias 1.0 - License: open | Type: model - Trained on the Jean Zay supercomputer, 192x H100s for 20 days. Dataset is new CC + Synthetic: https://huggingface.co/datasets/PleIAs/common_corpus - **o1** (OpenAI) — 2024-12-01 | Parameters: o1 - License: open | Type: model - "a version of our most intelligent model that thinks longer for the most reliable responses" System card about safety only: https://cdn.openai.com/o1-system-card-20241205.pdf - **Nova Pro** (Amazon) — 2024-12-01 | Parameters: Nova Pro - License: open | Type: model - Multimodal, same performance as Llama 3.2 90B ∴ est 90B. Model card was hidden: https://assets.amazon.science/9f/a3/ae41627f4ab2bde091f1ebc6b830/the-amazon-nova-family-of-models-technical-report-and-model-card.pdf via https://www.amazon.science/publications/the-amazon-nova-family-of-models-technical-report-and-model-card - **EuroLLM** (Consortium) — 2024-12-01 | Parameters: EuroLLM - License: open | Type: model - 24 official languages are Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish. "we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer" Also: https://eurollm.io/ - **DisTrO 15B** (Nous Research) — 2024-12-01 | Parameters: DisTrO 15B - License: open | Type: model - "About 14 DGXes scattered around the globe. Sometimes more sometimes less, it varies depending on availability. On average, around 112 H100s." https://x.com/bloc97_/status/1863675225810043331 "we introduce DisTrO, a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by four to five orders of magnitude without relying on amortized analysis, enabling low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware." - **RNAformer** (University of Freiburg) — 2024-12-01 | Parameters: 32M - License: closed | Type: model - AI model by University of Freiburg - **INTELLECT-1** (Prime Intellect,Hugging Face,Arcee AI) — 2024-11-29 | Parameters: 10B - License: closed | Type: model - AI model by Prime Intellect,Hugging Face,Arcee AI - **360gpt2-pro** (360 Security Technology) — 2024-11-29 - License: closed | Type: model - AI model by 360 Security Technology - **abab7** (MiniMax) — 2024-11-29 - License: closed | Type: model - AI model by MiniMax - **Zhiyan (智言)** (4Paradigm) — 2024-11-29 - License: closed | Type: model - AI model by 4Paradigm - **Mingzhi Guwen** (Shanghai Shuheng Information Technology Co., Ltd.) — 2024-11-29 - License: closed | Type: model - AI model by Shanghai Shuheng Information Technology Co., Ltd. - **MiLi (米粒)** (CreditEase) — 2024-11-29 - License: closed | Type: model - AI model by CreditEase - **HaiYue (浪潮通用软件有限公司)** (Inspur) — 2024-11-29 | Parameters: 102B - License: closed | Type: model - AI model by Inspur - **DeepThought-8B** (Ruliad) — 2024-11-27 | Parameters: 8B - License: closed | Type: model - AI model by Ruliad - **Ovis1.6-Gemma2-27B** (Alibaba) — 2024-11-26 | Parameters: 28.9B - License: open | Type: model - AI model by Alibaba - **ControlNet (SD 3.5 Large) Depth** (Stability AI) — 2024-11-26 - License: open | Type: model - AI model by Stability AI - **ControlNet (SD 3.5 Large) Blur** (Stability AI) — 2024-11-26 - License: open | Type: model - AI model by Stability AI - **ControlNet (SD 3.5 Large) Canny** (Stability AI) — 2024-11-26 - License: open | Type: model - AI model by Stability AI - **Fugatto 1** (NVIDIA) — 2024-11-25 | Parameters: 2.5B - License: closed | Type: model - AI model by NVIDIA - **ConfRank** (University of Bonn,Institute for Numerical Simulation,Fraunhofer Institute for Algorithms and Scientific Computing) — 2024-11-24 | Parameters: 150K - License: closed | Type: model - AI model by University of Bonn,Institute for Numerical Simulation,Fraunhofer Institute for Algorithms and Scientific Computing - **Fish-Speech 1.5** (Fish Audio) — 2024-11-24 - License: open | Type: model - AI model by Fish Audio - **Hymba** (NVIDIA) — 2024-11-22 | Parameters: 1.5B - License: open | Type: model - AI model by NVIDIA - **Tulu 3 (Tülu 3) 70B** (Allen Institute for AI,University of Washington) — 2024-11-21 | Parameters: 70B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **Gauss2** (Samsung) — 2024-11-21 - License: closed | Type: model - AI model by Samsung - **Tulu 3 8B** (Allen Institute for AI,University of Washington) — 2024-11-21 | Parameters: 8B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **DeepSeek-R1-Lite-Preview** (DeepSeek) — 2024-11-20 - License: closed | Type: model - AI model by DeepSeek - **GPT-4o (Nov 2024)** (OpenAI) — 2024-11-20 - License: closed | Type: model - AI model by OpenAI - **Suno v4** (Suno) — 2024-11-19 - License: closed | Type: model - AI model by Suno - **Boltz-1** (Massachusetts Institute of Technology (MIT), Genesis Therapeutics) — 2024-11-19 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT), Genesis Therapeutics - **Pixtral Large** (Mistral AI) — 2024-11-18 | Parameters: 124B - License: open | Type: model - AI model by Mistral AI - **360Zhinao2-7B** (360 Security Technology) — 2024-11-18 | Parameters: 7B - License: open | Type: model - AI model by 360 Security Technology - **BiRNA-BERT** (Bangladesh University of Engineering and Technology,University of California Riverside,Carnegie Mellon University (CMU)) — 2024-11-18 | Parameters: 117M - License: open | Type: model - AI model by Bangladesh University of Engineering and Technology,University of California Riverside,Carnegie Mellon University (CMU) - **SK Telecom Telco** (SK Telecom) — 2024-11-18 - License: closed | Type: model - AI model by SK Telecom - **k0-math** (Moonshot) — 2024-11-16 - License: closed | Type: model - AI model by Moonshot - **Gemini-Exp-1114** (Google DeepMind) — 2024-11-15 - License: closed | Type: model - AI model by Google DeepMind - **LLaVA-CoT** (Peking University,Tsinghua University,Peng Cheng Laboratory,Alibaba DAMO Academy,Lehigh University) — 2024-11-15 | Parameters: 11B - License: open | Type: model - AI model by Peking University,Tsinghua University,Peng Cheng Laboratory,Alibaba DAMO Academy,Lehigh University - **bailing-pro-1120** (Ant Group) — 2024-11-15 - License: closed | Type: model - AI model by Ant Group - **Qwen2.5-Turbo** (Alibaba) — 2024-11-15 - License: closed | Type: model - AI model by Alibaba - **Mistral Large 2.1** (Mistral AI) — 2024-11-15 | Parameters: 123B - License: open | Type: model - AI model by Mistral AI - **Athene-V2** (Nexusflow) — 2024-11-14 | Parameters: 72B - License: open | Type: model - AI model by Nexusflow - **Gemma2 9B CPT Sahabat-AI** (Indosat,Tech Mahindra,AI Singapore,GoTo) — 2024-11-14 | Parameters: 9B - License: open | Type: model - AI model by Indosat,Tech Mahindra,AI Singapore,GoTo - **Vidu 1.5** (ShengShu) — 2024-11-13 - License: closed | Type: model - AI model by ShengShu - **Qwen2.5-Coder (32B)** (Alibaba) — 2024-11-12 | Parameters: 32.5B - License: open | Type: model - AI model by Alibaba - **MassiveFold** (Université de Lille,Linköping University,Universite de Technologie de Compiègne – CNRS) — 2024-11-11 - License: closed | Type: model - AI model by Université de Lille,Linköping University,Universite de Technologie de Compiègne – CNRS - **SeedEdit** (ByteDance) — 2024-11-11 - License: closed | Type: model - AI model by ByteDance - **NatureLM-audio** (Earth Species Project) — 2024-11-11 | Parameters: 665M - License: open | Type: model - AI model by Earth Species Project - **stFormer** (Shanghai Jiao Tong University,Chinese Academy of Sciences) — 2024-11-09 - License: closed | Type: model - AI model by Shanghai Jiao Tong University,Chinese Academy of Sciences - **Fish-Speech 1.4** (Fish Audio) — 2024-11-09 - License: open | Type: model - AI model by Fish Audio - **TeleChat2-7B** (China Telecom) — 2024-11-08 | Parameters: 7B - License: open | Type: model - AI model by China Telecom - **TeleChat2-3B** (China Telecom) — 2024-11-08 | Parameters: 3B - License: open | Type: model - AI model by China Telecom - **Mistral Moderation** (Mistral AI) — 2024-11-07 - License: closed | Type: model - AI model by Mistral AI - **Hunyuan-Large** (Tencent) — 2024-11-06 | Parameters: 389B - License: open | Type: model - AI model by Tencent - **FLUX1.1 [pro] Ultra** (Black Forest Labs) — 2024-11-06 - License: closed | Type: model - AI model by Black Forest Labs - **FLUX1.1 [pro] Raw** (Black Forest Labs) — 2024-11-06 - License: closed | Type: model - AI model by Black Forest Labs - **OpenPhenom-S/16** (Recursion Pharmaceuticals) — 2024-11-05 | Parameters: 178.0M - License: open | Type: model - AI model by Recursion Pharmaceuticals - **Llama-3.1-Minitron-4B** (NVIDIA) — 2024-11-04 | Parameters: 4B - License: open | Type: model - AI model by NVIDIA - **Minitron 8B** (NVIDIA) — 2024-11-04 | Parameters: 8.3B - License: open | Type: model - AI model by NVIDIA - **Minitron 4B** (NVIDIA) — 2024-11-04 | Parameters: 4.2B - License: open | Type: model - AI model by NVIDIA - **KwooVa** (Harbin Institute of Technology) — 2024-11-02 | Parameters: 8B - License: closed | Type: model - AI model by Harbin Institute of Technology - **KwooLa** (Harbin Institute of Technology) — 2024-11-02 | Parameters: 14B - License: closed | Type: model - AI model by Harbin Institute of Technology - **KwooGr** (Harbin Institute of Technology) — 2024-11-02 - License: closed | Type: model - AI model by Harbin Institute of Technology - **INTELLECT-1** (Prime Intellect) — 2024-11-01 | Parameters: INTELLECT-1 - License: open | Type: model - Training complete 22/Nov/2024. Fully distributed training: "the first decentralized training run of a 10-billion-parameter model, inviting anyone to contribute compute and participate. This brings us one step closer towards open source AGI." - **QwQ-32B-Preview** (Alibaba) — 2024-11-01 | Parameters: QwQ-32B-Preview - License: open | Type: model - Scores 1/5 on latest ALPrompt 2024 H2. Qwen with Question=QwQ - **Teuken-7B** (OpenGPT-X) — 2024-11-01 | Parameters: Teuken-7B - License: open | Type: model - 24 EU languages (60% non-English): bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv. https://opengpt-x.de/models/teuken-7b-de/ & paper date is Sep/2024. - **OLMo 2** (Allen AI) — 2024-11-01 | Parameters: OLMo 2 - License: open | Type: model - Open Language Model (OLMo) 2 Apache 2.0 license for research and educational use. Paper coming. Data: 5 trillion tokens (1.2 epochs of 4T tokens) + 100B tokens (3 runs) + 300B tokens (1 run) merged. https://huggingface.co/allenai/OLMo-2-1124-13B & playground: https://playground.allenai.org/ - **Bi-Mamba** (CMU) — 2024-11-01 | Parameters: Bi-Mamba - License: closed | Type: model - Unreleased, but will be replicated. "a scalable and powerful 1-bit Mamba architecture designed for more efficient large language models" - **k0-math** (Moonshot AI) — 2024-11-01 | Parameters: k0-math - License: open | Type: model - Reasoning, maths only. Very little info available. Chinese. Long context. No paper. - **Marco-o1** (Alibaba) — 2024-11-01 | Parameters: Marco-o1 - License: open | Type: model - No evals. Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset. - **TÜLU 3** (Allen AI) — 2024-11-01 | Parameters: TÜLU 3 - License: open | Type: model - Llama 3.1 post-training, worse performance on most benchmarks. Post training methods include new Reinforcement Learning with Verifiable Rewards (RLVR). "We perform supervised fine-tuning on new capability-focused synthetic data mixed with existing instruction datasets. We then perform preference tuning on on-policy synthetic preference data. We finish training Llama Tülu3 with our new method, Reinforcement Learning with Verifiable Rewards." - **gpt-4o-2024-11-20** (OpenAI) — 2024-11-01 | Parameters: gpt-4o-2024-11-20 - License: open | Type: model - Material decrease in benchmark scores (GPQA: -13.37%, MMLU: -3.38%) compared to Aug/2024. Pruned? Quantized? https://github.com/openai/simple-evals - **DeepSeek-R1-Lite** (DeepSeek-AI) — 2024-11-01 | Parameters: DeepSeek-R1-Lite - License: open | Type: model - Scores 0/5 on latest ALPrompt 2024 H2 "DeepSeek-R1-Lite is currently still in the iterative development stage. It currently only supports web usage and does not support API calls. The base model used by DeepSeek-R1-Lite is also a relatively small model, unable to fully unleash the potential of long reasoning chains. At present, we are continuously iterating on the inference series models. In the future, the official DeepSeek-R1 model will be fully open-sourced. We will publicly release the technical report and deploy API services." https://mp-weixin-qq-com.translate.goog/s/e1YnTxZlzFvjcmrLLTA8fw?_x_tr_sl=zh-CN&_x_tr_tl=en&_x_tr_hl=zh-TW - **Xmodel-LM** (XiaoduoAI) — 2024-11-01 | Parameters: Xmodel-LM - License: open | Type: model - SLM - **Pixtral Large** (Mistral) — 2024-11-01 | Parameters: Pixtral Large - License: open | Type: model - Open-weights multimodal model built on top of Mistral Large 2. - **f1** (Fireworks) — 2024-11-01 | Parameters: f1 - License: open | Type: model - "a compound AI model specialized in complex reasoning, that interweaves multiple open models at the inference layer. " - **Qwen2.5-Coder** (Alibaba) — 2024-11-01 | Parameters: Qwen2.5-Coder - License: open | Type: model - https://qwenlm.github.io/blog/qwen2.5-coder-family/ Jack Clark from Anthropic is saying it’s actually 18T tokens from Qwen2.5 + 5.5T tokens for a total of 23.5T tokens. That doesn’t seem right from my interpretation of the technical report. - **Fox-1** (TensorOpera) — 2024-11-01 | Parameters: Fox-1 - License: open | Type: model - Gold standard for dataset documentation - **Hunyuan-Large** (Tencent) — 2024-11-01 | Parameters: Hunyuan-Large - License: open | Type: model - Hunyuan-Large is pre-trained on 7T tokens, which contains nearly 1.5T tokens of high-quality and diverse synthetic data.' '389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens' - **SEA-LIONv3** (AI Singapore) — 2024-11-01 | Parameters: SEA-LIONv3 - License: open | Type: model - SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region. The Gemma2 9B CPT SEA-LIONv3 base model which has undergone continued pre-training from the base Gemma-2-9B model. SEA-LION stands for Southeast Asian Languages In One Network.' News: https://www.techinasia.com/news/ai-singapore-boosts-sea-ai-sealion-v3-model - **AMD OLMo** (AMD) — 2024-11-01 | Parameters: AMD OLMo - License: open | Type: model - 1 billion parameter LMs trained from scratch using 1.3T tokens on a cluster of AMD Instinct MI250 GPUs. - **SmolLM2** (Hugging Face) — 2024-11-01 | Parameters: SmolLM2 - License: open | Type: model - Base and instruct versions, with Apache 2.0 license - **FvFold** (Jeonbuk National University) — 2024-11-01 | Parameters: 9.2M - License: open | Type: model - AI model by Jeonbuk National University - **Uni-Med** (Tsinghua University,Beijing University of Posts and Telecommunications) — 2024-11-01 | Parameters: 8.8B - License: closed | Type: model - AI model by Tsinghua University,Beijing University of Posts and Telecommunications - **SimPO** (Princeton University,University of Virginia) — 2024-11-01 | Parameters: 9B - License: open | Type: model - AI model by Princeton University,University of Virginia - **π0 (pi-zero)** (Physical Intelligence) — 2024-10-31 | Parameters: 3.3B - License: closed | Type: model - AI model by Physical Intelligence - **Aha (大模型)** (Shanghai Xingzhi Technology Co., Ltd.) — 2024-10-31 - License: closed | Type: model - AI model by Shanghai Xingzhi Technology Co., Ltd. - **VASA-1** (Microsoft Research Asia) — 2024-10-31 | Parameters: 229M - License: closed | Type: model - AI model by Microsoft Research Asia - **AstroOne** (Zhijiang Lab) — 2024-10-30 | Parameters: 70B - License: closed | Type: model - AI model by Zhijiang Lab - **aiXcoder-7B Base** (Peking University) — 2024-10-30 | Parameters: 7B - License: closed | Type: model - AI model by Peking University - **Recraft V3** (Recraft) — 2024-10-30 - License: closed | Type: model - AI model by Recraft - **Universal-2-TF** (AssemblyAI) — 2024-10-30 | Parameters: 600M - License: closed | Type: model - AI model by AssemblyAI - **Haiper 2.0** (Haiper) — 2024-10-29 - License: closed | Type: model - AI model by Haiper - **Stable Diffusion 3.5 Medium** (Stability AI) — 2024-10-29 | Parameters: 2.5B - License: open | Type: model - AI model by Stability AI - **Pro-PRIME** (Shanghai Jiao Tong University,Shanghai AI Lab,East China University of Science and Technology,Shanghai Tech University,Guangzhou Inernational Bio Island,Chinese Academy of Sciences,Shanghai Academy of Experimental Medicine) — 2024-10-28 | Parameters: 650M - License: closed | Type: model - AI model by Shanghai Jiao Tong University,Shanghai AI Lab,East China University of Science and Technology,Shanghai Tech University,Guangzhou Inernational Bio Island,Chinese Academy of Sciences,Shanghai Academy of Experimental Medicine - **Doubao-pro** (ByteDance) — 2024-10-28 | Parameters: 500B - License: closed | Type: model - AI model by ByteDance - **Aya Expanse 32B** (Cohere for AI) — 2024-10-24 | Parameters: 32.3B - License: open | Type: model - AI model by Cohere for AI - **Aya Expanse 8B** (Cohere for AI) — 2024-10-24 | Parameters: 8B - License: open | Type: model - AI model by Cohere for AI - **Bielik 7B** (SpeakLeash,Cyfronet AGH) — 2024-10-24 | Parameters: 7B - License: open | Type: model - AI model by SpeakLeash,Cyfronet AGH - **GigaChat MAX** (Sber) — 2024-10-24 - License: closed | Type: model - AI model by Sber - **YandexGPT 4 Pro** (Yandex) — 2024-10-24 - License: closed | Type: model - AI model by Yandex - **YandexGPT 4 Lite** (Yandex) — 2024-10-24 - License: closed | Type: model - AI model by Yandex - **YOLOv11** (Huddersfield University) — 2024-10-23 - License: closed | Type: model - AI model by Huddersfield University - **Mochi 1** (Genmo) — 2024-10-22 | Parameters: 10B - License: open | Type: model - AI model by Genmo - **NVLM-D 72B** (NVIDIA) — 2024-10-22 | Parameters: 72B - License: open | Type: model - AI model by NVIDIA - **NVLM-H 72B** (NVIDIA) — 2024-10-22 | Parameters: 72B - License: closed | Type: model - AI model by NVIDIA - **NVLM-X 72B** (NVIDIA) — 2024-10-22 | Parameters: 72B - License: closed | Type: model - AI model by NVIDIA - **Claude 3.5 Haiku** (Anthropic) — 2024-10-22 - License: closed | Type: model - AI model by Anthropic - **Stable Diffusion 3.5 Large** (Stability AI) — 2024-10-22 | Parameters: 8.1B - License: open | Type: model - AI model by Stability AI - **Stable Diffusion 3.5 Large Turbo** (Stability AI) — 2024-10-22 - License: open | Type: model - AI model by Stability AI - **LongVU** (Meta AI,King Abdullah University of Science and Technology (KAUST),Korea University) — 2024-10-22 | Parameters: 7B - License: open | Type: model - AI model by Meta AI,King Abdullah University of Science and Technology (KAUST),Korea University - **Granite 3.0 8B** (IBM) — 2024-10-21 | Parameters: 8.1B - License: open | Type: model - AI model by IBM - **Granite 3.0 2B** (IBM) — 2024-10-21 | Parameters: 2.5B - License: open | Type: model - AI model by IBM - **Emu3** (Beijing Academy of Artificial Intelligence / BAAI) — 2024-10-21 | Parameters: 8B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Xunguang** (Alibaba DAMO Academy) — 2024-10-21 - License: closed | Type: model - AI model by Alibaba DAMO Academy - **Taiyi (旷视太乙)** (Megvii Inc) — 2024-10-21 - License: closed | Type: model - AI model by Megvii Inc - **Zhiyun Culture LLM (智云文化大模型)** (Xinhua Zhiyun Technology Co., Ltd.) — 2024-10-21 - License: closed | Type: model - AI model by Xinhua Zhiyun Technology Co., Ltd. - **Allegro** (Rhymes AI) — 2024-10-20 | Parameters: 3.0B - License: open | Type: model - AI model by Rhymes AI - **Depth Anything V2 Giant** (Tik Tok,Hong Kong University) — 2024-10-20 | Parameters: 1.3B - License: closed | Type: model - AI model by Tik Tok,Hong Kong University - **Depth Anything V2 Large** (Tik Tok,Hong Kong University) — 2024-10-20 | Parameters: 335.3M - License: open | Type: model - AI model by Tik Tok,Hong Kong University - **Sonic** (Cartesia) — 2024-10-19 - License: closed | Type: model - AI model by Cartesia - **Yi-Lightning** (01.AI) — 2024-10-18 - License: closed | Type: model - AI model by 01.AI - **TeleChat2-35B** (China Telecom) — 2024-10-18 | Parameters: 35B - License: open | Type: model - AI model by China Telecom - **Janus 1.3B** (DeepSeek,The University of Hong Kong,Peking University) — 2024-10-17 | Parameters: 1.3B - License: open | Type: model - AI model by DeepSeek,The University of Hong Kong,Peking University - **Belle-whisper-larger-v3-turbo-zh** (KE Holdings Inc. (“Beike”)) — 2024-10-16 - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **Ministral 3B** (Mistral AI) — 2024-10-16 | Parameters: 3B - License: closed | Type: model - AI model by Mistral AI - **Ministral 8B** (Mistral AI) — 2024-10-16 | Parameters: 8B - License: open | Type: model - AI model by Mistral AI - **Marco-o1** (Alibaba) — 2024-10-16 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **CHAI-1** (Chai discovery) — 2024-10-15 - License: open | Type: model - AI model by Chai discovery - **Pika 1.5** (Pika Labs) — 2024-10-15 - License: closed | Type: model - AI model by Pika Labs - **Tiangong SkyPaint** (Kunlun Inc.) — 2024-10-15 - License: closed | Type: model - AI model by Kunlun Inc. - **Firefly Video** (Adobe) — 2024-10-14 - License: closed | Type: model - AI model by Adobe - **SANA 1.6B** (NVIDIA,Massachusetts Institute of Technology (MIT),Tsinghua University) — 2024-10-14 | Parameters: 1.6B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT),Tsinghua University - **Alibaba-NLP (mGTE)** (Alibaba,Hong Kong Polytechnic University) — 2024-10-14 | Parameters: 304M - License: closed | Type: model - AI model by Alibaba,Hong Kong Polytechnic University - **MolPath** (Southwest Petroleum University,East China Normal University) — 2024-10-13 - License: closed | Type: model - AI model by Southwest Petroleum University,East China Normal University - **RNADiffFold** (Hangzhou Institute of Medicine,Zhejiang University (ZJU),University of Chinese Academy of Sciences) — 2024-10-13 - License: open | Type: model - AI model by Hangzhou Institute of Medicine,Zhejiang University (ZJU),University of Chinese Academy of Sciences - **PROPERMAB** (Regeneron) — 2024-10-12 - License: closed | Type: model - AI model by Regeneron - **Yuel 2** (Pennsylvania State University) — 2024-10-12 - License: open | Type: model - AI model by Pennsylvania State University - **vScreenML 2.0** (Fox Chase Cancer Center,Temple University School of Pharmacy) — 2024-10-12 - License: closed | Type: model - AI model by Fox Chase Cancer Center,Temple University School of Pharmacy - **EvoBind2** (Stockholm University,Science for Life Laboratory) — 2024-10-12 - License: closed | Type: model - AI model by Stockholm University,Science for Life Laboratory - **ALICE** (Nanhu Brain-Computer Interface Institute,Lingang Laboratory,Medical School of Nantong University,Zhejiang University School of Medicine) — 2024-10-11 - License: closed | Type: model - AI model by Nanhu Brain-Computer Interface Institute,Lingang Laboratory,Medical School of Nantong University,Zhejiang University School of Medicine - **Deep Learning Enabled Discovery of Kinase Drug Targets in Pharos** (West Virginia University,University of New Mexico) — 2024-10-11 - License: closed | Type: model - AI model by West Virginia University,University of New Mexico - **T2V-Turbo-v2** (University of California Santa Barbara (UCSB),University of California Los Angeles (UCLA),Amazon,University of Waterloo) — 2024-10-11 - License: open | Type: model - AI model by University of California Santa Barbara (UCSB),University of California Los Angeles (UCLA),Amazon,University of Waterloo - **Baichuan-Omni** (Baichuan,Westlake University,Zhejiang University (ZJU)) — 2024-10-11 | Parameters: 7B - License: closed | Type: model - AI model by Baichuan,Westlake University,Zhejiang University (ZJU) - **ProteinChat** (University of California San Diego,BioMap Research,The Scripps Research Institute,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)) — 2024-10-10 | Parameters: 14B - License: closed | Type: model - AI model by University of California San Diego,BioMap Research,The Scripps Research Institute,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) - **HaloClass** (University of California Davis,Pt. Jawahar Lal Nehru Memorial Medical College,Purdue University) — 2024-10-10 - License: closed | Type: model - AI model by University of California Davis,Pt. Jawahar Lal Nehru Memorial Medical College,Purdue University - **IEV2MOL** (Tokyo Institute of Technology) — 2024-10-10 - License: closed | Type: model - AI model by Tokyo Institute of Technology - **RDT-1B** (Tsinghua University) — 2024-10-10 | Parameters: 1.2B - License: open | Type: model - AI model by Tsinghua University - **ProCALM (Uniref9B)** (Profluent Bio,California Institute of Technology) — 2024-10-09 | Parameters: 764M - License: closed | Type: model - AI model by Profluent Bio,California Institute of Technology - **SCUBA-D** (University of Science and Technology of China (USTC),Oristruct Biotech Company,iFLYTEK Research) — 2024-10-09 - License: closed | Type: model - AI model by University of Science and Technology of China (USTC),Oristruct Biotech Company,iFLYTEK Research - **AF_unmasked** (Linköping University,Uppsala University,Stockholm University) — 2024-10-09 - License: closed | Type: model - AI model by Linköping University,Uppsala University,Stockholm University - **Chirp 2 Speech-to-Text** (Google) — 2024-10-09 - License: closed | Type: model - AI model by Google - **Genesis** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss Institute of Bioinformatics,Imperial College London,University of Oxford,Prescient Design) — 2024-10-08 - License: open | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss Institute of Bioinformatics,Imperial College London,University of Oxford,Prescient Design - **CLEAN-Contact** (Cleveland Clinic,Kent State University,Pacific Northwest National Laboratory) — 2024-10-08 - License: closed | Type: model - AI model by Cleveland Clinic,Kent State University,Pacific Northwest National Laboratory - **SO3LR** (University of Luxembourg,Technische Universitat Berlin,Berlin Institute for the Foundations of Learning and Data,DeepMind,Max Planck Institute for Informatics,Korea University) — 2024-10-08 - License: closed | Type: model - AI model by University of Luxembourg,Technische Universitat Berlin,Berlin Institute for the Foundations of Learning and Data,DeepMind,Max Planck Institute for Informatics,Korea University - **CogVideoX** (Z.ai (Zhipu AI),Tsinghua University) — 2024-10-08 | Parameters: 5B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **GR-2** (ByteDance) — 2024-10-08 | Parameters: 230M - License: closed | Type: model - AI model by ByteDance - **Pyramid Flow** (Peking University,Kuaishou Technology,Beijing University of Posts and Telecommunications) — 2024-10-08 | Parameters: 2B - License: open | Type: model - AI model by Peking University,Kuaishou Technology,Beijing University of Posts and Telecommunications - **Aria** (Rhymes AI) — 2024-10-08 | Parameters: 24.9B - License: open | Type: model - AI model by Rhymes AI - **MLDD3UTRmRRNAS** (Ginkgo Bioworks) — 2024-10-07 | Parameters: 44M - License: closed | Type: model - AI model by Ginkgo Bioworks - **scHyena** (Korea Advanced Institute of Science and Technology (KAIST)) — 2024-10-04 - License: closed | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST) - **Movie Gen Video** (Meta AI) — 2024-10-04 | Parameters: 30B - License: closed | Type: model - AI model by Meta AI - **Movie Gen Audio** (Meta AI) — 2024-10-04 | Parameters: 13B - License: closed | Type: model - AI model by Meta AI - **Aya-Expanse-32B** (Cohere) — 2024-10-01 | Parameters: Aya-Expanse-32B - License: open | Type: model - "Aya Expanse, a family of highly performant multilingual models that excels across 23 languages and outperforms other leading open-weights models...we have collaborated with over 3,000 researchers from 119 countries to expand cutting-edge multilingual research... 220 language ambassadors from around the world who have been part of this release" - **Claude 3.5 Sonnet (new)** (Anthropic) — 2024-10-01 | Parameters: Claude 3.5 Sonnet (new) - License: open | Type: model - Absurd naming scheme. Paper addendum pp51-64: https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf#page=51 - **Granite 3.0 8B** (IBM) — 2024-10-01 | Parameters: Granite 3.0 8B - License: open | Type: model - Announce: https://www.ibm.com/new/ibm-granite-3-0-open-state-of-the-art-enterprise-models - **Granite-3.0-3B-A800M-Instruct** (IBM) — 2024-10-01 | Parameters: Granite-3.0-3B-A800M-Instruct - License: open | Type: model - Announce: https://www.ibm.com/new/ibm-granite-3-0-open-state-of-the-art-enterprise-models - **aiXcoder-7B** (aiXcoder) — 2024-10-01 | Parameters: aiXcoder-7B - License: open | Type: model - Dataset: The Stack - **Llama-3.1-Nemotron-70B** (NVIDIA) — 2024-10-01 | Parameters: Llama-3.1-Nemotron-70B - License: open | Type: model - Related paper: https://arxiv.org/abs/2410.01257 - **Ministral 8B** (Mistral) — 2024-10-01 | Parameters: Ministral 8B - License: open | Type: model - "Introducing the world’s best edge models" - **Yi-Lightning** (01-ai) — 2024-10-01 | Parameters: Yi-Lightning - License: open | Type: model - "New MoE hybrid expert architecture" and https://x.com/01AI_Yi/status/1845776529185476613 - **Zamba2-7B** (Zyphra) — 2024-10-01 | Parameters: Zamba2-7B - License: open | Type: model - Mamba2 "trained on 128 H100 GPUS for approximately 50 days using our internal training framework developed atop Megatron-LM" - **nGPT** (NVIDIA) — 2024-10-01 | Parameters: nGPT - License: open | Type: model - "a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized...reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length." - **Inflection-3 Pi (3.0)** (Inflection AI) — 2024-10-01 | Parameters: Inflection-3 Pi (3.0) - License: open | Type: model - Inference via Intel Gaudi® 3 128 GB, on-premise available. Minimum spend $100 credits. - **Inflection-3 Productivity (3.0)** (Inflection AI) — 2024-10-01 | Parameters: Inflection-3 Productivity (3.0) - License: open | Type: model - Inference via Intel Gaudi® 3 128 GB, on-premise available. Minimum spend $100 credits. - **EnzymeFlow** (McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Hong Kong University of Science and Technology (HKUST),University of Washington,Microsoft Research,DeepMind,Shanghai Jiao Tong University,University of Montreal / Université de Montréal) — 2024-10-01 - License: open | Type: model - AI model by McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Hong Kong University of Science and Technology (HKUST),University of Washington,Microsoft Research,DeepMind,Shanghai Jiao Tong University,University of Montreal / Université de Montréal - **BindCraft** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Zurich,University of Lausanne,Massachusetts Institute of Technology (MIT),Visterra Inc,Swiss Federal Institute of Technology) — 2024-10-01 - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Zurich,University of Lausanne,Massachusetts Institute of Technology (MIT),Visterra Inc,Swiss Federal Institute of Technology - **PlasmidGPT** (Harvard University) — 2024-10-01 | Parameters: 110M - License: open | Type: model - AI model by Harvard University - **LFM 40B** (Liquid) — 2024-09-30 | Parameters: 40.3B - License: closed | Type: model - AI model by Liquid - **FragLlama** (YDS Pharmatech) — 2024-09-30 | Parameters: 779M - License: closed | Type: model - AI model by YDS Pharmatech - **Takane** (Fujitsu,Cohere) — 2024-09-30 - License: closed | Type: model - AI model by Fujitsu,Cohere - **PocketFlow** (University of Science and Technology of China (USTC),State Key Laboratory of Cognitive Intelligence,Harvard University) — 2024-09-29 - License: closed | Type: model - AI model by University of Science and Technology of China (USTC),State Key Laboratory of Cognitive Intelligence,Harvard University - **FlexSBDD** (University of Science and Technology of China (USTC),State Key Laboratory of Cognitive Intelligence,Princeton University) — 2024-09-29 - License: closed | Type: model - AI model by University of Science and Technology of China (USTC),State Key Laboratory of Cognitive Intelligence,Princeton University - **DFMDock** (Johns Hopkins University) — 2024-09-28 - License: open | Type: model - AI model by Johns Hopkins University - **ConoDL** (Chongqing University,Ministry of Natural Resources (China)) — 2024-09-28 | Parameters: 1.2B - License: open | Type: model - AI model by Chongqing University,Ministry of Natural Resources (China) - **DeepREAD** (Shape Therapeutics) — 2024-09-28 - License: closed | Type: model - AI model by Shape Therapeutics - **PepNet** (Shandong University) — 2024-09-28 - License: closed | Type: model - AI model by Shandong University - **Mdgen** (Massachusetts Institute of Technology (MIT)) — 2024-09-26 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **AlphaChip** (Google DeepMind) — 2024-09-26 - License: closed | Type: model - AI model by Google DeepMind - **SPOT** (Heinrich Heine University,Concordia University) — 2024-09-26 - License: closed | Type: model - AI model by Heinrich Heine University,Concordia University - **Loop-Diffusion** (University of Washington) — 2024-09-26 - License: closed | Type: model - AI model by University of Washington - **RWKV-5 (Eagle) 7B** (RWKV Foundation,EleutherAI,Ohio State University,University of California Santa Barbara (UCSB),Wroclaw Tech (Wrocław University of Science and Technology),Guangdong Laboratory of Artificial Intelligence and Digital Economy (Pazhou Lab),New York University (NYU),Harvard University,Contextual AI,University of Chinese Academy of Sciences,University of California Santa Cruz,Tsinghua University,University of Edinburgh,University of British Columbia (UBC),Pennsylvania State University) — 2024-09-26 | Parameters: 7.5B - License: open | Type: model - AI model by RWKV Foundation,EleutherAI,Ohio State University,University of California Santa Barbara (UCSB),Wroclaw Tech (Wrocław University of Science and Technology),Guangdong Laboratory of Artificial Intelligence and Digital Economy (Pazhou Lab),New York University (NYU),Harvard University,Contextual AI,University of Chinese Academy of Sciences,University of California Santa Cruz,Tsinghua University,University of Edinburgh,University of British Columbia (UBC),Pennsylvania State University - **RWKV-6 (Finch) 3B** (RWKV Foundation,EleutherAI,Ohio State University,University of California Santa Barbara (UCSB),Wroclaw Tech (Wrocław University of Science and Technology),Guangdong Laboratory of Artificial Intelligence and Digital Economy (Pazhou Lab),New York University (NYU),Harvard University,Contextual AI,University of Chinese Academy of Sciences,University of California Santa Cruz,Tsinghua University,University of Edinburgh,University of British Columbia (UBC),Pennsylvania State University) — 2024-09-26 | Parameters: 3.1B - License: open | Type: model - AI model by RWKV Foundation,EleutherAI,Ohio State University,University of California Santa Barbara (UCSB),Wroclaw Tech (Wrocław University of Science and Technology),Guangdong Laboratory of Artificial Intelligence and Digital Economy (Pazhou Lab),New York University (NYU),Harvard University,Contextual AI,University of Chinese Academy of Sciences,University of California Santa Cruz,Tsinghua University,University of Edinburgh,University of British Columbia (UBC),Pennsylvania State University - **RNA-DCGen** (Bangladesh University of Engineering and Technology,University of California Riverside) — 2024-09-25 | Parameters: 117M - License: closed | Type: model - AI model by Bangladesh University of Engineering and Technology,University of California Riverside - **ProteinGenerator** (University of Washington,Institute for Protein Design,Georgia Institute of Technology,Microsoft,Heidelberg University) — 2024-09-25 - License: closed | Type: model - AI model by University of Washington,Institute for Protein Design,Georgia Institute of Technology,Microsoft,Heidelberg University - **Mothra** (Tokyo Institute of Technology) — 2024-09-25 - License: closed | Type: model - AI model by Tokyo Institute of Technology - **GeoAB** (Zhejiang University (ZJU),Westlake University) — 2024-09-25 - License: closed | Type: model - AI model by Zhejiang University (ZJU),Westlake University - **ChemNet** (University of Washington) — 2024-09-25 - License: closed | Type: model - AI model by University of Washington - **PPFlow** (Zhejiang University (ZJU),Westlake University) — 2024-09-25 - License: closed | Type: model - AI model by Zhejiang University (ZJU),Westlake University - **TxGNN** (Harvard Medical School,Harvard-MIT Program in Health Sciences and Technology,The Mount Sinai Hospital (New York),Broad Institute,Harvard Data Science Initiative,Harvard University,Stanford University) — 2024-09-25 - License: closed | Type: model - AI model by Harvard Medical School,Harvard-MIT Program in Health Sciences and Technology,The Mount Sinai Hospital (New York),Broad Institute,Harvard Data Science Initiative,Harvard University,Stanford University - **SeaMoon** (Sorbonne University,Université Grenoble Alpes,Institut Universitaire de France (IUF)) — 2024-09-25 | Parameters: 1M - License: closed | Type: model - AI model by Sorbonne University,Université Grenoble Alpes,Institut Universitaire de France (IUF) - **KnoMol** (Zhejiang University (ZJU),Jiangsu University of Technology,Zhejiang University School of Medicine) — 2024-09-25 - License: closed | Type: model - AI model by Zhejiang University (ZJU),Jiangsu University of Technology,Zhejiang University School of Medicine - **CodonMPNN** (Harvard Medical School,Massachusetts Institute of Technology (MIT)) — 2024-09-25 - License: open | Type: model - AI model by Harvard Medical School,Massachusetts Institute of Technology (MIT) - **Molmo 72B** (Allen Institute for AI,University of Washington) — 2024-09-25 | Parameters: 72B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **dnaGrinder** (Hong Kong Polytechnic University) — 2024-09-24 | Parameters: 63.6M - License: closed | Type: model - AI model by Hong Kong Polytechnic University - **Protein-Mamba** (Rensselaer Polytechnic Institute,Stanford University,University of Minnesota,Korea Advanced Institute of Science and Technology (KAIST),University of Illinois Urbana-Champaign (UIUC)) — 2024-09-24 - License: closed | Type: model - AI model by Rensselaer Polytechnic Institute,Stanford University,University of Minnesota,Korea Advanced Institute of Science and Technology (KAIST),University of Illinois Urbana-Champaign (UIUC) - **AFP-Deep** (Nanjing University,Yangzhou University) — 2024-09-24 - License: closed | Type: model - AI model by Nanjing University,Yangzhou University - **ProtBFN** (InstaDeep) — 2024-09-24 | Parameters: 650M - License: closed | Type: model - AI model by InstaDeep - **Llama 3.2 11B** (Meta AI) — 2024-09-24 | Parameters: 10.6B - License: open | Type: model - AI model by Meta AI - **Llama 3.2 90B** (Meta AI) — 2024-09-24 | Parameters: 88.6B - License: open | Type: model - AI model by Meta AI - **Llama 3.2 1B** (Meta AI) — 2024-09-24 | Parameters: 1.2B - License: open | Type: model - AI model by Meta AI - **Llama 3.2 3B** (Meta AI) — 2024-09-24 | Parameters: 3.2B - License: open | Type: model - AI model by Meta AI - **DrugTar** (Isfahan University of Technology) — 2024-09-24 | Parameters: 650M - License: closed | Type: model - AI model by Isfahan University of Technology - **Thermostable protein design** (Indraprastha Institute of Information Technology Delhi) — 2024-09-24 | Parameters: 738M - License: closed | Type: model - AI model by Indraprastha Institute of Information Technology Delhi - **Jaeger** (University Medicine Greifswald,Utrecht University,Friedrich Schiller University Jena) — 2024-09-24 | Parameters: 944.0K - License: closed | Type: model - AI model by University Medicine Greifswald,Utrecht University,Friedrich Schiller University Jena - **Importance of higher-order epistasis in large protein sequence-function relationships** (University of Florida) — 2024-09-24 - License: closed | Type: model - AI model by University of Florida - **MTDP** (Chinese University of Hong Kong (CUHK),City University of Hong Kong) — 2024-09-24 | Parameters: 20M - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK),City University of Hong Kong - **AlphaMut** (Indian Institute of Science Education and Research) — 2024-09-24 - License: closed | Type: model - AI model by Indian Institute of Science Education and Research - **PixelDance** (ByteDance) — 2024-09-24 - License: closed | Type: model - AI model by ByteDance - **ByteDance Seaweed** (ByteDance) — 2024-09-24 - License: closed | Type: model - AI model by ByteDance - **ProteinSetTransformer** (University of Wisconsin Madison) — 2024-09-23 - License: closed | Type: model - AI model by University of Wisconsin Madison - **PreAlgPro** (Shanghai Ocean University) — 2024-09-23 - License: closed | Type: model - AI model by Shanghai Ocean University - **PocketGen** (University of Science and Technology of China (USTC),Hefei Comprehensive National Science Center,Harvard University,Broad Institute,Harvard Data Science Initiative) — 2024-09-23 | Parameters: 7.9M - License: closed | Type: model - AI model by University of Science and Technology of China (USTC),Hefei Comprehensive National Science Center,Harvard University,Broad Institute,Harvard Data Science Initiative - **TAWFN** (Northeastern University (China)) — 2024-09-23 - License: closed | Type: model - AI model by Northeastern University (China) - **Spark 4.0** (iFlytek) — 2024-09-23 - License: closed | Type: model - AI model by iFlytek - **AMPLIFY** (Chandar Research Lab,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Amgen,Polytechnique Montreal,CIFAR AI Research) — 2024-09-23 | Parameters: 350M - License: open | Type: model - AI model by Chandar Research Lab,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Amgen,Polytechnique Montreal,CIFAR AI Research - **MoEFold2D** (George Washington University) — 2024-09-22 | Parameters: 960K - License: open | Type: model - AI model by George Washington University - **IgGM** (Chinese Academy of Sciences,University of Chinese Academy of Sciences,Tencent) — 2024-09-22 - License: open | Type: model - AI model by Chinese Academy of Sciences,University of Chinese Academy of Sciences,Tencent - **Kling 1.5 Pro** (Kuaishou Technology) — 2024-09-22 - License: closed | Type: model - AI model by Kuaishou Technology - **BTFBS** (Nanjing Agricultural University) — 2024-09-22 - License: closed | Type: model - AI model by Nanjing Agricultural University - **PepINVENT** (AstraZeneca,Chalmers University of Technology) — 2024-09-21 - License: open | Type: model - AI model by AstraZeneca,Chalmers University of Technology - **ExSelfRL** (Soochow University) — 2024-09-20 - License: closed | Type: model - AI model by Soochow University - **Telechat2-115B** (China Telecom) — 2024-09-20 | Parameters: 115B - License: open | Type: model - AI model by China Telecom - **Prithvi WxC** (IBM Research,University of Alabama,Stanford University,Colorado State University,Oak Ridge National Laboratory,NASA) — 2024-09-20 | Parameters: 2.3B - License: open | Type: model - AI model by IBM Research,University of Alabama,Stanford University,Colorado State University,Oak Ridge National Laboratory,NASA - **Qwen2.5-72B** (Alibaba) — 2024-09-19 | Parameters: 72.7B - License: open | Type: model - AI model by Alibaba - **EITLEM-Kinetics** (Beijing University of Chemical Technology) — 2024-09-19 - License: open | Type: model - AI model by Beijing University of Chemical Technology - **pKALM** (Hokkaido University) — 2024-09-19 | Parameters: 5.1M - License: closed | Type: model - AI model by Hokkaido University - **GeoSeqBuilder** (Peking University) — 2024-09-19 - License: open | Type: model - AI model by Peking University - **McMLP** (Harvard Medical School,University of Illinois Urbana-Champaign (UIUC),Harvard TH Chan School of Public Health) — 2024-09-19 - License: closed | Type: model - AI model by Harvard Medical School,University of Illinois Urbana-Champaign (UIUC),Harvard TH Chan School of Public Health - **Qwen2.5 Instruct (7B)** (Alibaba) — 2024-09-19 | Parameters: 7.6B - License: open | Type: model - AI model by Alibaba - **Qwen2.5 Instruct (72B)** (Alibaba) — 2024-09-19 | Parameters: 72.7B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-3B** (Alibaba) — 2024-09-19 | Parameters: 3.1B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-7B** (Alibaba) — 2024-09-19 | Parameters: 7.6B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-1.5B** (Alibaba) — 2024-09-19 | Parameters: 1.5B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-14B** (Alibaba) — 2024-09-19 | Parameters: 14.7B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-Math-7B-Base** (Alibaba) — 2024-09-19 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Oryx 34B** (Tsinghua University,Tencent,Nanyang Technological University) — 2024-09-19 | Parameters: 34B - License: open | Type: model - AI model by Tsinghua University,Tencent,Nanyang Technological University - **Oryx 7B** (Tsinghua University,Tencent,Nanyang Technological University) — 2024-09-19 | Parameters: 7B - License: open | Type: model - AI model by Tsinghua University,Tencent,Nanyang Technological University - **Qwen2.5 Instruct (32B)** (Alibaba) — 2024-09-19 | Parameters: 32.5B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-Math-1.5B** (Alibaba) — 2024-09-19 | Parameters: 1.5B - License: open | Type: model - AI model by Alibaba - **Qwen2-VL-72B** (Alibaba) — 2024-09-18 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Qwen2-VL-2B** (Alibaba) — 2024-09-18 | Parameters: 2B - License: open | Type: model - AI model by Alibaba - **Qwen2-VL-7B** (Alibaba) — 2024-09-18 | Parameters: 8B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-Coder (7B)** (Alibaba) — 2024-09-18 | Parameters: 7.6B - License: open | Type: model - AI model by Alibaba - **Qwen2.5-Coder (1.5B)** (Alibaba) — 2024-09-18 | Parameters: 1.5B - License: open | Type: model - AI model by Alibaba - **RoseTTAFold2-Lite** (University of Washington,University of Texas Southwest Medical Center,Seoul National University,Massachusettes General Hospital,Harvard Medical School,Broad Institute) — 2024-09-18 - License: closed | Type: model - AI model by University of Washington,University of Texas Southwest Medical Center,Seoul National University,Massachusettes General Hospital,Harvard Medical School,Broad Institute - **AtomFlow** (Peking University,Chinese University of Hong Kong (CUHK),Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,HEC Montreal,CIFAR AI Research) — 2024-09-18 - License: closed | Type: model - AI model by Peking University,Chinese University of Hong Kong (CUHK),Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,HEC Montreal,CIFAR AI Research - **GraphEC** (Sun Yat-sen University,National Supercomputing Center in Shenzhen,Chongqing University,Key Laboratory of Machine Intelligence and Advanced Computing) — 2024-09-18 - License: closed | Type: model - AI model by Sun Yat-sen University,National Supercomputing Center in Shenzhen,Chongqing University,Key Laboratory of Machine Intelligence and Advanced Computing - **ProtENN2** (European Bioinformatics Institute,University of Cambridge,Google Research) — 2024-09-18 - License: closed | Type: model - AI model by European Bioinformatics Institute,University of Cambridge,Google Research - **Whale Bioacoustics Model** (Google Research,National Oceanic and Atmospheric Administration (NOAA),Oregon State University) — 2024-09-18 - License: open | Type: model - AI model by Google Research,National Oceanic and Atmospheric Administration (NOAA),Oregon State University - **1X World Model** (1X) — 2024-09-17 - License: closed | Type: model - AI model by 1X - **Mistral Small v24.09** (Mistral AI) — 2024-09-17 | Parameters: 22B - License: open | Type: model - AI model by Mistral AI - **AminoAcid-0** (Ginkgo Bioworks) — 2024-09-17 - License: closed | Type: model - AI model by Ginkgo Bioworks - **OmniGen** (Beijing Academy of Artificial Intelligence / BAAI) — 2024-09-17 | Parameters: 3.8B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **DeepRelax** (National University of Singapore,Sun Yat-sen University,Peking University,China Medical University Hospital,Asia university,Guangdong L-Med Biotechnology Company) — 2024-09-17 - License: closed | Type: model - AI model by National University of Singapore,Sun Yat-sen University,Peking University,China Medical University Hospital,Asia university,Guangdong L-Med Biotechnology Company - **ProTeM** (Zhejiang Lab,Zhejiang University (ZJU),Huazhong University of Science and Technology,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)) — 2024-09-17 - License: closed | Type: model - AI model by Zhejiang Lab,Zhejiang University (ZJU),Huazhong University of Science and Technology,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) - **Pixtral 12B** (Mistral AI) — 2024-09-17 | Parameters: 12.4B - License: open | Type: model - AI model by Mistral AI - **Qwen2.5-32B** (Alibaba) — 2024-09-17 | Parameters: 32.5B - License: open | Type: model - AI model by Alibaba - **RNA language models predict mutations that improve RNA function** (NERSC, Lawrence Berkeley National Laboratory,University of California San Francisco,University of California (UC) Berkeley) — 2024-09-16 - License: closed | Type: model - AI model by NERSC, Lawrence Berkeley National Laboratory,University of California San Francisco,University of California (UC) Berkeley - **DeepUrfold** (University of Virginia) — 2024-09-16 | Parameters: 110M - License: closed | Type: model - AI model by University of Virginia - **Ovis1.6-Gemma2-9B** (Alibaba) — 2024-09-16 | Parameters: 10.2B - License: open | Type: model - AI model by Alibaba - **Playground v3** (Playground) — 2024-09-16 - License: closed | Type: model - AI model by Playground - **RNAdiffusion** (Princeton University,Tsinghua University,Stanford University) — 2024-09-15 - License: closed | Type: model - AI model by Princeton University,Tsinghua University,Stanford University - **UdanDTI** (Tsinghua University) — 2024-09-15 - License: closed | Type: model - AI model by Tsinghua University - **Luma Ray 1.6** (LumaLabs) — 2024-09-15 - License: closed | Type: model - AI model by LumaLabs - **ProtRNA** (Fudan University,Shanghai AI Lab) — 2024-09-14 | Parameters: 650M - License: open | Type: model - AI model by Fudan University,Shanghai AI Lab - **PLTNUM** (Kyoto University,National Institute of Biomedical Innovation,RIKEN) — 2024-09-14 | Parameters: 650M - License: closed | Type: model - AI model by Kyoto University,National Institute of Biomedical Innovation,RIKEN - **LEGO** (Chinese Academy of Sciences,Beijing Academy of Artificial Intelligence / BAAI,University of Chinese Academy of Sciences) — 2024-09-14 - License: closed | Type: model - AI model by Chinese Academy of Sciences,Beijing Academy of Artificial Intelligence / BAAI,University of Chinese Academy of Sciences - **MolSnapper** (University of Oxford) — 2024-09-14 - License: closed | Type: model - AI model by University of Oxford - **NeoaPred** (South China University of Technology,Jinan University) — 2024-09-14 - License: closed | Type: model - AI model by South China University of Technology,Jinan University - **YOLOv8 (reCAPTCHA fine-tuned)** (ETH Zurich) — 2024-09-13 - License: closed | Type: model - AI model by ETH Zurich - **JURA Bio Model** (JURA Bio,New York University (NYU),Harvard Medical School,Columbia University) — 2024-09-13 - License: closed | Type: model - AI model by JURA Bio,New York University (NYU),Harvard Medical School,Columbia University - **IDPFold** (Shandong University,BioMap Research,Fuzhou University,Shanghai Jiao Tong University) — 2024-09-13 | Parameters: 17.8M - License: closed | Type: model - AI model by Shandong University,BioMap Research,Fuzhou University,Shanghai Jiao Tong University - **Novae** (CentraleSupelec,Gustave Roussy,Université Paris Cité) — 2024-09-13 | Parameters: 32M - License: open | Type: model - AI model by CentraleSupelec,Gustave Roussy,Université Paris Cité - **CodonTransformer** (Vector Institute,University of Toronto,Université Paris Cité) — 2024-09-13 | Parameters: 89.6M - License: closed | Type: model - AI model by Vector Institute,University of Toronto,Université Paris Cité - **Text2Protein** (University of California San Diego,Brown University) — 2024-09-13 - License: closed | Type: model - AI model by University of California San Diego,Brown University - **o1-mini** (OpenAI) — 2024-09-12 - License: closed | Type: model - AI model by OpenAI - **o1-preview** (OpenAI) — 2024-09-12 - License: closed | Type: model - AI model by OpenAI - **4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment** (Fudan University,Shanghai Academy of Artificial Intelligence for Science,Nanjing University) — 2024-09-12 - License: closed | Type: model - AI model by Fudan University,Shanghai Academy of Artificial Intelligence for Science,Nanjing University - **Demostart (in progress)** (Google DeepMind) — 2024-09-12 - License: closed | Type: model - AI model by Google DeepMind - **DataGemma** (Google) — 2024-09-12 | Parameters: 27.2B - License: open | Type: model - AI model by Google - **E2 TTS** (Microsoft) — 2024-09-12 | Parameters: 335M - License: closed | Type: model - AI model by Microsoft - **Automated design of multi-target ligands by generative deep learning** (Goethe University Frankfurt,Fraunhofer Institute for Translational Medicine and Pharmacology,Ludwig Maximilian University of Munich) — 2024-09-11 | Parameters: 5.8M - License: closed | Type: model - AI model by Goethe University Frankfurt,Fraunhofer Institute for Translational Medicine and Pharmacology,Ludwig Maximilian University of Munich - **Solar Pro** (Upstage) — 2024-09-11 | Parameters: 22B - License: closed | Type: model - AI model by Upstage - **Galaxy (星汉Galaxy大模型)** (Dahan Software / Hanweb) — 2024-09-11 | Parameters: 32B - License: closed | Type: model - AI model by Dahan Software / Hanweb - **EVI 2** (Hume) — 2024-09-11 - License: closed | Type: model - AI model by Hume - **GenMS** (Google DeepMind) — 2024-09-10 - License: closed | Type: model - AI model by Google DeepMind - **KinoML** (Charité-Universitätsmedizin Berlin,Saarland University,Memorial Sloan Kettering Cancer Center) — 2024-09-10 - License: closed | Type: model - AI model by Charité-Universitätsmedizin Berlin,Saarland University,Memorial Sloan Kettering Cancer Center - **CPDiffusion** (Shanghai Jiao Tong University,University of New South Wales,University of Cambridge,Shanghai AI Lab) — 2024-09-10 | Parameters: 4M - License: closed | Type: model - AI model by Shanghai Jiao Tong University,University of New South Wales,University of Cambridge,Shanghai AI Lab - **MolPhenix** (Valence Labs,University of British Columbia (UBC),Vector Institute,University of Toronto,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2024-09-10 | Parameters: 38.7M - License: closed | Type: model - AI model by Valence Labs,University of British Columbia (UBC),Vector Institute,University of Toronto,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **DiffForce** (University of Cambridge,Shanghai Jiao Tong University,University of New South Wales) — 2024-09-09 - License: closed | Type: model - AI model by University of Cambridge,Shanghai Jiao Tong University,University of New South Wales - **ProteinMPNN-DDG** (Peptone) — 2024-09-09 - License: closed | Type: model - AI model by Peptone - **ESMIF-DDG** (Peptone) — 2024-09-09 - License: closed | Type: model - AI model by Peptone - **PDFII** (Sichuan University) — 2024-09-09 - License: closed | Type: model - AI model by Sichuan University - **EpiScan** (Sun Yat-sen University,Guangzhou National Laboratory) — 2024-09-09 | Parameters: 288.9K - License: closed | Type: model - AI model by Sun Yat-sen University,Guangzhou National Laboratory - **VespaG** (Technical University of Munich,Sorbonne University,Institute for Advanced Study,Université Paris Cité,Institut Universitaire de France (IUF)) — 2024-09-09 | Parameters: 660K - License: closed | Type: model - AI model by Technical University of Munich,Sorbonne University,Institute for Advanced Study,Université Paris Cité,Institut Universitaire de France (IUF) - **MMAPLE** (City University of New York,Cornell University) — 2024-09-09 - License: open | Type: model - AI model by City University of New York,Cornell University - **AbGPT** (Carnegie Mellon University (CMU)) — 2024-09-09 | Parameters: 734M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **BetterBodies** (University of Freiburg,Collaborative Research Institute Intelligent Oncology ,BrainLinks-BrainTools) — 2024-09-09 - License: closed | Type: model - AI model by University of Freiburg,Collaborative Research Institute Intelligent Oncology ,BrainLinks-BrainTools - **RiboCode** (Sun Yat-sen University,Rhegen Biotechnology,Chinese Academy of Sciences) — 2024-09-08 - License: closed | Type: model - AI model by Sun Yat-sen University,Rhegen Biotechnology,Chinese Academy of Sciences - **ALOHA Unleashed** (Google DeepMind) — 2024-09-08 | Parameters: 217M - License: closed | Type: model - AI model by Google DeepMind - **DDGemb** (University of Bologna) — 2024-09-07 - License: closed | Type: model - AI model by University of Bologna - **ESM-DBP** (Hunan University) — 2024-09-07 | Parameters: 650M - License: closed | Type: model - AI model by Hunan University - **MPDF** (Chinese University of Hong Kong (CUHK),Lanzhou University,Zhejiang Lab,Zhejiang University (ZJU)) — 2024-09-07 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK),Lanzhou University,Zhejiang Lab,Zhejiang University (ZJU) - **Aristotle** (Harmonic) — 2024-09-07 - License: closed | Type: model - AI model by Harmonic - **DeepSeek-V2.5** (DeepSeek) — 2024-09-06 | Parameters: 236B - License: open | Type: model - AI model by DeepSeek - **GearBind** (BioGeometry,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Fudan University,HEC Montreal) — 2024-09-06 - License: open | Type: model - AI model by BioGeometry,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Fudan University,HEC Montreal - **ProtSSN** (Shanghai Jiao Tong University,East China University of Science and Technology,Shanghai AI Lab) — 2024-09-06 | Parameters: 1.5B - License: closed | Type: model - AI model by Shanghai Jiao Tong University,East China University of Science and Technology,Shanghai AI Lab - **SAAMBE-MEM** (Clemson University,Central China Normal University) — 2024-09-06 - License: closed | Type: model - AI model by Clemson University,Central China Normal University - **xLAM-8x22B** (Salesforce) — 2024-09-06 | Parameters: 141B - License: open | Type: model - AI model by Salesforce - **MiniCPM3.0** (Mianbi Intelligence) — 2024-09-06 | Parameters: 4B - License: closed | Type: model - AI model by Mianbi Intelligence - **AlphaProteo** (Google DeepMind) — 2024-09-05 - License: closed | Type: model - AI model by Google DeepMind - **µFormer** (Microsoft Research AI for Science) — 2024-09-05 | Parameters: 670M - License: closed | Type: model - AI model by Microsoft Research AI for Science - **EnOpt** (University of Pittsburgh) — 2024-09-05 - License: closed | Type: model - AI model by University of Pittsburgh - **Hunyuan Turbo** (Tencent) — 2024-09-05 - License: closed | Type: model - AI model by Tencent - **Xinchen Lingo** (West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司) — 2024-09-05 - License: closed | Type: model - AI model by West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司 - **Harrison.rad.1** (Harrison.ai) — 2024-09-05 - License: closed | Type: model - AI model by Harrison.ai - **CHIEF** (Harvard Medical School,Massachusetts Institute of Technology (MIT),Sichuan University,Sun Yat-sen University,Shenzhen Maternity & Child Healthcare Hospital,Chongqing University Cancer Hospital,Harvard University,University of Pennsylvania,Cedars-Sinai Medical Center,Broad Institute,Dana-Farber Cancer Institute,Brigham and Women's Hospital,Tencent,Massachusettes General Hospital,Pennsylvania State University,Jinan University,Stanford University) — 2024-09-04 - License: open | Type: model - AI model by Harvard Medical School,Massachusetts Institute of Technology (MIT),Sichuan University,Sun Yat-sen University,Shenzhen Maternity & Child Healthcare Hospital,Chongqing University Cancer Hospital,Harvard University,University of Pennsylvania,Cedars-Sinai Medical Center,Broad Institute,Dana-Farber Cancer Institute,Brigham and Women's Hospital,Tencent,Massachusettes General Hospital,Pennsylvania State University,Jinan University,Stanford University - **Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-Terminal Coding Sequences** (National University of Singapore,Jiangnan University) — 2024-09-04 - License: closed | Type: model - AI model by National University of Singapore,Jiangnan University - **MolMVC** (Central South University,Singapore Agency for Science) — 2024-09-04 - License: open | Type: model - AI model by Central South University,Singapore Agency for Science - **OLMoE** (Allen Institute for AI,Contextual AI,University of Washington,Princeton University) — 2024-09-03 | Parameters: 7B - License: open | Type: model - AI model by Allen Institute for AI,Contextual AI,University of Washington,Princeton University - **DrugCLIP** (Tsinghua University,Tsinghua-Peiking Center for Life Sciences,Peking University,Beijing Academy of Artificial Intelligence / BAAI) — 2024-09-03 - License: closed | Type: model - AI model by Tsinghua University,Tsinghua-Peiking Center for Life Sciences,Peking University,Beijing Academy of Artificial Intelligence / BAAI - **Alphaflow** (Massachusetts Institute of Technology (MIT)) — 2024-09-02 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **ChatMol** (Tsinghua University,PingAn Technology,Beijing University of Posts and Telecommunications) — 2024-09-02 - License: open | Type: model - AI model by Tsinghua University,PingAn Technology,Beijing University of Posts and Telecommunications - **ParetoDrug** (Chinese University of Hong Kong (CUHK),Zhejiang Lab,Zhejiang University (ZJU),Huawei Noah's Ark Lab) — 2024-09-02 - License: open | Type: model - AI model by Chinese University of Hong Kong (CUHK),Zhejiang Lab,Zhejiang University (ZJU),Huawei Noah's Ark Lab - **DTI-LM** (University of Central Florida) — 2024-09-02 - License: closed | Type: model - AI model by University of Central Florida - **ESMFlow** (Massachusetts Institute of Technology (MIT)) — 2024-09-02 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **LFM-40B** (Liquid AI) — 2024-09-01 | Parameters: LFM-40B - License: open | Type: model - 40BA12B. Some controversy/concern over company. Liquid Foundation Models (LFM). "Human preference optimization techniques have not been applied extensively to our models yet." - **SFR-LLaMA-3.1-70B-Judge** (Salesforce) — 2024-09-01 | Parameters: SFR-LLaMA-3.1-70B-Judge - License: closed | Type: model - Code coming soon: https://github.com/SalesforceAIResearch/SFRJudge "we opt to focus on datasets that evaluate modern (2023 and beyond) LLM responses, as older datasets likely contain lower quality responses from less capable models, with correspondingly stale annotations. We supplement human-annotated data with synthetically generated data to endow our judge models with specific capabilities (e.g., following fine-grained rubrics in evaluation)" - **Emu3** (BAAI) — 2024-09-01 | Parameters: Emu3 - License: open | Type: model - VLM. Dataset estimates are based on the unrelated UW/Salesforce dataset MINT-1T (3.4B images, 927M documents) https://arxiv.org/abs/2406.11271v1 - **NLVM 1.0** (NVIDIA) — 2024-09-01 | Parameters: NLVM 1.0 - License: open | Type: model - Flamingo clone. "we use Qwen2-72B-Instruct as the default text-only LLM backbone. We also employ Nous-Hermes-2-Yi-34B for ablation study and faster experimentation... we use InternViT-6B as the default vision encoder" - **Unnamed 1T** (China Telecom Artificial Intelligence Research Institute) — 2024-09-01 | Parameters: Unnamed 1T - License: closed | Type: model - Trained on Chinese GPUs: "Ascend Atlas 800T A2 training server – a Huawei product listed as supporting the Kunpeng 920 7265 or Kunpeng 920 5250 processors" https://www.theregister.com/2024/10/02/china_telecom_model_trained_local_tech/ - **TeleChat2-115B** (China Telecom Artificial Intelligence Research Institute) — 2024-09-01 | Parameters: TeleChat2-115B - License: open | Type: model - Trained on Chinese GPUs: "Ascend Atlas 800T A2 training server – a Huawei product listed as supporting the Kunpeng 920 7265 or Kunpeng 920 5250 processors" https://www.theregister.com/2024/10/02/china_telecom_model_trained_local_tech/ - **AMD-Llama-135m** (AMD) — 2024-09-01 | Parameters: AMD-Llama-135m - License: open | Type: model - Small language model (SLM). Trained on AMD Instinct™ MI250 accelerators. "Pretrain Dataset: We employed the SlimPajama and Project Gutenberg dataset to pretrain the 135M model. Project Gutenberg is a library of over 70,000 free eBooks approximately. This sums up to 670B tokens" - **Llama 3.2 90B** (Meta AI) — 2024-09-01 | Parameters: Llama 3.2 90B - License: open | Type: model - Vision (VLM) - **Llama 3.2 3B** (Meta AI) — 2024-09-01 | Parameters: Llama 3.2 3B - License: open | Type: model - Text (LLM). "Pre-training. [For Llama 3.2 3B] We prune the models from their 8B siblings and use logits from the 8B and 70B models as token-level targets (token-level distillation). We then use knowledge distillation to recover performance." - **Molmo** (Allen AI) — 2024-09-01 | Parameters: Molmo - License: open | Type: model - ViT: Llava as Qwen2 (or Olmo) + CLIP. Multimodal Open Language Model built by Ai2. Announce: https://molmo.allenai.org/blog - **Gemini-1.5-Pro-002** (Google DeepMind) — 2024-09-01 | Parameters: Gemini-1.5-Pro-002 - License: open | Type: model - Sparse MoE. Context window=2M. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **Qwen2.5** (Alibaba) — 2024-09-01 | Parameters: Qwen2.5 - License: open | Type: model - - **GRIN MoE** (Microsoft) — 2024-09-01 | Parameters: GRIN MoE - License: open | Type: model - 16x3.8B "only 6.6B activate parameters". GRIN=GRadient-INformed. "GRIN MoE is pre-trained on 4T tokens as a Causal Language Model. The same training dataset has been used to train Phi-3 dense models" - **Data-Gemma** (Google DeepMind) — 2024-09-01 | Parameters: Data-Gemma - License: open | Type: model - RAG/RIG: "the LLM is fine-tuned to produce natural language Data Commons queries alongside statistics" - **o1-preview** (OpenAI) — 2024-09-01 | Parameters: o1-preview - License: closed | Type: model - - **Reader-LM** (Jina AI) — 2024-09-01 | Parameters: Reader-LM - License: open | Type: model - HTML->Markdown. Specialist small model; outperforms GPT-4o general model, does not outperform Gemini Pro 1.5. - **Pixtral-12b-240910** (Mistral) — 2024-09-01 | Parameters: Pixtral-12b-240910 - License: open | Type: model - "Pixtral was trained to be a drop-in replacement for Mistral Nemo 12B." - **DeepSeek-V2.5** (DeepSeek-AI) — 2024-09-01 | Parameters: DeepSeek-V2.5 - License: open | Type: model - "DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct." - **Yi-Coder** (01-ai) — 2024-09-01 | Parameters: Yi-Coder - License: open | Type: model - 6B=3T tokens, 9B=+0.8T tokens, 9B-Coder=+2.4T tokens=6.2T tokens. See Yi 1.5 34B in this table - **OLMoE-1B-7B** (Allen AI) — 2024-09-01 | Parameters: OLMoE-1B-7B - License: open | Type: model - Open Language (OL) Mixture of Experts (MoE). "We train OLMoE-1B-7B for 5 trillion tokens, however, some recent dense models train significantly longer, such as Llama 3 with 15 trillion tokens. To the best of our knowledge, there has been no large MoE that has been overtrained as much as OLMoE-1B-7B. Specifically, taking the active parameters of OLMoE-1B-7B, our token multiplier is around 5,000 (5T / 1B). There are likely benefits to training even longer, but to what degree overtraining is effective for MoEs and how it differs from dense models still requires more research." - **Moonshot-v1** (Moonshot) — 2024-09-01 - License: closed | Type: model - AI model by Moonshot - **MiniMax Video-01** (MiniMax) — 2024-08-31 - License: closed | Type: model - AI model by MiniMax - **HelixFold3** (Tecorigin LTD,Tsinghua University,Baidu) — 2024-08-30 - License: closed | Type: model - AI model by Tecorigin LTD,Tsinghua University,Baidu - **RENNAISSANCE** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Cambridge,Harvard Medical School) — 2024-08-30 - License: open | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Cambridge,Harvard Medical School - **NV-Embed-v2** (NVIDIA) — 2024-08-30 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA - **Shuka-1** (Sarvam) — 2024-08-30 | Parameters: 9.6B - License: open | Type: model - AI model by Sarvam - **OPUS-Design** (Fudan University,Shanghai AI Lab,Harcam Biomedicines) — 2024-08-29 - License: closed | Type: model - AI model by Fudan University,Shanghai AI Lab,Harcam Biomedicines - **DrugFormer** (University of Florida,University of Texas Health Science Center,H. Lee Moffitt Cancer Center and Research Institute) — 2024-08-29 | Parameters: 21.7M - License: closed | Type: model - AI model by University of Florida,University of Texas Health Science Center,H. Lee Moffitt Cancer Center and Research Institute - **GLM-4-Plus** (Z.ai (Zhipu AI)) — 2024-08-29 - License: closed | Type: model - AI model by Z.ai (Zhipu AI) - **Hairuo** (Inspur) — 2024-08-29 - License: closed | Type: model - AI model by Inspur - **Bielik-11B-v2** (SpeakLeash,Cyfronet AGH) — 2024-08-28 | Parameters: 11B - License: open | Type: model - AI model by SpeakLeash,Cyfronet AGH - **GLA Transformer 1.3B** (MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT)) — 2024-08-27 | Parameters: 1.3B - License: open | Type: model - AI model by MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT) - **GLA Transformer 340M** (MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT)) — 2024-08-27 | Parameters: 340M - License: open | Type: model - AI model by MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT) - **Pharia-1-LLM-7B** (Aleph Alpha) — 2024-08-26 | Parameters: 7.0B - License: open | Type: model - AI model by Aleph Alpha - **DISTRO** (Nous Research) — 2024-08-26 | Parameters: 1.2B - License: closed | Type: model - AI model by Nous Research - **Xinyu** (Zhejiang University (ZJU),Institute for Advanced Algorithms Research,Northeastern University (China),China Telecom,State Key Laboratory of Media Convergence Production Technology and Systems) — 2024-08-23 | Parameters: 13B - License: closed | Type: model - AI model by Zhejiang University (ZJU),Institute for Advanced Algorithms Research,Northeastern University (China),China Telecom,State Key Laboratory of Media Convergence Production Technology and Systems - **Jamba 1.5-Large** (AI21 Labs) — 2024-08-22 | Parameters: 398B - License: open | Type: model - AI model by AI21 Labs - **EvolMPNN** (Aarhus university) — 2024-08-22 - License: closed | Type: model - AI model by Aarhus university - **GameNGen** (Google Research,Google DeepMind,Tel Aviv University) — 2024-08-22 - License: closed | Type: model - AI model by Google Research,Google DeepMind,Tel Aviv University - **GITIII** (Yale School of Public Health) — 2024-08-22 - License: closed | Type: model - AI model by Yale School of Public Health - **Jamba 1.5 Mini** (AI21 Labs) — 2024-08-22 | Parameters: 52B - License: open | Type: model - AI model by AI21 Labs - **CoPRA** (Tsinghua University,University College London (UCL),Monash University,Beijing University of Posts and Telecommunications) — 2024-08-21 - License: closed | Type: model - AI model by Tsinghua University,University College London (UCL),Monash University,Beijing University of Posts and Telecommunications - **Ideogram v2** (Ideogram) — 2024-08-21 - License: closed | Type: model - AI model by Ideogram - **Kosmos-2.5** (Microsoft) — 2024-08-21 | Parameters: 1.3B - License: open | Type: model - AI model by Microsoft - **AntiFormer** (University of Florida,Sichuan University,Shihezi University,University of Macau,University of Texas Health Science Center) — 2024-08-20 | Parameters: 24.7M - License: open | Type: model - AI model by University of Florida,Sichuan University,Shihezi University,University of Macau,University of Texas Health Science Center - **Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Goal Directed Generation** (ETH Zurich,University of Zurich,ETH AI Center) — 2024-08-19 - License: open | Type: model - AI model by ETH Zurich,University of Zurich,ETH AI Center - **LongVILA-7B** (NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,UT Austin) — 2024-08-19 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,UT Austin - **Hybrid-Phi-Mamba-1.5B** (Carnegie Mellon University (CMU),Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Cartesia) — 2024-08-19 | Parameters: 1.5B - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Cartesia - **TransFew** (University of Missouri) — 2024-08-17 - License: closed | Type: model - AI model by University of Missouri - **CLR_ESP** (Kansas State University) — 2024-08-16 - License: open | Type: model - AI model by Kansas State University - **xGen-MM (BLIP-3)** (Salesforce Research,University of Washington) — 2024-08-16 | Parameters: 4B - License: closed | Type: model - AI model by Salesforce Research,University of Washington - **Transformers in music recommendation** (Google Research) — 2024-08-16 - License: closed | Type: model - AI model by Google Research - **DeepSeek-Prover-V1.5** (DeepSeek) — 2024-08-15 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **Gen-3 Alpha Turbo** (Runway) — 2024-08-15 - License: closed | Type: model - AI model by Runway - **Hermes 3 405B** (Nous Research) — 2024-08-14 | Parameters: 405B - License: open | Type: model - AI model by Nous Research - **Hermes 3 70B** (Nous Research) — 2024-08-14 | Parameters: 70B - License: open | Type: model - AI model by Nous Research - **Hermes 3 8B** (Nous Research) — 2024-08-14 | Parameters: 8B - License: open | Type: model - AI model by Nous Research - **Zhiye LLM (浪潮知业大模型)** (Inspur) — 2024-08-14 - License: closed | Type: model - AI model by Inspur - **Grok-2** (xAI) — 2024-08-13 - License: closed | Type: model - AI model by xAI - **Grok-2 mini** (xAI) — 2024-08-13 - License: closed | Type: model - AI model by xAI - **Agent Q** (Stanford University) — 2024-08-13 | Parameters: 70B - License: closed | Type: model - AI model by Stanford University - **Saaras v1** (Sarvam) — 2024-08-13 | Parameters: 1.5B - License: closed | Type: model - AI model by Sarvam - **Cosine Genie** (Cosine) — 2024-08-12 - License: closed | Type: model - AI model by Cosine - **Falcon Mamba** (Technology Innovation Institute) — 2024-08-12 | Parameters: 7B - License: open | Type: model - AI model by Technology Innovation Institute - **P-LLama3** (University of Siena) — 2024-08-12 | Parameters: 8B - License: open | Type: model - AI model by University of Siena - **P-Mistral** (University of Siena) — 2024-08-12 | Parameters: 7B - License: open | Type: model - AI model by University of Siena - **P-Llama2** (University of Siena) — 2024-08-12 | Parameters: 7B - License: open | Type: model - AI model by University of Siena - **P-gemma** (University of Siena) — 2024-08-12 | Parameters: 7B - License: open | Type: model - AI model by University of Siena - **PepMLM** (Duke University,Cornell University,McMaster University) — 2024-08-11 | Parameters: 3B - License: open | Type: model - AI model by Duke University,Cornell University,McMaster University - **Qwen2-Math-1.5B** (Alibaba) — 2024-08-09 | Parameters: 1.5B - License: open | Type: model - AI model by Alibaba - **Qwen2-Math-72B** (Alibaba) — 2024-08-09 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Qwen2-Math-7B** (Alibaba) — 2024-08-09 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Tencent Search LLM (腾讯搜索大模型)** (Tencent) — 2024-08-09 - License: closed | Type: model - AI model by Tencent - **MooER** (Moore Threads) — 2024-08-09 | Parameters: 7.2B - License: closed | Type: model - AI model by Moore Threads - **3-ensemble of Self-ensembles on CIFAR-100** (Google DeepMind) — 2024-08-08 | Parameters: 60.2M - License: closed | Type: model - AI model by Google DeepMind - **EXAONE 3.0** (LG AI Research) — 2024-08-07 | Parameters: 7.8B - License: open | Type: model - AI model by LG AI Research - **Table Tennis Agent** (Google DeepMind) — 2024-08-07 | Parameters: 185K - License: closed | Type: model - AI model by Google DeepMind - **MiniCPM-V 2.6** (OpenBMB (Open Lab for Big Model Base)) — 2024-08-06 | Parameters: 8B - License: open | Type: model - AI model by OpenBMB (Open Lab for Big Model Base) - **LLaVA-OV-7B** (ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST)) — 2024-08-06 | Parameters: 7.6B - License: open | Type: model - AI model by ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST) - **LLaVA-OV-72B** (ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST)) — 2024-08-06 | Parameters: 72B - License: open | Type: model - AI model by ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST) - **GPT-4o (Aug 2024)** (OpenAI) — 2024-08-06 - License: closed | Type: model - AI model by OpenAI - **Jais-70B** (G42,Inception G42) — 2024-08-05 | Parameters: 70B - License: open | Type: model - AI model by G42,Inception G42 - **CXR Foundation** (Google) — 2024-08-02 - License: open | Type: model - AI model by Google - **PLLuM** (Consortium) — 2024-08-01 | Parameters: PLLuM - License: open | Type: model - Polish Large Language Model. Not yet available as of Sep/2024 - **xLAM** (Salesforce) — 2024-08-01 | Parameters: xLAM - License: open | Type: model - 64K sequence length. Released under Apache-2.0. - **LTM-2-mini** (Magic) — 2024-08-01 | Parameters: LTM-2-mini - License: closed | Type: model - Context=100M tokens equals ~10 million lines of code or ~750 novels. - **Rene** (Cartesia) — 2024-08-01 | Parameters: Rene - License: open | Type: model - On-device. "hybrid architecture based on Mamba-2, with feedforward and sliding window attention layers interspersed" - **Gemini 1.5 Flash-8B** (Google DeepMind) — 2024-08-01 | Parameters: Gemini 1.5 Flash-8B - License: open | Type: model - Announce: https://x.com/OfficialLoganK/status/1828480085353234535 1M context for all modalities. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **Pharia-1-LLM-7B** (Aleph Alpha) — 2024-08-01 | Parameters: Pharia-1-LLM-7B - License: open | Type: model - - **TTT-Linear** (Stanford) — 2024-08-01 | Parameters: TTT-Linear - License: open | Type: model - Test-Time Training (TTT) layers. Real-time learning by Stanford, UC, and Meta. Potential for frontier models in 2025+. - **Jamba 1.5** (AI21) — 2024-08-01 | Parameters: Jamba 1.5 - License: open | Type: model - Jamba 1.5 Mini (12B active/52B total) and Jamba 1.5 Large (94B active/398B total) are also optimized for business use cases and capabilities such as function calling, structured output (JSON), and grounded generation. - **phi-3.5-MoE** (Microsoft) — 2024-08-01 | Parameters: phi-3.5-MoE - License: open | Type: model - - **phi-3.5-mini** (Microsoft) — 2024-08-01 | Parameters: phi-3.5-mini - License: open | Type: model - - **Minitron-4B** (NVIDIA) — 2024-08-01 | Parameters: Minitron-4B - License: open | Type: model - Pruned and distilled from Nemotron-4 15B: https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/ - **sarvam-2b** (Sarvam AI) — 2024-08-01 | Parameters: sarvam-2b - License: open | Type: model - Indic languages supported are: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu. - **Grok-2** (xAI) — 2024-08-01 | Parameters: Grok-2 - License: open | Type: model - MMLU-Pro=75.5=SOTA. Claude 3.5S MMLU-Pro=72.83. "Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r." At the time of this blog post, it is outperforming both Claude 3.5 Sonnet and GPT-4-Turbo." [Alan: Grok is Heinlein, Sixth Column is also Heinlein: https://en.wikipedia.org/wiki/Sixth_Column ] - **EXAONE 3.0** (LG) — 2024-08-01 | Parameters: EXAONE 3.0 - License: open | Type: model - “EXAONE”=“EXpert AI for EveryONE” - **Falcon Mamba 7B** (TII) — 2024-08-01 | Parameters: Falcon Mamba 7B - License: open | Type: model - https://huggingface.co/spaces/tiiuae/falcon-mamba-playground - **Flux.1 [pro]** (Black Forest Labs) — 2024-08-01 | Parameters: 12B - License: closed | Type: model - AI model by Black Forest Labs - **Flux.1 [dev]** (Black Forest Labs) — 2024-08-01 | Parameters: 12B - License: open | Type: model - AI model by Black Forest Labs - **OmniParser: Interactable Region Detection Model** (Microsoft Research) — 2024-08-01 - License: open | Type: model - AI model by Microsoft Research - **OmniParser: Icon Description Model** (Microsoft Research) — 2024-08-01 - License: open | Type: model - AI model by Microsoft Research - **BaiLing TTS** (Ant Group) — 2024-08-01 - License: closed | Type: model - AI model by Ant Group - **EPInformer** (The University of Hong Kong,Harvard Medical School) — 2024-08-01 | Parameters: 447.1K - License: closed | Type: model - AI model by The University of Hong Kong,Harvard Medical School - **InternLM2.5** (Shanghai AI Lab) — 2024-08-01 | Parameters: 20B - License: open | Type: model - AI model by Shanghai AI Lab - **Midjourney V6.1** (Midjourney) — 2024-07-30 - License: closed | Type: model - AI model by Midjourney - **Llama SEA-LION V2 8B** (AI Singapore) — 2024-07-30 | Parameters: 8B - License: open | Type: model - AI model by AI Singapore - **AFM-on-device** (Apple) — 2024-07-29 | Parameters: 2.7B - License: closed | Type: model - AI model by Apple - **AFM-server** (Apple) — 2024-07-29 - License: closed | Type: model - AI model by Apple - **RNACG** (Tsinghua University) — 2024-07-29 | Parameters: 4.5M - License: closed | Type: model - AI model by Tsinghua University - **Segment Anything Model 2** (Meta AI) — 2024-07-29 | Parameters: 224.4M - License: open | Type: model - AI model by Meta AI - **Florence-2-B (base)** (Microsoft) — 2024-07-29 | Parameters: 232M - License: closed | Type: model - AI model by Microsoft - **Florence-2-L (large)** (Microsoft) — 2024-07-29 | Parameters: 771M - License: closed | Type: model - AI model by Microsoft - **Yi Vision** (01.AI) — 2024-07-29 - License: closed | Type: model - AI model by 01.AI - **ProRNA3D-Single** (Virginia Tech (Virginia Polytechnic Institute and State University)) — 2024-07-28 - License: open | Type: model - AI model by Virginia Tech (Virginia Polytechnic Institute and State University) - **SaulLM-large** (Equall.ai) — 2024-07-28 | Parameters: 141B - License: open | Type: model - AI model by Equall.ai - **NIO World Model (蔚来大模型)** (NIO) — 2024-07-27 - License: closed | Type: model - AI model by NIO - **AlphaProof** (Google DeepMind) — 2024-07-25 | Parameters: 3B - License: closed | Type: model - AI model by Google DeepMind - **AlphaGeometry 2** (Google DeepMind) — 2024-07-25 - License: closed | Type: model - AI model by Google DeepMind - **X-Portrait** (ByteDance) — 2024-07-25 - License: open | Type: model - AI model by ByteDance - **SearchGPT** (OpenAI) — 2024-07-25 - License: closed | Type: model - AI model by OpenAI - **Mistral Large 2** (Mistral AI) — 2024-07-24 | Parameters: 123B - License: open | Type: model - AI model by Mistral AI - **Stable Video 4D (SV4D)** (Stability AI,Northeastern University) — 2024-07-24 - License: open | Type: model - AI model by Stability AI,Northeastern University - **PrePR-CT** (King Abdullah University of Science and Technology (KAUST),Karolinska Institute) — 2024-07-24 - License: closed | Type: model - AI model by King Abdullah University of Science and Technology (KAUST),Karolinska Institute - **Palmyra Fin** (Writer) — 2024-07-24 | Parameters: 70B - License: closed | Type: model - AI model by Writer - **Palmyra Med** (Writer) — 2024-07-24 | Parameters: 70B - License: open | Type: model - AI model by Writer - **Llama 3.1-405B** (Meta AI) — 2024-07-23 | Parameters: 405B - License: open | Type: model - AI model by Meta AI - **Llama 3.1-70B** (Meta AI) — 2024-07-23 | Parameters: 70B - License: open | Type: model - AI model by Meta AI - **Llama 3.1-8B** (Meta AI) — 2024-07-23 | Parameters: 8B - License: open | Type: model - AI model by Meta AI - **OutfitAnyone** (Alibaba,University of Science and Technology of China (USTC)) — 2024-07-23 - License: closed | Type: model - AI model by Alibaba,University of Science and Technology of China (USTC) - **CarrotAI (CarrotAI大模型)** (Jiangsu Huizhi Intelligent Digital Technology Co., Ltd.) — 2024-07-23 | Parameters: 7B - License: open | Type: model - AI model by Jiangsu Huizhi Intelligent Digital Technology Co., Ltd. - **PepPrCLIP** (Duke University,Cornell University,Sanford Burnham Prebys Institute) — 2024-07-22 - License: open | Type: model - AI model by Duke University,Cornell University,Sanford Burnham Prebys Institute - **OV-DINO** (Guangzhou AI Public Computing Center,Sun Yat-sen University,Meituan Inc,Pengcheng Lab) — 2024-07-22 - License: closed | Type: model - AI model by Guangzhou AI Public Computing Center,Sun Yat-sen University,Meituan Inc,Pengcheng Lab - **DCLM 7B** (Apple) — 2024-07-20 | Parameters: 7B - License: open | Type: model - AI model by Apple - **Zhiwei (知微大模型)** (Beijing Weimeng Chuangke Network Technology) — 2024-07-20 - License: closed | Type: model - AI model by Beijing Weimeng Chuangke Network Technology - **Athene-70B** (Nexusflow) — 2024-07-19 | Parameters: 70B - License: open | Type: model - AI model by Nexusflow - **Grounding Dino L** (Tsinghua University,International Digital Economy Academy,Hong Kong University of Science and Technology (HKUST),Chinese University of Hong Kong (CUHK),Microsoft Research,South China University of Technology) — 2024-07-19 | Parameters: 341M - License: open | Type: model - AI model by Tsinghua University,International Digital Economy Academy,Hong Kong University of Science and Technology (HKUST),Chinese University of Hong Kong (CUHK),Microsoft Research,South China University of Technology - **Eleven Turbo v2.5** (ElevenLabs) — 2024-07-19 - License: closed | Type: model - AI model by ElevenLabs - **Mistral NeMo** (Mistral AI) — 2024-07-18 | Parameters: 12B - License: open | Type: model - AI model by Mistral AI - **GPT-4o mini** (OpenAI) — 2024-07-18 - License: closed | Type: model - AI model by OpenAI - **ChatLing (灵犀大模型)** (Beijing 58 Information Technology) — 2024-07-18 - License: closed | Type: model - AI model by Beijing 58 Information Technology - **Zhizhe Qianwen (智者千问大语言模型)** (Beijing Zhijing Yunchuang Technology Co., Ltd.) — 2024-07-18 - License: closed | Type: model - AI model by Beijing Zhijing Yunchuang Technology Co., Ltd. - **Beiruan (北软大模型)** (Beijing Beida Software Engineering Co., Ltd.) — 2024-07-18 - License: closed | Type: model - AI model by Beijing Beida Software Engineering Co., Ltd. - **YiYuan (壹元大模型)** (SoundAI) — 2024-07-18 - License: closed | Type: model - AI model by SoundAI - **Xingchen Multimodal Model (中电信人工智能科技(北京)有限公司)** (China Telecom) — 2024-07-18 - License: closed | Type: model - AI model by China Telecom - **Cygnet** (Gray Swan) — 2024-07-16 | Parameters: 8B - License: closed | Type: model - AI model by Gray Swan - **Codestral Mamba** (Mistral AI) — 2024-07-16 | Parameters: 7.3B - License: open | Type: model - AI model by Mistral AI - **DeepL LLM** (DeepL) — 2024-07-16 - License: closed | Type: model - AI model by DeepL - **Mathstral** (Mistral AI) — 2024-07-16 | Parameters: 7B - License: open | Type: model - AI model by Mistral AI - **LLaVA-NeXT-32B-Qwen** (LMMs-Lab) — 2024-07-16 | Parameters: 32B - License: open | Type: model - AI model by LMMs-Lab - **Nebula (星云大模型)** (ZTE) — 2024-07-16 | Parameters: 100B - License: closed | Type: model - AI model by ZTE - **ScribblePrompt-SAM** (Massachusetts Institute of Technology (MIT)) — 2024-07-16 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Qwen2-Audio** (Alibaba) — 2024-07-15 | Parameters: 8.2B - License: open | Type: model - AI model by Alibaba - **OmniGenome** (University of Exeter) — 2024-07-15 | Parameters: 186M - License: closed | Type: model - AI model by University of Exeter - **PARM** (Oncode Institute,UMC Utrecht,Netherlands Cancer Institute,University of Groningen,Radboud University Medical Center) — 2024-07-15 - License: closed | Type: model - AI model by Oncode Institute,UMC Utrecht,Netherlands Cancer Institute,University of Groningen,Radboud University Medical Center - **InternVL2-Llama3-76B** (Shanghai AI Lab) — 2024-07-15 | Parameters: 76B - License: open | Type: model - AI model by Shanghai AI Lab - **MP4** (310.ai) — 2024-07-15 - License: closed | Type: model - AI model by 310.ai - **HelixProtX** (Baidu) — 2024-07-12 - License: closed | Type: model - AI model by Baidu - **Deep learning linking mechanistic models to single-cell transcriptomics data reveals transcriptional bursting in response to DNA damage** (Sun Yat-sen University,University of California Irvine,Guangdong Provincial People's Hospital,Guangdong Academy of Medical Sciences) — 2024-07-12 | Parameters: 2.2K - License: closed | Type: model - AI model by Sun Yat-sen University,University of California Irvine,Guangdong Provincial People's Hospital,Guangdong Academy of Medical Sciences - **PaliGemma** (Google DeepMind) — 2024-07-10 | Parameters: 3B - License: open | Type: model - AI model by Google DeepMind - **OpenDiLoCo 150M** (Prime Intellect) — 2024-07-10 | Parameters: 150M - License: closed | Type: model - AI model by Prime Intellect - **OpenDiLoCo 1.1B** (Prime Intellect) — 2024-07-10 | Parameters: 1.1B - License: closed | Type: model - AI model by Prime Intellect - **MAP-Neo** (University of Waterloo,01.AI,Wuhan University) — 2024-07-10 | Parameters: 7B - License: open | Type: model - AI model by University of Waterloo,01.AI,Wuhan University - **SenseChat 5.5** (SenseTime) — 2024-07-06 | Parameters: 600B - License: closed | Type: model - AI model by SenseTime - **Precious3GPT** (Insilico Medicine AI,Harvard Medical School) — 2024-07-05 | Parameters: 89M - License: closed | Type: model - AI model by Insilico Medicine AI,Harvard Medical School - **ColPali** (Illuin Technology,Equall.ai,University Paris-Saclay,ETH Zurich) — 2024-07-05 | Parameters: 3B - License: open | Type: model - AI model by Illuin Technology,Equall.ai,University Paris-Saclay,ETH Zurich - **InternVL2 26B** (Shanghai AI Lab) — 2024-07-04 | Parameters: 25.5B - License: open | Type: model - AI model by Shanghai AI Lab - **InternVL2-40B** (Shanghai AI Lab) — 2024-07-04 | Parameters: 40.1B - License: open | Type: model - AI model by Shanghai AI Lab - **DualNetGO** (Hong Kong University of Science and Technology (HKUST)) — 2024-07-03 | Parameters: 82M - License: closed | Type: model - AI model by Hong Kong University of Science and Technology (HKUST) - **Smaug-34B** (Abacus AI) — 2024-07-03 | Parameters: 34B - License: open | Type: model - AI model by Abacus AI - **Smaug-72B** (Abacus AI) — 2024-07-03 | Parameters: 72B - License: open | Type: model - AI model by Abacus AI - **Palmyra-Med-70B** (Writer) — 2024-07-01 | Parameters: Palmyra-Med-70B - License: open | Type: model - Medical. MMLU Medical Genetics=94.0 - **Palmyra-Fin-70B** (Writer) — 2024-07-01 | Parameters: Palmyra-Fin-70B - License: open | Type: model - Financial. "across a variety of real-world financial use cases. It outperformed popular models like Claude 3.5 Sonnet, GPT-4o, and Mixtral-8x7b" - **Zamba2-small** (Zyphra) — 2024-07-01 | Parameters: Zamba2-small - License: open | Type: model - Mamba2 - **Minitron-8B** (NVIDIA) — 2024-07-01 | Parameters: Minitron-8B - License: open | Type: model - Pruned and distilled from Nemotron-4 15B: https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/ - **Mistral Large 2** (Mistral) — 2024-07-01 | Parameters: Mistral Large 2 - License: open | Type: model - Fits on a single node for inference. - **Llama 3.1 405B** (Meta AI) — 2024-07-01 | Parameters: Llama 3.1 405B - License: open | Type: model - Announce: https://ai.meta.com/blog/meta-llama-3-1/ Model card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md - **GPT-4o mini** (OpenAI) — 2024-07-01 | Parameters: GPT-4o mini - License: open | Type: model - Omnimodel. "OpenAI would not disclose exactly how large GPT-4o mini is, but said it’s roughly in the same tier as other small AI models, such as Llama 3 8b, Claude Haiku and Gemini 1.5 Flash." https://techcrunch.com/2024/07/18/openai-unveils-gpt-4o-mini-a-small-ai-model-powering-chatgpt/ "tested GPT-4o to identify potential risks, which we have addressed and plan to share the details of in the forthcoming GPT-4o system card and Preparedness scorecard." And related paper about instruction hierarchy: https://arxiv.org/abs/2404.13208 - **NeMo** (Mistral) — 2024-07-01 | Parameters: NeMo - License: open | Type: model - With NVIDIA. "Drop-in replacement of Mistral 7B". "trained using Megatron-LM, part of NVIDIA NeMo, with 3,072 H100 80GB Tensor Core GPUs" https://blogs.nvidia.com/blog/mistral-nvidia-ai-model/ - **Codestral Mamba** (Mistral) — 2024-07-01 | Parameters: Codestral Mamba - License: open | Type: model - "Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length." - **Mathstral** (Mistral) — 2024-07-01 | Parameters: Mathstral - License: open | Type: model - "We’re contributing Mathstral to the science community to bolster efforts in advanced mathematical problems requiring complex, multi-step logical reasoning." - **SpreadsheetLLM** (Microsoft) — 2024-07-01 | Parameters: SpreadsheetLLM - License: closed | Type: model - Notable finetune of GPT4-0125-preview "outperforming the vanilla approach by 25.6% in GPT4’s in-context learning setting" - **Spectra** (Consortium) — 2024-07-01 | Parameters: Spectra - License: open | Type: model - AKA TriLM. "Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens." - **next-gen** (DeepL) — 2024-07-01 | Parameters: next-gen - License: open | Type: model - "Built using our own groundbreaking, specialized LLM technology and proprietary training data, designed specifically for translation" - **SmolLM** (Hugging Face) — 2024-07-01 | Parameters: SmolLM - License: open | Type: model - Dataset includes new Cosmopedia v2 synthetic data. 135M and 360M models,each trained on 600B tokens from Smollm-Corpus. 1.7B model trained on 1T tokens from Smollm-Corpus. - **Mockingbird** (Vectara) — 2024-07-01 | Parameters: Mockingbird - License: open | Type: model - "At <10B parameters it's an LLM trained to provide optimal results for RAG and structured outputs." - **FLAMe** (Google DeepMind) — 2024-07-01 | Parameters: FLAMe - License: closed | Type: model - LLM-as-a-Judge autorater. Foundational Large Autorater Models (FLAMe). Uses an instruction-tuned PaLM-2-24B model. Unrelated to Microsoft FLAME Jan/2023. - **Step-2** (StepFun) — 2024-07-01 | Parameters: Step-2 - License: open | Type: model - Launched early Jul/2024: https://pandaily.com/stepfun-releases-three-large-models-of-the-step-series/ "StepFun, founded in April 2023 with the mission to “Scale-up possibilities for everyone,” unites top talent in artificial intelligence from both domestic and international backgrounds, and is dedicated to advancing toward AGI. The company has already launched the Step series of foundation models, which includes Step-2, a cutting-edge trillion-parameter Mixture of Experts (MoE) language model; Step-1.5V, a powerful multimodal large model; and Step-1V, an innovative image generation model, among others." - **H2O-Danube3-4B** (H2O.ai) — 2024-07-01 | Parameters: H2O-Danube3-4B - License: open | Type: model - Runs natively and fully offline on mobile phone. "H2O-Danube3 is a family of decoder only LLM models that use the general Llama model architecture adopting core principles from Llama 2 and Mistral with custom parameters determining the shape of each layer and total parameter count. We use the Mistral tokenizer..." MMLU for chat=54.74, base=55.18 via https://huggingface.co/h2oai/h2o-danube3-4b-base - **Causal Axioms** (Microsoft) — 2024-07-01 | Parameters: Causal Axioms - License: closed | Type: model - "the training dataset follows a specific structure, we develop a custom tokenizer. Alphanumeric node names are tokenized at a character level, while special terms such as ‘causes’, ‘Does’, ‘cause’, ‘Yes’, and ‘No’ are tokenized at the word level... Our training setup consists of around 175k instances of sequential chains with size of chains ranging from 3 to 6 nodes... All models are trained for 100 epochs. [LifeArchitect.ai estimate is 12 tokens per node x 6 nodes x 175,000 instances x 100 epochs = 1.26B tokens]" Based on GPT-2 arch. - **SenseNova 5.5** (SenseTime) — 2024-07-01 | Parameters: SenseNova 5.5 - License: open | Type: model - "The model training was based on over 10TB tokens [sic, taken as 10T tokens instead of 10TB=2T tokens] of high-quality training data, including a large amount of synthetically-generated reasoning chain data, which help to enhance its reasoning capabilities." & "The updates include SenseNova 5o, the first real-time multimodal model in China, which provides a new AI interaction model on par with GPT-4o’s streaming interaction capabilities" - **Helium 7B** (Kyutai) — 2024-07-01 | Parameters: Helium 7B - License: open | Type: model - "1. The model is fine-tuned on 100K transcripts generated by Helium itself. 2. These transcripts are highly detailed, heavily annotated with emotion and style, and conversational. 3. Text to Speech Engine is further fine-tuned on 20 hours of audio recorded by Alice and licensed." - **InternLM2.5** (Shanghai AI Laboratory/SenseTime) — 2024-07-01 | Parameters: InternLM2.5 - License: open | Type: model - "The release of InternLM2.5 series contains 7B model size for now and we are going to release the 1.8B and 20B versions soon" [20B released around 1/Aug/2024] - **Tele-FLM-1T** (BAAI) — 2024-07-01 | Parameters: Tele-FLM-1T - License: open | Type: model - Technical arch testing only, ratio is too low for decent performance. - **YuLan-Base-12B** (Renmin) — 2024-07-01 | Parameters: YuLan-Base-12B - License: open | Type: model - "YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks." - **Llama-SEA-LION-v2-8B-IT** (AI Singapore) — 2024-07-01 | Parameters: 8B - License: open | Type: model - AI model by AI Singapore - **RiboDiffusion** (Beihang University,Nanjing University,Chinese University of Hong Kong (CUHK)) — 2024-06-28 - License: open | Type: model - AI model by Beihang University,Nanjing University,Chinese University of Hong Kong (CUHK) - **ChatBit** (Beijing Institute of Technology,Academy of Military Science,Minzu University of China) — 2024-06-28 | Parameters: 13B - License: closed | Type: model - AI model by Beijing Institute of Technology,Academy of Military Science,Minzu University of China - **Ernie 4.0 Turbo** (Baidu) — 2024-06-28 - License: closed | Type: model - AI model by Baidu - **CriticGPT** (OpenAI) — 2024-06-27 - License: closed | Type: model - AI model by OpenAI - **Index-1.9B** (Shanghai Kuanyu Digital Technology Co., Ltd. (Bilibili)) — 2024-06-27 | Parameters: 1.9B - License: closed | Type: model - AI model by Shanghai Kuanyu Digital Technology Co., Ltd. (Bilibili) - **Molecular Diffusion Models with Virtual Receptors** (Verily Research) — 2024-06-26 - License: closed | Type: model - AI model by Verily Research - **ESM3 (98B)** (EvolutionaryScale,University of California (UC) Berkeley) — 2024-06-25 | Parameters: 98.5B - License: closed | Type: model - AI model by EvolutionaryScale,University of California (UC) Berkeley - **ESM3-open-small** (EvolutionaryScale,University of California (UC) Berkeley) — 2024-06-25 | Parameters: 1.4B - License: open | Type: model - AI model by EvolutionaryScale,University of California (UC) Berkeley - **Flexi-JEST++** (Google DeepMind) — 2024-06-25 - License: closed | Type: model - AI model by Google DeepMind - **JEST++** (Google DeepMind) — 2024-06-25 - License: closed | Type: model - AI model by Google DeepMind - **JEST-L++** (DeepMind) — 2024-06-25 - License: closed | Type: model - AI model by DeepMind - **Gemma 2 9B** (Google DeepMind) — 2024-06-24 | Parameters: 9B - License: open | Type: model - AI model by Google DeepMind - **Gemma 2 27B** (Google DeepMind) — 2024-06-24 | Parameters: 27B - License: open | Type: model - AI model by Google DeepMind - **Gemma 2 2B** (Google DeepMind) — 2024-06-24 | Parameters: 2.6B - License: open | Type: model - AI model by Google DeepMind - **DiffPALM** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss Institute of Bioinformatics) — 2024-06-24 - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swiss Institute of Bioinformatics - **BADGER** (NVIDIA,University of California (UC) Berkeley) — 2024-06-24 | Parameters: 2.9M - License: closed | Type: model - AI model by NVIDIA,University of California (UC) Berkeley - **Cambrian-1-34B** (New York University (NYU)) — 2024-06-24 | Parameters: 34B - License: open | Type: model - AI model by New York University (NYU) - **Cambrian-1-13B** (New York University (NYU)) — 2024-06-24 | Parameters: 13B - License: open | Type: model - AI model by New York University (NYU) - **Cambrian-1-8B** (New York University (NYU)) — 2024-06-24 | Parameters: 8B - License: open | Type: model - AI model by New York University (NYU) - **Code Droid** (Factory) — 2024-06-22 - License: closed | Type: model - AI model by Factory - **Claude 3.5 Sonnet** (Anthropic) — 2024-06-20 - License: closed | Type: model - AI model by Anthropic - **Hermes 2 Theta Llama-3 70B** (Nous Research,Arcee AI) — 2024-06-20 | Parameters: 70B - License: open | Type: model - AI model by Nous Research,Arcee AI - **RNA-FrameFlow** (National University of Singapore,Prescient Design,University of Missouri,University of Cambridge) — 2024-06-19 | Parameters: 16.8M - License: open | Type: model - AI model by National University of Singapore,Prescient Design,University of Missouri,University of Cambridge - **MPNNsol** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University at Buffalo,University of Washington,Massachusetts Institute of Technology (MIT)) — 2024-06-19 - License: open | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University at Buffalo,University of Washington,Massachusetts Institute of Technology (MIT) - **GLM-4V-9B** (Z.ai (Zhipu AI),Tsinghua University) — 2024-06-18 | Parameters: 9B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **DeepSeek-Coder-V2 236B** (DeepSeek) — 2024-06-17 | Parameters: 236B - License: open | Type: model - AI model by DeepSeek - **Gen-3 Alpha** (Runway) — 2024-06-17 - License: closed | Type: model - AI model by Runway - **Ovis-7B** (Alibaba,Nanjing University) — 2024-06-17 | Parameters: 7B - License: open | Type: model - AI model by Alibaba,Nanjing University - **PRISM-1** (Wayve) — 2024-06-17 - License: closed | Type: model - AI model by Wayve - **MindEye2** (Stability AI,Medical AI Research Center (MedARC),Princeton University,University of Minnesota,University of Sydney,University of Waterloo) — 2024-06-15 - License: open | Type: model - AI model by Stability AI,Medical AI Research Center (MedARC),Princeton University,University of Minnesota,University of Sydney,University of Waterloo - **JIUTIAN-139MoE** (China Mobile) — 2024-06-15 | Parameters: 38.8B - License: open | Type: model - AI model by China Mobile - **Nemotron-4 340B** (NVIDIA) — 2024-06-14 | Parameters: 340B - License: open | Type: model - AI model by NVIDIA - **PLaMo-100B** (Preferred Networks Inc) — 2024-06-14 | Parameters: 100B - License: closed | Type: model - AI model by Preferred Networks Inc - **DigiRL** (University of California (UC) Berkeley,University of Illinois Urbana-Champaign (UIUC),Google DeepMind) — 2024-06-14 | Parameters: 1.3B - License: open | Type: model - AI model by University of California (UC) Berkeley,University of Illinois Urbana-Champaign (UIUC),Google DeepMind - **OpenVLA** (Stanford University,University of California (UC) Berkeley,Toyota Research Institute,Google DeepMind,Massachusetts Institute of Technology (MIT),Physical Intelligence) — 2024-06-13 | Parameters: 7.2B - License: open | Type: model - AI model by Stanford University,University of California (UC) Berkeley,Toyota Research Institute,Google DeepMind,Massachusetts Institute of Technology (MIT),Physical Intelligence - **Animate Anyone** (Alibaba) — 2024-06-13 - License: closed | Type: model - AI model by Alibaba - **Mamba2-Hybrid** (NVIDIA) — 2024-06-12 | Parameters: 8.7B - License: open | Type: model - AI model by NVIDIA - **ProteinReDiff** (FPT Software AI Center,University of Chicago,Indiana State University) — 2024-06-12 - License: open | Type: model - AI model by FPT Software AI Center,University of Chicago,Indiana State University - **Stable Diffusion 3 Medium** (Stability AI) — 2024-06-12 | Parameters: 2.5B - License: open | Type: model - AI model by Stability AI - **Luma Dream Machine** (LumaLabs) — 2024-06-12 - License: closed | Type: model - AI model by LumaLabs - **Llama-3.1-Nemotron-70B-Instruct** (NVIDIA,Meta AI) — 2024-06-12 - License: open | Type: model - AI model by NVIDIA,Meta AI - **Shutterstock ImageAI** (Databricks) — 2024-06-12 - License: closed | Type: model - AI model by Databricks - **Megrez-3B-Omni** (Infinigence AI,Tsinghua University,Shanghai Jiao Tong University) — 2024-06-12 | Parameters: 3B - License: open | Type: model - AI model by Infinigence AI,Tsinghua University,Shanghai Jiao Tong University - **Phoenix 1.0 Ultra** (Leonardo AI) — 2024-06-12 - License: closed | Type: model - AI model by Leonardo AI - **Samba 3.8B** (Microsoft,University of Illinois Urbana-Champaign (UIUC)) — 2024-06-11 | Parameters: 3.8B - License: closed | Type: model - AI model by Microsoft,University of Illinois Urbana-Champaign (UIUC) - **TiTok-L** (ByteDance,Technical University of Munich) — 2024-06-11 | Parameters: 307M - License: open | Type: model - AI model by ByteDance,Technical University of Munich - **Kling** (Kuaishou Technology) — 2024-06-10 - License: closed | Type: model - AI model by Kuaishou Technology - **Qwen2-72B** (Alibaba) — 2024-06-07 | Parameters: 72.7B - License: open | Type: model - AI model by Alibaba - **Qwen2-7B** (Alibaba) — 2024-06-07 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Qwen2-57B-A14B** (Alibaba) — 2024-06-07 | Parameters: 57B - License: open | Type: model - AI model by Alibaba - **Qwen2-1.5B** (Alibaba) — 2024-06-07 | Parameters: 1.5B - License: open | Type: model - AI model by Alibaba - **Qwen2-0.5B** (Alibaba) — 2024-06-07 | Parameters: 500M - License: open | Type: model - AI model by Alibaba - **Audioseal** (Facebook AI Research) — 2024-06-06 - License: closed | Type: model - AI model by Facebook AI Research - **ProTrek** (Westlake University) — 2024-06-03 | Parameters: 930M - License: closed | Type: model - AI model by Westlake University - **Prot2Token** (University of Missouri,Politecnico di Milano) — 2024-06-03 | Parameters: 650M - License: closed | Type: model - AI model by University of Missouri,Politecnico di Milano - **MiniCPM-2.4B** (Tsinghua University,ModelBest) — 2024-06-03 | Parameters: 2.4B - License: open | Type: model - AI model by Tsinghua University,ModelBest - **MiniCPM-3-4B** (Tsinghua University,ModelBest) — 2024-06-03 | Parameters: 4B - License: open | Type: model - AI model by Tsinghua University,ModelBest - **MiniCPM-1.2B** (Tsinghua University,ModelBest) — 2024-06-03 | Parameters: 1.2B - License: open | Type: model - AI model by Tsinghua University,ModelBest - **MULAN** (AIRI Artificial Intelligence Research Institute,Skolkovo Institute of Science and Technology,Belozersky Institute of Physio-Chemical Biology,Ligand Pro) — 2024-06-02 | Parameters: 35M - License: open | Type: model - AI model by AIRI Artificial Intelligence Research Institute,Skolkovo Institute of Science and Technology,Belozersky Institute of Physio-Chemical Biology,Ligand Pro - **DRGN-AI** (Stanford University,SLAC National Laboratory,Princeton University,Columbia University) — 2024-06-02 - License: closed | Type: model - AI model by Stanford University,SLAC National Laboratory,Princeton University,Columbia University - **ERNIE 4.0 Turbo** (Baidu) — 2024-06-01 | Parameters: ERNIE 4.0 Turbo - License: open | Type: model - "Ernie Bot has reached 300 million users since its launch [on 16/Mar/2023, public Aug/2023]" Jun/2024 - **Gemma 2** (Google DeepMind) — 2024-06-01 | Parameters: Gemma 2 - License: open | Type: model - Announce: https://blog.google/technology/developers/google-gemma-2/ - **CriticGPT** (OpenAI) — 2024-06-01 | Parameters: CriticGPT - License: closed | Type: model - "LLM Critics Help Catch LLM Bugs" Announce: https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/ - **4M-21** (Apple) — 2024-06-01 | Parameters: 4M-21 - License: open | Type: model - Vision model based on T5-XXL. Modalities: RGB, Caption, Bounding boxes, Semantic segmentation, Depth, Human poses, Surface normals, CLIP, DINOv2, ImageBind, Metadata, Canny edges, SAM edges, SAM instances, Color palette. Project page: https://4m.epfl.ch/ - **ESM3** (EvolutionaryScale) — 2024-06-01 | Parameters: ESM3 - License: partial | Type: model - Biology large language model: "sequence, structure, and function are all masked and predicted during training, ESM3 can generate in all three modalities." 1.4B only released. - **PanGu 5.0 Super** (Huawei) — 2024-06-01 | Parameters: PanGu 5.0 Super - License: partial | Type: model - https://x.com/faridofanani96/status/1804079517193113850/photo/1 - **Claude 3.5 Sonnet** (Anthropic) — 2024-06-01 | Parameters: Claude 3.5 Sonnet - License: closed | Type: model - MMLU=90.4 with prompting. Model card: https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf - **DeepSeek-Coder-V2** (DeepSeek-AI) — 2024-06-01 | Parameters: DeepSeek-Coder-V2 - License: open | Type: model - DeepSeek-V2 with additional 6 trillion tokens. - **DCLM-Baseline 7B 2.6T** (International) — 2024-06-01 | Parameters: DCLM-Baseline 7B 2.6T - License: partial | Type: model - New dataset: 240T tokens: 8× larger than previous SOTA dataset. DCLM-Pool is 240T, DCLM-Baseline is 3.8T: "we combine our 3.8T DCLM-BASELINE with the StarCoder and ProofPile2 data to arrive at a 4.1T token dataset. We train a 7B model for 2.5T tokens" and "We release the DCLM benchmark, framework, models, and datasets at https://datacomp.ai/dclm." - **Nemotron-4-340B** (NVIDIA) — 2024-06-01 | Parameters: Nemotron-4-340B - License: open | Type: model - Open-source equiv of Mar/2023 GPT-4 (1760MoE≈340B, 13T), same param count but 2x the tokens of May/2023 PaLM 2 (340B, 3.6T), competitor to Nov/2023 Grok-1 (314B, 6T). Trained on 6,144 H100s. ~1.3TB for inference. 50+ natural and 40+ coding languages. Trained between December 2023 and May 2024. MMLU 0-shot for instruct=78.7, 5-shot for base=81.1. Permalink for paper: https://research.nvidia.com/publication/2024-06_nemotron-4-340b - **Apple On-Device model Jun/2024** (Apple) — 2024-06-01 | Parameters: Apple On-Device model Jun/2024 - License: open | Type: model - https://lifearchitect.ai/apple/ Likely to be the Apple OpenELM model (Apr/2024). "two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute". https://machinelearning.apple.com/research/introducing-apple-foundation-models The server-based model is possibly Ferret, although it is more properly called a multimodal model (not just language). It could also be Apple GPT based on their Ajax framework: https://archive.md/f3C0r - **MatMul-Free LM** (UCSC) — 2024-06-01 | Parameters: MatMul-Free LM - License: open | Type: model - "we explore alternative methods for mixing tokens without relying on matrix multiplications." Compared with Transformer++ based on Llama-2, not to be confused with the pre-GPT-3 American Express Transformer++ paper from 2/Mar/2020. Instead, Transformer++ is defined in the Mamba paper: 'Transformer++: A Transformer with an improved architecture, namely rotary positional encodings (Su et al. 2021) and SwiGLU MLP (Shazeer 2020)' - **Luna** (Galileo) — 2024-06-01 | Parameters: Luna - License: open | Type: model - Based on DeBERTA-large (440M). RoBERTa=162B token dataset. - **Qwen2** (Alibaba) — 2024-06-01 | Parameters: Qwen2 - License: open | Type: model - Instruct MMLU=82. Instruct GPQA=41.9. https://qwenlm.github.io/blog/qwen2/ - **Qwen2-57B-A14B** (Alibaba) — 2024-06-01 | Parameters: Qwen2-57B-A14B - License: open | Type: model - https://qwenlm.github.io/blog/qwen2/ - **Skywork MoE 16x13B** (Kunlun Tech) — 2024-06-01 | Parameters: Skywork MoE 16x13B - License: open | Type: model - CN + EN. "(MoE) model with 146 billion parameters, 16 experts, and 22 billion activated parameters. This model is initialized from the pre-existing dense checkpoints of our Skywork-13B model." - **Xingchen Voice Model (星辰语音大模型)** (China Telecom) — 2024-06-01 | Parameters: 300M - License: open | Type: model - AI model by China Telecom - **KeTu (Kolors)** (Kuaishou Technology) — 2024-05-31 | Parameters: 2.6B - License: open | Type: model - AI model by Kuaishou Technology - **Mamba 2, 2.7B** (Princeton University,Carnegie Mellon University (CMU)) — 2024-05-31 | Parameters: 2.7B - License: open | Type: model - AI model by Princeton University,Carnegie Mellon University (CMU) - **Granite 20B** (IBM Research) — 2024-05-31 | Parameters: 20B - License: open | Type: model - AI model by IBM Research - **sgRNAGen** (Beijing Institute of Technology) — 2024-05-31 | Parameters: 14.2M - License: closed | Type: model - AI model by Beijing Institute of Technology - **Ark LLM (方舟大模型)** (Zhuhai Wujiefangzhou Intelligent Technology) — 2024-05-31 - License: closed | Type: model - AI model by Zhuhai Wujiefangzhou Intelligent Technology - **FoldFlow2** (Dreamfold,University of Montreal / Université de Montréal,McGill University,University of Oxford) — 2024-05-30 - License: open | Type: model - AI model by Dreamfold,University of Montreal / Université de Montréal,McGill University,University of Oxford - **CLAY** (Shanghai Tech University,Deemos Technology,Huazhong University of Science and Technology) — 2024-05-30 | Parameters: 1.5B - License: closed | Type: model - AI model by Shanghai Tech University,Deemos Technology,Huazhong University of Science and Technology - **Codestral** (Mistral AI) — 2024-05-29 | Parameters: 22.2B - License: open | Type: model - AI model by Mistral AI - **ChunkLlama2-13B** (Alibaba,The University of Hong Kong,Fudan University) — 2024-05-29 | Parameters: 13B - License: closed | Type: model - AI model by Alibaba,The University of Hong Kong,Fudan University - **Aurora** (Microsoft Research) — 2024-05-28 | Parameters: 1.3B - License: closed | Type: model - AI model by Microsoft Research - **Nanbeige2-16B-Chat** (Nanbeige LLM Lab) — 2024-05-28 | Parameters: 15.8B - License: open | Type: model - AI model by Nanbeige LLM Lab - **TimeGPT-1** (Nixtla) — 2024-05-27 - License: closed | Type: model - AI model by Nixtla - **NV-Embed-v1** (NVIDIA) — 2024-05-27 | Parameters: 7B - License: open | Type: model - AI model by NVIDIA - **LUXIA-21.4B-Alignment** (SaltLux) — 2024-05-27 | Parameters: 21.4B - License: open | Type: model - AI model by SaltLux - **Zamba2-7B** (Zyphra) — 2024-05-26 | Parameters: 7B - License: open | Type: model - AI model by Zyphra - **Xingchen Jianwei Security Model (星辰·见微安全大 模型)** (China Telecom) — 2024-05-25 - License: closed | Type: model - AI model by China Telecom - **Genie 2 (bio)** (Columbia University,Rutgers University) — 2024-05-24 | Parameters: 15.7M - License: open | Type: model - AI model by Columbia University,Rutgers University - **LLPS** (InstaDeep) — 2024-05-24 | Parameters: 344M - License: closed | Type: model - AI model by InstaDeep - **OMNI-EPIC** (Imperial College London,University of British Columbia (UBC)) — 2024-05-24 - License: closed | Type: model - AI model by Imperial College London,University of British Columbia (UBC) - **YOLOv10-X** (Tsinghua University) — 2024-05-23 | Parameters: 29.5M - License: open | Type: model - AI model by Tsinghua University - **Baichuan4** (Baichuan) — 2024-05-22 - License: closed | Type: model - AI model by Baichuan - **360Zhinao-7B** (360 Security Technology) — 2024-05-22 | Parameters: 7B - License: open | Type: model - AI model by 360 Security Technology - **ProtT3** (National University of Singapore,University of Science and Technology of China (USTC),Hokkaido University) — 2024-05-21 | Parameters: 1.6B - License: closed | Type: model - AI model by National University of Singapore,University of Science and Technology of China (USTC),Hokkaido University - **ALLaM adapted13B** (Saudi Data and Artificial Intelligence Authority) — 2024-05-21 | Parameters: 13B - License: closed | Type: model - AI model by Saudi Data and Artificial Intelligence Authority - **ALLaM adapted 70B** (Saudi Data and Artificial Intelligence Authority) — 2024-05-21 | Parameters: 70B - License: closed | Type: model - AI model by Saudi Data and Artificial Intelligence Authority - **ALLaM 7B** (Saudi Data and Artificial Intelligence Authority) — 2024-05-21 | Parameters: 7B - License: open | Type: model - AI model by Saudi Data and Artificial Intelligence Authority - **ALLaM 34B** (Saudi Data and Artificial Intelligence Authority) — 2024-05-21 | Parameters: 34B - License: closed | Type: model - AI model by Saudi Data and Artificial Intelligence Authority - **GLM-4 (0520)** (Z.ai (Zhipu AI)) — 2024-05-20 - License: closed | Type: model - AI model by Z.ai (Zhipu AI) - **Diamond** (University of Geneva,University of Edinburgh,Microsoft Research) — 2024-05-20 - License: open | Type: model - AI model by University of Geneva,University of Edinburgh,Microsoft Research - **Octo-Base** (University of California (UC) Berkeley,Stanford University,Carnegie Mellon University (CMU),DeepMind) — 2024-05-20 | Parameters: 93M - License: open | Type: model - AI model by University of California (UC) Berkeley,Stanford University,Carnegie Mellon University (CMU),DeepMind - **Octo-Small** (University of California (UC) Berkeley,Stanford University,Carnegie Mellon University (CMU),DeepMind) — 2024-05-20 | Parameters: 27M - License: open | Type: model - AI model by University of California (UC) Berkeley,Stanford University,Carnegie Mellon University (CMU),DeepMind - **HelixFold** (Baidu) — 2024-05-17 - License: closed | Type: model - AI model by Baidu - **ProSST** (Shanghai Jiao Tong University,Shanghai AI Lab,East China University of Science and Technology) — 2024-05-17 | Parameters: 110M - License: closed | Type: model - AI model by Shanghai Jiao Tong University,Shanghai AI Lab,East China University of Science and Technology - **Chameleon-34B** (Facebook AI Research) — 2024-05-16 | Parameters: 34B - License: open | Type: model - AI model by Facebook AI Research - **FragLlama: Next-fragment prediction for molecular design** (Facebook AI Research) — 2024-05-16 | Parameters: 7B - License: open | Type: model - AI model by Facebook AI Research - **LBSTER** (Prescient Design,Genentech) — 2024-05-15 | Parameters: 67M - License: closed | Type: model - AI model by Prescient Design,Genentech - **Doubao-lite** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Role-Playing Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Text-to-Speech Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Voice Cloning Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Speech Recognition Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Text-to-Image Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Function Call Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Vectorization Model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Image-to-Image generation model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Real-Time translation model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **Doubao Video Generation model** (ByteDance) — 2024-05-15 - License: closed | Type: model - AI model by ByteDance - **VILA1.5-40B** (NVIDIA,Massachusetts Institute of Technology (MIT)) — 2024-05-15 | Parameters: 40B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT) - **VILA-7B** (NVIDIA,Massachusetts Institute of Technology (MIT)) — 2024-05-15 | Parameters: 7B - License: closed | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT) - **Veo** (Google DeepMind) — 2024-05-14 - License: closed | Type: model - AI model by Google DeepMind - **Imagen 3** (Google DeepMind) — 2024-05-14 - License: closed | Type: model - AI model by Google DeepMind - **LearnLM-Tutor** (Google Research,Google DeepMind,Google,Arizona State University,Lund University,University of Oxford) — 2024-05-14 - License: closed | Type: model - AI model by Google Research,Google DeepMind,Google,Arizona State University,Lund University,University of Oxford - **Hunyuan-DiT** (Tencent) — 2024-05-14 | Parameters: 1.5B - License: open | Type: model - AI model by Tencent - **BRIA 2.3** (BRIA AI) — 2024-05-14 - License: open | Type: model - AI model by BRIA AI - **Yi-Large** (01.AI) — 2024-05-13 | Parameters: 100B - License: closed | Type: model - AI model by 01.AI - **GPT-4o** (OpenAI) — 2024-05-13 - License: closed | Type: model - AI model by OpenAI - **Yi-1.5-34B** (01.AI) — 2024-05-13 | Parameters: 34B - License: open | Type: model - AI model by 01.AI - **Yi-1.5-9B** (01.AI) — 2024-05-13 | Parameters: 8.8B - License: open | Type: model - AI model by 01.AI - **Digivio (迪智伟奥DIGIVIO)** (Shanghai Digivio Information Technology Co., Ltd.) — 2024-05-13 - License: closed | Type: model - AI model by Shanghai Digivio Information Technology Co., Ltd. - **Yuanshi (元石大模型)** (Beijing Yuanshi Technology Co., Ltd.) — 2024-05-13 - License: closed | Type: model - AI model by Beijing Yuanshi Technology Co., Ltd. - **Xiaoyu Q&A** (Xiaoyu Intelligence Information Technology (Yunnan) Co., Ltd) — 2024-05-13 - License: closed | Type: model - AI model by Xiaoyu Intelligence Information Technology (Yunnan) Co., Ltd - **Xinyuan (心元大模型)** (Beijing Lituo Feiyuan Technology Co., Ltd.,Cylingo Group) — 2024-05-13 | Parameters: 14B - License: open | Type: model - AI model by Beijing Lituo Feiyuan Technology Co., Ltd.,Cylingo Group - **Lantu (蓝图大模型)** (Beijing Bitauto Interactive Advertising Company Limited) — 2024-05-13 - License: closed | Type: model - AI model by Beijing Bitauto Interactive Advertising Company Limited - **MoLeR** (Microsoft Research,Novartis) — 2024-05-12 - License: open | Type: model - AI model by Microsoft Research,Novartis - **Fugaku-LLM** (Tohoku University,CyberAgent,Tokyo Institute of Technology,Fujitsu,RIKEN,Nagoya University,Kotoba Technologies) — 2024-05-10 | Parameters: 13B - License: open | Type: model - AI model by Tohoku University,CyberAgent,Tokyo Institute of Technology,Fujitsu,RIKEN,Nagoya University,Kotoba Technologies - **Gemini 1.5 Flash** (Google DeepMind) — 2024-05-10 - License: closed | Type: model - AI model by Google DeepMind - **Gemini 1.5 Flash 8B** (Google DeepMind) — 2024-05-10 | Parameters: 8B - License: closed | Type: model - AI model by Google DeepMind - **MatterSim (M3GNet - MatterSim-v1.0.0-5M)** (Microsoft Research AI for Science) — 2024-05-10 | Parameters: 4.5M - License: open | Type: model - AI model by Microsoft Research AI for Science - **MatterSim (Grpaphomer)** (Microsoft Research AI for Science) — 2024-05-10 | Parameters: 182M - License: closed | Type: model - AI model by Microsoft Research AI for Science - **Falcon 2 11B** (Technology Innovation Institute) — 2024-05-09 | Parameters: 11B - License: open | Type: model - AI model by Technology Innovation Institute - **XPT “Xiao Model” (晓模型XPT)** (Chengdu Xiaoduo Technology Co., Ltd.) — 2024-05-09 - License: closed | Type: model - AI model by Chengdu Xiaoduo Technology Co., Ltd. - **AlphaFold 3** (Google DeepMind,Isomorphic Labs) — 2024-05-08 - License: open | Type: model - AI model by Google DeepMind,Isomorphic Labs - **BiosimDock** (DeepOrigin) — 2024-05-08 - License: closed | Type: model - AI model by DeepOrigin - **Emu2** (Beijing Academy of Artificial Intelligence / BAAI,Tsinghua University,Peking University) — 2024-05-08 | Parameters: 37B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI,Tsinghua University,Peking University - **DeepSeek-V2 (MoE-236B)** (DeepSeek) — 2024-05-07 | Parameters: 236B - License: open | Type: model - AI model by DeepSeek - **xLSTM 1.4B** (Johannes Kepler University Linz) — 2024-05-07 | Parameters: 1.4B - License: closed | Type: model - AI model by Johannes Kepler University Linz - **Amazon Titan Text Premier** (Amazon) — 2024-05-07 - License: closed | Type: model - AI model by Amazon - **Med-Gemini-2D** (Google DeepMind,Google Research) — 2024-05-06 - License: closed | Type: model - AI model by Google DeepMind,Google Research - **Med-Gemini-3D** (Google DeepMind,Google Research) — 2024-05-06 - License: closed | Type: model - AI model by Google DeepMind,Google Research - **Microsoft MAI-1 (2024 unreleased)** (Microsoft) — 2024-05-06 | Parameters: 500B - License: closed | Type: model - AI model by Microsoft - **Soccer Robot** (Google DeepMind,University College London (UCL)) — 2024-05-03 - License: closed | Type: model - AI model by Google DeepMind,University College London (UCL) - **MetaMath 70B** (University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems) — 2024-05-03 | Parameters: 70B - License: open | Type: model - AI model by University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems - **MetaMath 7B (LLaMa finetune)** (University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems) — 2024-05-03 | Parameters: 7B - License: open | Type: model - AI model by University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems - **MetaMath 7B (Mistral finetune)** (University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems) — 2024-05-03 | Parameters: 7B - License: open | Type: model - AI model by University of Cambridge,Southern University of Science and Technology (SUSTech),Hong Kong University of Science and Technology (HKUST),Huawei Noah's Ark Lab,Alan Turing Institute,Max Planck Institute for Intelligent Systems - **Idefics2** (Hugging Face,Sorbonne University) — 2024-05-03 | Parameters: 8B - License: open | Type: model - AI model by Hugging Face,Sorbonne University - **VILA1.5-13B** (NVIDIA,Massachusetts Institute of Technology (MIT)) — 2024-05-03 | Parameters: 13.5B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT) - **OpenELM-1.1B** (Apple) — 2024-05-02 | Parameters: 1.1B - License: open | Type: model - AI model by Apple - **OpenELM-3B** (Apple) — 2024-05-02 | Parameters: 3.0B - License: open | Type: model - AI model by Apple - **OpenELM-450M** (Apple) — 2024-05-02 | Parameters: 450M - License: open | Type: model - AI model by Apple - **OpenELM-270M** (Apple) — 2024-05-02 | Parameters: 270M - License: open | Type: model - AI model by Apple - **Mamba-2** (CMU) — 2024-05-01 | Parameters: Mamba-2 - License: open | Type: model - Analysis: https://tridao.me/blog/2024/mamba2-part1-model/ - **MAP-Neo** (International) — 2024-05-01 | Parameters: MAP-Neo - License: open | Type: model - "first fully open-sourced bilingual LLM with comparable performance to existing state-of-the-art LLMs... we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided." - **K2** (LLM360) — 2024-05-01 | Parameters: K2 - License: open | Type: model - "K2-65B is a fully reproducible LLM outperforming Llama 2 70B using 35% less compute." - **Codestral** (Mistral) — 2024-05-01 | Parameters: Codestral - License: open | Type: model - Fluent in 80+ programming languages - **Aya-23-35B** (Cohere) — 2024-05-01 | Parameters: Aya-23-35B - License: open | Type: model - - **Yi-XLarge** (01-ai) — 2024-05-01 | Parameters: Yi-XLarge - License: open | Type: model - Still training as of May/2024: https://appserversrc.8btc.cn/FnDYlEC4STBhphu6M3NL4CKH43FW dead link, use: https://finance.china.com.cn/roll/20240513/6116857.shtml - **Yi-Large** (01-ai) — 2024-05-01 | Parameters: Yi-Large - License: open | Type: model - - **Chameleon** (Meta AI) — 2024-05-01 | Parameters: Chameleon - License: open | Type: model - Multimodal - **LearnLM** (Google DeepMind) — 2024-05-01 | Parameters: LearnLM - License: partial | Type: model - Fine-tuned + prompted Gemini (Dec/2023). "The results of LearnLM-Tutor reproduce the performance of Gemini Pro, for example an MMLU score of 0.72 and MATH score of 0.33." - **Sparse Llama 7B** (Cerebras) — 2024-05-01 | Parameters: Sparse Llama 7B - License: open | Type: model - https://www.cerebras.net/blog/introducing-sparse-llama-70-smaller-3x-faster-full-accuracy "For the 50% sparse model, we utilized 45 billion tokens of pretraining data, while an additional 100 billion tokens were used for the 70% model. This represents approximately 2% to 8% of the original 2 trillion tokens used to train the base Llama-2 model." - **Gemini 1.5 Flash** (Google DeepMind) — 2024-05-01 | Parameters: Gemini 1.5 Flash - License: open | Type: model - 1M context length. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **GPT-4o** (OpenAI) — 2024-05-01 | Parameters: GPT-4o - License: closed | Type: model - gpt-4o-2024-05-13 no longer easily available, so hidden in the Model Table rankings. Omnimodel. ‘[GPT-4o is] likely an early checkpoint of GPT-5’. https://twitter.com/drjimfan/status/1790089671365767313 ELO: https://twitter.com/LiamFedus/status/1790064963966370209 Demo: https://youtu.be/DQacCB9tDaw - **Falcon 2 11B** (TII) — 2024-05-01 | Parameters: Falcon 2 11B - License: open | Type: model - Announce: https://www.tii.ae/news/falcon-2-uaes-technology-innovation-institute-releases-new-ai-model-series-outperforming-metas - **Fugaku-LLM** (Fujitsu) — 2024-05-01 | Parameters: Fugaku-LLM - License: open | Type: model - Japanese. CPU trained: 158,976+ A64FX CPUs (7M+ cores), zero GPUs. https://en.wikipedia.org/wiki/Fugaku_(supercomputer) - **Yi 1.5 34B** (01-ai) — 2024-05-01 | Parameters: Yi 1.5 34B - License: open | Type: model - Uses 600B more training tokens than Yi 1.0 (Nov/2023). - **YOCO** (Microsoft) — 2024-05-01 | Parameters: YOCO - License: open | Type: model - With Tsingua. You Only Cache Once (YOCO). Long context "1M context length with near-perfect needle retrieval accuracy" - **DeepSeek-V2** (DeepSeek-AI) — 2024-05-01 | Parameters: DeepSeek-V2 - License: open | Type: model - Huge dataset, 12% Chinese "Therefore, we acknowledge that DeepSeek-V2 still has a slight gap in basic English capabilities with LLaMA3 70B". - **ChuXin** (Independent) — 2024-05-01 | Parameters: ChuXin - License: open | Type: model - "results on the ”Needle In A Haystack”(NIAH) tests indicate that ChuXin-1M performs well across all context window lengths up to 1M." - **RWKV-v6 Finch** (RWKV) — 2024-05-01 | Parameters: RWKV-v6 Finch - License: open | Type: model - RWKV (pronounced RwaKuv) is an RNN: https://twitter.com/BlinkDL_AI/status/1787834625211158562 - **xLSTM** (ELLIS) — 2024-05-01 | Parameters: xLSTM - License: closed | Type: model - New method LSTM to xLSTM, see also RNNs. Code/weights doesn't seem to be released. https://github.com/AI-Guru/xlstm-resources - **Granite Code** (IBM) — 2024-05-01 | Parameters: Granite Code - License: open | Type: model - MMLU=50 for 8B model only. Dataset: publicly available datasets (e.g., GitHub Code Clean, Starcoder data), public code repositories, and issues from GitHub. - **Qwen-Max** (Alibaba) — 2024-05-01 | Parameters: Qwen-Max - License: open | Type: model - https://twitter.com/JustinLin610/status/1787584325367529509 - **Med-Gemini-L 1.0** (Google DeepMind) — 2024-05-01 | Parameters: Med-Gemini-L 1.0 - License: closed | Type: model - Med-Gemini-M 1.0 and Med-Gemini-L 1.0 (Pro and Ultra finetunes) "For language tasks that require less complex reasoning, such as summarizing medical notes and creating referral letters, we introduce Med-Gemini-M 1.0 by fine-tuning the Gemini 1.0 Pro model. For other tasks that require more advanced reasoning, we introduce Med-Gemini-L 1.0 by fine-tuning the Gemini 1.0 Ultra model using a self-training method to enable the models to efficiently use web search." - **Med-Gemini-M 1.5** (Google DeepMind,Google Research) — 2024-05-01 - License: closed | Type: model - AI model by Google DeepMind,Google Research - **GenCast** (Google DeepMind) — 2024-05-01 - License: open | Type: model - AI model by Google DeepMind - **Multi-Token Prediction 7B** (Facebook AI Research) — 2024-04-30 | Parameters: 6.7B - License: open | Type: model - AI model by Facebook AI Research - **Multi-Token Prediction 13B** (Facebook AI Research) — 2024-04-30 | Parameters: 13B - License: closed | Type: model - AI model by Facebook AI Research - **DiffPepBuilder** (Peking University) — 2024-04-30 | Parameters: 104M - License: closed | Type: model - AI model by Peking University - **Amazon Q Developer** (Amazon) — 2024-04-30 - License: closed | Type: model - AI model by Amazon - **TAIDE-LX-7B** (National Science and Technology Council) — 2024-04-29 | Parameters: 7B - License: closed | Type: model - AI model by National Science and Technology Council - **TAIDE LX-13B** (National Science and Technology Council) — 2024-04-29 | Parameters: 13B - License: closed | Type: model - AI model by National Science and Technology Council - **Llama 3-TAIDE-LX-8B-Chat-Alpha1** (National Science and Technology Council) — 2024-04-29 | Parameters: 8B - License: closed | Type: model - AI model by National Science and Technology Council - **InternVL1.5** (Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK)) — 2024-04-29 | Parameters: 25.5B - License: open | Type: model - AI model by Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK) - **Swallow** (Tokyo Institute of Technology) — 2024-04-27 | Parameters: 70M - License: open | Type: model - AI model by Tokyo Institute of Technology - **Vidu** (Tsinghua University,ShengShu) — 2024-04-27 - License: closed | Type: model - AI model by Tsinghua University,ShengShu - **Insights into Human Harmony (洞见人和)** (Zhejiang Lianxin Digital Co., Ltd.) — 2024-04-27 - License: closed | Type: model - AI model by Zhejiang Lianxin Digital Co., Ltd. - **Qwen1.5-110B** (Alibaba) — 2024-04-25 | Parameters: 110B - License: open | Type: model - AI model by Alibaba - **Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient** (Huazhong University of Science and Technology,Fudan University,Northwestern Polytechnical University) — 2024-04-24 | Parameters: 35.8M - License: closed | Type: model - AI model by Huazhong University of Science and Technology,Fudan University,Northwestern Polytechnical University - **Arctic** (Snowflake) — 2024-04-24 | Parameters: 480B - License: open | Type: model - AI model by Snowflake - **NEC cotomi** (NEC Corporation) — 2024-04-24 - License: closed | Type: model - AI model by NEC Corporation - **Yuanjing LLM (联通元景大模型)** (China Unicom) — 2024-04-24 - License: closed | Type: model - AI model by China Unicom - **phi-3-mini 3.8B** (Microsoft) — 2024-04-23 | Parameters: 3.8B - License: open | Type: model - AI model by Microsoft - **phi-3-medium 14B** (Microsoft) — 2024-04-23 | Parameters: 14B - License: open | Type: model - AI model by Microsoft - **SenseChat 5.0** (SenseTime) — 2024-04-23 | Parameters: 600B - License: closed | Type: model - AI model by SenseTime - **phi-3-small 7.4B** (Microsoft) — 2024-04-23 | Parameters: 7.4B - License: open | Type: model - AI model by Microsoft - **SI-PLM** (University of Pittsburgh) — 2024-04-23 - License: closed | Type: model - AI model by University of Pittsburgh - **phi-3.5-mini** (Microsoft) — 2024-04-23 | Parameters: 3.8B - License: open | Type: model - AI model by Microsoft - **phi-3.5-Vision** (Microsoft) — 2024-04-23 | Parameters: 4.2B - License: open | Type: model - AI model by Microsoft - **Phi-3.5-MoE** (Microsoft) — 2024-04-23 | Parameters: 60.8B - License: open | Type: model - AI model by Microsoft - **Phoenix (凤凰大模型)** (Zhixin Shuchuang (Chongqing) Technology Co., Ltd.) — 2024-04-23 - License: closed | Type: model - AI model by Zhixin Shuchuang (Chongqing) Technology Co., Ltd. - **Firefly Image 3** (Adobe) — 2024-04-23 - License: closed | Type: model - AI model by Adobe - **VISTA-2D** (NVIDIA) — 2024-04-22 | Parameters: 100M - License: closed | Type: model - AI model by NVIDIA - **InstructPLM** (Zhejiang Lab,Zhejiang University (ZJU),Nanjing University,Tsinghua University,Alibaba,Chinese University of Hong Kong (CUHK)) — 2024-04-20 | Parameters: 89.1M - License: closed | Type: model - AI model by Zhejiang Lab,Zhejiang University (ZJU),Nanjing University,Tsinghua University,Alibaba,Chinese University of Hong Kong (CUHK) - **SaProt** (Zhejiang University (ZJU),Westlake University) — 2024-04-19 | Parameters: 650M - License: closed | Type: model - AI model by Zhejiang University (ZJU),Westlake University - **Llama 3-70B** (Meta AI) — 2024-04-18 | Parameters: 70B - License: open | Type: model - AI model by Meta AI - **Llama 3-8B** (Meta AI) — 2024-04-18 | Parameters: 8B - License: open | Type: model - AI model by Meta AI - **FRED-T5-XL** (Sber) — 2024-04-18 | Parameters: 1.7B - License: open | Type: model - AI model by Sber - **LLaMA-3-Instruct-8B** (Meta AI) — 2024-04-18 | Parameters: 8B - License: open | Type: model - AI model by Meta AI - **Parakeet ASR rnnt 1.1B** (NVIDIA) — 2024-04-18 | Parameters: 1.1B - License: open | Type: model - AI model by NVIDIA - **Reka Edge** (Reka AI) — 2024-04-18 | Parameters: 7B - License: closed | Type: model - AI model by Reka AI - **EVI** (Hume) — 2024-04-18 - License: closed | Type: model - AI model by Hume - **Mixtral 8x22B** (Mistral AI) — 2024-04-17 | Parameters: 141B - License: open | Type: model - AI model by Mistral AI - **SIMA** (Google DeepMind) — 2024-04-17 - License: closed | Type: model - AI model by Google DeepMind - **METL-Global** (University of Wisconsin Madison,Morgridge Institute for Research) — 2024-04-17 | Parameters: 50M - License: closed | Type: model - AI model by University of Wisconsin Madison,Morgridge Institute for Research - **GRITLM 7B** (Contextual AI,The University of Hong Kong,Microsoft) — 2024-04-17 | Parameters: 7.2B - License: open | Type: model - AI model by Contextual AI,The University of Hong Kong,Microsoft - **GRITLM 8x7B** (Contextual AI,The University of Hong Kong,Microsoft) — 2024-04-17 | Parameters: 46.7B - License: open | Type: model - AI model by Contextual AI,The University of Hong Kong,Microsoft - **Tiangong 3.0 (MoE)** (Kunlun Inc.) — 2024-04-17 | Parameters: 400B - License: closed | Type: model - AI model by Kunlun Inc. - **Tiangong SkyMusic** (Kunlun Inc.) — 2024-04-17 - License: closed | Type: model - AI model by Kunlun Inc. - **abab6.5** (MiniMax) — 2024-04-17 | Parameters: 1T - License: closed | Type: model - AI model by MiniMax - **LINGO-2** (Wayve) — 2024-04-17 - License: closed | Type: model - AI model by Wayve - **OLMo 1.7-7B** (Allen Institute for AI) — 2024-04-17 | Parameters: 7B - License: closed | Type: model - AI model by Allen Institute for AI - **Reka Core** (Reka AI) — 2024-04-15 | Parameters: 67B - License: closed | Type: model - AI model by Reka AI - **Reka Flash** (Reka AI) — 2024-04-15 | Parameters: 21B - License: closed | Type: model - AI model by Reka AI - **WizardLM-2 8x22B** (Microsoft) — 2024-04-15 | Parameters: 141B - License: open | Type: model - AI model by Microsoft - **WizardLM-2 70B** (Microsoft) — 2024-04-15 | Parameters: 70B - License: closed | Type: model - AI model by Microsoft - **WizardLM-2 7B** (Microsoft) — 2024-04-15 | Parameters: 7B - License: open | Type: model - AI model by Microsoft - **DDPM** (University Paris-Saclay,Radboud University Medical Center) — 2024-04-13 - License: closed | Type: model - AI model by University Paris-Saclay,Radboud University Medical Center - **DDIM** (University Paris-Saclay,Radboud University Medical Center) — 2024-04-13 - License: closed | Type: model - AI model by University Paris-Saclay,Radboud University Medical Center - **Bencao Zhiku** (Chengdu University of Traditional Chinese Medicine) — 2024-04-12 | Parameters: 1B - License: closed | Type: model - AI model by Chengdu University of Traditional Chinese Medicine - **NOMI GPT** (NIO) — 2024-04-12 - License: closed | Type: model - AI model by NIO - **tsuzumi 7B upgrade 2024** (NTT Communication Science Laboratories) — 2024-04-11 | Parameters: 7B - License: closed | Type: model - AI model by NTT Communication Science Laboratories - **HGRN2 3B** (Shanghai AI Lab,Massachusetts Institute of Technology (MIT),Taptap) — 2024-04-11 | Parameters: 2.9B - License: closed | Type: model - AI model by Shanghai AI Lab,Massachusetts Institute of Technology (MIT),Taptap - **HGRN2 1B** (Shanghai AI Lab,Massachusetts Institute of Technology (MIT),Taptap) — 2024-04-11 | Parameters: 1B - License: closed | Type: model - AI model by Shanghai AI Lab,Massachusetts Institute of Technology (MIT),Taptap - **Zephyr 141B-A39B** (Hugging Face) — 2024-04-11 | Parameters: 141B - License: closed | Type: model - AI model by Hugging Face - **AF2RAVE** (University of Maryland) — 2024-04-10 - License: closed | Type: model - AI model by University of Maryland - **Zephyr 141B-A39B** (Hugging Face,Korea Advanced Institute of Science and Technology (KAIST),Argilla) — 2024-04-10 | Parameters: 141B - License: open | Type: model - AI model by Hugging Face,Korea Advanced Institute of Science and Technology (KAIST),Argilla - **DiffBindFR** (Peking University,Tsinghua-Peiking Center for Life Sciences) — 2024-04-09 - License: open | Type: model - AI model by Peking University,Tsinghua-Peiking Center for Life Sciences - **WeituAI 1.0** (Weitu AI) — 2024-04-09 | Parameters: 15B - License: closed | Type: model - AI model by Weitu AI - **GPT-4 Turbo (Apr 2024)** (OpenAI) — 2024-04-09 - License: closed | Type: model - AI model by OpenAI - **Stable LM 2 12B** (Stability AI) — 2024-04-08 | Parameters: 12.1B - License: open | Type: model - AI model by Stability AI - **YaART** (Yandex) — 2024-04-08 | Parameters: 2.3B - License: closed | Type: model - AI model by Yandex - **OpenThaiGPT v1.0.0 (13B)** (Mahidol University,AI Entrepreneurs Association of Thailand) — 2024-04-08 | Parameters: 13.1B - License: open | Type: model - AI model by Mahidol University,AI Entrepreneurs Association of Thailand - **OpenThaiGPT v1.0.0 (7B)** (Mahidol University,AI Entrepreneurs Association of Thailand) — 2024-04-08 | Parameters: 6.8B - License: open | Type: model - AI model by Mahidol University,AI Entrepreneurs Association of Thailand - **SambaLingo-Thai-Chat (7B)** (SambaNova Systems, Inc) — 2024-04-08 | Parameters: 7.0B - License: open | Type: model - AI model by SambaNova Systems, Inc - **SambaLingo-Thai-Chat-70B** (SambaNova Systems, Inc) — 2024-04-08 | Parameters: 70B - License: open | Type: model - AI model by SambaNova Systems, Inc - **ESM-AA** (Peking University,Nanjing University,Tsinghua University,PharMolix) — 2024-04-05 | Parameters: 35M - License: closed | Type: model - AI model by Peking University,Nanjing University,Tsinghua University,PharMolix - **Command R+** (Cohere,Cohere for AI) — 2024-04-04 | Parameters: 104B - License: open | Type: model - AI model by Cohere,Cohere for AI - **Viking** (Silo AI,University of Turku) — 2024-04-04 | Parameters: 33B - License: open | Type: model - AI model by Silo AI,University of Turku - **eFold** (Harvard Medical School,Stanford University,Columbia University,University of Strasbourg) — 2024-04-04 - License: closed | Type: model - AI model by Harvard Medical School,Stanford University,Columbia University,University of Strasbourg - **Sailor-7B-Chat** (Sea AI Lab,Singapore University of Technology & Design) — 2024-04-04 | Parameters: 7.7B - License: open | Type: model - AI model by Sea AI Lab,Singapore University of Technology & Design - **Universal-1** (AssemblyAI) — 2024-04-03 | Parameters: 600M - License: closed | Type: model - AI model by AssemblyAI - **Mixture-of-Depths** (Google DeepMind,McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2024-04-02 | Parameters: 3B - License: closed | Type: model - AI model by Google DeepMind,McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **POKE´LLMON** (Georgia Institute of Technology) — 2024-04-02 - License: closed | Type: model - AI model by Georgia Institute of Technology - **AutoDiff** (Galixir Technologies,Rensselaer Polytechnic Institute,Massachusetts Institute of Technology (MIT)) — 2024-04-02 - License: closed | Type: model - AI model by Galixir Technologies,Rensselaer Polytechnic Institute,Massachusetts Institute of Technology (MIT) - **XVERSE-MoE-A4.2B** (XVERSE Technology,Shenzhen Yuanxiang Technology) — 2024-04-02 | Parameters: 4.2B - License: open | Type: model - AI model by XVERSE Technology,Shenzhen Yuanxiang Technology - **APUS-xDAN-4.0(MoE)** (Qilin Hesheng Network Technology Co., Ltd. (APUS)) — 2024-04-02 | Parameters: 136B - License: open | Type: model - AI model by Qilin Hesheng Network Technology Co., Ltd. (APUS) - **Youyuanjian (邮远见)** (China Post Consumer Finance Co., Ltd.) — 2024-04-02 - License: closed | Type: model - AI model by China Post Consumer Finance Co., Ltd. - **TinyStories** (Microsoft) — 2024-04-01 | Parameters: TinyStories - License: open | Type: model - Precursor to phi. - **Tele-FLM** (BAAI) — 2024-04-01 | Parameters: Tele-FLM - License: open | Type: model - Also known as FLM-2. "We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research." Discussion paper Jul/2024: https://arxiv.org/abs/2407.02783 - **Qwen-1.5 110B** (Alibaba) — 2024-04-01 | Parameters: Qwen-1.5 110B - License: open | Type: model - Worse performance on GPQA (72B=36.3, 110B=35.9). - **Arctic** (Snowflake AI Research) — 2024-04-01 | Parameters: Arctic - License: open | Type: model - "Arctic uses a unique Dense-MoE Hybrid transformer architecture. It combines a 10B dense transformer model with a residual 128×3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating." - **SenseNova 5.0** (SenseTime) — 2024-04-01 | Parameters: SenseNova 5.0 - License: open | Type: model - GPT-4 scale; low media coverage; no demo in Western world. https://www.techinasia.com/sensetime-pauses-trading-stock-rises-30-model-launch - **OpenELM** (Apple) — 2024-04-01 | Parameters: OpenELM - License: open | Type: model - On-device model (laptop, phone). Open-source Efficient Language Models (OpenELM). https://venturebeat.com/ai/apple-releases-openelm-small-open-source-ai-models-designed-to-run-on-device/ - **phi-3-medium** (Microsoft) — 2024-04-01 | Parameters: phi-3-medium - License: open | Type: model - Preview only, benchmarks being investigated as of May/2024. - **phi-3-mini** (Microsoft) — 2024-04-01 | Parameters: phi-3-mini - License: open | Type: model - "phi3-mini can be quantized to 4-bits so that it only occupies ≈ 1.8GB of memory. We tested the quantized model by deploying phi-3-mini on iPhone 14 with A16 Bionic chip running natively on-device and fully offline achieving more than 12 tokens per second." - **Llama 3 70B** (Meta AI) — 2024-04-01 | Parameters: Llama 3 70B - License: open | Type: model - Instruct MMLU-Pro=56.2 - **Zamba 7B** (Zyphra) — 2024-04-01 | Parameters: Zamba 7B - License: open | Type: model - Mamba1 - **HLAT** (Amazon) — 2024-04-01 | Parameters: HLAT - License: closed | Type: model - HLAT=High-quality LLM pre-trained on AWS Trainium. Same arch as Llama 7B. The pre-training is performed up to 64 Amazon EC2 trn1.32xlarge instances with totalling up to 1024 AWS Trainium accelerators. Read more about Trainium: https://www.aboutamazon.com/news/aws/what-you-need-to-know-about-the-aws-ai-chips-powering-amazons-partnership-with-anthropic - **Idefics2** (Hugging Face) — 2024-04-01 | Parameters: Idefics2 - License: open | Type: model - Clone of Flamingo now using Mistral 7B. Named after Asterix and Obelix's dog Idefix (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS) - **Reka Core** (Reka AI) — 2024-04-01 | Parameters: Reka Core - License: open | Type: model - https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model - **WizardLM-2-8x22B** (Microsoft) — 2024-04-01 | Parameters: WizardLM-2-8x22B - License: open | Type: model - Base model = mistral-8x22b. - **Pile-T5** (EleutherAI) — 2024-04-01 | Parameters: Pile-T5 - License: open | Type: model - - **Zephyr 141B-A35B** (Hugging Face H4) — 2024-04-01 | Parameters: Zephyr 141B-A35B - License: open | Type: model - mixtral-8x22b finetune using Odds Ratio Preference Optimization (ORPO). - **Rerank 3** (Cohere) — 2024-04-01 | Parameters: Rerank 3 - License: open | Type: model - RAG + semantic search, possibly backed by Command-R+. - **gpt-4-turbo-2024-04-09** (OpenAI) — 2024-04-01 | Parameters: gpt-4-turbo-2024-04-09 - License: open | Type: model - This is such a significantly better model that I've added it here. This GPQA=46.5%, old GPT-4 GPQA=36%. https://twitter.com/EpochAIResearch/status/1778463039932584205 MMLU scores are unclear, but may have improved by 1%: https://twitter.com/OpenAI/status/1778602770784002136. Final benchmarks are here: https://archive.md/6Cc0Z - **MiniCPM-2.4B** (Tsinghua) — 2024-04-01 | Parameters: MiniCPM-2.4B - License: open | Type: model - MoE option=https://huggingface.co/openbmb/MiniCPM-MoE-8x2B - **Ferret-UI** (Apple) — 2024-04-01 | Parameters: Ferret-UI - License: open | Type: model - Vicuna base, multimodal. Extension of Ferret from Oct/2023. - **mixtral-8x22b** (Mistral) — 2024-04-01 | Parameters: mixtral-8x22b - License: open | Type: model - MoE=22Bx8, seq=65536. - **Sailor** (Sail) — 2024-04-01 | Parameters: Sailor - License: open | Type: model - SEA languages. Based on Qwen-1.5. https://github.com/sail-sg/sailor-llm "Generally Sailor models consume around 200B tokens, completing a full pass through the SailCraft corpus once. However, the Sailor-0.5B model undergoes training with 400B tokens, equivalent to 2 epochs." - **JetMoE-8B** (MIT) — 2024-04-01 | Parameters: JetMoE-8B - License: open | Type: model - - **Eurus** (Tsinghua) — 2024-04-01 | Parameters: Eurus - License: open | Type: model - Fine-tune of Mistral-7B and CodeLlama-70B. - **Command-R+** (Cohere) — 2024-04-01 | Parameters: Command-R+ - License: open | Type: model - purpose-built to excel at real-world enterprise use cases. Announce with no arch details: https://txt.cohere.com/command-r-plus-microsoft-azure/ - **Viking** (Silo AI) — 2024-04-01 | Parameters: Viking - License: open | Type: model - Viking uses an architecture similar to Llama 2, with flash attention, rotary embeddings, grouped query attention and supports a 4k sequence length' - **OLMo-Bitnet-1B** (Nous Research) — 2024-04-01 | Parameters: OLMo-Bitnet-1B - License: open | Type: model - 1.58-bit quantized (ternary weights) means we can run a 70B model in ~14GB VRAM. See also BitNet b1.58 - **MobileCLIP-B (LT)** (Apple) — 2024-04-01 | Parameters: 149.7M - License: open | Type: model - AI model by Apple - **TeleChat-7B** (China Telecom) — 2024-04-01 | Parameters: 7B - License: open | Type: model - AI model by China Telecom - **TeleChat-3B** (China Telecom) — 2024-04-01 | Parameters: 3B - License: closed | Type: model - AI model by China Telecom - **TeleChat-12B** (China Telecom) — 2024-04-01 | Parameters: 12B - License: open | Type: model - AI model by China Telecom - **TW3-JRGL-v2** (French Engineering School ECE,TW3 Partners) — 2024-04-01 | Parameters: 72B - License: open | Type: model - AI model by French Engineering School ECE,TW3 Partners - **Le_Triomphant-ECE-TW3** (French Engineering School ECE,TW3 Partners) — 2024-04-01 | Parameters: 72B - License: open | Type: model - AI model by French Engineering School ECE,TW3 Partners - **ReALM** (Apple) — 2024-03-29 | Parameters: 3B - License: closed | Type: model - AI model by Apple - **Voice Engine** (OpenAI) — 2024-03-29 - License: closed | Type: model - AI model by OpenAI - **Grok-1.5** (xAI) — 2024-03-28 - License: closed | Type: model - AI model by xAI - **Grok-1.5V** (xAI) — 2024-03-28 - License: closed | Type: model - AI model by xAI - **YandexGPT 3** (Yandex) — 2024-03-28 - License: closed | Type: model - AI model by Yandex - **Jamba** (AI21 Labs) — 2024-03-28 | Parameters: 51.6B - License: open | Type: model - AI model by AI21 Labs - **DBRX** (Databricks) — 2024-03-27 | Parameters: 132B - License: open | Type: model - AI model by Databricks - **MultiVerse 70B** (MTS) — 2024-03-25 | Parameters: 72B - License: open | Type: model - AI model by MTS - **ProstT5** (Technical University of Munich,Seoul National University,Institute for Advanced Study,TUM School of Life Sciences Weihenstephan) — 2024-03-24 | Parameters: 3B - License: closed | Type: model - AI model by Technical University of Munich,Seoul National University,Institute for Advanced Study,TUM School of Life Sciences Weihenstephan - **CrossBind** (Shanghai AI Lab,Fudan University,Loughborough University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University) — 2024-03-24 - License: closed | Type: model - AI model by Shanghai AI Lab,Fudan University,Loughborough University,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University - **BindDM** (Peng Cheng Laboratory,Peking University,University of Science and Technology of China (USTC),ByteDance,Tsinghua University) — 2024-03-24 - License: closed | Type: model - AI model by Peng Cheng Laboratory,Peking University,University of Science and Technology of China (USTC),ByteDance,Tsinghua University - **Suno v3** (Suno) — 2024-03-21 - License: closed | Type: model - AI model by Suno - **Xuanji Yuheng (璇玑玉衡)** (Zhuoshi Technology) — 2024-03-20 | Parameters: 100B - License: closed | Type: model - AI model by Zhuoshi Technology - **MiniGPT4 + LRV-Instruction** (University of Maryland,Microsoft) — 2024-03-19 | Parameters: 7B - License: open | Type: model - AI model by University of Maryland,Microsoft - **JetFire (GPT2-LARGE)** (Tsinghua University) — 2024-03-19 | Parameters: 774M - License: closed | Type: model - AI model by Tsinghua University - **Stable Video 3D (SV3D)** (Stability AI) — 2024-03-18 - License: open | Type: model - AI model by Stability AI - **ERNIE-RNA** (Microsoft Research,Syngentech,Tsinghua University) — 2024-03-17 | Parameters: 86M - License: closed | Type: model - AI model by Microsoft Research,Syngentech,Tsinghua University - **PocketVec** (Barcelona Institute of Science and Technology,Universitat de Barcelona,Institució Catalana de Recerca i Estudis Avançats (ICREA)) — 2024-03-16 - License: closed | Type: model - AI model by Barcelona Institute of Science and Technology,Universitat de Barcelona,Institució Catalana de Recerca i Estudis Avançats (ICREA) - **Xinghai (星海)** (Hisense) — 2024-03-15 - License: closed | Type: model - AI model by Hisense - **MM1-30B** (Apple) — 2024-03-14 | Parameters: 30B - License: closed | Type: model - AI model by Apple - **Quiet-STaR** (Stanford University) — 2024-03-14 - License: closed | Type: model - AI model by Stanford University - **ManiGaussian** (Tsinghua University,Nanyang Technological University,Carnegie Mellon University (CMU)) — 2024-03-13 - License: open | Type: model - AI model by Tsinghua University,Nanyang Technological University,Carnegie Mellon University (CMU) - **Recraft V2 (Recraft 20B)** (Recraft) — 2024-03-13 | Parameters: 20B - License: closed | Type: model - AI model by Recraft - **Command R** (Cohere,Cohere for AI) — 2024-03-11 | Parameters: 35B - License: open | Type: model - AI model by Cohere,Cohere for AI - **RFM-1** (Covariant) — 2024-03-11 | Parameters: 8B - License: closed | Type: model - AI model by Covariant - **Dream Home LLM (贝壳梦想家大模型)** (KE Holdings Inc. (“Beike”)) — 2024-03-11 - License: closed | Type: model - AI model by KE Holdings Inc. (“Beike”) - **HAM-TTS** (Geely Automobile Research Institute (Ningbo) Company,National Institute of Informatics,Shanghai Jiao Tong University) — 2024-03-09 | Parameters: 800M - License: closed | Type: model - AI model by Geely Automobile Research Institute (Ningbo) Company,National Institute of Informatics,Shanghai Jiao Tong University - **Derm Foundational Model** (Google Research) — 2024-03-08 | Parameters: 928M - License: open | Type: model - AI model by Google Research - **DeepSeek-VL-7B** (DeepSeek) — 2024-03-08 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-VL-1.3B** (DeepSeek) — 2024-03-08 | Parameters: 1.3B - License: open | Type: model - AI model by DeepSeek - **Inflection-2.5** (Inflection AI) — 2024-03-07 - License: closed | Type: model - AI model by Inflection AI - **BaseFold** (Basecamp Research) — 2024-03-06 - License: closed | Type: model - AI model by Basecamp Research - **GroundingGPT** (ByteDance,Fudan University) — 2024-03-05 | Parameters: 7B - License: open | Type: model - AI model by ByteDance,Fudan University - **Claude 3 Haiku** (Anthropic) — 2024-03-04 - License: closed | Type: model - AI model by Anthropic - **Claude 3 Sonnet** (Anthropic) — 2024-03-04 - License: closed | Type: model - AI model by Anthropic - **Claude 3 Opus** (Anthropic) — 2024-03-04 - License: closed | Type: model - AI model by Anthropic - **Aramco Metabrain AI** (Saudi Aramco) — 2024-03-04 | Parameters: 250B - License: closed | Type: model - AI model by Saudi Aramco - **Aurora-M** (International) — 2024-03-01 | Parameters: Aurora-M - License: open | Type: model - - **ReALM-3B** (Apple) — 2024-03-01 | Parameters: ReALM-3B - License: closed | Type: model - FLAN-T5 (Oct/2022) finetune. - **Qwen1.5-MoE-A2.7B** (Alibaba) — 2024-03-01 | Parameters: Qwen1.5-MoE-A2.7B - License: open | Type: model - MoE. "Of particular significance is the fact that, through upcycling, the necessity for training an equivalent volume of tokens as in the original model has been eliminated." I assumed half of the original 3T tokens - **Grok-1.5** (xAI) — 2024-03-01 | Parameters: Grok-1.5 - License: open | Type: model - Context=128k. - **Jamba 1** (AI21) — 2024-03-01 | Parameters: Jamba 1 - License: open | Type: model - MoE. Open weights, licensed under Apache 2.0. Announce: https://arxiv.org/abs/2403.19887 - **DBRX** (MosaicML) — 2024-03-01 | Parameters: DBRX - License: open | Type: model - MoE. Trained for $10M on 3,072 NVIDIA H100s connected by 3.2Tbps Infiniband. - **Stable Code Instruct 3B** (Stability AI) — 2024-03-01 | Parameters: Stable Code Instruct 3B - License: open | Type: model - Context window=16,384. Trained on The Stack dataset. - **EvoLLM-JP** (Sakana AI) — 2024-03-01 | Parameters: EvoLLM-JP - License: open | Type: model - Japanese. Model merge 'our EvoLLM-JP-A is a merge of shisa-gamma-7b-v1, Arithmo2-Mistral-7B, and Abel7B-002' https://sakana.ai/evolutionary-model-merge/ - **RakutenAI-7B** (Rakuten Group) — 2024-03-01 | Parameters: RakutenAI-7B - License: open | Type: model - Japanese. Mistral 7B derivative. - **Parakeet** (Independent) — 2024-03-01 | Parameters: Parakeet - License: open | Type: model - Tiny model (378M) for testing - **RWKV-v5 EagleX** (RWKV) — 2024-03-01 | Parameters: RWKV-v5 EagleX - License: open | Type: model - RWKV (pronounced RwaKuv) is an RNN: Built on the RWKV-v5 architecture (a linear transformer with 10-100x+ lower inference cost) - **MM1** (Apple) — 2024-03-01 | Parameters: MM1 - License: closed | Type: model - VLM, outperforms Flamingo 80B (Apr/2022) across benchmarks. 2T text tokens + ~10B+ other text (estimate). Unreleased. - **RFM-1** (Covariant) — 2024-03-01 | Parameters: RFM-1 - License: partial | Type: model - Commercial, multimodal for robotics - **Command-R** (Cohere) — 2024-03-01 | Parameters: Command-R - License: open | Type: model - RAG and tool use - **DeepSeek-VL** (DeepSeek-AI) — 2024-03-01 | Parameters: DeepSeek-VL - License: open | Type: model - Vision, based on DeepSeek-LLM-7B - **AnyGPT** (Fudan University) — 2024-03-01 | Parameters: AnyGPT - License: open | Type: model - Llama 2 7B backbone with new matrices ('reshaping the embedding matrix and prediction layer') - **Stable Beluga 2.5** (Stability AI) — 2024-03-01 | Parameters: Stable Beluga 2.5 - License: open | Type: model - Mentioned in Stability release about Intel chips 11/Mar/2024, availablity unknown - **Inflection-2.5** (Inflection AI) — 2024-03-01 | Parameters: Inflection-2.5 - License: open | Type: model - - **Apollo** (SRIBD/CUHK) — 2024-03-01 | Parameters: Apollo - License: open | Type: model - Qwen 1.8B as base. Medical focus. - **Claude 3 Opus** (Anthropic) — 2024-03-01 | Parameters: Claude 3 Opus - License: open | Type: model - Original MMLU=86.8 (GPT-4=86.4). MMLU=88.2 with CoT prompting. Original GPQA=50.4. 200k context, 1M for researchers. - **MACE-MP-0** (University of Cambridge,Federal Institute of Materials Research and Testing (BAM),NERSC, Lawrence Berkeley National Laboratory,University of British Columbia (UBC),Friedrich Schiller University Jena,University of Bayreuth,Fritz Haber Institute of the Max Planck Society,U. S. Naval Research Laboratory,Chemix,Daresbury Laboratory,BASF,University of South Carolina,University of Stuttgart,Uppsala University,Newcastle University,Technical University of Denmark,Aix-Marseille Université,University of Warwick,University of California Los Angeles (UCLA),InstaDeep,University of California (UC) Berkeley) — 2024-03-01 - License: open | Type: model - AI model by University of Cambridge,Federal Institute of Materials Research and Testing (BAM),NERSC, Lawrence Berkeley National Laboratory,University of British Columbia (UBC),Friedrich Schiller University Jena,University of Bayreuth,Fritz Haber Institute of the Max Planck Society,U. S. Naval Research Laboratory,Chemix,Daresbury Laboratory,BASF,University of South Carolina,University of Stuttgart,Uppsala University,Newcastle University,Technical University of Denmark,Aix-Marseille Université,University of Warwick,University of California Los Angeles (UCLA),InstaDeep,University of California (UC) Berkeley - **Step-1X** (StepFun) — 2024-03-01 - License: closed | Type: model - AI model by StepFun - **Step-1.5V** (StepFun) — 2024-03-01 | Parameters: 100B - License: closed | Type: model - AI model by StepFun - **Step-2** (StepFun) — 2024-03-01 | Parameters: 1T - License: closed | Type: model - AI model by StepFun - **Ovis2 16B** (Alibaba) — 2024-03-01 | Parameters: 16B - License: closed | Type: model - AI model by Alibaba - **Ovis2 34B** (Alibaba) — 2024-03-01 | Parameters: 34B - License: closed | Type: model - AI model by Alibaba - **StarCoder 2 15B** (Hugging Face,ServiceNow,NVIDIA,BigCode) — 2024-02-29 | Parameters: 15B - License: open | Type: model - AI model by Hugging Face,ServiceNow,NVIDIA,BigCode - **StarCoder 2 7B** (Hugging Face,ServiceNow,NVIDIA,BigCode) — 2024-02-29 | Parameters: 7B - License: open | Type: model - AI model by Hugging Face,ServiceNow,NVIDIA,BigCode - **StarCoder 2 3B** (Hugging Face,ServiceNow,NVIDIA,BigCode) — 2024-02-29 | Parameters: 3B - License: open | Type: model - AI model by Hugging Face,ServiceNow,NVIDIA,BigCode - **Griffin** (Google DeepMind) — 2024-02-29 | Parameters: 14B - License: closed | Type: model - AI model by Google DeepMind - **Hawk** (Google DeepMind) — 2024-02-29 | Parameters: 7B - License: closed | Type: model - AI model by Google DeepMind - **Humanoid Locomotion** (University of California (UC) Berkeley) — 2024-02-29 | Parameters: 8M - License: closed | Type: model - AI model by University of California (UC) Berkeley - **YOLOv9-E** (Academia Sinica,National Taipei University of Technology,Chung Yuan Christian University) — 2024-02-29 | Parameters: 57.3M - License: open | Type: model - AI model by Academia Sinica,National Taipei University of Technology,Chung Yuan Christian University - **RiNALMo** (University of Zagreb,Genome Institute of Singapore,Bioinformatics Institute) — 2024-02-29 | Parameters: 650M - License: closed | Type: model - AI model by University of Zagreb,Genome Institute of Singapore,Bioinformatics Institute - **PTM-Mamba** (Duke University) — 2024-02-29 - License: open | Type: model - AI model by Duke University - **Protllm** (Beijing Institute of Technology,Beihang University,Peking University,Smart Grid Research Institute,Shanghai AI Lab) — 2024-02-28 - License: closed | Type: model - AI model by Beijing Institute of Technology,Beihang University,Peking University,Smart Grid Research Institute,Shanghai AI Lab - **Ideogram 1.0** (Ideogram) — 2024-02-28 - License: closed | Type: model - AI model by Ideogram - **Evo** (Stanford University,University of California (UC) Berkeley,Together) — 2024-02-27 | Parameters: 7B - License: open | Type: model - AI model by Stanford University,University of California (UC) Berkeley,Together - **BitNet b1.58** (University of Chinese Academy of Sciences,Microsoft Research) — 2024-02-27 | Parameters: 70B - License: closed | Type: model - AI model by University of Chinese Academy of Sciences,Microsoft Research - **Nemotron-4 15B** (NVIDIA) — 2024-02-27 | Parameters: 15B - License: closed | Type: model - AI model by NVIDIA - **Palmyra Vision** (Writer) — 2024-02-27 - License: closed | Type: model - AI model by Writer - **Playground v2.5** (Playground) — 2024-02-27 | Parameters: 3.5B - License: open | Type: model - AI model by Playground - **Mistral Large** (Mistral AI) — 2024-02-26 - License: closed | Type: model - AI model by Mistral AI - **ProLLaMA** (Peking University,Peng Cheng Laboratory) — 2024-02-26 | Parameters: 7B - License: closed | Type: model - AI model by Peking University,Peng Cheng Laboratory - **DecompDiff** (University of Illinois Urbana-Champaign (UIUC),ByteDance,University of Chinese Academy of Sciences,Chinese Academy of Sciences,Tsinghua University) — 2024-02-26 - License: closed | Type: model - AI model by University of Illinois Urbana-Champaign (UIUC),ByteDance,University of Chinese Academy of Sciences,Chinese Academy of Sciences,Tsinghua University - **Gemma 1.1 7B Instruct** (Google) — 2024-02-24 | Parameters: 8.5B - License: open | Type: model - AI model by Google - **SDXL-Lightning** (ByteDance) — 2024-02-24 - License: open | Type: model - AI model by ByteDance - **MegaScale (175B)** (ByteDance,Peking University) — 2024-02-23 | Parameters: 175B - License: closed | Type: model - AI model by ByteDance,Peking University - **MegaScale (530B)** (ByteDance,Peking University) — 2024-02-23 | Parameters: 530B - License: closed | Type: model - AI model by ByteDance,Peking University - **MegaScale (Production)** (ByteDance,Peking University) — 2024-02-23 | Parameters: 530B - License: closed | Type: model - AI model by ByteDance,Peking University - **Genie** (Google DeepMind) — 2024-02-23 | Parameters: 10.7B - License: closed | Type: model - AI model by Google DeepMind - **Stable Diffusion 3** (Stability AI) — 2024-02-22 | Parameters: 8B - License: closed | Type: model - AI model by Stability AI - **Gemma 7B** (Google DeepMind) — 2024-02-21 | Parameters: 8.5B - License: open | Type: model - AI model by Google DeepMind - **Re-Dock** (Zhejiang University (ZJU),Westlake University,University of Washington) — 2024-02-21 - License: closed | Type: model - AI model by Zhejiang University (ZJU),Westlake University,University of Washington - **PepGLAD** (Tsinghua University,Renmin University of China) — 2024-02-21 - License: closed | Type: model - AI model by Tsinghua University,Renmin University of China - **Reinvent 4** (AstraZeneca) — 2024-02-21 - License: open | Type: model - AI model by AstraZeneca - **Gemma 2B** (Google DeepMind) — 2024-02-21 | Parameters: 2.5B - License: open | Type: model - AI model by Google DeepMind - **Me Llama 70B** (Yale School of Medicine,University of Florida,University of Texas Health Science Center) — 2024-02-20 | Parameters: 70B - License: open | Type: model - AI model by Yale School of Medicine,University of Florida,University of Texas Health Science Center - **Me Llama 13B** (Yale School of Medicine,University of Florida,University of Texas Health Science Center) — 2024-02-20 | Parameters: 13B - License: open | Type: model - AI model by Yale School of Medicine,University of Florida,University of Texas Health Science Center - **Sora** (OpenAI) — 2024-02-15 - License: closed | Type: model - AI model by OpenAI - **Gemini 1.5 Pro** (Google DeepMind) — 2024-02-15 - License: closed | Type: model - AI model by Google DeepMind - **ProtChatGPT** (University of Technology Sydney,Zhejiang University (ZJU)) — 2024-02-15 | Parameters: 8B - License: closed | Type: model - AI model by University of Technology Sydney,Zhejiang University (ZJU) - **Gemini 1.0 Pro Vision** (Google DeepMind) — 2024-02-15 - License: closed | Type: model - AI model by Google DeepMind - **V-JEPA** (Meta AI) — 2024-02-15 | Parameters: 630M - License: closed | Type: model - AI model by Meta AI - **KwaiYii 175B** (Kuaishou Technology) — 2024-02-14 | Parameters: 175B - License: closed | Type: model - AI model by Kuaishou Technology - **Aya** (Cohere for AI,Brown University,Cohere,Carnegie Mellon University (CMU),Massachusetts Institute of Technology (MIT)) — 2024-02-12 | Parameters: 13B - License: open | Type: model - AI model by Cohere for AI,Brown University,Cohere,Carnegie Mellon University (CMU),Massachusetts Institute of Technology (MIT) - **PLAPT** (Wolfram Research,ASC27,Newport High School,Sanskriti School) — 2024-02-12 | Parameters: 1.5M - License: closed | Type: model - AI model by Wolfram Research,ASC27,Newport High School,Sanskriti School - **Stable Cascade** (Stability AI) — 2024-02-12 | Parameters: 5.1B - License: open | Type: model - AI model by Stability AI - **DiscDiff** (Imperial College London) — 2024-02-08 - License: closed | Type: model - AI model by Imperial College London - **Distilled Grandmaster** (DeepMind) — 2024-02-07 | Parameters: 270M - License: open | Type: model - AI model by DeepMind - **Structure-Informed Protein Language Model** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,IBM Research,HEC Montreal,CIFAR AI Research) — 2024-02-07 | Parameters: 650M - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,IBM Research,HEC Montreal,CIFAR AI Research - **RecGPT / Xingchen LLM (淘宝(中国)软件有限公司)** (Alibaba) — 2024-02-07 | Parameters: 50B - License: closed | Type: model - AI model by Alibaba - **StableLM-2-12B** (Stability AI) — 2024-02-07 | Parameters: 12.1B - License: closed | Type: model - AI model by Stability AI - **CARP** (Microsoft Research) — 2024-02-06 | Parameters: 643M - License: open | Type: model - AI model by Microsoft Research - **SenseChat-DataAnalysis V4** (SenseTime) — 2024-02-06 - License: closed | Type: model - AI model by SenseTime - **SenseMirage V4** (SenseTime) — 2024-02-06 | Parameters: 10B - License: closed | Type: model - AI model by SenseTime - **SenseChat-Medical V4** (SenseTime) — 2024-02-06 - License: closed | Type: model - AI model by SenseTime - **SenseChat-Vision V4** (SenseTime) — 2024-02-06 | Parameters: 30B - License: closed | Type: model - AI model by SenseTime - **SenseChat 4.0** (SenseTime) — 2024-02-06 - License: closed | Type: model - AI model by SenseTime - **Qwen Plus** (Alibaba) — 2024-02-06 - License: closed | Type: model - AI model by Alibaba - **Qwen-Turbo** (Alibaba) — 2024-02-06 - License: closed | Type: model - AI model by Alibaba - **CausalLM 34B β** (CausalLM) — 2024-02-06 | Parameters: 34.4B - License: open | Type: model - AI model by CausalLM - **Qwen1.5-32B** (Alibaba) — 2024-02-05 | Parameters: 32B - License: open | Type: model - AI model by Alibaba - **DeepSeekMath 7B** (DeepSeek,Tsinghua University,Peking University) — 2024-02-05 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek,Tsinghua University,Peking University - **Gemini 2.0 Flash-Lite** (Google DeepMind) — 2024-02-05 - License: closed | Type: model - AI model by Google DeepMind - **Qwen1.5-72B** (Alibaba) — 2024-02-04 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Qwen1.5-7B** (Alibaba) — 2024-02-04 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **Qwen1.5-14B** (Alibaba) — 2024-02-04 | Parameters: 14B - License: open | Type: model - AI model by Alibaba - **Nemotron-4 15B** (NVIDIA) — 2024-02-01 | Parameters: Nemotron-4 15B - License: open | Type: model - - **TowerLLM** (Unbabel) — 2024-02-01 | Parameters: TowerLLM - License: open | Type: model - Commercial product, Llama-2 as base. - **Hawk** (Google DeepMind) — 2024-02-01 | Parameters: Hawk - License: open | Type: model - MMLU=35. RNN. - **Griffin** (Google DeepMind) — 2024-02-01 | Parameters: Griffin - License: open | Type: model - MMLU=49.5. RNN. - **BitNet b1.58** (Microsoft) — 2024-02-01 | Parameters: BitNet b1.58 - License: open | Type: model - - **Samba-1** (SambaNova) — 2024-02-01 | Parameters: Samba-1 - License: partial | Type: model - CoE: Collection of experts: Llama2 7B / 13B / 70B Mistral 7B DeepSeek Coder 1.3B / 6.7B / 33B Falcon 40B DePlot CLIP Llava - **Aya-101** (Cohere) — 2024-02-01 | Parameters: Aya-101 - License: open | Type: model - mT5 base. - **Cosmo-1B** (HF) — 2024-02-01 | Parameters: Cosmo-1B - License: open | Type: model - Synthetic data (25B tokens of synthetic data for 6 epochs + code). MMLU=32.4 - **Poro** (Silo AI) — 2024-02-01 | Parameters: Poro - License: open | Type: model - Uses a BLOOM architecture with ALiBi embeddings to allow for context window extrapolation. While model architecture for the initial model has been kept simple, future models under progress will support additional capabilities, such as flash attention, rotary embeddings and grouped query attention.' - **StarCoder 2** (HF/ServiceNow) — 2024-02-01 | Parameters: StarCoder 2 - License: open | Type: model - The Stack v2=900B tokens, 5 epochs to 4.3T tokens - **530B** (ByteDance) — 2024-02-01 | Parameters: 530B - License: closed | Type: model - Trained using 12,288 A100 GPUs, replicating MT-NLG size - **175B** (ByteDance) — 2024-02-01 | Parameters: 175B - License: closed | Type: model - Trained using 12,288 A100 GPUs, replicating GPT-3 size - **Mistral Small** (Mistral) — 2024-02-01 | Parameters: Mistral Small - License: open | Type: model - Optimised for latency and cost. - **Mistral Large** (Mistral) — 2024-02-01 | Parameters: Mistral Large - License: open | Type: model - MMLU=81.2 (same as Flan-PaLM 2 340B, higher than PaLM 2 340B MMLU=78.3), 32k context window. API only (not open source). - **Hanooman** (Reliance) — 2024-02-01 | Parameters: Hanooman - License: open | Type: model - 11 Indian languages like Hindi, Tamil, and Marathi - **Ask** (Apple) — 2024-02-01 | Parameters: Ask - License: closed | Type: model - Internal employee model only - **Reka Edge** (Reka AI) — 2024-02-01 | Parameters: Reka Edge - License: open | Type: model - - **Reka Flash** (Reka AI) — 2024-02-01 | Parameters: Reka Flash - License: open | Type: model - My testing shows very poor performance equiv with tiny model - **Gemma** (Google DeepMind) — 2024-02-01 | Parameters: Gemma - License: open | Type: model - MMLU=64.3 (Llama 2 70B=68.9, ChatGPT 20B=70). Text only. Probably dense. Largest trained dataset (6T) besides frontier models. - **Gemini 1.5 Pro** (Google DeepMind) — 2024-02-01 | Parameters: Gemini 1.5 Pro - License: open | Type: model - Sparse MoE. Context window=1M and 10M for research. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **Qwen-1.5 72B** (Alibaba) — 2024-02-01 | Parameters: Qwen-1.5 72B - License: open | Type: model - - **MobileLLM** (Meta AI) — 2024-02-01 | Parameters: MobileLLM - License: open | Type: model - Optimizing Sub-billion Parameter Language Models for On-Device Use Cases - **GOODY-2** (BRAIN) — 2024-02-01 | Parameters: GOODY-2 - License: open | Type: model - Satire (and hilarious). Probably Llama 2 with aggressive prompt. Wired interview: https://archive.md/toxHq - **Natural-SQL-7B** (ChatDB) — 2024-02-01 | Parameters: Natural-SQL-7B - License: open | Type: model - Based on DeepSeek-Coder 6.7B. - **Sea-Lion** (AI Singapore) — 2024-02-01 | Parameters: Sea-Lion - License: open | Type: model - MPT base. MMLU=26.87. Southeast Asian languages like Thai, Vietnamese and Bahasa Indonesia. https://www.computerweekly.com/feature/Sea-Lion-explained-Southeast-Asias-first-large-language-model - **TimesFM** (Google) — 2024-02-01 | Parameters: TimesFM - License: open | Type: model - Time-series forecasting only. 'a large pretraining corpus of 100B real world time-points' may be more than 100B tokens. - **OLMo** (Allen AI) — 2024-02-01 | Parameters: OLMo - License: open | Type: model - Open Language Model (OLMo) - **Audio Flamingo** (NVIDIA) — 2024-02-01 | Parameters: Audio Flamingo - License: partial | Type: model - Project page: https://audioflamingo.github.io/ - **OLMo-7B** (Allen Institute for AI,University of Washington) — 2024-02-01 | Parameters: 7B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **OLMo-1B** (Allen Institute for AI,University of Washington) — 2024-02-01 | Parameters: 1B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **Hanhai (瀚海)** (Shuchi Information Technology ( Shanghai ) Co. , Ltd.) — 2024-01-31 - License: closed | Type: model - AI model by Shuchi Information Technology ( Shanghai ) Co. , Ltd. - **LLaVA-NeXT-34B (LLaVA-1.6)** (University of Wisconsin Madison,ByteDance,Nanyang Technological University,University of California (UC) Berkeley) — 2024-01-30 | Parameters: 34.8B - License: open | Type: model - AI model by University of Wisconsin Madison,ByteDance,Nanyang Technological University,University of California (UC) Berkeley - **BGE-M3 Embedding** (Beijing Academy of Artificial Intelligence / BAAI) — 2024-01-30 | Parameters: 335M - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Code Llama-70B** (Meta AI) — 2024-01-29 | Parameters: 70B - License: open | Type: model - AI model by Meta AI - **Baichuan3** (Baichuan) — 2024-01-29 | Parameters: 100B - License: closed | Type: model - AI model by Baichuan - **Karakuri LM** (KARAKURI Inc.) — 2024-01-26 | Parameters: 70B - License: open | Type: model - AI model by KARAKURI Inc. - **ProteinStructureTransformer** (Max Planck Institute of Biochemistry) — 2024-01-26 | Parameters: 1.1B - License: closed | Type: model - AI model by Max Planck Institute of Biochemistry - **Yan** (Rock AI / Shanghai Stonehill Technology) — 2024-01-26 - License: closed | Type: model - AI model by Rock AI / Shanghai Stonehill Technology - **DeepSeek Coder 33B** (DeepSeek,Peking University) — 2024-01-25 | Parameters: 33B - License: open | Type: model - AI model by DeepSeek,Peking University - **Qwen-VL-Max** (Alibaba) — 2024-01-25 | Parameters: 7B - License: closed | Type: model - AI model by Alibaba - **text-embedding-3-small** (OpenAI) — 2024-01-25 - License: closed | Type: model - AI model by OpenAI - **text-embedding-3-large** (OpenAI) — 2024-01-25 - License: closed | Type: model - AI model by OpenAI - **DeepSeek Coder 1.3B** (DeepSeek,Peking University) — 2024-01-25 | Parameters: 1.3B - License: open | Type: model - AI model by DeepSeek,Peking University - **DeepSeek Coder 6.7B** (DeepSeek,Peking University) — 2024-01-25 | Parameters: 6.7B - License: open | Type: model - AI model by DeepSeek,Peking University - **Fuyu-Heavy** (Adept) — 2024-01-24 | Parameters: 100B - License: closed | Type: model - AI model by Adept - **Lumiere** (Google Research,Weizmann Institute of Science,Tel Aviv University,Technion - Israel Institute of Technology) — 2024-01-23 - License: closed | Type: model - AI model by Google Research,Weizmann Institute of Science,Tel Aviv University,Technion - Israel Institute of Technology - **Yi-VL-34B** (01.AI) — 2024-01-23 | Parameters: 34B - License: open | Type: model - AI model by 01.AI - **voyage-code-2** (Voyage AI) — 2024-01-23 - License: closed | Type: model - AI model by Voyage AI - **Prothyena** (Tokyo Institute of Technology) — 2024-01-22 | Parameters: 4.3M - License: closed | Type: model - AI model by Tokyo Institute of Technology - **Orion Star (猎户星空大模型)** (Beijing OrionStar Technology Co., Ltd.) — 2024-01-21 | Parameters: 14B - License: open | Type: model - AI model by Beijing OrionStar Technology Co., Ltd. - **StableLM-2-1.6B** (Stability AI) — 2024-01-18 | Parameters: 1.6B - License: open | Type: model - AI model by Stability AI - **ESM-NBR** (Hunan University) — 2024-01-18 - License: closed | Type: model - AI model by Hunan University - **AlphaGeometry** (Google DeepMind,New York University (NYU)) — 2024-01-17 | Parameters: 151M - License: open | Type: model - AI model by Google DeepMind,New York University (NYU) - **GLM-4 (0116)** (Z.ai (Zhipu AI)) — 2024-01-17 - License: closed | Type: model - AI model by Z.ai (Zhipu AI) - **GLM-4** (Z.ai (Zhipu AI)) — 2024-01-17 - License: closed | Type: model - AI model by Z.ai (Zhipu AI) - **VideoCrafter2** (Tencent) — 2024-01-17 - License: open | Type: model - AI model by Tencent - **Skiff LLM (一叶轻舟大语言模型)** (Shiyin Intelligent Technology Co., Ltd.) — 2024-01-17 | Parameters: 540B - License: closed | Type: model - AI model by Shiyin Intelligent Technology Co., Ltd. - **Rubik (Rubik魔方大模型)** (Thunder Software Technology Co.,Ltd.) — 2024-01-17 | Parameters: 13B - License: closed | Type: model - AI model by Thunder Software Technology Co.,Ltd. - **ProteinINR** (Kakao) — 2024-01-16 - License: closed | Type: model - AI model by Kakao - **DeciCoder-6B** (Deci AI) — 2024-01-15 | Parameters: 6B - License: open | Type: model - AI model by Deci AI - **OmniNA** (Tianjin Medical University) — 2024-01-15 | Parameters: 1.7B - License: closed | Type: model - AI model by Tianjin Medical University - **InternViT-6B** (Shanghai AI Lab,Nanjing University,The University of Hong Kong,Tsinghua University,SenseTime,University of Science and Technology of China (USTC)) — 2024-01-15 | Parameters: 6B - License: closed | Type: model - AI model by Shanghai AI Lab,Nanjing University,The University of Hong Kong,Tsinghua University,SenseTime,University of Science and Technology of China (USTC) - **Yiye Qingzhou-0.7B** (EFFYIC (识因智能)) — 2024-01-15 | Parameters: 700M - License: closed | Type: model - AI model by EFFYIC (识因智能) - **Yiye Qingzhou-45B** (EFFYIC (识因智能)) — 2024-01-15 | Parameters: 45B - License: closed | Type: model - AI model by EFFYIC (识因智能) - **InternVL** (Shanghai AI Lab,Nanjing University,The University of Hong Kong,Tsinghua University,SenseTime,University of Science and Technology of China (USTC)) — 2024-01-15 | Parameters: 14B - License: open | Type: model - AI model by Shanghai AI Lab,Nanjing University,The University of Hong Kong,Tsinghua University,SenseTime,University of Science and Technology of China (USTC) - **InternLM2-20B** (Shanghai AI Lab,SenseTime,Chinese University of Hong Kong (CUHK),Fudan University) — 2024-01-12 | Parameters: 20B - License: open | Type: model - AI model by Shanghai AI Lab,SenseTime,Chinese University of Hong Kong (CUHK),Fudan University - **DeepSeekMoE-16B** (DeepSeek) — 2024-01-11 | Parameters: 16B - License: open | Type: model - AI model by DeepSeek - **YOLOv8x** (Ultralytics) — 2024-01-10 | Parameters: 68.2M - License: open | Type: model - AI model by Ultralytics - **Magic LLM (魔法大模型)** (Shenzhen Honor Software Technology) — 2024-01-10 | Parameters: 7B - License: closed | Type: model - AI model by Shenzhen Honor Software Technology - **MAGNeT** (Meta AI,Hebrew University of Jerusalem,Kyutai) — 2024-01-09 | Parameters: 1.5B - License: open | Type: model - AI model by Meta AI,Hebrew University of Jerusalem,Kyutai - **Stable Code 3B** (Stability AI) — 2024-01-09 | Parameters: 2.8B - License: open | Type: model - AI model by Stability AI - **FABind** (Renmin University of China,Huazhong University of Science and Technology,Microsoft Research AI for Science,University of Science and Technology of China (USTC)) — 2024-01-09 - License: closed | Type: model - AI model by Renmin University of China,Huazhong University of Science and Technology,Microsoft Research AI for Science,University of Science and Technology of China (USTC) - **Improved motif-scaffolding with SE(3) flow matching** (University of Oxford,Massachusetts Institute of Technology (MIT),Microsoft Research AI for Science) — 2024-01-08 | Parameters: 16.8M - License: closed | Type: model - AI model by University of Oxford,Massachusetts Institute of Technology (MIT),Microsoft Research AI for Science - **DeepSeek LLM 67B** (DeepSeek) — 2024-01-05 | Parameters: 67B - License: open | Type: model - AI model by DeepSeek - **DeepSeek-LLM-1.3b-base** (DeepSeek) — 2024-01-05 | Parameters: 1.3B - License: closed | Type: model - AI model by DeepSeek - **DeepSeek LLM 7B** (DeepSeek) — 2024-01-05 | Parameters: 7B - License: open | Type: model - AI model by DeepSeek - **Xingrui AI (星睿AI)** (Geely Automobile Research Institute (Ningbo) Company) — 2024-01-05 | Parameters: 100B - License: closed | Type: model - AI model by Geely Automobile Research Institute (Ningbo) Company - **RT-1 + AutoRT** (Google DeepMind) — 2024-01-04 | Parameters: 35M - License: closed | Type: model - AI model by Google DeepMind - **babbage-002** (OpenAI) — 2024-01-04 - License: closed | Type: model - AI model by OpenAI - **CLAPE-DB** (Tsinghua University) — 2024-01-03 - License: closed | Type: model - AI model by Tsinghua University - **PLLaMa** (University of California Santa Barbara (UCSB),University of Lincoln,Chinese Academy of Agricultural Sciences,Swedish University of Agricultural Sciences) — 2024-01-03 | Parameters: 13B - License: open | Type: model - AI model by University of California Santa Barbara (UCSB),University of Lincoln,Chinese Academy of Agricultural Sciences,Swedish University of Agricultural Sciences - **FLOR-6.3B** (Cerebras) — 2024-01-01 | Parameters: FLOR-6.3B - License: open | Type: model - Spanish, Catalan. Bloom-7.1B (341B tok) + continued pre-training on 140B tok. Trained on Cerebras hardware. - **Weaver** (AIWaves.cn) — 2024-01-01 | Parameters: Weaver - License: open | Type: model - Llama? 'All Weaver models are initialized from powerful open-source LLMs.' English waitlist: https://www.wawawriter.com/en/ - **miqu 70b** (Mistral) — 2024-01-01 | Parameters: miqu 70b - License: open | Type: model - Leaked, proper version soon: https://venturebeat.com/ai/mistral-ceo-confirms-leak-of-new-open-source-ai-model-nearing-gpt-4-performance/ - **iFlytekSpark-13B** (iFlyTek) — 2024-01-01 | Parameters: iFlytekSpark-13B - License: open | Type: model - pre-trained on a massive high-quality data set with a total of more than 3 trillion tokens, and then fine-tuned on fine-tuned diversified alignment data.' - **Xinghuo 3.5 (Spark)** (iFlyTek) — 2024-01-01 | Parameters: Xinghuo 3.5 (Spark) - License: open | Type: model - GPT-4 competitor. https://www.shine.cn/biz/tech/2401304331/ - **MGIE** (Apple) — 2024-01-01 | Parameters: MGIE - License: open | Type: model - MLLM and diffusion model initialized from LLaVA-7B (Llama 2 + Vicuna) + StableDiffusion-v1.5. - **CodeLlama-70B** (Meta AI) — 2024-01-01 | Parameters: CodeLlama-70B - License: open | Type: model - Paper link is to 34B from Aug/2023. This 70B model finished training Jan/2024. - **RWKV-v5 Eagle 7B** (RWKV) — 2024-01-01 | Parameters: RWKV-v5 Eagle 7B - License: open | Type: model - RWKV (pronounced RwaKuv) is an RNN: Built on the RWKV-v5 architecture (a linear transformer with 10-100x+ lower inference cost), Trained on 1.1 Trillion Tokens across 100+ languages. Original paper: https://arxiv.org/abs/2305.13048 - **MaLA-500** (LMU) — 2024-01-01 | Parameters: MaLA-500 - License: open | Type: model - Extends Llama 2 7B to 10B using 534 languages. - **MambaByte** (Cornell) — 2024-01-01 | Parameters: MambaByte - License: closed | Type: model - Used bytes instead of tokens. 4 bytes≈1 token, so 150B bytes≈37.5B tokens - **DeepSeek-Coder** (DeepSeek-AI) — 2024-01-01 | Parameters: DeepSeek-Coder - License: open | Type: model - surpasses existing closed-source models like Codex and GPT-3.5... permissive license that allows for both research and unrestricted commercial use.' - **FuseLLM** (Tencent) — 2024-01-01 | Parameters: FuseLLM - License: open | Type: model - Fusion of Llama-2-7B (2T tok), OpenLLaMA-7B (2T tok), and MPT-7B (1T tok). - **Fuyu-Heavy** (Adept) — 2024-01-01 | Parameters: Fuyu-Heavy - License: partial | Type: model - Fuyu-Heavy is the world’s third-most-capable multimodal model, behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger.' Token estimate is based on Adept Persimmon-8B using many more tokens. - **Orion-14B** (OrionStar) — 2024-01-01 | Parameters: Orion-14B - License: open | Type: model - English, Chinese, Japanese, Korean, and other languages. - **InternLM2** (Shanghai AI Laboratory/SenseTime) — 2024-01-01 | Parameters: InternLM2 - License: open | Type: model - - **GLM-4** (Zhipu AI (Tsinghua)) — 2024-01-01 | Parameters: GLM-4 - License: open | Type: model - Best Chinese model to date based on analysis. Follows OpenAI roadmap. MMLU=81.5. 'hundreds of billions of parameters' https://www.chatglm.cn/ - **DeepSeekMoE** (DeepSeek-AI) — 2024-01-01 | Parameters: DeepSeekMoE - License: closed | Type: model - MoE activated parameters is 10-15% of dense, so I need to rethink ALScore for MoE. 'preliminary efforts to scale up DeepSeekMoE to 145B' - **DeepSeek** (DeepSeek-AI) — 2024-01-01 | Parameters: DeepSeek - License: open | Type: model - Chinese/English. Outperforms Llama 2. MMLU=71.3 outperforms GPT-3.5. - **LLaMA Pro** (Tencent) — 2024-01-01 | Parameters: LLaMA Pro - License: open | Type: model - We pre-train LLAMA PRO’s expanded blocks on 80B tokens using open-source code and math data for 2830 GPU Hours (16 NVIDIA H800 GPUs for about 7 days). - **Palmyra X** (Writer) — 2024-01-01 | Parameters: Palmyra X - License: open | Type: model - Palmyra X V2, Palmyra X V3, Palmyra X V4. https://venturebeat.com/ai/why-writers-palmyra-llm-is-the-little-ai-model-that-could-for-enterprises/ - **TinyLlama** (SUTD/Independent) — 2024-01-01 | Parameters: TinyLlama - License: open | Type: model - Overtrained' using 2,727 tokens per parameter. Dataset was 1T: 3 epochs to 3T seen. Singapore - **DocLLM** (JPMorgan) — 2024-01-01 | Parameters: DocLLM - License: closed | Type: model - Document spatial layout structure. - **Kimi Explorer** (Moonshot) — 2024-01-01 - License: closed | Type: model - AI model by Moonshot - **Qarasu-14B** (Lightblue) — 2023-12-29 | Parameters: 14B - License: open | Type: model - AI model by Lightblue - **CoRe** (Tsinghua University) — 2023-12-29 | Parameters: 12.4B - License: closed | Type: model - AI model by Tsinghua University - **Mengzi-Code-6.7B** (Langboat) — 2023-12-28 | Parameters: 6.7B - License: closed | Type: model - AI model by Langboat - **MengziGPT-General-13B** (Langboat) — 2023-12-28 | Parameters: 13B - License: closed | Type: model - AI model by Langboat - **Elyza** (Elyza) — 2023-12-27 | Parameters: 13B - License: open | Type: model - AI model by Elyza - **Zhuhai-13B** (Zhujian Intelligence) — 2023-12-27 | Parameters: 13B - License: closed | Type: model - AI model by Zhujian Intelligence - **Nous-Hermes-2-Yi-34B** (Nous Research) — 2023-12-25 | Parameters: 34B - License: open | Type: model - AI model by Nous Research - **Solar-10.7B (Solar Mini)** (Upstage) — 2023-12-23 | Parameters: 10.7B - License: open | Type: model - AI model by Upstage - **GQA-8-XXL** (Google Research) — 2023-12-23 | Parameters: 11B - License: closed | Type: model - AI model by Google Research - **YaYi 2.0** (Yayi (Wenge)) — 2023-12-22 | Parameters: 30B - License: open | Type: model - AI model by Yayi (Wenge) - **Fulu Gua (福禄瓜)** (ByteDance) — 2023-12-22 - License: closed | Type: model - AI model by ByteDance - **VideoPoet** (Google Research,Carnegie Mellon University (CMU),Google DeepMind) — 2023-12-21 | Parameters: 8B - License: closed | Type: model - AI model by Google Research,Carnegie Mellon University (CMU),Google DeepMind - **nekomata-14b** (rinna) — 2023-12-21 | Parameters: 14.2B - License: open | Type: model - AI model by rinna - **Suno Music Generation** (Suno) — 2023-12-20 - License: closed | Type: model - AI model by Suno - **Gemini Nano-2** (Google DeepMind) — 2023-12-19 | Parameters: 3.3B - License: closed | Type: model - AI model by Google DeepMind - **Gemini Nano-1** (Google DeepMind) — 2023-12-19 | Parameters: 1.8B - License: closed | Type: model - AI model by Google DeepMind - **HiFi - NN** (Basecamp Research,Technical University of Munich,Molecular Institute of Biology,Microsoft Research) — 2023-12-19 | Parameters: 3M - License: open | Type: model - AI model by Basecamp Research,Technical University of Munich,Molecular Institute of Biology,Microsoft Research - **Bird Vocalization Classifier (Perch)** (Google Research,Cornell University,Naturalis Biodiversity Center,Chemnitz University of Technology) — 2023-12-18 | Parameters: 7.8M - License: open | Type: model - AI model by Google Research,Cornell University,Naturalis Biodiversity Center,Chemnitz University of Technology - **Lyra-Fr 10B** (LightOn) — 2023-12-15 | Parameters: 10B - License: closed | Type: model - AI model by LightOn - **Konan LLM 41B** (Konan Technology) — 2023-12-15 | Parameters: 41B - License: closed | Type: model - AI model by Konan Technology - **Poro 34B** (High-Performance Language Technologies (HPLT),University of Turku) — 2023-12-14 | Parameters: 34.2B - License: open | Type: model - AI model by High-Performance Language Technologies (HPLT),University of Turku - **CogAgent** (Tsinghua University,Z.ai (Zhipu AI)) — 2023-12-14 | Parameters: 18B - License: open | Type: model - AI model by Tsinghua University,Z.ai (Zhipu AI) - **FunSearch** (Google DeepMind) — 2023-12-14 | Parameters: 15B - License: open | Type: model - AI model by Google DeepMind - **Imagen 2** (Google DeepMind) — 2023-12-13 - License: closed | Type: model - AI model by Google DeepMind - **GigaChat Pro** (Sber) — 2023-12-13 | Parameters: 29B - License: closed | Type: model - AI model by Sber - **MedLM** (Google Cloud) — 2023-12-13 - License: closed | Type: model - AI model by Google Cloud - **Phi-2** (Microsoft) — 2023-12-12 | Parameters: 2.7B - License: open | Type: model - AI model by Microsoft - **VILA-13B** (NVIDIA,Massachusetts Institute of Technology (MIT)) — 2023-12-12 | Parameters: 13.4B - License: open | Type: model - AI model by NVIDIA,Massachusetts Institute of Technology (MIT) - **Mixtral 8x7B** (Mistral AI) — 2023-12-11 | Parameters: 46.7B - License: open | Type: model - AI model by Mistral AI - **Mistral Medium** (Mistral AI) — 2023-12-11 - License: closed | Type: model - AI model by Mistral AI - **ruDalle: Kandinsky 3.0** (Sber) — 2023-12-11 | Parameters: 11.9B - License: open | Type: model - AI model by Sber - **ruDalle: Kandinsky 3.1** (Sber) — 2023-12-11 - License: open | Type: model - AI model by Sber - **MBP** (University of Science and Technology of China (USTC),Tencent,Zhejiang University (ZJU)) — 2023-12-11 - License: closed | Type: model - AI model by University of Science and Technology of China (USTC),Tencent,Zhejiang University (ZJU) - **CRYSTALCODER** (Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Petuum,University of Southern California,Carnegie Mellon University (CMU),University of Illinois Urbana-Champaign (UIUC),University of California San Diego,LLM360) — 2023-12-11 | Parameters: 6.7B - License: open | Type: model - AI model by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Petuum,University of Southern California,Carnegie Mellon University (CMU),University of Illinois Urbana-Champaign (UIUC),University of California San Diego,LLM360 - **Amber** (Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Petuum,University of Southern California,Carnegie Mellon University (CMU),University of Illinois Urbana-Champaign (UIUC),University of California San Diego,LLM360) — 2023-12-11 | Parameters: 6.7B - License: open | Type: model - AI model by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Petuum,University of Southern California,Carnegie Mellon University (CMU),University of Illinois Urbana-Champaign (UIUC),University of California San Diego,LLM360 - **W.A.L.T** (Stanford University,Google Research,Georgia Institute of Technology) — 2023-12-11 | Parameters: 4.7B - License: closed | Type: model - AI model by Stanford University,Google Research,Georgia Institute of Technology - **XVERSE-65B-2** (XVERSE Technology,Shenzhen Yuanxiang Technology) — 2023-12-08 | Parameters: 65B - License: open | Type: model - AI model by XVERSE Technology,Shenzhen Yuanxiang Technology - **SeamlessM4T** (Facebook,INRIA,University of California (UC) Berkeley) — 2023-12-08 | Parameters: 2.3B - License: open | Type: model - AI model by Facebook,INRIA,University of California (UC) Berkeley - **Llama Guard** (Meta AI) — 2023-12-07 | Parameters: 7B - License: open | Type: model - AI model by Meta AI - **StableLM - Zephyr 3B** (Stability AI) — 2023-12-07 | Parameters: 3B - License: open | Type: model - AI model by Stability AI - **Gemini 1.0 Ultra** (Google DeepMind) — 2023-12-06 - License: closed | Type: model - AI model by Google DeepMind - **Gemini 1.0 Pro** (Google DeepMind) — 2023-12-06 - License: closed | Type: model - AI model by Google DeepMind - **OneLLM** (Chinese University of Hong Kong (CUHK),Shanghai AI Lab) — 2023-12-06 | Parameters: 7B - License: open | Type: model - AI model by Chinese University of Hong Kong (CUHK),Shanghai AI Lab - **NexusRaven-V2** (Nexusflow) — 2023-12-05 | Parameters: 13B - License: open | Type: model - AI model by Nexusflow - **SARA-RT-2** (Google DeepMind) — 2023-12-04 | Parameters: 5B - License: closed | Type: model - AI model by Google DeepMind - **Playground v2** (Playground) — 2023-12-02 - License: open | Type: model - AI model by Playground - **Baize-v2-13B (白泽)** (University of California San Diego,Sun Yat-sen University,Microsoft Research Asia) — 2023-12-02 | Parameters: 13B - License: open | Type: model - AI model by University of California San Diego,Sun Yat-sen University,Microsoft Research Asia - **MACE-MP-0** (Cambridge) — 2023-12-01 | Parameters: MACE-MP-0 - License: open | Type: model - "Uses 4-body equivariant messages; covers 89 elements; supports fine-tuning for ab initio accuracy with minimal data." - **Unified-IO 2** (Allen AI) — 2023-12-01 | Parameters: Unified-IO 2 - License: open | Type: model - 600TB dataset (plus 120+ fine-tuning datasets) includes '1B imagetext pairs, 1T text tokens, 180M video clips, 130M interleaved image & text, 3M 3D assets, and 1M agent trajectories.' - **WaveCoder-DS-6.7B** (Microsoft) — 2023-12-01 | Parameters: WaveCoder-DS-6.7B - License: closed | Type: model - To obtain WaveCoder models, We choose StarCoder-15B, CodeLLaMa (7B and 13B), DeepseekCoder-6.7B as the base model and fine-tune all the base model for 3 epochs - **YunShan** (Huawei) — 2023-12-01 | Parameters: YunShan - License: closed | Type: model - Finance + law fine-tune of PanGu-π - **PanGu-Pi** (Huawei) — 2023-12-01 | Parameters: PanGu-Pi - License: closed | Type: model - Dense, named PanGu-π - **YAYI 2** (Wenge) — 2023-12-01 | Parameters: YAYI 2 - License: open | Type: model - Dataset=240TB filtered to 10.6TB for 2.65T tokens - **Emu2** (BAAI) — 2023-12-01 | Parameters: Emu2 - License: open | Type: model - VLM. Gemini clone. Outperforms Flamingo 80B. The Pile for text, but only sampled 3.6B tokens (1.4% of the dataset). - **MedLM** (Google DeepMind) — 2023-12-01 | Parameters: MedLM - License: partial | Type: model - Available to 'white-listed' orgs only. - **SOLAR-10.7B** (Upstage AI) — 2023-12-01 | Parameters: SOLAR-10.7B - License: open | Type: model - South Korean. Llama-2 arch. SOTA for its size (Dec/2023). - **DeciLM-7B** (Deci) — 2023-12-01 | Parameters: DeciLM-7B - License: open | Type: model - 4.4x times faster than Mistral. English only. - **Mistral-medium** (Mistral) — 2023-12-01 | Parameters: Mistral-medium - License: open | Type: model - MMLU=75.3% (GPT-3.5-turbo 20B=70%, Llama 2 70B=68.9%) - **mixtral-8x7b-32kseqlen** (Mistral) — 2023-12-01 | Parameters: mixtral-8x7b-32kseqlen - License: open | Type: model - MoE=7Bx8, aka mistral-small. 'Concretely, Mixtral has 45B total parameters but only uses 12B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12B model.' - **StripedHyena 7B** (Together) — 2023-12-01 | Parameters: StripedHyena 7B - License: open | Type: model - RedPajama (C4), new arch beyond just Transformers - **NexusRaven-V2 13B** (Nexusflow.ai) — 2023-12-01 | Parameters: NexusRaven-V2 13B - License: open | Type: model - Based on CodeLlama. 'surpasses GPT-4 by up to 7% in function calling success rates in human-generated use cases involving nested and composite functions.' - **Gemini Ultra 1.0** (Google DeepMind) — 2023-12-01 | Parameters: Gemini Ultra 1.0 - License: open | Type: model - Original MMLU=83.7. MMLU=90.04 with prompting. Chinchilla (20:1), dense, maybe 600B-2000T. Note: Gemini outputs are watermarked. I do not use GDM models. https://lifearchitect.ai/watermarking/ - **Mamba** (CMU) — 2023-12-01 | Parameters: Mamba - License: open | Type: model - The Pile, new arch beyond just Transformers. 2.7B MMLU=26.2. 7B MMLU=33.3. - **LVM-3B** (Berkeley/JHU) — 2023-12-01 | Parameters: LVM-3B - License: closed | Type: model - Paper is 25MB. First Large Vision Model (LVM); no text. Based on Llama and LAION 5B (1.49B). - **SeaLLM-13b** (Alibaba) — 2023-12-01 | Parameters: SeaLLM-13b - License: open | Type: model - Llama 2 for Southeast Asian (SEA) languages: Vietnamese 🇻🇳, Indonesian 🇮🇩, Thai 🇹🇭, Malay 🇲🇾, Khmer🇰🇭, Lao🇱🇦, Tagalog🇵🇭 and Burmese🇲🇲 - **Mamba-24M (SC09)** (Carnegie Mellon University (CMU),Princeton University) — 2023-12-01 | Parameters: 23.4M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Princeton University - **Mamba-2.8B** (Carnegie Mellon University (CMU),Princeton University) — 2023-12-01 | Parameters: 2.8B - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Princeton University - **tsuzumi 7B** (NTT Communication Science Laboratories) — 2023-12-01 | Parameters: 7B - License: closed | Type: model - AI model by NTT Communication Science Laboratories - **NASA SMD** (NASA,IBM) — 2023-12-01 | Parameters: 125M - License: open | Type: model - AI model by NASA,IBM - **SEA-LION V1 3B** (AI Singapore) — 2023-12-01 | Parameters: 3B - License: open | Type: model - AI model by AI Singapore - **SEA-LION V1 7B** (AI Singapore) — 2023-12-01 | Parameters: 7B - License: open | Type: model - AI model by AI Singapore - **SeaLLM-7B-v2** (Alibaba DAMO Academy) — 2023-12-01 | Parameters: 7.4B - License: open | Type: model - AI model by Alibaba DAMO Academy - **SeaLLM-7B-v2.5** (Alibaba DAMO Academy) — 2023-12-01 | Parameters: 8.5B - License: open | Type: model - AI model by Alibaba DAMO Academy - **AzeroGPT** (SoundAI) — 2023-12-01 | Parameters: 100B - License: closed | Type: model - AI model by SoundAI - **Qwen-72B** (Alibaba) — 2023-11-30 | Parameters: 72B - License: open | Type: model - AI model by Alibaba - **Granite 13B** (IBM) — 2023-11-30 | Parameters: 13B - License: closed | Type: model - AI model by IBM - **Cohere Command Light** (Cohere) — 2023-11-30 | Parameters: 6B - License: closed | Type: model - AI model by Cohere - **GNoME for crystal discovery** (Google DeepMind) — 2023-11-29 | Parameters: 16.2M - License: closed | Type: model - AI model by Google DeepMind - **PPLX-70B-Online** (Perplexity) — 2023-11-29 | Parameters: 70B - License: closed | Type: model - AI model by Perplexity - **Amazon Titan Text Express** (Amazon) — 2023-11-29 - License: closed | Type: model - AI model by Amazon - **Amazon Titan Text Lite** (Amazon) — 2023-11-29 - License: closed | Type: model - AI model by Amazon - **Llama-3-Taiwan-70B** (National Taiwan University) — 2023-11-29 | Parameters: 70B - License: open | Type: model - AI model by National Taiwan University - **SD-Turbo** (Stability AI) — 2023-11-28 - License: open | Type: model - AI model by Stability AI - **Yuan 2.0** (Inspur) — 2023-11-27 | Parameters: 102.6B - License: open | Type: model - AI model by Inspur - **StripedHyena-Hessian-7B** (Together,Nous Research) — 2023-11-27 | Parameters: 7B - License: open | Type: model - AI model by Together,Nous Research - **Meditron-70B** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL)) — 2023-11-27 | Parameters: 70B - License: open | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL) - **Amazon Transcribe** (Amazon) — 2023-11-26 - License: closed | Type: model - AI model by Amazon - **ControlNet (SDv2)** (Stability AI) — 2023-11-26 - License: closed | Type: model - AI model by Stability AI - **Starling-LM-7B-alpha** (University of California (UC) Berkeley) — 2023-11-25 | Parameters: 7B - License: open | Type: model - AI model by University of California (UC) Berkeley - **Stable Video Diffusion** (Stability AI) — 2023-11-25 - License: open | Type: model - AI model by Stability AI - **Belle VL** (KE Holdings Inc. (“Beike”)) — 2023-11-24 | Parameters: 14B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **Infinity (无涯)** (Transwarp Technology) — 2023-11-24 - License: closed | Type: model - AI model by Transwarp Technology - **TAIWAN-LLM 13B** (National Taiwan University) — 2023-11-23 | Parameters: 13B - License: closed | Type: model - AI model by National Taiwan University - **TAIWAN-LLM 7B** (National Taiwan University) — 2023-11-23 | Parameters: 7B - License: closed | Type: model - AI model by National Taiwan University - **Inflection-2** (Inflection AI) — 2023-11-22 - License: closed | Type: model - AI model by Inflection AI - **OmniFusion-7B (InternViT-6B-448px V1-2)** (AIRI Artificial Intelligence Research Institute,Sber,Skolkovo Institute of Science and Technology) — 2023-11-22 | Parameters: 12.5B - License: open | Type: model - AI model by AIRI Artificial Intelligence Research Institute,Sber,Skolkovo Institute of Science and Technology - **Claude 2.1** (Anthropic) — 2023-11-21 - License: closed | Type: model - AI model by Anthropic - **Orca 2-13B** (Microsoft Research) — 2023-11-21 | Parameters: 13B - License: open | Type: model - AI model by Microsoft Research - **Tulu V2 DPO 70B** (Allen Institute for AI,University of Washington) — 2023-11-20 | Parameters: 70B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **StyleTTS 2** (Columbia University) — 2023-11-20 - License: open | Type: model - AI model by Columbia University - **Lyria** (Google DeepMind) — 2023-11-16 - License: closed | Type: model - AI model by Google DeepMind - **Mistral 7B + OVM** (Chinese University of Hong Kong (CUHK)) — 2023-11-16 | Parameters: 7B - License: open | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **AndesGPT** (Oppo Mobile Telecommunications) — 2023-11-16 - License: closed | Type: model - AI model by Oppo Mobile Telecommunications - **Nemotron-3-8B** (NVIDIA) — 2023-11-15 | Parameters: 8B - License: open | Type: model - AI model by NVIDIA - **Mi:dm 7B** (KT) — 2023-11-15 | Parameters: 7B - License: open | Type: model - AI model by KT - **GraphCast** (Google DeepMind) — 2023-11-14 - License: open | Type: model - AI model by Google DeepMind - **Qwen-Audio-Chat** (Alibaba) — 2023-11-14 | Parameters: 8.5B - License: open | Type: model - AI model by Alibaba - **SPHINX (Llama 2 13B)** (Shanghai AI Lab,Chinese University of Hong Kong (CUHK),ShanghaiTech University) — 2023-11-13 | Parameters: 19.9B - License: open | Type: model - AI model by Shanghai AI Lab,Chinese University of Hong Kong (CUHK),ShanghaiTech University - **Volcano 13B** (Korea University,Korea Advanced Institute of Science and Technology (KAIST),LG) — 2023-11-13 | Parameters: 13B - License: open | Type: model - AI model by Korea University,Korea Advanced Institute of Science and Technology (KAIST),LG - **LLaVA + LVIS-INSTRUCT4V** (Fudan University,University of Maryland) — 2023-11-13 | Parameters: 13B - License: open | Type: model - AI model by Fudan University,University of Maryland - **Intel Aurora 1T** (Intel,Argonne National Laboratory) — 2023-11-13 | Parameters: 1T - License: closed | Type: model - AI model by Intel,Argonne National Laboratory - **Stable Diffusion 1.6** (Stability AI) — 2023-11-10 - License: closed | Type: model - AI model by Stability AI - **tts-1** (OpenAI) — 2023-11-09 - License: closed | Type: model - AI model by OpenAI - **tts-1-hd** (OpenAI) — 2023-11-09 - License: closed | Type: model - AI model by OpenAI - **MultiBand Diffusion** (Meta AI,Hebrew University of Jerusalem,LORIA) — 2023-11-08 - License: open | Type: model - AI model by Meta AI,Hebrew University of Jerusalem,LORIA - **Samsung Gauss Language** (Samsung) — 2023-11-08 - License: closed | Type: model - AI model by Samsung - **Samsung Gauss Code** (Samsung) — 2023-11-08 - License: closed | Type: model - AI model by Samsung - **Samsung Gauss Image** (Samsung) — 2023-11-08 - License: closed | Type: model - AI model by Samsung - **Prithvi-100M** (IBM,NASA) — 2023-11-08 | Parameters: 100M - License: open | Type: model - AI model by IBM,NASA - **HGRN 1B (WT 103)** (Shanghai AI Lab,Massachusetts Institute of Technology (MIT)) — 2023-11-08 | Parameters: 1B - License: open | Type: model - AI model by Shanghai AI Lab,Massachusetts Institute of Technology (MIT) - **Jais-30b (phase 1)** (Cerebras Systems,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Inception G42,G42) — 2023-11-08 | Parameters: 30B - License: open | Type: model - AI model by Cerebras Systems,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Inception G42,G42 - **RoFormer** (Zhuiyi Technology) — 2023-11-08 | Parameters: 110M - License: closed | Type: model - AI model by Zhuiyi Technology - **OmniVec** (TensorTour) — 2023-11-07 - License: closed | Type: model - AI model by TensorTour - **mPLUG-Owl2** (Alibaba) — 2023-11-07 | Parameters: 7.1B - License: open | Type: model - AI model by Alibaba - **OtterHD-8B** (Nanyang Technological University) — 2023-11-07 | Parameters: 8B - License: closed | Type: model - AI model by Nanyang Technological University - **XVERSE-13B-2** (XVERSE Technology,Shenzhen Yuanxiang Technology) — 2023-11-06 | Parameters: 13B - License: open | Type: model - AI model by XVERSE Technology,Shenzhen Yuanxiang Technology - **Whisper v3** (OpenAI) — 2023-11-06 | Parameters: 1.6B - License: open | Type: model - AI model by OpenAI - **Consistency Decoder** (OpenAI) — 2023-11-06 - License: open | Type: model - AI model by OpenAI - **GPT-4 Turbo (Nov 2023)** (OpenAI) — 2023-11-06 - License: closed | Type: model - AI model by OpenAI - **CogVLM-17B** (Tsinghua University,Z.ai (Zhipu AI),Beihang University) — 2023-11-06 | Parameters: 17B - License: open | Type: model - AI model by Tsinghua University,Z.ai (Zhipu AI),Beihang University - **RNA-MSM** (Peking University,Shanghai AI Lab,Griffith University,Peng Cheng Laboratory,Shenzhen Bay Laboratory) — 2023-11-06 - License: open | Type: model - AI model by Peking University,Shanghai AI Lab,Griffith University,Peng Cheng Laboratory,Shenzhen Bay Laboratory - **LLaVA 1.5** (University of Wisconsin Madison,Microsoft Research) — 2023-11-05 | Parameters: 13B - License: open | Type: model - AI model by University of Wisconsin Madison,Microsoft Research - **Grok-1** (xAI) — 2023-11-04 | Parameters: 314B - License: open | Type: model - AI model by xAI - **Grok-0** (xAI) — 2023-11-04 | Parameters: 33B - License: closed | Type: model - AI model by xAI - **Sequence Monkey (序列猴子)** (Mobvoi) — 2023-11-04 - License: closed | Type: model - AI model by Mobvoi - **ZhihaiTu AI (知海图)** (Beijing Zhizhe Tianxia Technology) — 2023-11-04 - License: closed | Type: model - AI model by Beijing Zhizhe Tianxia Technology - **BLUUMI** (University of Turku,Hugging Face) — 2023-11-03 | Parameters: 176B - License: open | Type: model - AI model by University of Turku,Hugging Face - **RT-Trajectory** (Google DeepMind,University of California San Diego,Stanford University) — 2023-11-03 - License: closed | Type: model - AI model by Google DeepMind,University of California San Diego,Stanford University - **Yi-34B** (01.AI) — 2023-11-02 | Parameters: 34B - License: open | Type: model - AI model by 01.AI - **Cohere Embed** (Cohere) — 2023-11-02 - License: closed | Type: model - AI model by Cohere - **DeepSA** (ShanghaiTech University) — 2023-11-02 - License: closed | Type: model - AI model by ShanghaiTech University - **BlueLM 70B** (vivo AI lab) — 2023-11-02 | Parameters: 70B - License: closed | Type: model - AI model by vivo AI lab - **BlueLM 130B** (vivo AI lab) — 2023-11-02 | Parameters: 130B - License: closed | Type: model - AI model by vivo AI lab - **BlueLM 175B** (vivo AI lab) — 2023-11-02 | Parameters: 175B - License: closed | Type: model - AI model by vivo AI lab - **Yi 6B** (01.AI) — 2023-11-02 | Parameters: 6B - License: open | Type: model - AI model by 01.AI - **pplx-70b-online** (Perplexity) — 2023-11-01 | Parameters: pplx-70b-online - License: open | Type: model - Web access. Higher 'freshness' and 'truth' scores. - **SeamlessM4T-Large v2** (Meta AI) — 2023-11-01 | Parameters: SeamlessM4T-Large v2 - License: open | Type: model - Based on NLLB and older models. https://github.com/facebookresearch/seamless_communication - **Q-Transformer** (Google DeepMind) — 2023-11-01 | Parameters: Q-Transformer - License: closed | Type: model - Robotics, builds on RT-1 - **Yuan 2.0** (IEIT) — 2023-11-01 | Parameters: Yuan 2.0 - License: open | Type: model - Chinese + EN dataset include The Pile: DM, arxiv, wikipedia, book3, stack exchange, Freelaw and medical - **MEDITRON** (EPFL) — 2023-11-01 | Parameters: MEDITRON - License: open | Type: model - Llama 2 trained on med data using NVIDIA Megatron-LM. "outperforms Llama-2-70B, GPT-3.5 (text-davinci-003, 8-shot), and Flan-PaLM on multiple medical reasoning tasks." - **Transformers-Arithmetic** (Microsoft) — 2023-11-01 | Parameters: Transformers-Arithmetic - License: closed | Type: model - Proving maths is not memorized. Uses GPT-2-style model. Sébastien Bubeck - **Starling-7B** (Berkeley) — 2023-11-01 | Parameters: Starling-7B - License: open | Type: model - Llama 2 7B -> OpenChat 7B -> Starling-7B (RLAIF) - **Inflection-2** (Inflection AI) — 2023-11-01 | Parameters: Inflection-2 - License: open | Type: model - “now the 2nd best LLM in the world”. Finished training 19/Nov/2023, waiting for fine-tuning and release. - **Claude 2.1** (Anthropic) — 2023-11-01 | Parameters: Claude 2.1 - License: open | Type: model - Less hallucinations, 200k context length, tool use - **TÜLU 2** (Allen AI) — 2023-11-01 | Parameters: TÜLU 2 - License: open | Type: model - Llama 2 finetune with RLHF direct preference optimization (DPO). - **Nemotron-3 22B** (NVIDIA) — 2023-11-01 | Parameters: Nemotron-3 22B - License: open | Type: model - 8B released, 22B internal. - **Nemotron-2 43B** (NVIDIA) — 2023-11-01 | Parameters: Nemotron-2 43B - License: closed | Type: model - Used to train HelpSteer (16/Nov/2023): https://arxiv.org/abs/2311.09528 - **Orca 2** (Microsoft) — 2023-11-01 | Parameters: Orca 2 - License: partial | Type: model - Llama 2 13B (2T) -> Orca 2 (GPT-4 finetune). Still an imitation model, overhyped: The False Promise of Imitating Proprietary LLMs https://arxiv.org/abs/2305.15717 - **Phi-2** (Microsoft) — 2023-11-01 | Parameters: Phi-2 - License: open | Type: model - https://twitter.com/SebastienBubeck/status/1724854157004190095 - **Florence-2** (Microsoft) — 2023-11-01 | Parameters: Florence-2 - License: open | Type: model - VLM, Flamingo alt - **Mirasol3B** (Google DeepMind) — 2023-11-01 | Parameters: Mirasol3B - License: closed | Type: model - Combiner + autoregressive transformer for video/audio/text - **OtterHD-8B** (NTU) — 2023-11-01 | Parameters: OtterHD-8B - License: open | Type: model - Evolution of Persimmon-9.3B and Fuyu 8B - **Gauss** (Samsung) — 2023-11-01 | Parameters: Gauss - License: partial | Type: model - Gauss Language specializing in generating texts, Gauss Code on software and code description and Gauss Image for image creation. - **Grok-1** (xAI) — 2023-11-01 | Parameters: Grok-1 - License: open | Type: model - Context window=8192. UI: https://twitter.com/TobyPhln/status/1721053802235621734 - **Grok-0** (xAI) — 2023-11-01 | Parameters: Grok-0 - License: closed | Type: model - Announced Nov/2023, trained Jul/2023 - **Yi-34B** (01-ai) — 2023-11-01 | Parameters: Yi-34B - License: open | Type: model - Controversy about Llama 2 base. https://twitter.com/kaifulee/status/1724673131875377465 MMLU=76.3 (PaLM 2=78.3) Outperforms Llama 2. Chinese and English. https://www.bloomberg.com/news/articles/2023-11-05/kai-fu-lee-s-open-source-01-ai-bests-llama-2-according-to-hugging-face - **GPT-4 Turbo** (OpenAI) — 2023-11-01 | Parameters: GPT-4 Turbo - License: open | Type: model - https://openai.com/blog/new-models-and-developer-products-announced-at-devday - **Nanbeige-16B** (Nanbeige LLM Lab) — 2023-11-01 | Parameters: 16B - License: open | Type: model - AI model by Nanbeige LLM Lab - **LingoWhale-8B** (DeepLang AI) — 2023-11-01 | Parameters: 8B - License: open | Type: model - AI model by DeepLang AI - **Calm2-7B** (CyberAgent) — 2023-11-01 | Parameters: 7B - License: open | Type: model - AI model by CyberAgent - **OpenChat 3.5-7B** (Tsinghua University) — 2023-11-01 | Parameters: 7B - License: open | Type: model - AI model by Tsinghua University - **MuggleMath** (University of Science and Technology of China (USTC),Alibaba) — 2023-11-01 | Parameters: 70B - License: open | Type: model - AI model by University of Science and Technology of China (USTC),Alibaba - **YaRN (Llama 2 13B)** (Nous Research,EleutherAI,University of Geneva) — 2023-11-01 | Parameters: 13B - License: open | Type: model - AI model by Nous Research,EleutherAI,University of Geneva - **YaRN (Llama 2 70B)** (Nous Research,EleutherAI,University of Geneva) — 2023-11-01 | Parameters: 70B - License: open | Type: model - AI model by Nous Research,EleutherAI,University of Geneva - **BlueLM 7B** (vivo AI lab) — 2023-10-31 | Parameters: 7B - License: open | Type: model - AI model by vivo AI lab - **Tongyi Qianwen 2.0** (Alibaba) — 2023-10-31 - License: closed | Type: model - AI model by Alibaba - **Mi:dm 200B** (KT) — 2023-10-31 | Parameters: 200B - License: closed | Type: model - AI model by KT - **Skywork-13B** (Kunlun Inc.) — 2023-10-30 | Parameters: 13B - License: open | Type: model - AI model by Kunlun Inc. - **Spec-Drafter** (Peking University,Microsoft Research Asia) — 2023-10-30 | Parameters: 500M - License: open | Type: model - AI model by Peking University,Microsoft Research Asia - **ChatGLM3-6B** (Z.ai (Zhipu AI)) — 2023-10-27 | Parameters: 6B - License: open | Type: model - AI model by Z.ai (Zhipu AI) - **CODEFUSION (Python)** (Microsoft,Microsoft Research) — 2023-10-26 | Parameters: 75M - License: closed | Type: model - AI model by Microsoft,Microsoft Research - **DiT-XL/2 + CADS** (ETH Zurich,Disney Research) — 2023-10-26 | Parameters: 675M - License: closed | Type: model - AI model by ETH Zurich,Disney Research - **QMoE: compressed SwitchTransformer** (Institute of Science and Technology Austria (ISTA),Neural Magic) — 2023-10-25 | Parameters: 1.6T - License: open | Type: model - AI model by Institute of Science and Technology Austria (ISTA),Neural Magic - **Xinghan Foundation Model** (Dahua Technology) — 2023-10-25 - License: closed | Type: model - AI model by Dahua Technology - **Zephyr 7B** (Hugging Face) — 2023-10-25 | Parameters: 7B - License: open | Type: model - AI model by Hugging Face - **Spark 3.0** (iFlytek) — 2023-10-24 - License: closed | Type: model - AI model by iFlytek - **Stockmark-13B** (Stockmark) — 2023-10-23 | Parameters: 13.2B - License: open | Type: model - AI model by Stockmark - **CausalLM 14B** (CausalLM) — 2023-10-22 | Parameters: 14.7B - License: open | Type: model - AI model by CausalLM - **SILC-S* (86M)** (ETH Zurich,DeepMind,Google,Technical University of Munich) — 2023-10-20 | Parameters: 86M - License: closed | Type: model - AI model by ETH Zurich,DeepMind,Google,Technical University of Munich - **SILC-S** (ETH Zurich,DeepMind,Google,Technical University of Munich) — 2023-10-20 | Parameters: 86M - License: closed | Type: model - AI model by ETH Zurich,DeepMind,Google,Technical University of Munich - **DALL·E 3** (OpenAI) — 2023-10-19 - License: closed | Type: model - AI model by OpenAI - **KwaiYiiMath** (Kuaishou Technology) — 2023-10-19 | Parameters: 13B - License: closed | Type: model - AI model by Kuaishou Technology - **Voicebox / VB-En** (Facebook AI Research) — 2023-10-19 | Parameters: 358M - License: closed | Type: model - AI model by Facebook AI Research - **ERNIE 4.0** (Baidu) — 2023-10-17 - License: closed | Type: model - AI model by Baidu - **Fuyu-8B** (Adept) — 2023-10-17 | Parameters: 8B - License: open | Type: model - AI model by Adept - **PaLI-3** (Google DeepMind,Google Research,Google Cloud) — 2023-10-17 | Parameters: 5B - License: closed | Type: model - AI model by Google DeepMind,Google Research,Google Cloud - **Llemma 34B** (Princeton University,University of Toronto,Vector Institute,University of Cambridge,Carnegie Mellon University (CMU),University of Washington,EleutherAI) — 2023-10-16 | Parameters: 34B - License: open | Type: model - AI model by Princeton University,University of Toronto,Vector Institute,University of Cambridge,Carnegie Mellon University (CMU),University of Washington,EleutherAI - **Llemma 7B** (Princeton University,EleutherAI,University of Toronto,Vector Institute,University of Cambridge,Carnegie Mellon University (CMU),University of Washington) — 2023-10-16 | Parameters: 7B - License: open | Type: model - AI model by Princeton University,EleutherAI,University of Toronto,Vector Institute,University of Cambridge,Carnegie Mellon University (CMU),University of Washington - **Aquila2 34B** (Beijing Academy of Artificial Intelligence / BAAI) — 2023-10-13 | Parameters: 34B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Table-GPT** (Microsoft Research) — 2023-10-13 | Parameters: 175B - License: closed | Type: model - AI model by Microsoft Research - **RT-2-X** (Google DeepMind) — 2023-10-13 | Parameters: 55B - License: closed | Type: model - AI model by Google DeepMind - **Jiutian** (China Mobile) — 2023-10-12 | Parameters: 13.9B - License: open | Type: model - AI model by China Mobile - **Ferret (13B)** (Columbia University,Apple) — 2023-10-11 | Parameters: 13B - License: open | Type: model - AI model by Columbia University,Apple - **Mistral 7B** (Mistral AI) — 2023-10-10 | Parameters: 7B - License: open | Type: model - AI model by Mistral AI - **CodeFuse-13B** (Ant Group) — 2023-10-10 | Parameters: 13B - License: open | Type: model - AI model by Ant Group - **CELLE-2** (Chan Zuckerberg Initiative,University of California San Francisco,University of California (UC) Berkeley) — 2023-10-10 - License: open | Type: model - AI model by Chan Zuckerberg Initiative,University of California San Francisco,University of California (UC) Berkeley - **RoseTTAFold All-Atom (RFAA)** (University of Washington,Seoul National University,University of Sheffield) — 2023-10-09 - License: open | Type: model - AI model by University of Washington,Seoul National University,University of Sheffield - **FinGPT-13B** (University of California Los Angeles (UCLA),Columbia University,New York University (NYU)) — 2023-10-07 | Parameters: 13B - License: open | Type: model - AI model by University of California Los Angeles (UCLA),Columbia University,New York University (NYU) - **NAEPro** (University of California Santa Barbara (UCSB),Massachusetts Institute of Technology (MIT),Carnegie Mellon University (CMU)) — 2023-10-06 - License: closed | Type: model - AI model by University of California Santa Barbara (UCSB),Massachusetts Institute of Technology (MIT),Carnegie Mellon University (CMU) - **FoldFlow** (McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Dreamfold,University of Montreal / Université de Montréal,University of Oxford) — 2023-10-03 - License: closed | Type: model - AI model by McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Dreamfold,University of Montreal / Université de Montréal,University of Oxford - **LLaMA-7B (protein-oriented instruction-tuned)** (Zhejiang University (ZJU)) — 2023-10-02 | Parameters: 7B - License: open | Type: model - AI model by Zhejiang University (ZJU) - **MiniGPT4 (Vicuna finetune)** (King Abdullah University of Science and Technology (KAUST)) — 2023-10-02 | Parameters: 13B - License: open | Type: model - AI model by King Abdullah University of Science and Technology (KAUST) - **Phi-1** (Microsoft Research) — 2023-10-02 | Parameters: 1.3B - License: open | Type: model - AI model by Microsoft Research - **MatFormer** (Google DeepMind) — 2023-10-01 | Parameters: MatFormer - License: open | Type: model - Matryoshka Transformer or MatFormer model architecture. 850M (696M / 620M / 582M). "850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters, each exhibiting better validation loss and one-shot downstream evaluations than independently trained counterparts." - **Skywork-13B** (Kunlun Tech) — 2023-10-01 | Parameters: Skywork-13B - License: open | Type: model - CN + EN. - **Kimi Chat** (Moonshot AI) — 2023-10-01 | Parameters: Kimi Chat - License: open | Type: model - Chinese. Long context. No paper. - **jina-embeddings-v2** (Jina AI) — 2023-10-01 | Parameters: jina-embeddings-v2 - License: open | Type: model - Alternative to text-embedding-ada-002. Related v1 paper: https://arxiv.org/abs/2307.11224 - **Fuyu** (Adept) — 2023-10-01 | Parameters: Fuyu - License: open | Type: model - VLM. 8B available under open licence, Medium size is closed - **ERNIE 4.0** (Baidu) — 2023-10-01 | Parameters: ERNIE 4.0 - License: open | Type: model - Dense (confirmed). English-dubbed launch video (2h52m): https://twitter.com/i/broadcasts/1yNGaZaeallJj & https://youtu.be/wYozcsavRuM - **Zephyr** (Hugging Face H4) — 2023-10-01 | Parameters: Zephyr - License: open | Type: model - Mistral with 'aligned' data removed from dataset - **PaLI-3** (Google DeepMind) — 2023-10-01 | Parameters: PaLI-3 - License: closed | Type: model - VLM. Next iteration of PaLI via Pathways. https://lifearchitect.ai/pathways/ - **Retro 48B** (NVIDIA) — 2023-10-01 | Parameters: Retro 48B - License: open | Type: model - the largest LLM pretrained with retrieval before instruction tuning.' - **Ferret** (Apple) — 2023-10-01 | Parameters: Ferret - License: open | Type: model - Vicuna base, multimodal - **Lemur** (XLANG Lab) — 2023-10-01 | Parameters: Lemur - License: open | Type: model - https://arxiv.org/abs/2310.06830 - **AceGPT** (KAUST/Shenzhen) — 2023-10-01 | Parameters: AceGPT - License: open | Type: model - Arabic. Llama 2 + RLAIF - **Yasa-1** (Reka AI) — 2023-10-01 | Parameters: Yasa-1 - License: partial | Type: model - Multi-modal. No public arch info. Researchers from DeepMind, Google, Baidu and Meta building enterprise models - **RT-X** (Google DeepMind) — 2023-10-01 | Parameters: RT-X - License: open | Type: model - Robotics using UL2. 'RT-1 model trained using the robotic data mixture as RT-1-X, and the RT-2 model trained using the robotic data mixture as RT-2-X.' - **CTM (CIFAR-10)** (Stanford University,Sony) — 2023-10-01 - License: open | Type: model - AI model by Stanford University,Sony - **TinyLlama-1.1B (1T token checkpoint)** (Singapore University of Technology & Design) — 2023-10-01 | Parameters: 1.1B - License: open | Type: model - AI model by Singapore University of Technology & Design - **TinyLlama-1.1B (3T token checkpoint)** (Singapore University of Technology & Design) — 2023-10-01 | Parameters: 1.1B - License: open | Type: model - AI model by Singapore University of Technology & Design - **BITTERS** (LG,Shutterstock) — 2023-10-01 | Parameters: 650M - License: closed | Type: model - AI model by LG,Shutterstock - **PIXART-α** (Huawei Noah's Ark Lab,The University of Hong Kong,Hong Kong University of Science and Technology (HKUST)) — 2023-09-30 | Parameters: 600M - License: open | Type: model - AI model by Huawei Noah's Ark Lab,The University of Hong Kong,Hong Kong University of Science and Technology (HKUST) - **HiDream Large Model 1.0** (HiDream) — 2023-09-30 | Parameters: 6B - License: closed | Type: model - AI model by HiDream - **StableLM-3B-4E1T** (Stability AI) — 2023-09-29 | Parameters: 2.8B - License: open | Type: model - AI model by Stability AI - **Wuerstchen** (Technische Hochschule Ingolstadt,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Polytechnique Montreal,Wand Technologies) — 2023-09-29 | Parameters: 1B - License: open | Type: model - AI model by Technische Hochschule Ingolstadt,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Polytechnique Montreal,Wand Technologies - **GAIA-1** (Wayve) — 2023-09-29 | Parameters: 9B - License: closed | Type: model - AI model by Wayve - **Qwen-14B** (Alibaba) — 2023-09-28 | Parameters: 14B - License: open | Type: model - AI model by Alibaba - **Amazon Titan** (Amazon) — 2023-09-28 | Parameters: 200B - License: closed | Type: model - AI model by Amazon - **Qwen-7B** (Alibaba) — 2023-09-28 | Parameters: 7B - License: open | Type: model - AI model by Alibaba - **PLaMo-13B** (Preferred Networks Inc) — 2023-09-28 | Parameters: 13B - License: open | Type: model - AI model by Preferred Networks Inc - **GPT-3.5 Turbo Instruct** (OpenAI) — 2023-09-28 | Parameters: 20B - License: closed | Type: model - AI model by OpenAI - **Show-1** (National University of Singapore) — 2023-09-27 - License: open | Type: model - AI model by National University of Singapore - **Emu (Meta)** (Meta AI) — 2023-09-27 | Parameters: 2.8B - License: closed | Type: model - AI model by Meta AI - **BigRNA** (DeepGenomics) — 2023-09-27 | Parameters: 2B - License: closed | Type: model - AI model by DeepGenomics - **InternLM-XComposer** (Shanghai AI Lab) — 2023-09-26 | Parameters: 7B - License: open | Type: model - AI model by Shanghai AI Lab - **GPT-4V** (OpenAI) — 2023-09-25 - License: closed | Type: model - AI model by OpenAI - **AlphaMissense** (Google DeepMind) — 2023-09-22 | Parameters: 93M - License: closed | Type: model - AI model by Google DeepMind - **PengCheng Mind (鹏城脑海)** (Peng Cheng Laboratory) — 2023-09-21 | Parameters: 200B - License: open | Type: model - AI model by Peng Cheng Laboratory - **BTLM-3B** (Cerebras Systems) — 2023-09-20 | Parameters: 2.6B - License: open | Type: model - AI model by Cerebras Systems - **DreamLLM** (Xi’an Jiaotong University,Megvii Inc,Tsinghua University,Huazhong University of Science and Technology) — 2023-09-20 | Parameters: 7B - License: closed | Type: model - AI model by Xi’an Jiaotong University,Megvii Inc,Tsinghua University,Huazhong University of Science and Technology - **GPT-MolBERTa** (Carnegie Mellon University (CMU)) — 2023-09-20 - License: open | Type: model - AI model by Carnegie Mellon University (CMU) - **Baichuan 2-7B** (Baichuan) — 2023-09-20 | Parameters: 7B - License: open | Type: model - AI model by Baichuan - **OpenChat-13b** (Tsinghua University,Shanghai AI Lab,01.AI) — 2023-09-20 | Parameters: 13B - License: open | Type: model - AI model by Tsinghua University,Shanghai AI Lab,01.AI - **Nova-2** (Deepgram) — 2023-09-19 - License: closed | Type: model - AI model by Deepgram - **bge-reranker-large** (Beijing Academy of Artificial Intelligence / BAAI,Hugging Face) — 2023-09-14 | Parameters: 560M - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI,Hugging Face - **DeciLM 6B** (Deci AI) — 2023-09-13 | Parameters: 5.7B - License: open | Type: model - AI model by Deci AI - **Robot Parkour** (Shanghai Qi Zhi institute,Stanford University,Carnegie Mellon University (CMU),Tsinghua University) — 2023-09-12 | Parameters: 500K - License: closed | Type: model - AI model by Shanghai Qi Zhi institute,Stanford University,Carnegie Mellon University (CMU),Tsinghua University - **Phi-1.5** (Microsoft) — 2023-09-11 | Parameters: 1.3B - License: open | Type: model - AI model by Microsoft - **MADLAD-400 10B** (Google DeepMind,Google Research) — 2023-09-09 | Parameters: 10.7B - License: open | Type: model - AI model by Google DeepMind,Google Research - **Mobile V-MoEs** (Apple) — 2023-09-08 - License: closed | Type: model - AI model by Apple - **Persimmon-8B** (Adept) — 2023-09-07 | Parameters: 9.3B - License: open | Type: model - AI model by Adept - **FLM-101B** (Chinese Academy of Sciences,Harbin Institute of Technology,Nanyang Technological University,Beijing Academy of Artificial Intelligence / BAAI) — 2023-09-07 | Parameters: 101B - License: open | Type: model - AI model by Chinese Academy of Sciences,Harbin Institute of Technology,Nanyang Technological University,Beijing Academy of Artificial Intelligence / BAAI - **XGen-7B** (Salesforce) — 2023-09-07 | Parameters: 6.7B - License: open | Type: model - AI model by Salesforce - **Hunyuan** (Tencent) — 2023-09-07 | Parameters: 100B - License: closed | Type: model - AI model by Tencent - **ELIXR-C** (Google,Northwestern Medicine,Apollo Radiology International) — 2023-09-07 - License: closed | Type: model - AI model by Google,Northwestern Medicine,Apollo Radiology International - **ELIXR-B** (Google,Northwestern Medicine,Apollo Radiology International) — 2023-09-07 - License: closed | Type: model - AI model by Google,Northwestern Medicine,Apollo Radiology International - **Falcon-180B** (Technology Innovation Institute) — 2023-09-06 | Parameters: 180B - License: open | Type: model - AI model by Technology Innovation Institute - **Baichuan2-13B** (Baichuan) — 2023-09-06 | Parameters: 13B - License: open | Type: model - AI model by Baichuan - **TigerBot-70B** (Tigerobo) — 2023-09-06 | Parameters: 70B - License: open | Type: model - AI model by Tigerobo - **360 Smart Brain** (360 Security Technology) — 2023-09-04 - License: closed | Type: model - AI model by 360 Security Technology - **MotionLM** (Waymo) — 2023-09-01 | Parameters: MotionLM - License: closed | Type: model - LLM for autonomous vehicle forecasting. https://youtu.be/jrMMNmN21I8?t=1560 - **GAIA-1** (Wayve) — 2023-09-01 | Parameters: GAIA-1 - License: closed | Type: model - World model, generates video. Uses T5-large 770M for language + all vision parameters - **Qwen** (Alibaba) — 2023-09-01 | Parameters: Qwen - License: open | Type: model - Chinese. Full name is 'Tongyi Qianwen' 通义千问. 'Lags behind both GPT-3.5 and GPT-4'. Originally 7B/14B params Apr/2023 - **Llama 2 Long** (Meta AI) — 2023-09-01 | Parameters: Llama 2 Long - License: closed | Type: model - Unreleased to date. Context window=32,768 tokens (compare to Llama 2=4096 tokens) - **LeoLM** (Hessian AI/LAION) — 2023-09-01 | Parameters: LeoLM - License: open | Type: model - Llama 2 'extended' and pretrained on 2000B Llama 2 tokens + 65B tokens of German - **Mistral 7B** (Mistral) — 2023-09-01 | Parameters: Mistral 7B - License: open | Type: model - Apache 2.0, Sliding Window Attention (SWA) to handle longer sequences at smaller cost - **Kosmos-2.5** (Microsoft) — 2023-09-01 | Parameters: Kosmos-2.5 - License: closed | Type: model - - **Baichuan 2** (Baichuan) — 2023-09-01 | Parameters: Baichuan 2 - License: open | Type: model - Great paper. Chinese-English bilingual dataset - **BOLT2.5B** (ThirdAI) — 2023-09-01 | Parameters: BOLT2.5B - License: open | Type: model - CPU trained - **DeciLM** (Deci) — 2023-09-01 | Parameters: DeciLM - License: open | Type: model - Faster inference (4.8× throughput of Llama 2) - **MoLM** (IBM) — 2023-09-01 | Parameters: MoLM - License: open | Type: model - ModuleFormer is based on the Sparse Mixture of Experts (MoE). - **NExT-GPT** (Singapore) — 2023-09-01 | Parameters: NExT-GPT - License: open | Type: model - Multimodal. Vicuna 7B + other modalities - **Phi-1.5** (Microsoft) — 2023-09-01 | Parameters: Phi-1.5 - License: open | Type: model - Textbooks only. 30B-token dataset - **UniLM** (Apple) — 2023-09-01 | Parameters: UniLM - License: open | Type: model - Apple's Transformer model for iOS 17 + macOS Sonoma. Announce is actually Jun/2023. GPT-2 base? 128 token context window - **Persimmon-8B** (Adept) — 2023-09-01 | Parameters: Persimmon-8B - License: open | Type: model - Open Apache license and publicly accessible weights. - **FLM-101B** (BAAI) — 2023-09-01 | Parameters: FLM-101B - License: open | Type: model - Train for $100k compute budget (on a cluster of 24 DGX-A800 GPU 8×80G servers for 21 days) - **Falcon 180B** (TII) — 2023-09-01 | Parameters: Falcon 180B - License: open | Type: model - Major milestone for open source models (largest open dense model to date). - **Hunyuan** (Tencent) — 2023-09-01 | Parameters: Hunyuan - License: open | Type: model - - **phi-CTNL** (Independent) — 2023-09-01 | Parameters: phi-CTNL - License: open | Type: model - Satire. MMLU=100. 'phi-CTNL (pronounced “fictional”) that achieves perfect results across diverse academic benchmarks' - **Granite** (IBM) — 2023-09-01 | Parameters: Granite - License: open | Type: model - Original trained on 1T tokens, update 15/Feb/2024 trained on 2.5T tokens: granite-13b-chat-v2 (v2.1.0). "At IBM, we curated 6.48TB of data to train our LLM Granite.13B. This was reduced to 2.07 TB after pre-processing, a 68% decrease." - **Swift** (Intel Labs) — 2023-08-30 | Parameters: 56.8K - License: closed | Type: model - AI model by Intel Labs - **ABAB** (MiniMax) — 2023-08-30 - License: closed | Type: model - AI model by MiniMax - **Jais** (Cerebras Systems,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Inception G42) — 2023-08-29 | Parameters: 13B - License: open | Type: model - AI model by Cerebras Systems,Mohamed bin Zayed University of Artificial Intelligence (MBZUAI),Inception G42 - **Refact-1.6B** (Refact AI) — 2023-08-29 | Parameters: 1.6B - License: open | Type: model - AI model by Refact AI - **Luca 2.0** (Mianbi Intelligence) — 2023-08-29 | Parameters: 100B - License: closed | Type: model - AI model by Mianbi Intelligence - **PeptideBERT** (Carnegie Mellon University (CMU)) — 2023-08-28 - License: open | Type: model - AI model by Carnegie Mellon University (CMU) - **MengziGPT-General-40B** (Langboat) — 2023-08-27 | Parameters: 40B - License: closed | Type: model - AI model by Langboat - **Mengzi-Fin-7B** (Langboat) — 2023-08-27 | Parameters: 7B - License: closed | Type: model - AI model by Langboat - **Mengzi-Fin-13B** (Langboat) — 2023-08-27 | Parameters: 13B - License: closed | Type: model - AI model by Langboat - **Mengzi-Lite** (Langboat) — 2023-08-25 - License: closed | Type: model - AI model by Langboat - **MathGPT** (TAL Education Group (Xueersi)) — 2023-08-24 - License: closed | Type: model - AI model by TAL Education Group (Xueersi) - **HyperCLOVA X** (NAVER) — 2023-08-24 - License: closed | Type: model - AI model by NAVER - **Qwen-VL** (Alibaba) — 2023-08-24 | Parameters: 9.6B - License: open | Type: model - AI model by Alibaba - **PULI GPTrio** (Hungarian Research Centre for Linguistics) — 2023-08-23 | Parameters: 6.7B - License: open | Type: model - AI model by Hungarian Research Centre for Linguistics - **ShapeMol** (Ohio State University) — 2023-08-23 | Parameters: 2.7M - License: closed | Type: model - AI model by Ohio State University - **IDEFICS-80B** (Hugging Face) — 2023-08-22 | Parameters: 80B - License: open | Type: model - AI model by Hugging Face - **IDEFICS-9B** (Hugging Face) — 2023-08-22 | Parameters: 9B - License: open | Type: model - AI model by Hugging Face - **Eleven Multilingual v2** (ElevenLabs) — 2023-08-22 - License: closed | Type: model - AI model by ElevenLabs - **Dou Bao** (ByteDance) — 2023-08-18 - License: closed | Type: model - AI model by ByteDance - **KwaiYii 13B** (Kuaishou Technology) — 2023-08-16 | Parameters: 13B - License: closed | Type: model - AI model by Kuaishou Technology - **VARCO LLM 2.0 base** (NCSOFT) — 2023-08-16 | Parameters: 13B - License: closed | Type: model - AI model by NCSOFT - **VARCO LLM 2.0 small Finetuning** (NCSOFT) — 2023-08-16 | Parameters: 7B - License: closed | Type: model - AI model by NCSOFT - **VARCO LLM KO/EN-13B-IST ver.1** (NCSOFT) — 2023-08-16 | Parameters: 13B - License: closed | Type: model - AI model by NCSOFT - **DeciCoder-1B** (Deci AI) — 2023-08-15 | Parameters: 1.1B - License: open | Type: model - AI model by Deci AI - **Konan LLM 13B** (Konan Technology) — 2023-08-15 | Parameters: 13.1B - License: closed | Type: model - AI model by Konan Technology - **A.X (Adot) 7B** (SK Telecom) — 2023-08-15 | Parameters: 7B - License: closed | Type: model - AI model by SK Telecom - **Spark 2.0** (iFlytek) — 2023-08-15 - License: closed | Type: model - AI model by iFlytek - **Platypus-70B** (Boston University) — 2023-08-14 | Parameters: 70B - License: open | Type: model - AI model by Boston University - **Code Llama-34B** (Meta AI) — 2023-08-14 | Parameters: 34B - License: open | Type: model - AI model by Meta AI - **Japanese-LM-3.6B** (LINE Corporation) — 2023-08-14 | Parameters: 3.6B - License: open | Type: model - AI model by LINE Corporation - **Code Llama-7B** (Meta AI) — 2023-08-14 | Parameters: 7B - License: open | Type: model - AI model by Meta AI - **Code Llama-13B** (Meta AI) — 2023-08-14 | Parameters: 13B - License: open | Type: model - AI model by Meta AI - **Japanese StableLM Base Alpha 7B** (Stability AI) — 2023-08-10 | Parameters: 7B - License: open | Type: model - AI model by Stability AI - **Baichuan2-53B** (Baichuan) — 2023-08-09 | Parameters: 53B - License: closed | Type: model - AI model by Baichuan - **Claude Instant** (Anthropic) — 2023-08-09 - License: closed | Type: model - AI model by Anthropic - **CALM** (NVIDIA,Technion - Israel Institute of Technology) — 2023-08-06 - License: closed | Type: model - AI model by NVIDIA,Technion - Israel Institute of Technology - **SS-pLM** (Nostrum Biodiscovery,Barcelona Supercomputing Center,Institucio Catalana de Recerca i Estudis Avancçats) — 2023-08-06 | Parameters: 14.8M - License: closed | Type: model - AI model by Nostrum Biodiscovery,Barcelona Supercomputing Center,Institucio Catalana de Recerca i Estudis Avancçats - **StableLM-Base-Alpha-7B** (Stability AI) — 2023-08-05 | Parameters: 6.9B - License: open | Type: model - AI model by Stability AI - **GGNN** (Westlake University,Tsinghua University,Toyota Technological Institute at Chicago) — 2023-08-05 - License: closed | Type: model - AI model by Westlake University,Tsinghua University,Toyota Technological Institute at Chicago - **Weblab-10B** (Matsuo Lab) — 2023-08-04 | Parameters: 10B - License: open | Type: model - AI model by Matsuo Lab - **YuLan-Chat-2 (13B)** (Renmin University of China) — 2023-08-02 | Parameters: 13B - License: closed | Type: model - AI model by Renmin University of China - **OpenFlamingo** (University of Washington,Stanford University,Allen Institute for AI,Hebrew University of Jerusalem,Columbia University,Google DeepMind,University of California Santa Barbara (UCSB),Research Center Juelich) — 2023-08-02 | Parameters: 9B - License: open | Type: model - AI model by University of Washington,Stanford University,Allen Institute for AI,Hebrew University of Jerusalem,Columbia University,Google DeepMind,University of California Santa Barbara (UCSB),Research Center Juelich - **Jais** (Inception) — 2023-08-01 | Parameters: Jais - License: open | Type: model - Arabic, trained in Abu Dhabi, UAE using Cerebras. - **Code Llama 34B** (Meta AI) — 2023-08-01 | Parameters: Code Llama 34B - License: open | Type: model - Outperforms GPT-3.5. Initial Llama 2 (2T tokens) trained on 500B tokens of code, 100B tokens of python - **IDEFICS** (Hugging Face) — 2023-08-01 | Parameters: IDEFICS - License: open | Type: model - Clone of Flamingo using Llama-1 65B. Named after Asterix and Obelix's dog Idefix (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS) - **Raven** (UI/NVIDIA) — 2023-08-01 | Parameters: Raven - License: closed | Type: model - RAG Atlas - **DukunLM** (AzaleAI) — 2023-08-01 | Parameters: DukunLM - License: open | Type: model - Indonesian fine-tune of WizardLM (which is a Llama fine-tune). - **WizardLM** (Microsoft) — 2023-08-01 | Parameters: WizardLM - License: open | Type: model - Assume Llama-2 fine-tune. Outperforms text-davinci-003. May merge this entry with the Apr/2023 7B release - **Platypus** (Boston University) — 2023-08-01 | Parameters: Platypus - License: open | Type: model - Fine-tune of Llama 2, family includes merges with Beluga, Dolphin, and Camel fine-tunes. - **Japanese StableLM Alpha 7B** (Stability AI) — 2023-08-01 | Parameters: Japanese StableLM Alpha 7B - License: open | Type: model - Best-performing openly available language model for Japanese speakers. - **Stable Code 3B** (Stability AI) — 2023-08-01 | Parameters: Stable Code 3B - License: open | Type: model - Context window=16,384. Trained on The Stack dataset. - **JIANG** (K.D. Feddersen (KDF)) — 2023-08-01 - License: open | Type: model - AI model by K.D. Feddersen (KDF) - **Vicuna-7B-v1.5** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-08-01 | Parameters: 7B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **Vicuna-13B-v1.5** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-08-01 | Parameters: 13B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **bilingual-gpt-neox-4b** (rinna) — 2023-07-31 | Parameters: 3.8B - License: open | Type: model - AI model by rinna - **RT-2** (Google DeepMind) — 2023-07-28 | Parameters: 55B - License: closed | Type: model - AI model by Google DeepMind - **Zi Yue** (NetEase) — 2023-07-28 - License: closed | Type: model - AI model by NetEase - **Zi Yue 2.0** (NetEase) — 2023-07-28 - License: closed | Type: model - AI model by NetEase - **BELLE-Llama2-13B-chat-0.4M** (KE Holdings Inc. (“Beike”)) — 2023-07-27 | Parameters: 13B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **AudioLM** (Google Research) — 2023-07-26 | Parameters: 1.5B - License: closed | Type: model - AI model by Google Research - **CharacterGLM** (Beijing Lingxin Intelligent Technology Co., Ltd.) — 2023-07-25 - License: closed | Type: model - AI model by Beijing Lingxin Intelligent Technology Co., Ltd. - **WizardLM 13B v1.2** (Microsoft,Peking University) — 2023-07-25 | Parameters: 13B - License: open | Type: model - AI model by Microsoft,Peking University - **RFdiffusion** (University of Washington,Columbia University,Ecole Normale Supèrieure,University of Cambridge,Massachusetts Institute of Technology (MIT),Seoul National University) — 2023-07-23 - License: open | Type: model - AI model by University of Washington,Columbia University,Ecole Normale Supèrieure,University of Cambridge,Massachusetts Institute of Technology (MIT),Seoul National University - **LM-Design** (ByteDance,University of Wisconsin Madison) — 2023-07-23 | Parameters: 6.9M - License: closed | Type: model - AI model by ByteDance,University of Wisconsin Madison - **YAYI-13B-Llama2** (Yayi (Wenge)) — 2023-07-22 | Parameters: 13B - License: open | Type: model - AI model by Yayi (Wenge) - **YAYI-7B-Llama2** (Yayi (Wenge)) — 2023-07-22 | Parameters: 7B - License: open | Type: model - AI model by Yayi (Wenge) - **Stable Beluga 1** (Stability AI) — 2023-07-21 | Parameters: 65.2B - License: open | Type: model - AI model by Stability AI - **Stable Beluga 2** (Stability AI) — 2023-07-20 | Parameters: 70B - License: open | Type: model - AI model by Stability AI - **EXAONE 2.0** (LG AI Research) — 2023-07-19 | Parameters: 300B - License: closed | Type: model - AI model by LG AI Research - **Llama 2-70B** (Meta AI) — 2023-07-18 | Parameters: 70B - License: open | Type: model - AI model by Meta AI - **Llama 2-34B** (Meta AI) — 2023-07-18 | Parameters: 34B - License: closed | Type: model - AI model by Meta AI - **Llama 2-7B** (Meta AI) — 2023-07-18 | Parameters: 7B - License: open | Type: model - AI model by Meta AI - **Llama 2-13B** (Meta AI) — 2023-07-18 | Parameters: 13B - License: open | Type: model - AI model by Meta AI - **GPT3-2.7B (FlashAttention-2)** (Stanford University,Princeton University) — 2023-07-18 | Parameters: 2.7B - License: closed | Type: model - AI model by Stanford University,Princeton University - **RetNet** (Microsoft Research,Tsinghua University) — 2023-07-17 | Parameters: 6.7B - License: closed | Type: model - AI model by Microsoft Research,Tsinghua University - **BaiLing** (Ant Group) — 2023-07-15 - License: closed | Type: model - AI model by Ant Group - **CryoChains** (University of California Santa Barbara (UCSB),Stanford University) — 2023-07-15 - License: closed | Type: model - AI model by University of California Santa Barbara (UCSB),Stanford University - **ChatRhino** (JD.com) — 2023-07-13 | Parameters: 100B - License: closed | Type: model - AI model by JD.com - **Uni-RNA-L8** (DP Technology) — 2023-07-12 | Parameters: 25M - License: closed | Type: model - AI model by DP Technology - **Med-PaLM** (Google Research,National Library of Medicine,DeepMind) — 2023-07-12 | Parameters: 540B - License: closed | Type: model - AI model by Google Research,National Library of Medicine,DeepMind - **Uni-RNA-L-24** (DP Technology) — 2023-07-12 | Parameters: 400M - License: closed | Type: model - AI model by DP Technology - **Uni-RNA-L12** (DP Technology) — 2023-07-12 | Parameters: 85M - License: closed | Type: model - AI model by DP Technology - **Uni-RNA-L16** (DP Technology) — 2023-07-12 | Parameters: 169M - License: closed | Type: model - AI model by DP Technology - **Claude 2** (Anthropic) — 2023-07-11 - License: closed | Type: model - AI model by Anthropic - **Emu1 (BAAI)** (Beijing Academy of Artificial Intelligence / BAAI,Tsinghua University,Peking University) — 2023-07-11 | Parameters: 14B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI,Tsinghua University,Peking University - **Baichuan 1-13B** (Baichuan) — 2023-07-11 | Parameters: 13.3B - License: open | Type: model - AI model by Baichuan - **West Lake (“Xīhú / 西湖大模型”)** (West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司) — 2023-07-09 - License: open | Type: model - AI model by West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司 - **TeleChat** (China Telecom) — 2023-07-07 - License: closed | Type: model - AI model by China Telecom - **Pangu 3.0** (Huawei) — 2023-07-07 | Parameters: 100B - License: closed | Type: model - AI model by Huawei - **InternLM** (Shanghai AI Lab,SenseTime) — 2023-07-06 | Parameters: 104B - License: closed | Type: model - AI model by Shanghai AI Lab,SenseTime - **CodeGen2.5** (Salesforce) — 2023-07-06 | Parameters: 7B - License: open | Type: model - AI model by Salesforce - **xTrimoPGLM -100B** (Tsinghua University,BioMap Research) — 2023-07-06 | Parameters: 100B - License: closed | Type: model - AI model by Tsinghua University,BioMap Research - **NEC LLM 13B** (NEC Laboratories) — 2023-07-06 | Parameters: 13B - License: closed | Type: model - AI model by NEC Laboratories - **Pangu-Weather** (Huawei) — 2023-07-05 | Parameters: 256M - License: open | Type: model - AI model by Huawei - **LongNet** (Microsoft,Xi’an Jiaotong University) — 2023-07-05 | Parameters: 2.7B - License: closed | Type: model - AI model by Microsoft,Xi’an Jiaotong University - **Stable Diffusion XL (SDXL)** (Stability AI) — 2023-07-04 | Parameters: 3.4B - License: open | Type: model - AI model by Stability AI - **Med-Flamingo** (Stanford) — 2023-07-01 | Parameters: Med-Flamingo - License: open | Type: model - Uses LAION OpenFlamingo 9B, based on LLaMA-7B text + 1.3B vision - **Alfred-40B-0723** (LightOn) — 2023-07-01 | Parameters: Alfred-40B-0723 - License: open | Type: model - First finetuned version of Falcon with RLHF. Enterprise: https://www.lighton.ai/paradigm - **LLaMA-2-7B-32K** (Together) — 2023-07-01 | Parameters: LLaMA-2-7B-32K - License: open | Type: model - 32k context window instead of 4k (Llama 2) - **Med-PaLM M** (Google DeepMind) — 2023-07-01 | Parameters: Med-PaLM M - License: closed | Type: model - Uses PaLM 1. Already outperformed by Med-PaLM 2. Med-PaLM Multimodal (Med-PaLM M). - **BTLM-3B-8K** (Cerebras) — 2023-07-01 | Parameters: BTLM-3B-8K - License: open | Type: model - Runs on devices with as little as 3GB of memory [iPhone, Macbook] when quantized to 4-bit - **Stable Beluga 2** (Stability AI) — 2023-07-01 | Parameters: Stable Beluga 2 - License: open | Type: model - Fine-tuned Llama 2. Non-commercial use license. Codename was FreeWilly2 - **Stable Beluga 1** (Stability AI) — 2023-07-01 | Parameters: Stable Beluga 1 - License: open | Type: model - Fine-tuned LLaMA-1. Non-commercial use license. Codename was FreeWilly1 - **Meta-Transformer** (Shanghai AI Laboratory/CUHK) — 2023-07-01 | Parameters: Meta-Transformer - License: open | Type: model - Proto-AGI. 12 modalities (text, image, point cloud, audio, video, infrared, hyperspectral, X-ray, time-series, tabular, Inertial Measurement Unit (IMU), and graph data). - **Llama 2** (Meta AI) — 2023-07-01 | Parameters: Llama 2 - License: open | Type: model - Context window=4096. MMLU=68.9 (GPT-3.5=70.0, GPT-4=86.4) - **WormGPT** ((Undisclosed)) — 2023-07-01 | Parameters: WormGPT - License: partial | Type: model - GPT-J (2021) finetune/module. - **Claude 2** (Anthropic) — 2023-07-01 | Parameters: Claude 2 - License: open | Type: model - More HHH, 200k context length - **LongLLaMA** (IDEAS/DeepMind) — 2023-07-01 | Parameters: LongLLaMA - License: open | Type: model - 256k context length - **xTrimoPGLM** (Tsinghua) — 2023-07-01 | Parameters: xTrimoPGLM - License: closed | Type: model - Protein language model - **XGen** (Salesforce) — 2023-07-01 | Parameters: XGen - License: open | Type: model - 8K sequence length. Released under Apache-2.0. - **Zhinao (Intellectual Brain)** (360 cn) — 2023-07-01 | Parameters: Zhinao (Intellectual Brain) - License: open | Type: model - - **Multilingual-E5-large** (Microsoft) — 2023-06-30 | Parameters: 560M - License: open | Type: model - AI model by Microsoft - **Honghu Graphic** (China Unicom) — 2023-06-28 | Parameters: 2B - License: closed | Type: model - AI model by China Unicom - **ERNIE 3.5** (Baidu) — 2023-06-27 - License: closed | Type: model - AI model by Baidu - **HyenaDNA** (Stanford University,Harvard University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal) — 2023-06-27 | Parameters: 6.6M - License: open | Type: model - AI model by Stanford University,Harvard University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal - **Chirp** (Google) — 2023-06-27 | Parameters: 2B - License: closed | Type: model - AI model by Google - **RWKV-4 World (7B)** (RWKV Foundation) — 2023-06-26 | Parameters: 7.4B - License: open | Type: model - AI model by RWKV Foundation - **Kosmos-2** (Microsoft) — 2023-06-26 | Parameters: 1.6B - License: open | Type: model - AI model by Microsoft - **Llama-2-Chinese 13B** (FlagAlpha) — 2023-06-25 | Parameters: 13B - License: open | Type: model - AI model by FlagAlpha - **Inflection-1** (Inflection AI) — 2023-06-23 - License: closed | Type: model - AI model by Inflection AI - **MPT-30B** (MosaicML) — 2023-06-22 | Parameters: 30B - License: open | Type: model - AI model by MosaicML - **Vicuna-33B-v1.3** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-06-22 | Parameters: 33B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **Vicuna-13B-v1.3** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-06-22 | Parameters: 13B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **Vicuna-7B-v1.3** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-06-22 | Parameters: 7B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **RoboCat** (Google DeepMind,Google) — 2023-06-20 | Parameters: 1.2B - License: closed | Type: model - AI model by Google DeepMind,Google - **GigaGAN** (POSTECH,Carnegie Mellon University (CMU),Adobe) — 2023-06-19 | Parameters: 1B - License: closed | Type: model - AI model by POSTECH,Carnegie Mellon University (CMU),Adobe - **Pix2Struct-Large** (Google Research,University of Cambridge) — 2023-06-15 | Parameters: 1.3B - License: open | Type: model - AI model by Google Research,University of Cambridge - **WizardCoder-15.5B** (Microsoft) — 2023-06-14 | Parameters: 15.5B - License: open | Type: model - AI model by Microsoft - **GPT-3.5 Turbo** (OpenAI) — 2023-06-13 | Parameters: 20B - License: closed | Type: model - AI model by OpenAI - **BELLE-LLaMA-7B-0.6M-enc** (KE Holdings Inc. (“Beike”)) — 2023-06-12 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **BELLE-LLaMA-7B-2M-enc** (KE Holdings Inc. (“Beike”)) — 2023-06-12 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **BELLE-LLaMA-13B-2M-enc** (KE Holdings Inc. (“Beike”)) — 2023-06-12 | Parameters: 13B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **Wu Dao Aquila-7B** (Beijing Academy of Artificial Intelligence / BAAI) — 2023-06-10 | Parameters: 7B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Wu Dao Aquila-33B** (Beijing Academy of Artificial Intelligence / BAAI) — 2023-06-10 | Parameters: 33B - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **PoET** (OpenProtein.ai) — 2023-06-09 | Parameters: 57M - License: closed | Type: model - AI model by OpenProtein.ai - **SYNTERACT** (University of Delaware) — 2023-06-09 | Parameters: 420M - License: closed | Type: model - AI model by University of Delaware - **MusicGen** (Meta AI) — 2023-06-08 | Parameters: 3.4B - License: open | Type: model - AI model by Meta AI - **PolySphere-1** (AI inside) — 2023-06-08 | Parameters: 14B - License: closed | Type: model - AI model by AI inside - **RedPajama-INCITE-7B-Base** (Together) — 2023-06-06 | Parameters: 6.9B - License: open | Type: model - AI model by Together - **LTM-1** (Magic) — 2023-06-06 - License: closed | Type: model - AI model by Magic - **GELU for CIFAR-10** (University of California (UC) Berkeley,Toyota Technological Institute at Chicago) — 2023-06-06 | Parameters: 9.9K - License: closed | Type: model - AI model by University of California (UC) Berkeley,Toyota Technological Institute at Chicago - **life2vec** (Technical University of Denmark,University of Copenhagen) — 2023-06-05 | Parameters: 8.4M - License: closed | Type: model - AI model by Technical University of Denmark,University of Copenhagen - **Polyglot-Ko-12.8B** (EleutherAI) — 2023-06-04 | Parameters: 12.9B - License: open | Type: model - AI model by EleutherAI - **YaYi-7B** (Yayi (Wenge)) — 2023-06-03 | Parameters: 7B - License: closed | Type: model - AI model by Yayi (Wenge) - **Yasa** (Reka AI) — 2023-06-01 | Parameters: Yasa - License: partial | Type: model - No public arch info. Researchers from DeepMind, Google, Baidu and Meta building enterprise models - **Kosmos-2** (Microsoft) — 2023-06-01 | Parameters: Kosmos-2 - License: open | Type: model - Proto-AGI. Multimodal large language model (MLLM). a multimodal large language model with grounding capability built upon KOSMOS-1 - **AudioPaLM** (Google) — 2023-06-01 | Parameters: AudioPaLM - License: closed | Type: model - a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation - **Inflection-1** (Inflection AI) — 2023-06-01 | Parameters: Inflection-1 - License: open | Type: model - Comparable with benchmarking results from InternLM 104B, 1-2% better. ‘Inflection-1 was trained using thousands of NVIDIA H100 GPUs on a very large dataset.’ - **Phi-1** (Microsoft) — 2023-06-01 | Parameters: Phi-1 - License: closed | Type: model - Code model. ‘breaking existing scaling laws by training a 1.3B-parameter model, which we call phi-1, for roughly 8 passes over 7B tokens (slightly over 50B total tokens seen) followed by finetuning on less than 200M tokens.’ - **InternLM** (Shanghai AI Laboratory/SenseTime) — 2023-06-01 | Parameters: InternLM - License: closed | Type: model - Outperforms ChatGPT, LLaMA on RACE-h, Chinese + English - **BlenderBot 3x** (Meta AI) — 2023-06-01 | Parameters: BlenderBot 3x - License: open | Type: model - OPT-175B with new dialogue data - **Orca** (Microsoft) — 2023-06-01 | Parameters: Orca - License: partial | Type: model - LLaMA -> Vicuna -> Orca (GPT-4 finetune). Still an imitation model, overhyped: The False Promise of Imitating Proprietary LLMs https://arxiv.org/abs/2305.15717 - **PassGPT** (ETH Zürich) — 2023-06-01 | Parameters: PassGPT - License: closed | Type: model - GPT-2 trained on leaked passwords - **DIDACT** (Google DeepMind) — 2023-06-01 | Parameters: DIDACT - License: closed | Type: model - Iterative coding model trained on Google's monorepo. Jacob: https://twitter.com/jacobaustin132/status/1663972128176128002 - **LTM-1** (Magic) — 2023-06-01 | Parameters: LTM-1 - License: closed | Type: model - Context window=5M - **Baichuan1-7B** (Baichuan) — 2023-06-01 | Parameters: 7.0B - License: open | Type: model - AI model by Baichuan - **TransAct** (Pinterest) — 2023-05-31 | Parameters: 92M - License: closed | Type: model - AI model by Pinterest - **MASSA** (Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems,Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences) — 2023-05-30 - License: open | Type: model - AI model by Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems,Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences - **EGNN** (InstaDeep) — 2023-05-30 - License: closed | Type: model - AI model by InstaDeep - **UniDiffuser (多模态大模型)** (ShengShu,Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI) — 2023-05-30 - License: open | Type: model - AI model by ShengShu,Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI - **PaLI-X** (Google Research) — 2023-05-29 | Parameters: 55B - License: closed | Type: model - AI model by Google Research - **bpRNA-align** (Oregon State University) — 2023-05-29 - License: closed | Type: model - AI model by Oregon State University - **DPO on Pythia-2.8B** (Stanford University,CZ Biohub Network) — 2023-05-29 | Parameters: 2.8B - License: closed | Type: model - AI model by Stanford University,CZ Biohub Network - **30B-Lazarus** (Caldera AI) — 2023-05-27 | Parameters: 30B - License: open | Type: model - AI model by Caldera AI - **HuatuoGPT** (Shenzhen Research Institue of Big Data,Chinese University of Hong Kong (CUHK)) — 2023-05-24 | Parameters: 13B - License: open | Type: model - AI model by Shenzhen Research Institue of Big Data,Chinese University of Hong Kong (CUHK) - **Shanhai** (Unisound) — 2023-05-24 - License: closed | Type: model - AI model by Unisound - **LLaMA-7B (LoRA finetuned)** (NAVER) — 2023-05-23 | Parameters: 7B - License: closed | Type: model - AI model by NAVER - **LLaMA-65B (LoRA finetuned)** (NAVER) — 2023-05-23 | Parameters: 65.2B - License: closed | Type: model - AI model by NAVER - **LLaMA-13B (LoRA finetuned)** (NAVER) — 2023-05-23 | Parameters: 13B - License: closed | Type: model - AI model by NAVER - **LLaMA-33B (LoRA finetuned)** (NAVER) — 2023-05-23 | Parameters: 33B - License: closed | Type: model - AI model by NAVER - **Goat-7B** (National University of Singapore) — 2023-05-23 | Parameters: 7B - License: open | Type: model - AI model by National University of Singapore - **Guanaco-65B** (University of Washington) — 2023-05-23 | Parameters: 65B - License: open | Type: model - AI model by University of Washington - **BiomedGPT (182M)** (Lehigh University,University of Georgia,Samsung Research America,Harvard Medical School,University of Pennsylvania) — 2023-05-23 | Parameters: 182M - License: open | Type: model - AI model by Lehigh University,University of Georgia,Samsung Research America,Harvard Medical School,University of Pennsylvania - **ProlificDreamer** (Tsinghua University,ShengShu) — 2023-05-23 - License: closed | Type: model - AI model by Tsinghua University,ShengShu - **RWKV-4 14B** (RWKV Foundation) — 2023-05-22 | Parameters: 14B - License: closed | Type: model - AI model by RWKV Foundation - **MMS-1B** (Meta AI) — 2023-05-22 | Parameters: 1B - License: open | Type: model - AI model by Meta AI - **CodeT5+** (Salesforce) — 2023-05-20 | Parameters: 16B - License: open | Type: model - AI model by Salesforce - **XuanYuan 2.0** (Du Xiaoman) — 2023-05-19 | Parameters: 176.2B - License: open | Type: model - AI model by Du Xiaoman - **ONE-PEACE** (Alibaba,Huazhong University of Science and Technology) — 2023-05-18 | Parameters: 4B - License: open | Type: model - AI model by Alibaba,Huazhong University of Science and Technology - **LIMA** (Meta AI,Carnegie Mellon University (CMU),University of Southern California,Tel Aviv University) — 2023-05-18 | Parameters: 65B - License: closed | Type: model - AI model by Meta AI,Carnegie Mellon University (CMU),University of Southern California,Tel Aviv University - **Tianhe Tianyuan** (National Supercomputer Center in Tianjin) — 2023-05-18 - License: closed | Type: model - AI model by National Supercomputer Center in Tianjin - **CongRong (从容大模型)** (CloudWalk Technology) — 2023-05-18 - License: closed | Type: model - AI model by CloudWalk Technology - **CoEdiT-xxl** (University of Minnesota,Grammarly) — 2023-05-17 | Parameters: 11B - License: open | Type: model - AI model by University of Minnesota,Grammarly - **WeLM** (WeChat AI) — 2023-05-16 | Parameters: 10B - License: closed | Type: model - AI model by WeChat AI - **Med-PaLM 2** (Google Research,DeepMind) — 2023-05-16 | Parameters: 340B - License: closed | Type: model - AI model by Google Research,DeepMind - **OpenCALM** (CyberAgent) — 2023-05-15 | Parameters: 7B - License: open | Type: model - AI model by CyberAgent - **A.X (Adot) 39B** (SK Telecom) — 2023-05-15 | Parameters: 39B - License: closed | Type: model - AI model by SK Telecom - **LMRec** (NAVER,Naver AI Lab) — 2023-05-13 | Parameters: 210M - License: closed | Type: model - AI model by NAVER,Naver AI Lab - **InstructBLIP** (Salesforce Research,Hong Kong University of Science and Technology (HKUST),Nanyang Technological University) — 2023-05-11 | Parameters: 13B - License: open | Type: model - AI model by Salesforce Research,Hong Kong University of Science and Technology (HKUST),Nanyang Technological University - **ESM-GearNet** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,IBM Research,HEC Montreal,CIFAR AI Research) — 2023-05-11 | Parameters: 650M - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,IBM Research,HEC Montreal,CIFAR AI Research - **PaLM 2** (Google) — 2023-05-10 | Parameters: 340B - License: closed | Type: model - AI model by Google - **PaLM-2 Bison** (Google) — 2023-05-10 - License: closed | Type: model - AI model by Google - **PaLM-2 Unicorn** (Google) — 2023-05-10 - License: closed | Type: model - AI model by Google - **PaLM-2 Gecko** (Google) — 2023-05-10 - License: closed | Type: model - AI model by Google - **PaLM-2 Otter** (Google) — 2023-05-10 - License: closed | Type: model - AI model by Google - **StarCoder** (Hugging Face,ServiceNow,Northeastern University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Carnegie Mellon University (CMU),Johns Hopkins University,Leipzig University,ScaDS.AI,Queen Mary University of London,Roblox,Sea AI Lab,Technion - Israel Institute of Technology,Monash University,CSIRO,Data61,McGill University,Saama,University of British Columbia (UBC),Massachusetts Institute of Technology (MIT),Technical University of Munich,IBM,University of Vermont,UnfoldML,SAP,University of Notre Dame,Columbia University,New York University (NYU),University of Allahabad,Discover Dollar,Toloka,Telefonica,Stanford University,Weizmann Institute of Science,Alan Turing Institute,Wellesley College,EleutherAI,Forschungszentrum Julich) — 2023-05-09 | Parameters: 15.5B - License: open | Type: model - AI model by Hugging Face,ServiceNow,Northeastern University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Carnegie Mellon University (CMU),Johns Hopkins University,Leipzig University,ScaDS.AI,Queen Mary University of London,Roblox,Sea AI Lab,Technion - Israel Institute of Technology,Monash University,CSIRO,Data61,McGill University,Saama,University of British Columbia (UBC),Massachusetts Institute of Technology (MIT),Technical University of Munich,IBM,University of Vermont,UnfoldML,SAP,University of Notre Dame,Columbia University,New York University (NYU),University of Allahabad,Discover Dollar,Toloka,Telefonica,Stanford University,Weizmann Institute of Science,Alan Turing Institute,Wellesley College,EleutherAI,Forschungszentrum Julich - **ImageBind** (Meta AI) — 2023-05-09 | Parameters: 932M - License: open | Type: model - AI model by Meta AI - **Spark** (iFlytek) — 2023-05-06 - License: closed | Type: model - AI model by iFlytek - **MPT-7B** (MosaicML) — 2023-05-05 | Parameters: 7B - License: open | Type: model - AI model by MosaicML - **Otter** (Nanyang Technological University) — 2023-05-05 | Parameters: 1.3B - License: open | Type: model - AI model by Nanyang Technological University - **CodeGen2** (Salesforce) — 2023-05-03 | Parameters: 16B - License: open | Type: model - AI model by Salesforce - **Perfusion** (NVIDIA,Tel Aviv University,Bar-Ilan University) — 2023-05-02 - License: closed | Type: model - AI model by NVIDIA,Tel Aviv University,Bar-Ilan University - **GPT-4 MathMix** (OpenAI) — 2023-05-01 | Parameters: GPT-4 MathMix - License: closed | Type: model - Unreleased, includes step by step research - **PandaGPT** (Cambridge/Tencent) — 2023-05-01 | Parameters: PandaGPT - License: open | Type: model - Proto-AGI. 6 modalities (text, image/video, audio, depth, thermal, and IMU/accelerometer/gyroscope/compass). Based on Vicuna. - **Falcon** (TII) — 2023-05-01 | Parameters: Falcon - License: open | Type: model - Abu Dhabi - **202305-refact2b-mqa-lion** (Refact) — 2023-05-01 | Parameters: 202305-refact2b-mqa-lion - License: partial | Type: model - LiON vs Adam, code, RedPajama+The Stack - **Guanaco** (UW) — 2023-05-01 | Parameters: Guanaco - License: open | Type: model - LLaMA-65B via QLoRA - **LIMA** (Meta AI) — 2023-05-01 | Parameters: LIMA - License: closed | Type: model - LLaMA-65B with nearly no fine-tuning, no RLHF - **Formosa (FFM)** (Asus/TWS) — 2023-05-01 | Parameters: Formosa (FFM) - License: partial | Type: model - BLOOMZ finetune? Chinese, Taiwan's first LLM. Subscription hardware: https://archive.md/cVdJt - **CodeT5+** (Salesforce) — 2023-05-01 | Parameters: CodeT5+ - License: open | Type: model - InstructCodeT5+ 16B sets new SoTA results of 35.0% pass@1 and 54.5% pass@10 against other open code LLMs, even surpassing the closed-source OpenAI code-cushman-001' - **PaLM 2** (Google) — 2023-05-01 | Parameters: PaLM 2 - License: open | Type: model - “What we found in our work is that it’s not really the sort of size of model — that the larger is not always better,” Deepmind VP Zoubin Ghahramani said in a press briefing ahead of today’s announcement. “That’s why we’ve provided a family of models of different sizes. We think that actually parameter count is not really a useful way of thinking about the capabilities of models and capabilities are really to be judged by people using the models and finding out whether they’re useful in the tests that they try to achieve with these models.” - **StarCoder** (HF/ServiceNow) — 2023-05-01 | Parameters: StarCoder - License: open | Type: model - - **MPT** (MosaicML) — 2023-05-01 | Parameters: MPT - License: open | Type: model - Llongboi' -Apache 2.0 license suitable for commercial use. -Base 7B LLM trained on 1T tokens outperforms LLaMA and GPT3. -64K+ context length. -$200k to train from scratch. - **Pi** (Inflection AI) — 2023-05-01 | Parameters: Pi - License: open | Type: model - No indication of params/tokens. Devs from DeepMind. - **GPT-2B-001** (NVIDIA) — 2023-05-01 | Parameters: GPT-2B-001 - License: open | Type: model - No paper yet - **OpenLLaMA-13B** (OpenLM Research) — 2023-05-01 | Parameters: 13B - License: open | Type: model - AI model by OpenLM Research - **MosaicML Diffusion** (Databricks) — 2023-04-28 | Parameters: 1.3B - License: closed | Type: model - AI model by Databricks - **Shishuo** (4Paradigm) — 2023-04-27 - License: closed | Type: model - AI model by 4Paradigm - **Eleven Multilingual v1** (ElevenLabs) — 2023-04-27 - License: closed | Type: model - AI model by ElevenLabs - **Agile Soccer Robot** (Google DeepMind) — 2023-04-26 - License: closed | Type: model - AI model by Google DeepMind - **WizardLM-7B** (Microsoft,Peking University) — 2023-04-24 | Parameters: 6.7B - License: open | Type: model - AI model by Microsoft,Peking University - **Falcon-7B** (Technology Innovation Institute) — 2023-04-24 | Parameters: 7B - License: open | Type: model - AI model by Technology Innovation Institute - **WizardLM 70B** (Microsoft,Peking University) — 2023-04-24 | Parameters: 70B - License: open | Type: model - AI model by Microsoft,Peking University - **ruGPT-3.5 13B** (Sber) — 2023-04-24 | Parameters: 13B - License: open | Type: model - AI model by Sber - **GigaChat** (Sber) — 2023-04-24 - License: closed | Type: model - AI model by Sber - **AlphaLink** (Technische Universitat Berlin,Research Cluster of Excellence,University of Edinburgh) — 2023-04-20 | Parameters: 93M - License: open | Type: model - AI model by Technische Universitat Berlin,Research Cluster of Excellence,University of Edinburgh - **MOSS-Moon-003** (Fudan University) — 2023-04-19 | Parameters: 16B - License: open | Type: model - AI model by Fudan University - **Claude 1.3** (Anthropic) — 2023-04-18 - License: closed | Type: model - AI model by Anthropic - **AiLMe-100B v3** (Qilin Hesheng Network Technology Co., Ltd. (APUS)) — 2023-04-18 | Parameters: 100B - License: closed | Type: model - AI model by Qilin Hesheng Network Technology Co., Ltd. (APUS) - **LLaVA** (University of Wisconsin Madison,Microsoft Research,Columbia University) — 2023-04-17 | Parameters: 13B - License: open | Type: model - AI model by University of Wisconsin Madison,Microsoft Research,Columbia University - **BELLE-LLaMA-EXT-7B** (KE Holdings Inc. (“Beike”)) — 2023-04-16 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **OpenCLIP ViT-H-14-378-quickgelu** (Massachusetts Institute of Technology (MIT)) — 2023-04-16 | Parameters: 986.7M - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Suno Bark Model** (Suno) — 2023-04-15 | Parameters: 300M - License: open | Type: model - AI model by Suno - **DINOv2** (Facebook AI Research,INRIA) — 2023-04-14 | Parameters: 1.1B - License: open | Type: model - AI model by Facebook AI Research,INRIA - **HuaTuo** (Harbin Institute of Technology) — 2023-04-14 | Parameters: 7B - License: open | Type: model - AI model by Harbin Institute of Technology - **Anthropic LM 52B** (Anthropic) — 2023-04-12 | Parameters: 52B - License: closed | Type: model - AI model by Anthropic - **Dolly 2.0-12b** (Databricks) — 2023-04-12 | Parameters: 12B - License: open | Type: model - AI model by Databricks - **SenseChat** (SenseTime) — 2023-04-10 | Parameters: 180B - License: closed | Type: model - AI model by SenseTime - **Incoder-6.7B** (Facebook AI Research,University of Washington,University of California (UC) Berkeley,Carnegie Mellon University (CMU),Toyota Technological Institute at Chicago) — 2023-04-09 | Parameters: 6.7B - License: open | Type: model - AI model by Facebook AI Research,University of Washington,University of California (UC) Berkeley,Carnegie Mellon University (CMU),Toyota Technological Institute at Chicago - **gLM** (Harvard University) — 2023-04-08 | Parameters: 1B - License: open | Type: model - AI model by Harvard University - **DiffDock-PP** (Technical University of Munich,Massachusetts Institute of Technology (MIT)) — 2023-04-08 | Parameters: 1.6M - License: closed | Type: model - AI model by Technical University of Munich,Massachusetts Institute of Technology (MIT) - **EvoMIL** (University of Glasgow,Cancer Research UK Beatson Institute) — 2023-04-08 - License: closed | Type: model - AI model by University of Glasgow,Cancer Research UK Beatson Institute - **Cerebras-GPT-13B** (Cerebras Systems) — 2023-04-06 | Parameters: 13B - License: open | Type: model - AI model by Cerebras Systems - **Segment Anything Model** (Meta AI) — 2023-04-05 | Parameters: 636M - License: open | Type: model - AI model by Meta AI - **BELLE-7B-0.2M** (KE Holdings Inc. (“Beike”)) — 2023-04-05 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **BELLE-7B-1M** (KE Holdings Inc. (“Beike”)) — 2023-04-05 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **BELLE-7B-2M** (KE Holdings Inc. (“Beike”)) — 2023-04-05 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **EigenFold** (Massachusetts Institute of Technology (MIT)) — 2023-04-05 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Pythia-12b** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 12B - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-2.8b** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 2.8B - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-6.9b** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 6.9B - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-160m** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 160M - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-1b** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 1B - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-1.4b** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 1.4B - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-70m** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 70M - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **Pythia-410m** (EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam) — 2023-04-03 | Parameters: 410M - License: open | Type: model - AI model by EleutherAI,Booz Allen Hamilton, McLean,University of Cambridge,Indraprastha Institute of Information Technology Delhi,Stability AI,datasaur.ai,University of Amsterdam - **MahLool** (University of Rochester) — 2023-04-03 - License: closed | Type: model - AI model by University of Rochester - **Titan** (Amazon) — 2023-04-01 | Parameters: Titan - License: open | Type: model - No official information at all. 2nd hand via Jack Clark: https://importai.substack.com/p/import-ai-365-wmd-benchmark-amazon '$65m training run. Specifically, they trained a 200B dense model on 4T tokens of data across 13,760 NVIDIA A100 chips (using 1,720 P4d nodes). It took 48 days to train.' - **WizardLM** (Microsoft) — 2023-04-01 | Parameters: WizardLM - License: open | Type: model - LLaMA 7B self-instructed fine-tune. - **MPT** (MosaicML) — 2023-04-01 | Parameters: MPT - License: open | Type: model - More 1B models coming with different datasets. Many more. - **StableLM** (Stability AI) — 2023-04-01 | Parameters: StableLM - License: open | Type: model - contains 1.5 trillion tokens, roughly 3x the size of The Pile. These models will be trained on up to 1.5 trillion tokens. The context length for these models is 4096 tokens. - **Dolly 2.0** (Databricks) — 2023-04-01 | Parameters: Dolly 2.0 - License: open | Type: model - Fine-tuned Pythia 12B - **Pythia** (EleutherAI) — 2023-04-01 | Parameters: Pythia - License: open | Type: model - - **Koala-13B** (Berkeley) — 2023-04-01 | Parameters: Koala-13B - License: open | Type: model - LLaMA base. Academic licence only. - **STEPS** (McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Baidu Research - Silicon Valley AI Lab,Baidu,Boston Consulting Group X) — 2023-04-01 - License: closed | Type: model - AI model by McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),Baidu Research - Silicon Valley AI Lab,Baidu,Boston Consulting Group X - **Yiye Qingzhou** (EFFYIC (识因智能)) — 2023-04-01 | Parameters: 540B - License: closed | Type: model - AI model by EFFYIC (识因智能) - **Vicuna-13B v0** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-03-30 | Parameters: 13B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **BloombergGPT** (Bloomberg,Johns Hopkins University) — 2023-03-30 | Parameters: 50.6B - License: closed | Type: model - AI model by Bloomberg,Johns Hopkins University - **Vicuna-7B v0** (Large Model Systems Organization,University of California (UC) Berkeley) — 2023-03-30 | Parameters: 7B - License: open | Type: model - AI model by Large Model Systems Organization,University of California (UC) Berkeley - **VideoMAE V2** (Nanjing University,Shenzhen Institute of Advanced Technology,Shanghai AI Lab) — 2023-03-29 | Parameters: 1B - License: open | Type: model - AI model by Nanjing University,Shenzhen Institute of Advanced Technology,Shanghai AI Lab - **ERNIE-ViLG 2.0** (Baidu,Wuhan University of Science and Technology) — 2023-03-28 | Parameters: 24B - License: closed | Type: model - AI model by Baidu,Wuhan University of Science and Technology - **SigLIP 400M** (Google DeepMind) — 2023-03-27 | Parameters: 400M - License: open | Type: model - AI model by Google DeepMind - **SigLiT** (Google DeepMind) — 2023-03-27 - License: open | Type: model - AI model by Google DeepMind - **EVA-CLIP (EVA-02-CLIP-E/14+)** (Beijing Academy of Artificial Intelligence / BAAI,Huazhong University of Science and Technology) — 2023-03-27 | Parameters: 5B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI,Huazhong University of Science and Technology - **BELLE-7B-0.6M** (KE Holdings Inc. (“Beike”)) — 2023-03-26 | Parameters: 7B - License: open | Type: model - AI model by KE Holdings Inc. (“Beike”) - **CPM-Bee** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,ModelBest,OpenBMB (Open Lab for Big Model Base)) — 2023-03-24 | Parameters: 10B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,ModelBest,OpenBMB (Open Lab for Big Model Base) - **minChatGPT** (Stanford University) — 2023-03-24 | Parameters: 1.6B - License: open | Type: model - AI model by Stanford University - **Lightweight Fine-tuning a Pretrained Protein Language Model for Protein Secondary** (Henan University) — 2023-03-23 - License: closed | Type: model - AI model by Henan University - **Sparse Wide GPT-3 Small** (Cerebras Systems) — 2023-03-21 | Parameters: 1.3B - License: closed | Type: model - AI model by Cerebras Systems - **LightOn Mini** (LightOn) — 2023-03-21 | Parameters: 40B - License: closed | Type: model - AI model by LightOn - **Firefly** (Adobe) — 2023-03-21 - License: closed | Type: model - AI model by Adobe - **Gen-2** (Runway) — 2023-03-20 - License: closed | Type: model - AI model by Runway - **PanGu-Σ** (Huawei Noah's Ark Lab) — 2023-03-20 | Parameters: 1.1T - License: closed | Type: model - AI model by Huawei Noah's Ark Lab - **GPT-4 (Mar 2023)** (OpenAI) — 2023-03-15 | Parameters: 1.8T - License: closed | Type: model - AI model by OpenAI - **Falcon-40B** (Technology Innovation Institute) — 2023-03-15 | Parameters: 40B - License: open | Type: model - AI model by Technology Innovation Institute - **LEP-AD** (King Abdullah University of Science and Technology (KAUST),Karolinska Institute) — 2023-03-15 | Parameters: 3.0B - License: closed | Type: model - AI model by King Abdullah University of Science and Technology (KAUST),Karolinska Institute - **RIO** (West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司) — 2023-03-15 | Parameters: 100B - License: closed | Type: model - AI model by West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司 - **GPT-4 (Jun 2023)** (OpenAI) — 2023-03-15 | Parameters: 1.8T - License: closed | Type: model - AI model by OpenAI - **Claude** (Anthropic) — 2023-03-14 - License: closed | Type: model - AI model by Anthropic - **Alpaca** (Stanford University) — 2023-03-13 | Parameters: 7B - License: open | Type: model - AI model by Stanford University - **Pythia-Chat-Base-7B-v0.16** (Together) — 2023-03-08 | Parameters: 7B - License: open | Type: model - AI model by Together - **VALL-E X** (Microsoft) — 2023-03-07 | Parameters: 700M - License: closed | Type: model - AI model by Microsoft - **PaLM-E** (Google,TU Berlin) — 2023-03-06 | Parameters: 562B - License: closed | Type: model - AI model by Google,TU Berlin - **Uni-Mol Molecular Model** (Renmin University of China,DP Technology,AI for Science Institute, Beijing (AISI)) — 2023-03-06 - License: open | Type: model - AI model by Renmin University of China,DP Technology,AI for Science Institute, Beijing (AISI) - **AudioGen** (Meta AI,Hebrew University of Jerusalem) — 2023-03-05 | Parameters: 1B - License: open | Type: model - AI model by Meta AI,Hebrew University of Jerusalem - **Flan UL2** (Google Brain) — 2023-03-03 | Parameters: 19.5B - License: open | Type: model - AI model by Google Brain - **DiT-XL/2** (New York University (NYU),University of California (UC) Berkeley) — 2023-03-02 | Parameters: 675M - License: open | Type: model - AI model by New York University (NYU),University of California (UC) Berkeley - **Consistency Model (CIFAR-10)** (OpenAI) — 2023-03-02 - License: closed | Type: model - AI model by OpenAI - **C1.2** (Character.ai) — 2023-03-01 | Parameters: C1.2 - License: open | Type: model - No details released. - **BloombergGPT** (Bloomberg) — 2023-03-01 | Parameters: BloombergGPT - License: closed | Type: model - Video: https://youtu.be/m2Scj2SO85Y Underperforms GPT-3, based on BLOOM. Tokens: 'We select a model size motivated by Hoffmann et al. (2022) and train a 50 billion parameter model on 569 billion tokens from our corpus of over 700 billion tokens to produce a model that is competitive with larger models.' - **OpenFlamingo-9B** (LAION) — 2023-03-01 | Parameters: OpenFlamingo-9B - License: open | Type: model - Uses LLaMA-7B. Demo: https://7164d2142d11.ngrok.app/ - **GPT4All-LoRa** (Nomic) — 2023-03-01 | Parameters: GPT4All-LoRa - License: open | Type: model - chatbot trained on ~800k GPT-3.5-Turbo Generations based on LLaMa - **Cerebras-GPT** (Cerebras) — 2023-03-01 | Parameters: Cerebras-GPT - License: open | Type: model - 20:1 tokens to parameters as per https://lifearchitect.ai/chinchilla/ - **PanGu-Sigma** (Huawei) — 2023-03-01 | Parameters: PanGu-Sigma - License: closed | Type: model - Sparse. 1.085T parameters named PanGu-Σ. - **CoLT5** (Google) — 2023-03-01 | Parameters: CoLT5 - License: closed | Type: model - up to 64k context window [48k words or about 96 pages -Alan] - **Med-PaLM 2** (Google DeepMind) — 2023-03-01 | Parameters: Med-PaLM 2 - License: closed | Type: model - Recently, our next iteration, Med-PaLM 2, consistently performed at an “expert” doctor level on medical exam questions, scoring 85%. This is an 18% improvement from Med-PaLM’s previous performance and far surpasses similar AI models. - **GPT-4 Classic (gpt-4-0314 & gpt-4-0613, non-Turbo)** (OpenAI) — 2023-03-01 | Parameters: GPT-4 Classic (gpt-4-0314 & gpt-4-0613, non-Turbo) - License: open | Type: model - Original MMLU=86.4. MMLU=90.1 with prompting. Proto-AGI. 1.76T parameters MoE. - **Alpaca** (Stanford) — 2023-03-01 | Parameters: Alpaca - License: open | Type: model - Stanford Alpaca: An Instruction-following LLaMA model' - **Jurassic-2** (AI21) — 2023-03-01 | Parameters: Jurassic-2 - License: open | Type: model - - **GPT-NeoX-Chat-Base-20B** (Together) — 2023-03-01 | Parameters: GPT-NeoX-Chat-Base-20B - License: open | Type: model - instruction-tuned 20 billion parameter language model, a 6 billion parameter moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. It was trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai. ' - **Cohere Command** (Cohere) — 2023-03-01 | Parameters: 52B - License: closed | Type: model - AI model by Cohere - **Palmyra Large 20B** (Writer) — 2023-03-01 | Parameters: 20B - License: open | Type: model - AI model by Writer - **gpt-sw3-40b** (AI Sweden) — 2023-03-01 | Parameters: 40B - License: open | Type: model - AI model by AI Sweden - **MengziGPT-General-7B** (Langboat) — 2023-03-01 | Parameters: 7B - License: closed | Type: model - AI model by Langboat - **Kosmos-1** (Microsoft) — 2023-03-01 | Parameters: 1.6B - License: closed | Type: model - AI model by Microsoft - **CodeGen-Mono 16.1B** (Salesforce) — 2023-02-27 | Parameters: 16.1B - License: open | Type: model - AI model by Salesforce - **LLaMA-13B** (Meta AI) — 2023-02-27 | Parameters: 13B - License: open | Type: model - AI model by Meta AI - **LLaMA-33B** (Meta AI) — 2023-02-27 | Parameters: 32.5B - License: open | Type: model - AI model by Meta AI - **MsPBRsP** (Zhengzhou University) — 2023-02-27 - License: closed | Type: model - AI model by Zhengzhou University - **LLaMA-65B** (Meta AI) — 2023-02-24 | Parameters: 65.2B - License: open | Type: model - AI model by Meta AI - **LLaMA-7B** (Meta AI) — 2023-02-24 | Parameters: 6.7B - License: open | Type: model - AI model by Meta AI - **Hyena-3-slim** (Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2023-02-21 | Parameters: 125M - License: closed | Type: model - AI model by Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **Hyena 1.3B** (Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2023-02-21 | Parameters: 1.3B - License: closed | Type: model - AI model by Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **Hyena-2 355M** (Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2023-02-21 | Parameters: 355M - License: closed | Type: model - AI model by Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **Hyena-2 153M** (Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2023-02-21 | Parameters: 153M - License: open | Type: model - AI model by Stanford University,University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **Anthropic LM 175B** (Anthropic) — 2023-02-15 | Parameters: 175B - License: closed | Type: model - AI model by Anthropic - **BASIC-L + Lion** (Google,University of California Los Angeles (UCLA)) — 2023-02-13 | Parameters: 3.1B - License: closed | Type: model - AI model by Google,University of California Los Angeles (UCLA) - **ViT-22B** (Google) — 2023-02-10 | Parameters: 21.7B - License: closed | Type: model - AI model by Google - **ControlNet** (Stanford University) — 2023-02-10 - License: closed | Type: model - AI model by Stanford University - **ProteinDT** (University of California (UC) Berkeley,California Institute of Technology,University of Toronto,University of Wisconsin Madison,Texas A&M,NVIDIA,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2023-02-09 - License: open | Type: model - AI model by University of California (UC) Berkeley,California Institute of Technology,University of Toronto,University of Wisconsin Madison,Texas A&M,NVIDIA,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **ToolFormer** (Meta AI) — 2023-02-09 | Parameters: 6.7B - License: closed | Type: model - AI model by Meta AI - **Gen-1** (Runway) — 2023-02-06 - License: closed | Type: model - AI model by Runway - **ProteinSGM** (University of Toronto) — 2023-02-04 - License: closed | Type: model - AI model by University of Toronto - **CD-GraB (WT2)** (Cornell University) — 2023-02-02 - License: closed | Type: model - AI model by Cornell University - **Chemistry42** (Insilico Medicine AI) — 2023-02-02 - License: closed | Type: model - AI model by Insilico Medicine AI - **Kosmos-1** (Microsoft) — 2023-02-01 | Parameters: Kosmos-1 - License: closed | Type: model - Proto-AGI. Multimodal large language model (MLLM). Raven’s Progressive Matrices as real images, not digits as in testing of text-davinci-003 at https://lifearchitect.ai/ravens/ - **LLaMA-65B** (Meta AI) — 2023-02-01 | Parameters: LLaMA-65B - License: open | Type: model - Researchers only, noncommercial only. 'LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B.' - **MOSS** (Fudan University) — 2023-02-01 | Parameters: MOSS - License: open | Type: model - Major bandwidth issues: https://www.reuters.com/technology/china-fudan-university-team-apologises-after-chatgpt-style-platform-crashes-2023-02-21/ - **Palmyra** (Writer) — 2023-02-01 | Parameters: Palmyra - License: open | Type: model - Only up to 5B available open-source 'trained on over 300 billion tokens of text data, and the size of the resulting model is over 20 billion parameters. ' https://writer.com/product/cowrite/ - **Luminous Supreme Control** (Aleph Alpha) — 2023-02-01 | Parameters: Luminous Supreme Control - License: open | Type: model - ‘Control’ means instruction tuned - **Toolformer+Atlas 11B+NLLB 54B** (Meta AI) — 2023-02-01 | Parameters: Toolformer+Atlas 11B+NLLB 54B - License: closed | Type: model - Based on GPT-J 6.7B + access to other models via API - **Multimodal-CoT** (Amazon) — 2023-02-01 | Parameters: Multimodal-CoT - License: open | Type: model - Models <1B with vision CoT - **UniPi** (Google DeepMind,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,Georgia Institute of Technology,University of Alberta) — 2023-01-31 - License: open | Type: model - AI model by Google DeepMind,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,Georgia Institute of Technology,University of Alberta - **Flan T5-XXL + BLIP-2** (Salesforce Research) — 2023-01-30 | Parameters: 12.1B - License: open | Type: model - AI model by Salesforce Research - **BLIP-2 (Q-Former)** (Salesforce Research) — 2023-01-30 | Parameters: 1.5B - License: open | Type: model - AI model by Salesforce Research - **KeAP** (The University of Hong Kong,ByteDance,JancsiTech,OPPO HealthLab) — 2023-01-30 - License: closed | Type: model - AI model by The University of Hong Kong,ByteDance,JancsiTech,OPPO HealthLab - **Genie-SCOPe (bio)** (Columbia University) — 2023-01-29 | Parameters: 4.1M - License: open | Type: model - AI model by Columbia University - **RESP AI** (University of California San Diego) — 2023-01-28 - License: open | Type: model - AI model by University of California San Diego - **Protst** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Intel Labs,HEC Montreal,CIFAR AI Research) — 2023-01-28 - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Intel Labs,HEC Montreal,CIFAR AI Research - **DDPM-IP (CelebA)** (Utrecht University) — 2023-01-27 | Parameters: 295M - License: open | Type: model - AI model by Utrecht University - **MusicLM** (Google) — 2023-01-26 | Parameters: 860M - License: closed | Type: model - AI model by Google - **MoLFormer-XL** (IBM) — 2023-01-25 - License: open | Type: model - AI model by IBM - **GPT-2+Active-SGD (WT2)** (University of Montreal / Université de Montréal) — 2023-01-24 | Parameters: 124M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **mini-GPT-2+Active-AdamW** (HEC Montreal) — 2023-01-24 - License: closed | Type: model - AI model by HEC Montreal - **Operator** (OpenAI) — 2023-01-23 - License: closed | Type: model - AI model by OpenAI - **Adaptive Agent** (DeepMind) — 2023-01-18 | Parameters: 533M - License: closed | Type: model - AI model by DeepMind - **Ankh_large** (Technical University of Munich,Columbia University) — 2023-01-16 | Parameters: 1.9B - License: open | Type: model - AI model by Technical University of Munich,Columbia University - **Ankh_base** (Technical University of Munich,Columbia University) — 2023-01-16 | Parameters: 740M - License: open | Type: model - AI model by Technical University of Munich,Columbia University - **Eleven Monolingual v1** (ElevenLabs) — 2023-01-16 - License: closed | Type: model - AI model by ElevenLabs - **Nucleotide Transformer** (NVIDIA,Technical University of Munich,InstaDeep) — 2023-01-15 | Parameters: 2.5B - License: open | Type: model - AI model by NVIDIA,Technical University of Munich,InstaDeep - **DreamerV3** (DeepMind,University of Toronto) — 2023-01-10 | Parameters: 200M - License: closed | Type: model - AI model by DeepMind,University of Toronto - **SantaCoder** (Hugging Face,ServiceNow,Massachusetts Institute of Technology (MIT),Wellesley College,Saama,EleutherAI,Huawei Noah's Ark Lab,Carnegie Mellon University (CMU)) — 2023-01-09 | Parameters: 1.1B - License: open | Type: model - AI model by Hugging Face,ServiceNow,Massachusetts Institute of Technology (MIT),Wellesley College,Saama,EleutherAI,Huawei Noah's Ark Lab,Carnegie Mellon University (CMU) - **VALL-E** (Microsoft) — 2023-01-05 | Parameters: 353M - License: closed | Type: model - AI model by Microsoft - **SparseOPT-175B** (Institute of Science and Technology Austria (ISTA),Neural Magic) — 2023-01-02 | Parameters: 87.5B - License: closed | Type: model - AI model by Institute of Science and Technology Austria (ISTA),Neural Magic - **SparseOPT-66B** (Institute of Science and Technology Austria (ISTA)) — 2023-01-02 | Parameters: 66B - License: closed | Type: model - AI model by Institute of Science and Technology Austria (ISTA) - **SparseOPT-13B** (Institute of Science and Technology Austria (ISTA)) — 2023-01-02 | Parameters: 13B - License: closed | Type: model - AI model by Institute of Science and Technology Austria (ISTA) - **SparseOPT-30B** (Institute of Science and Technology Austria (ISTA)) — 2023-01-02 | Parameters: 30B - License: closed | Type: model - AI model by Institute of Science and Technology Austria (ISTA) - **DNA Fine-Tuned Language Model (DFLM)** (Tongji University) — 2023-01-02 - License: closed | Type: model - AI model by Tongji University - **FLAME** (Microsoft) — 2023-01-01 | Parameters: FLAME - License: closed | Type: model - T5 for Excel formulas, very small 60M params, "We start from a dataset of 927M formulas" estimate 10x multiplier for 9B tokens - **Hybrid H3-2.7B** (Stanford University,University at Buffalo) — 2022-12-28 | Parameters: 2.7B - License: open | Type: model - AI model by Stanford University,University at Buffalo - **Hybrid H3-125M** (Stanford University,University at Buffalo) — 2022-12-28 | Parameters: 125M - License: open | Type: model - AI model by Stanford University,University at Buffalo - **Hybrid H3-355M** (Stanford University,University at Buffalo) — 2022-12-28 | Parameters: 355M - License: open | Type: model - AI model by Stanford University,University at Buffalo - **Hybrid H3-1.3B** (Stanford University,University at Buffalo) — 2022-12-28 | Parameters: 1.3B - License: open | Type: model - AI model by Stanford University,University at Buffalo - **OPT-IML (175B)** (Meta AI) — 2022-12-22 | Parameters: 175B - License: open | Type: model - AI model by Meta AI - **CaLM** (University of Oxford) — 2022-12-19 | Parameters: 86M - License: closed | Type: model - AI model by University of Oxford - **text-embedding-ada-002** (OpenAI) — 2022-12-15 - License: closed | Type: model - AI model by OpenAI - **RT-1** (Google) — 2022-12-13 | Parameters: 35M - License: open | Type: model - AI model by Google - **XiaoAI 6.0** (Xiaomi Corp) — 2022-12-11 - License: closed | Type: model - AI model by Xiaomi Corp - **TranceptEve** (University of Oxford,Harvard Medical School) — 2022-12-10 - License: closed | Type: model - AI model by University of Oxford,Harvard Medical School - **Stable Diffusion 2.1** (Stability AI) — 2022-12-07 - License: open | Type: model - AI model by Stability AI - **Whisper v2** (OpenAI) — 2022-12-05 | Parameters: 1.6B - License: open | Type: model - AI model by OpenAI - **Vega v2** (Wuhan University,JD Explore Academy,Shanghai AI Lab,Nanyang Technological University,Washington University in St Louis,Chongqing University of Posts and Telecommunications,University of Sydney) — 2022-12-04 | Parameters: 6B - License: closed | Type: model - AI model by Wuhan University,JD Explore Academy,Shanghai AI Lab,Nanyang Technological University,Washington University in St Louis,Chongqing University of Posts and Telecommunications,University of Sydney - **Med-PaLM 1** (Google DeepMind) — 2022-12-01 | Parameters: Med-PaLM 1 - License: closed | Type: model - Collab between Google & DeepMind. Makes 1% less errors than humans - **OPT-IML** (Meta AI) — 2022-12-01 | Parameters: OPT-IML - License: open | Type: model - Instruct - **RL-CAI** (Anthropic) — 2022-12-01 | Parameters: RL-CAI - License: closed | Type: model - RLAIF=reinforcement learning with AI feedback - **ERNIE-Code** (Baidu) — 2022-12-01 | Parameters: ERNIE-Code - License: open | Type: model - - **RT-1** (Google) — 2022-12-01 | Parameters: RT-1 - License: closed | Type: model - - **Transformer + GFM** (Nanjing University) — 2022-12-01 | Parameters: 185.2M - License: closed | Type: model - AI model by Nanjing University - **ZymCTRL** (Basecamp Research,Friedrich-Alexander-Universität,University of Girona) — 2022-12-01 | Parameters: 738M - License: open | Type: model - AI model by Basecamp Research,Friedrich-Alexander-Universität,University of Girona - **DeepNash** (DeepMind) — 2022-12-01 - License: closed | Type: model - AI model by DeepMind - **ALM 1.0** (Beijing Academy of Artificial Intelligence / BAAI) — 2022-11-28 | Parameters: 335M - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **DiT-XL/2 + Discriminator Guidance** (Korea Advanced Institute of Science and Technology (KAIST),NAVER) — 2022-11-28 - License: closed | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST),NAVER - **Discriminator Guidance** (Korea Advanced Institute of Science and Technology (KAIST),NAVER) — 2022-11-28 - License: open | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST),NAVER - **CICERO** (Meta AI) — 2022-11-22 - License: open | Type: model - AI model by Meta AI - **CLUE** (Naver Clova,Naver AI Lab) — 2022-11-22 | Parameters: 160M - License: closed | Type: model - AI model by Naver Clova,Naver AI Lab - **scFormer** (University of Toronto,Vector Institute,University Health Network,Microsoft Research) — 2022-11-22 - License: closed | Type: model - AI model by University of Toronto,Vector Institute,University Health Network,Microsoft Research - **AR-LDM** (Alibaba,University of Waterloo,Vector Institute) — 2022-11-20 | Parameters: 1.5B - License: closed | Type: model - AI model by Alibaba,University of Waterloo,Vector Institute - **Fusion in Encoder** (Samsung) — 2022-11-18 | Parameters: 330M - License: closed | Type: model - AI model by Samsung - **Galactica** (Meta AI) — 2022-11-16 | Parameters: 120B - License: open | Type: model - AI model by Meta AI - **Luminous Sparse** (Aleph Alpha,Graphcore) — 2022-11-16 | Parameters: 2.6B - License: closed | Type: model - AI model by Aleph Alpha,Graphcore - **EVA-01** (Beijing Academy of Artificial Intelligence / BAAI,Huazhong University of Science and Technology,Zhejiang University (ZJU),Beijing Institute of Technology) — 2022-11-14 | Parameters: 1.0B - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI,Huazhong University of Science and Technology,Zhejiang University (ZJU),Beijing Institute of Technology - **AltCLIP_M9** (Beijing Academy of Artificial Intelligence / BAAI) — 2022-11-12 - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Innovative Drug-like Molecule Generation from** (University of Pittsburgh,Carnegie Mellon University (CMU)) — 2022-11-12 - License: closed | Type: model - AI model by University of Pittsburgh,Carnegie Mellon University (CMU) - **InternImage** (Shanghai AI Lab,Tsinghua University,Nanjing University,SenseTime,Chinese University of Hong Kong (CUHK)) — 2022-11-10 | Parameters: 1.1B - License: open | Type: model - AI model by Shanghai AI Lab,Tsinghua University,Nanjing University,SenseTime,Chinese University of Hong Kong (CUHK) - **Mogrifier RLSTM (WT2)** (DeepMind) — 2022-11-03 | Parameters: 35M - License: closed | Type: model - AI model by DeepMind - **Mogrifier RLSTM (PTB)** (DeepMind) — 2022-11-03 | Parameters: 24M - License: closed | Type: model - AI model by DeepMind - **BLOOMZ-176B** (Hugging Face) — 2022-11-03 | Parameters: 176B - License: open | Type: model - AI model by Hugging Face - **mT0-13B** (Hugging Face,BigScience) — 2022-11-03 | Parameters: 13B - License: open | Type: model - AI model by Hugging Face,BigScience - **eDiff-I** (NVIDIA) — 2022-11-02 | Parameters: 9.1B - License: closed | Type: model - AI model by NVIDIA - **ChatGPT (gpt-3.5-turbo)** (OpenAI) — 2022-11-01 | Parameters: ChatGPT (gpt-3.5-turbo) - License: open | Type: model - Instruct with strict policies ("extremely limited") - **text-davinci-003** (OpenAI) — 2022-11-01 | Parameters: text-davinci-003 - License: open | Type: model - - **GPT-JT** (Together) — 2022-11-01 | Parameters: GPT-JT - License: open | Type: model - - **RWKV-4** (RWKV) — 2022-11-01 | Parameters: RWKV-4 - License: open | Type: model - RWKV (pronounced RwaKuv) is an RNN: https://www.reddit.com/r/MachineLearning/comments/yxt8sa/r_rwkv4_7b_release_an_attentionfree_rnn_language/ - **Galactica** (Meta AI) — 2022-11-01 | Parameters: Galactica - License: open | Type: model - scientific only - **SED** (DeepMind) — 2022-11-01 | Parameters: SED - License: closed | Type: model - SED 420M (diffusion text model) - **mT0** (BigScience) — 2022-11-01 | Parameters: mT0 - License: open | Type: model - fine-tuned - **BLOOMZ** (BigScience) — 2022-11-01 | Parameters: BLOOMZ - License: open | Type: model - fine-tuned - **GearNet** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,University of Cambridge,IBM Research,HEC Montreal,CIFAR AI Research) — 2022-11-01 - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,University of Cambridge,IBM Research,HEC Montreal,CIFAR AI Research - **Taiyi-Stable Diffusion** (IDEA CCNL) — 2022-10-31 | Parameters: 1B - License: open | Type: model - AI model by IDEA CCNL - **Transformer-XL + PowerSGD + L-Greco** (Institute of Science and Technology Austria (ISTA),Neural Magic) — 2022-10-31 - License: closed | Type: model - AI model by Institute of Science and Technology Austria (ISTA),Neural Magic - **MIF-ST** (Microsoft Research,OpenBioML,University of Chicago) — 2022-10-26 - License: closed | Type: model - AI model by Microsoft Research,OpenBioML,University of Chicago - **XY-LENTXL** (Microsoft) — 2022-10-26 | Parameters: 2B - License: closed | Type: model - AI model by Microsoft - **Verbatim Memory Transformer (117M)** (Johns Hopkins University,New York University (NYU)) — 2022-10-24 | Parameters: 117M - License: closed | Type: model - AI model by Johns Hopkins University,New York University (NYU) - **Verbatim Memory Transformer (108M)** (Johns Hopkins University,New York University (NYU)) — 2022-10-24 | Parameters: 107.7M - License: closed | Type: model - AI model by Johns Hopkins University,New York University (NYU) - **EnCodec** (Meta AI) — 2022-10-24 - License: open | Type: model - AI model by Meta AI - **Tk-Instruct** (University of Washington,Arizona State University,Allen Institute for AI) — 2022-10-24 | Parameters: 11B - License: open | Type: model - AI model by University of Washington,Arizona State University,Allen Institute for AI - **DiffSBDD (CrossDocked)** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Cambridge,Cornell University,Chinese Academy of Mathematics and System Science,University of Rome,Microsoft Research,University of Oxford,AITHYRA Institute) — 2022-10-24 - License: open | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),University of Cambridge,Cornell University,Chinese Academy of Mathematics and System Science,University of Rome,Microsoft Research,University of Oxford,AITHYRA Institute - **U-PaLM (540B)** (Google) — 2022-10-20 | Parameters: 540B - License: closed | Type: model - AI model by Google - **LMSI-Palm** (Google,University of Illinois Urbana-Champaign (UIUC)) — 2022-10-20 | Parameters: 540B - License: closed | Type: model - AI model by Google,University of Illinois Urbana-Champaign (UIUC) - **Flan-PaLM 540B** (Google) — 2022-10-20 | Parameters: 540B - License: closed | Type: model - AI model by Google - **Flan-T5 11B** (Google) — 2022-10-20 | Parameters: 11B - License: open | Type: model - AI model by Google - **GPT-2 + Progressive LRD** (Huawei,Huawei Noah's Ark Lab) — 2022-10-12 | Parameters: 31M - License: closed | Type: model - AI model by Huawei,Huawei Noah's Ark Lab - **GenSLM** (University of Chicago,NVIDIA,Harvard University,Cerebras Systems,Technical University of Munich,California Institute of Technology) — 2022-10-11 | Parameters: 25B - License: open | Type: model - AI model by University of Chicago,NVIDIA,Harvard University,Cerebras Systems,Technical University of Munich,California Institute of Technology - **Instruct-GPT + Mind's Eye** (Google,Dartmouth College) — 2022-10-11 | Parameters: 176.5B - License: closed | Type: model - AI model by Google,Dartmouth College - **Diplodocus** (Meta AI,Massachusetts Institute of Technology (MIT)) — 2022-10-11 - License: open | Type: model - AI model by Meta AI,Massachusetts Institute of Technology (MIT) - **Decaying Fast Weights Transformer (WT-103)** (Jenni) — 2022-10-09 | Parameters: 242M - License: closed | Type: model - AI model by Jenni - **Imagen Video** (Google Brain) — 2022-10-05 | Parameters: 11.6B - License: closed | Type: model - AI model by Google Brain - **AlphaTensor** (DeepMind) — 2022-10-05 - License: closed | Type: model - AI model by DeepMind - **DiffDock** (Massachusetts Institute of Technology (MIT)) — 2022-10-04 | Parameters: 20.2M - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **NMST+GPT-2** (New York University (NYU)) — 2022-10-03 | Parameters: 124M - License: closed | Type: model - AI model by New York University (NYU) - **AminoBert** (Harvard Medical School,Nabla Bio,Columbia University) — 2022-10-03 - License: closed | Type: model - AI model by Harvard Medical School,Nabla Bio,Columbia University - **PACT** (Microsoft) — 2022-10-01 | Parameters: PACT - License: open | Type: model - Trained on ~5TB data, 2GB model download. 'In general we see an improvement in model performance as we increase the number of training tokens. Interestingly, larger models did not necessarily result in better performance for robot navigation. Even though larger models consistently presented better loss values for action prediction on a static dataset, (Fig. 7 b), when it comes to real-time deployment the larger network capacity introduces inference delays that become a disadvantage and lead to earlier crashes. For example, while LiDAR perception measurements arrive to the vehicle every 0.077s (13Hz), the largest model of 24 layers takes on average 0.023s for inference with a RTX3090 GPU, roughly 40% longer the 3 layer model (0.016s). These time differences can amount to even larger performance gaps in small embedded systems, and further emphasize the importance of multiple downstream task architectures sharing a common representation branch for real-time robotics applications.' - **Flan-T5** (Google) — 2022-10-01 | Parameters: Flan-T5 - License: open | Type: model - T5=1T tokens + LM-adapted T5 as 100B tokens - **Flan-PaLM** (Google) — 2022-10-01 | Parameters: Flan-PaLM - License: closed | Type: model - - **U-PaLM** (Google) — 2022-10-01 | Parameters: U-PaLM - License: closed | Type: model - - **VIMA** (NVIDIA) — 2022-10-01 | Parameters: VIMA - License: open | Type: model - - **GemNet-OC** (Technical University of Munich,Carnegie Mellon University (CMU),Facebook AI Research) — 2022-09-30 - License: open | Type: model - AI model by Technical University of Munich,Carnegie Mellon University (CMU),Facebook AI Research - **Make-A-Video** (Meta AI) — 2022-09-29 - License: closed | Type: model - AI model by Meta AI - **Sparrow** (DeepMind) — 2022-09-28 | Parameters: 70B - License: closed | Type: model - AI model by DeepMind - **CPM-Ant** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,ModelBest,OpenBMB (Open Lab for Big Model Base)) — 2022-09-22 | Parameters: 10B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,ModelBest,OpenBMB (Open Lab for Big Model Base) - **Whisper** (OpenAI) — 2022-09-21 | Parameters: 1.6B - License: open | Type: model - AI model by OpenAI - **DistilProtBert** (Bar-Ilan University) — 2022-09-18 | Parameters: 230M - License: open | Type: model - AI model by Bar-Ilan University - **CPAC** (Texas A&M) — 2022-09-18 - License: closed | Type: model - AI model by Texas A&M - **CLIP ViT-H/14 - LAION-2B** (LAION) — 2022-09-15 | Parameters: 986M - License: open | Type: model - AI model by LAION - **NeMO Megatron GPT 20B** (NVIDIA) — 2022-09-15 | Parameters: 20B - License: open | Type: model - AI model by NVIDIA - **ProteinMPNN** (University of Washington,Wageningen University and Research,NERSC, Lawrence Berkeley National Laboratory) — 2022-09-15 - License: open | Type: model - AI model by University of Washington,Wageningen University and Research,NERSC, Lawrence Berkeley National Laboratory - **PaLI** (Google) — 2022-09-14 | Parameters: 16.9B - License: closed | Type: model - AI model by Google - **Deep-LDA** (Fudan University) — 2022-09-14 - License: closed | Type: model - AI model by Fudan University - **SauTech** (Saudi Data and Artificial Intelligence Authority,Saudi Company for Artificial Intelligence) — 2022-09-14 - License: closed | Type: model - AI model by Saudi Data and Artificial Intelligence Authority,Saudi Company for Artificial Intelligence - **OpenChat** (Tsinghua) — 2022-09-01 | Parameters: OpenChat - License: open | Type: model - Llama 2 13B -> OpenChat 13B - **WeLM** (Wechat) — 2022-09-01 | Parameters: WeLM - License: open | Type: model - 13% English tokens and 87% Chinese - **CodeGeeX** (Tsinghua) — 2022-09-01 | Parameters: CodeGeeX - License: open | Type: model - - **Sparrow** (DeepMind) — 2022-09-01 | Parameters: Sparrow - License: closed | Type: model - Chatbot as a fine-tuned version of Chinchilla 70B - **PaLI** (Google) — 2022-09-01 | Parameters: PaLI - License: closed | Type: model - PaLM Vision model, new datasets of 10B multilingual text-image pairs - **NeMo Megatron-GPT 20B** (NVIDIA) — 2022-09-01 | Parameters: NeMo Megatron-GPT 20B - License: open | Type: model - - **BEIT-3** (Microsoft) — 2022-08-22 | Parameters: 1.9B - License: open | Type: model - AI model by Microsoft - **Stable Diffusion 1.5** (Runway) — 2022-08-22 - License: open | Type: model - AI model by Runway - **Stable Diffusion 1.2** (Ludwig Maximilian University of Munich) — 2022-08-22 - License: open | Type: model - AI model by Ludwig Maximilian University of Munich - **Stable Diffusion 1.4** (Ludwig Maximilian University of Munich) — 2022-08-22 - License: open | Type: model - AI model by Ludwig Maximilian University of Munich - **Stable Diffusion 1.1** (Ludwig Maximilian University of Munich) — 2022-08-22 - License: open | Type: model - AI model by Ludwig Maximilian University of Munich - **PaLM-SayCan** (Google) — 2022-08-16 | Parameters: 540B - License: closed | Type: model - AI model by Google - **Luminous-supreme** (Aleph Alpha) — 2022-08-15 | Parameters: 70B - License: closed | Type: model - AI model by Aleph Alpha - **Luminous-extended** (Aleph Alpha) — 2022-08-15 | Parameters: 30B - License: closed | Type: model - AI model by Aleph Alpha - **Luminous-base** (Aleph Alpha) — 2022-08-15 | Parameters: 13B - License: closed | Type: model - AI model by Aleph Alpha - **Dream Diary (造梦日记)** (West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司) — 2022-08-15 - License: closed | Type: model - AI model by West Lake Xinchen / Xinchen AI / 西湖心辰(杭州)科技有限公司 - **PeTriBERT** (University of Montpellier,BionomeeX) — 2022-08-13 | Parameters: 40M - License: closed | Type: model - AI model by University of Montpellier,BionomeeX - **M3GNet** (University of California San Diego) — 2022-08-11 - License: open | Type: model - AI model by University of California San Diego - **BlenderBot 3** (McGill University,Meta AI,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2022-08-10 | Parameters: 175B - License: open | Type: model - AI model by McGill University,Meta AI,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **RNA-FM** (Chinese University of Hong Kong (CUHK),Fudan University,Shanghai AI Lab,Harbin Institute of Technology,University of Electronic Science and Technology of China,Massachusetts Institute of Technology (MIT),Harvard University,Shanghai Zelixir Biotech,CUHK Shenzhen Research Institute) — 2022-08-08 - License: open | Type: model - AI model by Chinese University of Hong Kong (CUHK),Fudan University,Shanghai AI Lab,Harbin Institute of Technology,University of Electronic Science and Technology of China,Massachusetts Institute of Technology (MIT),Harvard University,Shanghai Zelixir Biotech,CUHK Shenzhen Research Institute - **FastSpeech 2** (Zhejiang University (ZJU),Microsoft Research Asia) — 2022-08-08 | Parameters: 27M - License: closed | Type: model - AI model by Zhejiang University (ZJU),Microsoft Research Asia - **SGPT BE 5.8B** (Peking University) — 2022-08-05 | Parameters: 5.8B - License: open | Type: model - AI model by Peking University - **GLM-130B** (Tsinghua University) — 2022-08-04 | Parameters: 130B - License: open | Type: model - AI model by Tsinghua University - **AlexaTM 20B** (Amazon) — 2022-08-02 | Parameters: 19.8B - License: closed | Type: model - AI model by Amazon - **Z-Code++** (Microsoft) — 2022-08-01 | Parameters: Z-Code++ - License: closed | Type: model - abstractive text summarization, 710M, outperforms PaLM 540B. "Due to the limited computational resource, Z-Code++LARGE is trained with only 500B tokens instead of 1T tokens as that for mT5 training." - **Atlas** (Meta AI) — 2022-08-01 | Parameters: Atlas - License: open | Type: model - - **BlenderBot 3** (Meta AI) — 2022-08-01 | Parameters: BlenderBot 3 - License: open | Type: model - - **GLM-130B** (Tsinghua) — 2022-08-01 | Parameters: GLM-130B - License: open | Type: model - 50% English (200B tokens), so included here - **AlexaTM 20B** (Amazon) — 2022-08-01 | Parameters: AlexaTM 20B - License: open | Type: model - Wikipedia and mC4 only. seq2seq - **ProtGPT2** (University of Bayreuth) — 2022-07-27 | Parameters: 738M - License: open | Type: model - AI model by University of Bayreuth - **GPT-NeoX-Japanese** (Abeja) — 2022-07-27 | Parameters: 2.7B - License: open | Type: model - AI model by Abeja - **OmegaPLM** (Massachusetts Institute of Technology (MIT),Westlake University) — 2022-07-22 | Parameters: 670M - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Westlake University - **ESM2-15B** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 15B - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **ESM2-3B** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 3B - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **ESM2-650M** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 650M - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **ESM2-150M** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 150M - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **ESM2-35M** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 35M - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **ESM2-8M** (Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT)) — 2022-07-21 | Parameters: 8M - License: open | Type: model - AI model by Meta AI,New York University (NYU),Stanford University,Massachusetts Institute of Technology (MIT) - **YuYan 11B** (Hong Kong Baptist University,NetEase) — 2022-07-15 | Parameters: 11B - License: closed | Type: model - AI model by Hong Kong Baptist University,NetEase - **Transformer-XL + RMT** (Moscow Institute of Physics and Technology,AIRI Artificial Intelligence Research Institute) — 2022-07-14 | Parameters: 247.0M - License: closed | Type: model - AI model by Moscow Institute of Physics and Technology,AIRI Artificial Intelligence Research Institute - **Rita-XLarge** (LightOn,Harvard University,University of Oxford) — 2022-07-14 | Parameters: 1.2B - License: open | Type: model - AI model by LightOn,Harvard University,University of Oxford - **Delphi** (Allen Institute for AI,University of Washington) — 2022-07-12 | Parameters: 11B - License: closed | Type: model - AI model by Allen Institute for AI,University of Washington - **BLOOM-176B** (Hugging Face,BigScience) — 2022-07-11 | Parameters: 176.2B - License: open | Type: model - AI model by Hugging Face,BigScience - **NLLB** (Meta AI) — 2022-07-06 | Parameters: 54.5B - License: open | Type: model - AI model by Meta AI - **BLOOM-7.1B** (Hugging Face,BigScience) — 2022-07-05 | Parameters: 7.1B - License: open | Type: model - AI model by Hugging Face,BigScience - **CodeT5-large** (Salesforce) — 2022-07-05 | Parameters: 770M - License: open | Type: model - AI model by Salesforce - **BLOOM-1.7B** (Hugging Face,BigScience) — 2022-07-05 | Parameters: 1.7B - License: open | Type: model - AI model by Hugging Face,BigScience - **BLOOM-560M** (Hugging Face,BigScience) — 2022-07-05 | Parameters: 560M - License: open | Type: model - AI model by Hugging Face,BigScience - **BLOOM-1B** (Hugging Face,BigScience) — 2022-07-05 | Parameters: 1B - License: open | Type: model - AI model by Hugging Face,BigScience - **BLOOM-3B** (Hugging Face,BigScience) — 2022-07-05 | Parameters: 3B - License: open | Type: model - AI model by Hugging Face,BigScience - **6.9B FIM** (OpenAI) — 2022-07-01 | Parameters: 6.9B FIM - License: closed | Type: model - Several models: 8 sizes, NLP, Code, FIM/non-FIM. 100B tokens for 6.9B params... beyond chinchilla - **‘monorepo-Transformer’** (Google) — 2022-07-01 | Parameters: ‘monorepo-Transformer’ - License: closed | Type: model - Unnamed. Writes >3% of internal google code. - **PanGu-Coder** (Huawei) — 2022-07-01 | Parameters: PanGu-Coder - License: closed | Type: model - Python via GH - **NLLB** (Meta AI) — 2022-07-01 | Parameters: NLLB - License: open | Type: model - 54.5B MOE, 3.3B dense. 200+ languages - **J-1 RBG** (AI21) — 2022-07-01 | Parameters: J-1 RBG - License: open | Type: model - J-1 fine-tuned with RBG law corpus - **BLOOM (tr11-176B-ml)** (BigScience) — 2022-07-01 | Parameters: BLOOM (tr11-176B-ml) - License: open | Type: model - - **WebGPT** (OpenAI) — 2022-07-01 | Parameters: 175B - License: closed | Type: model - AI model by OpenAI - **Drahim PFM AI** (Drahim) — 2022-06-30 - License: closed | Type: model - AI model by Drahim - **Minerva (540B)** (Google) — 2022-06-29 | Parameters: 540.4B - License: closed | Type: model - AI model by Google - **DALL-E mega** (Craiyon) — 2022-06-28 - License: open | Type: model - AI model by Craiyon - **ProGen2-xlarge** (Salesforce Research,Columbia University,Johns Hopkins University) — 2022-06-27 | Parameters: 6.4B - License: open | Type: model - AI model by Salesforce Research,Columbia University,Johns Hopkins University - **ProGen2-base** (Salesforce Research,Columbia University,Johns Hopkins University) — 2022-06-27 | Parameters: 764M - License: open | Type: model - AI model by Salesforce Research,Columbia University,Johns Hopkins University - **GPT-SW3** (AI Sweden,RISE) — 2022-06-25 | Parameters: 3.5B - License: closed | Type: model - AI model by AI Sweden,RISE - **CodeWhisperer** (Amazon) — 2022-06-24 - License: closed | Type: model - AI model by Amazon - **YaLM** (Yandex) — 2022-06-23 | Parameters: 100B - License: open | Type: model - AI model by Yandex - **Parti** (Google Research) — 2022-06-22 | Parameters: 20B - License: closed | Type: model - AI model by Google Research - **CodeGeeX** (Z.ai (Zhipu AI),Tsinghua University) — 2022-06-22 | Parameters: 13B - License: open | Type: model - AI model by Z.ai (Zhipu AI),Tsinghua University - **OPT-2.7B (finetuned on PTB)** (Meta AI) — 2022-06-21 | Parameters: 2.7B - License: open | Type: model - AI model by Meta AI - **OPT-1.3B** (Meta AI) — 2022-06-21 | Parameters: 1.3B - License: open | Type: model - AI model by Meta AI - **OPT-1.3B (finetuned on PTB)** (Meta AI) — 2022-06-21 | Parameters: 1.3B - License: open | Type: model - AI model by Meta AI - **OPT-2.7B (finetuned on WT2)** (Meta AI) — 2022-06-21 | Parameters: 2.7B - License: open | Type: model - AI model by Meta AI - **OPT-125M (finetuned)** (Meta AI) — 2022-06-21 | Parameters: 125M - License: open | Type: model - AI model by Meta AI - **OPT-6.7B** (Meta AI) — 2022-06-21 | Parameters: 6.7B - License: open | Type: model - AI model by Meta AI - **OPT-66B** (Meta AI) — 2022-06-21 | Parameters: 66B - License: open | Type: model - AI model by Meta AI - **OPT-350M** (Meta AI) — 2022-06-21 | Parameters: 350M - License: open | Type: model - AI model by Meta AI - **OPT-2.7B** (Meta AI) — 2022-06-21 | Parameters: 2.7B - License: open | Type: model - AI model by Meta AI - **OPT-125M (finetuned on PTB)** (Meta AI) — 2022-06-21 | Parameters: 125M - License: open | Type: model - AI model by Meta AI - **OPT-30B** (Meta AI) — 2022-06-21 | Parameters: 30B - License: open | Type: model - AI model by Meta AI - **OPT-1.3B (finetuned)** (Meta AI) — 2022-06-21 | Parameters: 1.3B - License: open | Type: model - AI model by Meta AI - **Unified-IO (XL)** (Allen Institute for AI,University of Washington) — 2022-06-17 | Parameters: 2.9B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **CoCa** (Google Research) — 2022-06-14 | Parameters: 2.1B - License: closed | Type: model - AI model by Google Research - **MetaLM** (Microsoft Research) — 2022-06-13 - License: closed | Type: model - AI model by Microsoft Research - **EGRU (WT2)** (Ruhr University Bochum,Technische Universität Dresden,University of London) — 2022-06-13 | Parameters: 74M - License: closed | Type: model - AI model by Ruhr University Bochum,Technische Universität Dresden,University of London - **EGRU (PTB)** (Ruhr University Bochum,Technische Universität Dresden,University of London) — 2022-06-13 | Parameters: 55M - License: closed | Type: model - AI model by Ruhr University Bochum,Technische Universität Dresden,University of London - **BIG-G 137B** (Google) — 2022-06-09 | Parameters: 137B - License: closed | Type: model - AI model by Google - **LIMoE-H/14** (Google) — 2022-06-06 | Parameters: 5.6B - License: closed | Type: model - AI model by Google - **DITTO** (Tsinghua University,Apple,Westlake University,Chinese University of Hong Kong (CUHK)) — 2022-06-06 | Parameters: 750M - License: closed | Type: model - AI model by Tsinghua University,Apple,Westlake University,Chinese University of Hong Kong (CUHK) - **Diffusion-GAN** (UT Austin,Microsoft) — 2022-06-05 - License: open | Type: model - AI model by UT Austin,Microsoft - **Minerva** (Google) — 2022-06-01 | Parameters: Minerva - License: closed | Type: model - PaLM finetuned on LaTeX/arXiv maths - **GODEL-XL** (Microsoft) — 2022-06-01 | Parameters: GODEL-XL - License: open | Type: model - XL: GPT-3 175B in paper, GPT-J 2.7B released - **YaLM 100B** (Yandex) — 2022-06-01 | Parameters: YaLM 100B - License: open | Type: model - Megatron-LM clone, Russian/English: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6 - **Unified-IO** (Allen AI) — 2022-06-01 | Parameters: Unified-IO - License: closed | Type: model - Based on T5. Demo only - **Perceiver AR** (DeepMind) — 2022-06-01 | Parameters: Perceiver AR - License: closed | Type: model - Context window=100,000. Params=364m wiki, 975M pg-19, 826M books, music=?, imagenet=770M, - **LIMoE** (Google) — 2022-06-01 | Parameters: LIMoE - License: closed | Type: model - - **GPT-4chan** (Independent) — 2022-06-01 | Parameters: GPT-4chan - License: open | Type: model - Warning for inappropriate content. GPT-J. - **B2T connection (16L)** (LINE Corporation,Tohoku University) — 2022-06-01 - License: open | Type: model - AI model by LINE Corporation,Tohoku University - **CRL** (Ulm University) — 2022-05-31 - License: closed | Type: model - AI model by Ulm University - **PFP** (Preferred Networks Inc) — 2022-05-30 - License: closed | Type: model - AI model by Preferred Networks Inc - **CogVideo** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI) — 2022-05-29 | Parameters: 9.4B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI - **Tranception** (University of Oxford,Harvard Medical School,Cohere) — 2022-05-27 | Parameters: 700M - License: open | Type: model - AI model by University of Oxford,Harvard Medical School,Cohere - **GPT-2 Medium (FlashAttention)** (Stanford University,University at Buffalo) — 2022-05-27 | Parameters: 355M - License: open | Type: model - AI model by Stanford University,University at Buffalo - **TRIMELMext (247M)** (Princeton University) — 2022-05-25 | Parameters: 247.0M - License: open | Type: model - AI model by Princeton University - **TRIMELMext (7M)** (Princeton University) — 2022-05-25 | Parameters: 7M - License: open | Type: model - AI model by Princeton University - **TRIMELMlong (150M)** (Princeton University) — 2022-05-25 | Parameters: 150M - License: open | Type: model - AI model by Princeton University - **Imagen** (Google Brain) — 2022-05-23 | Parameters: 7.8B - License: closed | Type: model - AI model by Google Brain - **improved U-Net for chest X-ray images segmentation** (Henan University of Technology,Nanyang Central Hospital) — 2022-05-23 - License: closed | Type: model - AI model by Henan University of Technology,Nanyang Central Hospital - **LSTM+GraB** (Cornell University) — 2022-05-22 - License: closed | Type: model - AI model by Cornell University - **SimCSE** (Princeton University,Tsinghua University) — 2022-05-18 - License: open | Type: model - AI model by Princeton University,Tsinghua University - **Gato** (DeepMind) — 2022-05-12 | Parameters: 1.2B - License: closed | Type: model - AI model by DeepMind - **UL2** (Google Research,Google Brain) — 2022-05-10 | Parameters: 20B - License: open | Type: model - AI model by Google Research,Google Brain - **ASE** (NVIDIA,University of California (UC) Berkeley) — 2022-05-05 - License: closed | Type: model - AI model by NVIDIA,University of California (UC) Berkeley - **StyleGAN-XL** (Max Planck Institute for Intelligent Systems,University of Tübingen) — 2022-05-05 - License: open | Type: model - AI model by Max Planck Institute for Intelligent Systems,University of Tübingen - **DeBERTaV3large + KEAR** (Microsoft) — 2022-05-04 | Parameters: 418M - License: closed | Type: model - AI model by Microsoft - **OPT-175B** (Meta AI) — 2022-05-02 | Parameters: 175B - License: open | Type: model - AI model by Meta AI - **OPT-13B** (Meta AI) — 2022-05-02 | Parameters: 13B - License: closed | Type: model - AI model by Meta AI - **Diffusion-LM** (Stanford) — 2022-05-01 | Parameters: Diffusion-LM - License: open | Type: model - GPT-J with synthetic data - **UL2 20B** (Google) — 2022-05-01 | Parameters: UL2 20B - License: closed | Type: model - Unifying Language model. C4 only. - **Gato (Cat)** (DeepMind) — 2022-05-01 | Parameters: Gato (Cat) - License: closed | Type: model - Proto-AGI. Generalist agent (LLM, VLM, robot) - **LaMDA 2** (Google) — 2022-05-01 | Parameters: LaMDA 2 - License: partial | Type: model - Chatbot with tiny walled garden demo TBA - **OPT-175B** (Meta AI) — 2022-05-01 | Parameters: OPT-175B - License: open | Type: model - Only 30B available (Jun/2022) - **Flamingo** (DeepMind) — 2022-04-29 | Parameters: 80B - License: closed | Type: model - AI model by DeepMind - **CogView2** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI) — 2022-04-28 | Parameters: 6B - License: closed | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI - **GraphBP** (Texas A&M,Fujitsu) — 2022-04-19 - License: closed | Type: model - AI model by Texas A&M,Fujitsu - **Sparse all-MLP** (Meta AI) — 2022-04-14 | Parameters: 9.4B - License: closed | Type: model - AI model by Meta AI - **XMC-GAN** (Google Research) — 2022-04-14 - License: closed | Type: model - AI model by Google Research - **Stable Diffusion (LDM-KL-8-G)** (Runway,Ludwig Maximilian University of Munich,Heidelberg University) — 2022-04-13 | Parameters: 1.4B - License: open | Type: model - AI model by Runway,Ludwig Maximilian University of Munich,Heidelberg University - **STT Conformer-Transducer XL** (NVIDIA) — 2022-04-12 | Parameters: 600M - License: open | Type: model - AI model by NVIDIA - **VLM-4** (LightOn) — 2022-04-12 - License: closed | Type: model - AI model by LightOn - **BERT-RBP** (Waseda University) — 2022-04-07 | Parameters: 110M - License: open | Type: model - AI model by Waseda University - **DALL·E 2** (OpenAI) — 2022-04-06 | Parameters: 3.5B - License: closed | Type: model - AI model by OpenAI - **PaLM (540B)** (Google Research) — 2022-04-04 | Parameters: 540.4B - License: closed | Type: model - AI model by Google Research - **Tk-Instruct** (Hugging Face) — 2022-04-01 | Parameters: Tk-Instruct - License: open | Type: model - Based on T5. - **InCoder** (Meta AI) — 2022-04-01 | Parameters: InCoder - License: open | Type: model - Python and JavaScript - **NOOR** (TII) — 2022-04-01 | Parameters: NOOR - License: closed | Type: model - Arabic. "World’s largest high-quality cross-domain Arabic dataset, combining web data with books, poetry, news articles, and technical information" - **mGPT** (Sber) — 2022-04-01 | Parameters: mGPT - License: partial | Type: model - 60 languages. Only 1.3B model available - **PaLM-Coder** (Google) — 2022-04-01 | Parameters: PaLM-Coder - License: closed | Type: model - - **PaLM** (Google) — 2022-04-01 | Parameters: PaLM - License: closed | Type: model - - **Monarch-GPT-2-Medium** (Stanford University,University at Buffalo,University of Michigan) — 2022-04-01 | Parameters: 165M - License: closed | Type: model - AI model by Stanford University,University at Buffalo,University of Michigan - **Monarch-GPT-2-Small** (Stanford University,University at Buffalo,University of Michigan) — 2022-04-01 | Parameters: 72M - License: closed | Type: model - AI model by Stanford University,University at Buffalo,University of Michigan - **NoPos** (Tel Aviv University,University of Washington,Intel Labs,Meta AI) — 2022-03-30 | Parameters: 1.3B - License: closed | Type: model - AI model by Tel Aviv University,University of Washington,Intel Labs,Meta AI - **Chinchilla** (DeepMind) — 2022-03-29 | Parameters: 70B - License: closed | Type: model - AI model by DeepMind - **GraSR** (Shanghai Jiao Tong University,Ministry of Education of China) — 2022-03-24 - License: open | Type: model - AI model by Shanghai Jiao Tong University,Ministry of Education of China - **Make-A-Scene** (Meta AI) — 2022-03-24 | Parameters: 4B - License: closed | Type: model - AI model by Meta AI - **MemSizer (language modeling)** (Meta AI,Chinese University of Hong Kong (CUHK)) — 2022-03-23 | Parameters: 357M - License: closed | Type: model - AI model by Meta AI,Chinese University of Hong Kong (CUHK) - **Segatron-XL large, M=384 + HCP** (Microsoft Research,University of Waterloo) — 2022-03-21 | Parameters: 257.0M - License: closed | Type: model - AI model by Microsoft Research,University of Waterloo - **Transformer Large + HCP** (University of Waterloo,Microsoft Research) — 2022-03-21 | Parameters: 257.0M - License: closed | Type: model - AI model by University of Waterloo,Microsoft Research - **Segatron -XL base, M=150 + HCP** (Microsoft Research,University of Waterloo) — 2022-03-21 | Parameters: 151M - License: closed | Type: model - AI model by Microsoft Research,University of Waterloo - **GPT-3.5 (davinci-002)** (OpenAI) — 2022-03-15 - License: closed | Type: model - AI model by OpenAI - **ViT-G (model soup)** (University of Washington,Columbia University,Google,Meta AI,Tel Aviv University) — 2022-03-10 | Parameters: 1.8B - License: open | Type: model - AI model by University of Washington,Columbia University,Google,Meta AI,Tel Aviv University - **GPT3-6.7B + muP** (Microsoft,OpenAI) — 2022-03-07 | Parameters: 6.7B - License: closed | Type: model - AI model by Microsoft,OpenAI - **MegaSyn** (Collaborations Pharmaceuticals) — 2022-03-07 - License: closed | Type: model - AI model by Collaborations Pharmaceuticals - **RQ-Transformer (LSUN-cat dataset)** (Kakao,POSTECH) — 2022-03-03 | Parameters: 612M - License: closed | Type: model - AI model by Kakao,POSTECH - **RQ-Transformer (1.4B params ImageNet dataset)** (Kakao,POSTECH) — 2022-03-03 | Parameters: 1.4B - License: closed | Type: model - AI model by Kakao,POSTECH - **RQ-Transformer (3.8B params ImageNet dataset)** (Kakao,POSTECH) — 2022-03-03 | Parameters: 3.8B - License: closed | Type: model - AI model by Kakao,POSTECH - **Statement Curriculum Learning** (OpenAI) — 2022-03-02 | Parameters: 774M - License: closed | Type: model - AI model by OpenAI - **SeeKeR** (Meta AI) — 2022-03-01 | Parameters: SeeKeR - License: open | Type: model - BART and compared to GPT-2 - **CodeGen** (Salesforce) — 2022-03-01 | Parameters: CodeGen - License: open | Type: model - Code - **VLM-4** (LightOn) — 2022-03-01 | Parameters: VLM-4 - License: open | Type: model - Params corrected 25/Apr/2022 - **Chinchilla** (DeepMind) — 2022-03-01 | Parameters: Chinchilla - License: closed | Type: model - First to double tokens per size increase - **CodeT5** (Salesforce) — 2022-03-01 | Parameters: CodeT5 - License: open | Type: model - "Text-to-Text Transfer Transformer". Code. Large introduced in https://arxiv.org/pdf/2207.01780.pdf - **DeepNet** (Microsoft Research) — 2022-03-01 | Parameters: 3.2B - License: closed | Type: model - AI model by Microsoft Research - **PolyCoder** (Carnegie Mellon University (CMU)) — 2022-02-26 | Parameters: 2.7B - License: open | Type: model - AI model by Carnegie Mellon University (CMU) - **FourCastNet** (NVIDIA,NERSC, Lawrence Berkeley National Laboratory,University of Michigan,Rice University,California Institute of Technology,Purdue University) — 2022-02-22 - License: closed | Type: model - AI model by NVIDIA,NERSC, Lawrence Berkeley National Laboratory,University of Michigan,Rice University,California Institute of Technology,Purdue University - **ST-MoE** (Google,Google Brain,Google Research) — 2022-02-17 | Parameters: 269B - License: closed | Type: model - AI model by Google,Google Brain,Google Research - **Midjourney V1** (Midjourney) — 2022-02-15 - License: closed | Type: model - AI model by Midjourney - **MuZero VP9** (DeepMind) — 2022-02-14 - License: closed | Type: model - AI model by DeepMind - **LaMDA** (Google) — 2022-02-10 | Parameters: 137B - License: closed | Type: model - AI model by Google - **ProteinBERT** (Hebrew University of Jerusalem,Ben-Gurion University of the Negev,Deep Trading) — 2022-02-10 | Parameters: 16M - License: open | Type: model - AI model by Hebrew University of Jerusalem,Ben-Gurion University of the Negev,Deep Trading - **GPT-NeoX-20B** (EleutherAI) — 2022-02-09 | Parameters: 20B - License: open | Type: model - AI model by EleutherAI - **MaskGIT (ImageNet)** (Google Research) — 2022-02-08 | Parameters: 227M - License: closed | Type: model - AI model by Google Research - **RETRO-7B** (DeepMind) — 2022-02-07 | Parameters: 7.5B - License: closed | Type: model - AI model by DeepMind - **AlphaCode** (DeepMind) — 2022-02-02 | Parameters: 41.1B - License: closed | Type: model - AI model by DeepMind - **GPT-NeoX-20B** (EleutherAI) — 2022-02-01 | Parameters: GPT-NeoX-20B - License: open | Type: model - Latest model to Feb/2022 - **DARK** (University College London (UCL)) — 2022-01-28 - License: closed | Type: model - AI model by University College London (UCL) - **InstructGPT 175B** (OpenAI) — 2022-01-27 | Parameters: 175B - License: closed | Type: model - AI model by OpenAI - **InstructGPT 6B** (OpenAI) — 2022-01-27 | Parameters: 6B - License: closed | Type: model - AI model by OpenAI - **InstructGPT 1.3B** (OpenAI) — 2022-01-27 | Parameters: 1.3B - License: closed | Type: model - AI model by OpenAI - **InstructGPT 350M** (OpenAI) — 2022-01-27 | Parameters: 350M - License: closed | Type: model - AI model by OpenAI - **Primer (GPT-3 XL-like 1.9B)** (Google Brain) — 2022-01-24 | Parameters: 1.9B - License: closed | Type: model - AI model by Google Brain - **OntoProtein** (Zhejiang University (ZJU)) — 2022-01-23 | Parameters: 420M - License: open | Type: model - AI model by Zhejiang University (ZJU) - **AbLang (heavy sequences)** (University of Oxford) — 2022-01-22 | Parameters: 355M - License: open | Type: model - AI model by University of Oxford - **data2vec (language)** (Meta AI) — 2022-01-20 | Parameters: 705.1M - License: open | Type: model - AI model by Meta AI - **Japanese-GPT-1B** (rinna) — 2022-01-19 | Parameters: 1.3B - License: open | Type: model - AI model by rinna - **Detic** (Meta AI,University of Texas at Austin) — 2022-01-07 | Parameters: 88M - License: open | Type: model - AI model by Meta AI,University of Texas at Austin - **SignalP 6.0** (Technical University of Denmark,ETH Zurich,University of Copenhagen,Stanford University,Stockholm University,European Bioinformatics Institute) — 2022-01-03 - License: closed | Type: model - AI model by Technical University of Denmark,ETH Zurich,University of Copenhagen,Stanford University,Stockholm University,European Bioinformatics Institute - **CM3** (Meta AI) — 2022-01-01 | Parameters: CM3 - License: open | Type: model - LLM with multimodal capabilities - **Vespa** (Technical University of Munich) — 2021-12-30 | Parameters: 231K - License: closed | Type: model - AI model by Technical University of Munich - **ERNIE 3.0 Titan** (Baidu,Peng Cheng Laboratory) — 2021-12-23 | Parameters: 260B - License: closed | Type: model - AI model by Baidu,Peng Cheng Laboratory - **GLIDE** (OpenAI) — 2021-12-20 | Parameters: 3.5B - License: open | Type: model - AI model by OpenAI - **MoE-1.1T** (Meta AI) — 2021-12-20 | Parameters: 1.1T - License: open | Type: model - AI model by Meta AI - **Fairseq-dense 13B** (Meta AI) — 2021-12-20 | Parameters: 13B - License: open | Type: model - AI model by Meta AI - **LDM-1.45B** (Heidelberg University,Runway) — 2021-12-20 | Parameters: 1.4B - License: open | Type: model - AI model by Heidelberg University,Runway - **XGLM-7.5B** (Meta AI,Facebook AI Research) — 2021-12-20 | Parameters: 7.5B - License: open | Type: model - AI model by Meta AI,Facebook AI Research - **XGLM** (Meta AI) — 2021-12-20 | Parameters: 564M - License: closed | Type: model - AI model by Meta AI - **HSO** (Toyota Technological Institute at Chicago) — 2021-12-16 | Parameters: 345M - License: closed | Type: model - AI model by Toyota Technological Institute at Chicago - **Contriever** (Meta AI,University College London (UCL),PSL University,Université Grenoble Alpes) — 2021-12-16 | Parameters: 110M - License: open | Type: model - AI model by Meta AI,University College London (UCL),PSL University,Université Grenoble Alpes - **LongT5** (Google Research) — 2021-12-15 | Parameters: 3B - License: open | Type: model - AI model by Google Research - **EXAONE 1.0** (LG) — 2021-12-14 | Parameters: 300B - License: closed | Type: model - AI model by LG - **GLaM** (Google) — 2021-12-13 | Parameters: 1.2T - License: closed | Type: model - AI model by Google - **Engine-Base (NE)** (Boston University) — 2021-12-11 | Parameters: 124M - License: open | Type: model - AI model by Boston University - **Engine-Medium(NE)** (Boston University) — 2021-12-11 | Parameters: 355M - License: open | Type: model - AI model by Boston University - **Gopher (280B)** (DeepMind) — 2021-12-08 | Parameters: 280B - License: closed | Type: model - AI model by DeepMind - **Gopher (7.1B)** (DeepMind) — 2021-12-08 | Parameters: 7.1B - License: closed | Type: model - AI model by DeepMind - **Student of Games** (DeepMind) — 2021-12-06 - License: closed | Type: model - AI model by DeepMind - **CTR-BERT** (Amazon) — 2021-12-06 | Parameters: 70M - License: closed | Type: model - AI model by Amazon - **T-NLRv5 XXL** (Microsoft) — 2021-12-03 | Parameters: 5.4B - License: closed | Type: model - AI model by Microsoft - **ERNIE 3.0 Titan** (Baidu) — 2021-12-01 | Parameters: ERNIE 3.0 Titan - License: open | Type: model - - **XGLM** (Meta AI) — 2021-12-01 | Parameters: XGLM - License: open | Type: model - Multilingual: 30 languages, 16 families. - **Fairseq** (Meta AI) — 2021-12-01 | Parameters: Fairseq - License: open | Type: model - 13B & 1100B param models. - **Gopher** (DeepMind) — 2021-12-01 | Parameters: Gopher - License: closed | Type: model - Dataset: https://lifearchitect.ai/whats-in-my-ai/ - **GLaM** (Google) — 2021-12-01 | Parameters: GLaM - License: closed | Type: model - - **Anthropic-LM 52B** (Anthropic) — 2021-12-01 | Parameters: Anthropic-LM 52B - License: closed | Type: model - Internal research only - **RETRO** (DeepMind) — 2021-12-01 | Parameters: RETRO - License: closed | Type: model - with retrieval - **GPT-2-Medium+Pixelfly** (Stanford University,SambaNova Systems, Inc,Peking University,Adobe,University at Buffalo) — 2021-11-30 | Parameters: 203.0M - License: closed | Type: model - AI model by Stanford University,SambaNova Systems, Inc,Peking University,Adobe,University at Buffalo - **GPT-2-Small+Pixelfly** (Stanford University,SambaNova Systems, Inc,Peking University,Adobe,University at Buffalo) — 2021-11-30 | Parameters: 68M - License: closed | Type: model - AI model by Stanford University,SambaNova Systems, Inc,Peking University,Adobe,University at Buffalo - **Quantized ADMM** (Chinese University of Hong Kong (CUHK),Microsoft) — 2021-11-29 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK),Microsoft - **Transformer LM + MinSen** (Chinese University of Hong Kong (CUHK)) — 2021-11-29 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **NÜWA** (Microsoft Research,Peking University) — 2021-11-24 | Parameters: 870M - License: closed | Type: model - AI model by Microsoft Research,Peking University - **Persia** (ETH Zurich,Kuaishou Technology) — 2021-11-23 | Parameters: 100T - License: closed | Type: model - AI model by ETH Zurich,Kuaishou Technology - **Florence** (Microsoft) — 2021-11-22 | Parameters: 893M - License: closed | Type: model - AI model by Microsoft - **BASIC-L** (Google) — 2021-11-19 | Parameters: 3.1B - License: closed | Type: model - AI model by Google - **Swin Transformer V2 (SwinV2-G)** (Microsoft Research Asia) — 2021-11-18 | Parameters: 3B - License: open | Type: model - AI model by Microsoft Research Asia - **DeBERTaV3large** (Microsoft Research) — 2021-11-18 | Parameters: 418M - License: open | Type: model - AI model by Microsoft Research - **ESM1v** (Facebook AI Research,New York University (NYU),University of California (UC) Berkeley) — 2021-11-17 | Parameters: 650M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU),University of California (UC) Berkeley - **ViT-G/14 (LiT)** (Google Research) — 2021-11-15 | Parameters: 3.0B - License: closed | Type: model - AI model by Google Research - **EquiDock** (Massachusetts Institute of Technology (MIT),ETH Zurich,Tencent) — 2021-11-15 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT),ETH Zurich,Tencent - **A.X (Adot) 18B** (SK Telecom) — 2021-11-15 | Parameters: 18B - License: closed | Type: model - AI model by SK Telecom - **KoGPT** (Kakao) — 2021-11-12 | Parameters: 6.2B - License: open | Type: model - AI model by Kakao - **Masked Autoencoders ViT-H** (Facebook AI Research) — 2021-11-11 | Parameters: 632M - License: open | Type: model - AI model by Facebook AI Research - **GPT-2 (AMPS)** (University of California (UC) Berkeley) — 2021-11-08 | Parameters: 1.5M - License: closed | Type: model - AI model by University of California (UC) Berkeley - **GPT2+CoreLM+Fine-Tuning** (Aristotle University of Thessaloniki) — 2021-11-04 | Parameters: 132M - License: closed | Type: model - AI model by Aristotle University of Thessaloniki - **NCP-VAE (CIFAR 10)** (University of Illinois Urbana-Champaign (UIUC),NVIDIA) — 2021-11-03 - License: closed | Type: model - AI model by University of Illinois Urbana-Champaign (UIUC),NVIDIA - **NCP-VAE (Celeba HQ)** (University of Illinois Urbana-Champaign (UIUC),NVIDIA) — 2021-11-03 - License: closed | Type: model - AI model by University of Illinois Urbana-Champaign (UIUC),NVIDIA - **Luminous** (Aleph Alpha) — 2021-11-01 | Parameters: Luminous - License: open | Type: model - Devs from EleutherAI - **DeBERTaV3** (Microsoft) — 2021-11-01 | Parameters: DeBERTaV3 - License: open | Type: model - RoBERTa=162B token dataset. - **BERT-480** (Google) — 2021-11-01 | Parameters: BERT-480 - License: closed | Type: model - Submission to benchmarks. Original dataset was BookCorpus + Wikipedia: https://arxiv.org/pdf/1810.04805.pdf - **BERT-200** (Google) — 2021-11-01 | Parameters: BERT-200 - License: closed | Type: model - Submission to benchmarks. Original dataset was BookCorpus + Wikipedia: https://arxiv.org/pdf/1810.04805.pdf - **Cedille FR-Boris** (Coteries) — 2021-11-01 | Parameters: Cedille FR-Boris - License: open | Type: model - French only. GPT-J. - **CodeT5-base** (Salesforce,Nanyang Technological University) — 2021-11-01 | Parameters: 220M - License: open | Type: model - AI model by Salesforce,Nanyang Technological University - **Projected GAN** (Heidelberg University) — 2021-11-01 - License: open | Type: model - AI model by Heidelberg University - **S4** (Stanford University) — 2021-10-31 | Parameters: 249.0M - License: open | Type: model - AI model by Stanford University - **EfficientZero** (Tsinghua University,University of California (UC) Berkeley,Shanghai Qi Zhi institute) — 2021-10-30 - License: closed | Type: model - AI model by Tsinghua University,University of California (UC) Berkeley,Shanghai Qi Zhi institute - **Scatterbrain** (Stanford University,Adobe,University at Buffalo) — 2021-10-28 - License: closed | Type: model - AI model by Stanford University,Adobe,University at Buffalo - **Eve** (Harvard Medical School,University of Oxford) — 2021-10-27 | Parameters: 15.0M - License: closed | Type: model - AI model by Harvard Medical School,University of Oxford - **DALL-E mini** (Craiyon) — 2021-10-26 - License: open | Type: model - AI model by Craiyon - **PMLM-large** (Microsoft Research Asia,Nanyang Technological University,Xi’an Jiaotong University,Sun Yat-sen University) — 2021-10-21 | Parameters: 250M - License: closed | Type: model - AI model by Microsoft Research Asia,Nanyang Technological University,Xi’an Jiaotong University,Sun Yat-sen University - **WD+LR+M** (University of Cambridge,Alan Turing Institute) — 2021-10-20 - License: closed | Type: model - AI model by University of Cambridge,Alan Turing Institute - **base LM+GNN+kNN** (Shannon.AI,Nanjing University,Nanyang Technological University,Zhejiang University (ZJU)) — 2021-10-17 | Parameters: 274M - License: open | Type: model - AI model by Shannon.AI,Nanjing University,Nanyang Technological University,Zhejiang University (ZJU) - **base LM+GNN (WT103)** (Shannon.AI,Nanjing University,Nanyang Technological University,Zhejiang University (ZJU)) — 2021-10-17 | Parameters: 247M - License: closed | Type: model - AI model by Shannon.AI,Nanjing University,Nanyang Technological University,Zhejiang University (ZJU) - **PAGnol-XL** (LightOn,Laboratoire de Physique de l'Ecole Normale (LPENS),INRIA) — 2021-10-16 | Parameters: 1.5B - License: closed | Type: model - AI model by LightOn,Laboratoire de Physique de l'Ecole Normale (LPENS),INRIA - **GPT-2 (fine-tuned with HYDRA)** (University of California San Diego) — 2021-10-16 | Parameters: 1.5B - License: closed | Type: model - AI model by University of California San Diego - **MGK 4 heads (medium)** (FPT Software AI Center,University of California Los Angeles (UCLA),VinUniversity,Deezer Research,Rice University,University of Texas at Austin) — 2021-10-16 | Parameters: 90M - License: closed | Type: model - AI model by FPT Software AI Center,University of California Los Angeles (UCLA),VinUniversity,Deezer Research,Rice University,University of Texas at Austin - **MGK 8 heads (small)** (FPT Software AI Center,University of California Los Angeles (UCLA),VinUniversity,Deezer Research,Rice University,University of Texas at Austin) — 2021-10-16 | Parameters: 40M - License: closed | Type: model - AI model by FPT Software AI Center,University of California Los Angeles (UCLA),VinUniversity,Deezer Research,Rice University,University of Texas at Austin - **T0-XXL** (Hugging Face,Brown University) — 2021-10-15 | Parameters: 11B - License: open | Type: model - AI model by Hugging Face,Brown University - **KnGPT2** (Huawei Noah's Ark Lab,McGill University) — 2021-10-15 | Parameters: 83M - License: closed | Type: model - AI model by Huawei Noah's Ark Lab,McGill University - **Yuan 1.0** (Inspur) — 2021-10-12 | Parameters: 245.7B - License: closed | Type: model - AI model by Inspur - **TOME** (University of Southern California,Google) — 2021-10-12 | Parameters: 220M - License: closed | Type: model - AI model by University of Southern California,Google - **Megatron-Turing NLG 530B** (Microsoft,NVIDIA) — 2021-10-11 | Parameters: 530B - License: closed | Type: model - AI model by Microsoft,NVIDIA - **M6-10T** (Alibaba) — 2021-10-08 | Parameters: 10T - License: closed | Type: model - AI model by Alibaba - **AlphaFold-Multimer** (Google DeepMind,DeepMind) — 2021-10-04 - License: open | Type: model - AI model by Google DeepMind,DeepMind - **MT-NLG** (Microsoft/NVIDIA) — 2021-10-01 | Parameters: MT-NLG - License: closed | Type: model - - **Turing ULRv5** (Microsoft) — 2021-09-28 | Parameters: 2.2B - License: closed | Type: model - AI model by Microsoft - **TrOCR** (Beihang University,Microsoft Research Asia) — 2021-09-21 | Parameters: 558M - License: open | Type: model - AI model by Beihang University,Microsoft Research Asia - **LM-GVP** (Amazon Machine Learning Solutions Lab,Johnson & Johnson) — 2021-09-21 - License: open | Type: model - AI model by Amazon Machine Learning Solutions Lab,Johnson & Johnson - **PLATO-XL** (Baidu) — 2021-09-20 | Parameters: 11B - License: open | Type: model - AI model by Baidu - **DLRM-2022** (Facebook) — 2021-09-15 | Parameters: 3T - License: closed | Type: model - AI model by Facebook - **MegaMolBART** (NVIDIA) — 2021-09-14 | Parameters: 45M - License: open | Type: model - AI model by NVIDIA - **HyperCLOVA 82B** (NAVER,Search Solutions) — 2021-09-10 | Parameters: 82B - License: closed | Type: model - AI model by NAVER,Search Solutions - **HyperCLOVA 204B** (NAVER) — 2021-09-10 | Parameters: 204B - License: closed | Type: model - AI model by NAVER - **NLM** (Carnegie Mellon University (CMU),University of California San Diego) — 2021-09-09 | Parameters: 247.0M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),University of California San Diego - **Speechmatics Enhanced** (Speechmatics) — 2021-09-07 - License: closed | Type: model - AI model by Speechmatics - **PermuteFormer** (Peking University) — 2021-09-06 | Parameters: 149.7M - License: closed | Type: model - AI model by Peking University - **RNS-RNN** (University of Notre Dame) — 2021-09-05 | Parameters: 5.8M - License: closed | Type: model - AI model by University of Notre Dame - **MEB** (Microsoft) — 2021-09-04 | Parameters: 135B - License: closed | Type: model - AI model by Microsoft - **FLAN 137B** (Google Research) — 2021-09-03 | Parameters: 137B - License: closed | Type: model - AI model by Google Research - **PLUS-RNN** (Seoul National University,LG AI Research,NAVER,Kangwon National University) — 2021-09-03 - License: closed | Type: model - AI model by Seoul National University,LG AI Research,NAVER,Kangwon National University - **FLAN** (Google) — 2021-09-01 | Parameters: FLAN - License: closed | Type: model - Fine-tuned LaMDA - **Command xlarge** (Cohere) — 2021-09-01 | Parameters: Command xlarge - License: open | Type: model - Stealth 'ebooks and webpages'. 52B: https://crfm.stanford.edu/helm/v1.0/?models=1 - **PLATO-XL** (Baidu) — 2021-09-01 | Parameters: PLATO-XL - License: open | Type: model - Chatbot. Reddit comments + CN social - **Macaw** (Allen AI) — 2021-09-01 | Parameters: Macaw - License: partial | Type: model - Chatbot - **$\infty$-former (SM)** (Universidade de Lisboa (ULisboa),DeepMind) — 2021-09-01 | Parameters: 124M - License: closed | Type: model - AI model by Universidade de Lisboa (ULisboa),DeepMind - **HJRSS** (University of Washington,Microsoft) — 2021-09-01 | Parameters: 16M - License: closed | Type: model - AI model by University of Washington,Microsoft - **ALiBi (L=3072, Lvalid = 3072)** (University of Washington,Facebook AI Research,Allen Institute for AI) — 2021-08-27 | Parameters: 1.3B - License: open | Type: model - AI model by University of Washington,Facebook AI Research,Allen Institute for AI - **Speechmatics Standard** (Speechmatics) — 2021-08-23 - License: closed | Type: model - AI model by Speechmatics - **XLMR-XXL** (Facebook AI Research) — 2021-08-17 | Parameters: 10.7B - License: open | Type: model - AI model by Facebook AI Research - **ProteinLM** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Tencent) — 2021-08-17 | Parameters: 3B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Tencent - **DNABERT** (Northeastern University) — 2021-08-15 | Parameters: 110M - License: open | Type: model - AI model by Northeastern University - **GPT-2 (1.5B, Curriculum Learning 45K)** (Microsoft) — 2021-08-13 | Parameters: 1.5B - License: closed | Type: model - AI model by Microsoft - **GPT-2 (117M, SLW 110K)** (Microsoft) — 2021-08-13 | Parameters: 117M - License: closed | Type: model - AI model by Microsoft - **Jurassic-1-Jumbo** (AI21 Labs) — 2021-08-11 | Parameters: 178B - License: closed | Type: model - AI model by AI21 Labs - **Zidong Taichu** (Chinese Academy of Sciences,Wuhan AI Computing Center) — 2021-08-11 | Parameters: 3.2B - License: open | Type: model - AI model by Chinese Academy of Sciences,Wuhan AI Computing Center - **W2v-BERT** (Google Brain,Massachusetts Institute of Technology (MIT)) — 2021-08-07 | Parameters: 1B - License: closed | Type: model - AI model by Google Brain,Massachusetts Institute of Technology (MIT) - **YOLOX-X** (Megvii Inc) — 2021-08-06 | Parameters: 99.1M - License: open | Type: model - AI model by Megvii Inc - **FMMformer (2-kernel fast weight + Band20)** (University of California Los Angeles (UCLA),University of Utah) — 2021-08-05 | Parameters: 40M - License: closed | Type: model - AI model by University of California Los Angeles (UCLA),University of Utah - **6-Act Tether** (Facebook AI Research,Georgia Institute of Technology) — 2021-08-03 | Parameters: 5M - License: open | Type: model - AI model by Facebook AI Research,Georgia Institute of Technology - **Codex** (OpenAI) — 2021-08-01 | Parameters: Codex - License: open | Type: model - Code - **Jurassic-1** (AI21) — 2021-08-01 | Parameters: Jurassic-1 - License: open | Type: model - Emulated GPT-3 dataset - **SEER** (Facebook AI Research,INRIA) — 2021-07-29 | Parameters: 1.3B - License: open | Type: model - AI model by Facebook AI Research,INRIA - **GOAT** (DeepMind) — 2021-07-27 | Parameters: 3.5M - License: closed | Type: model - AI model by DeepMind - **HuBERT** (Facebook AI Research) — 2021-07-27 | Parameters: 1B - License: open | Type: model - AI model by Facebook AI Research - **Codex** (OpenAI) — 2021-07-07 | Parameters: 12B - License: closed | Type: model - AI model by OpenAI - **ERNIE 3.0** (Baidu) — 2021-07-05 | Parameters: 10B - License: open | Type: model - AI model by Baidu - **BlenderBot 2.0** (Meta AI) — 2021-07-01 | Parameters: BlenderBot 2.0 - License: open | Type: model - Chatbot - **GemNet-T (OC20)** (Technical University of Munich) — 2021-07-01 | Parameters: 1.9M - License: open | Type: model - AI model by Technical University of Munich - **DEQ-Transformer (Post-LN) + Jacobian Regularisation** (Carnegie Mellon University (CMU),Intel Labs) — 2021-06-28 | Parameters: 98M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **Adaptive Input Transformer + RD** (Microsoft Research Asia,Soochow University) — 2021-06-28 | Parameters: 247.0M - License: closed | Type: model - AI model by Microsoft Research Asia,Soochow University - **CPM-2** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI) — 2021-06-24 | Parameters: 11B - License: closed | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI - **Fold2Seq** (IBM,Texas A&M) — 2021-06-24 | Parameters: 12.4M - License: closed | Type: model - AI model by IBM,Texas A&M - **EfficientNetV2-XL** (Google,Google Brain) — 2021-06-23 | Parameters: 208M - License: open | Type: model - AI model by Google,Google Brain - **StyleGAN3-T** (NVIDIA,Aalto University) — 2021-06-21 | Parameters: 2.2M - License: open | Type: model - AI model by NVIDIA,Aalto University - **StyleGAN3-R** (NVIDIA,Aalto University) — 2021-06-21 | Parameters: 1.6M - License: open | Type: model - AI model by NVIDIA,Aalto University - **ALIGN** (Google Research) — 2021-06-11 | Parameters: 820M - License: closed | Type: model - AI model by Google Research - **Denoising Diffusion Probabilistic Models (LSUN Bedroom)** (University of California (UC) Berkeley) — 2021-06-11 | Parameters: 256M - License: open | Type: model - AI model by University of California (UC) Berkeley - **Delta RNN (+ full context)** (IDSIA,SUPSI,King Abdullah University of Science and Technology (KAUST)) — 2021-06-11 | Parameters: 44.6M - License: closed | Type: model - AI model by IDSIA,SUPSI,King Abdullah University of Science and Technology (KAUST) - **DeBERTa** (Microsoft) — 2021-06-10 | Parameters: 1.5B - License: open | Type: model - AI model by Microsoft - **CoAtNet** (Google,Google Research,Google Brain) — 2021-06-09 | Parameters: 2.4B - License: closed | Type: model - AI model by Google,Google Research,Google Brain - **EMDR** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),McGill University,DeepMind) — 2021-06-09 | Parameters: 440M - License: open | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),McGill University,DeepMind - **ViT-G/14** (Google Brain,Google Research) — 2021-06-08 | Parameters: 1.8B - License: closed | Type: model - AI model by Google Brain,Google Research - **AFP+FPI (PTB)** (University of Sheffield) — 2021-06-04 | Parameters: 2.0M - License: closed | Type: model - AI model by University of Sheffield - **AFP+FPI (WT2)** (University of Sheffield) — 2021-06-04 | Parameters: 13.6M - License: closed | Type: model - AI model by University of Sheffield - **GPT2-Large+LHOPT** (OpenAI) — 2021-06-02 | Parameters: 760M - License: closed | Type: model - AI model by OpenAI - **GPT-J** (EleutherAI) — 2021-06-01 | Parameters: GPT-J - License: open | Type: model - Popular - **LaMDA** (Google) — 2021-06-01 | Parameters: LaMDA - License: closed | Type: model - Chatbot - **Wu Dao 2.0** (Beijing Academy of Artificial Intelligence / BAAI) — 2021-05-31 | Parameters: 1.8T - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **CODA** (The University of Hong Kong,Sun Yat-sen University,Shanghai AI Lab) — 2021-05-31 | Parameters: 246.9M - License: closed | Type: model - AI model by The University of Hong Kong,Sun Yat-sen University,Shanghai AI Lab - **ByT5-XXL** (Google,Google Research) — 2021-05-28 | Parameters: 12.9B - License: open | Type: model - AI model by Google,Google Research - **Transformer local-attention (NesT-B)** (Google Cloud,Google Research) — 2021-05-26 | Parameters: 90.1M - License: open | Type: model - AI model by Google Cloud,Google Research - **CogView** (Tsinghua University,Alibaba DAMO Academy) — 2021-05-26 | Parameters: 4B - License: open | Type: model - AI model by Tsinghua University,Alibaba DAMO Academy - **DeepFRI** (Flatiron Institute,University of California San Diego,Jagiellonian University,New York University (NYU),Broad Institute,University of Auckland,Massachusettes General Hospital,Harvard Medical School,Massachusetts Institute of Technology (MIT)) — 2021-05-26 - License: open | Type: model - AI model by Flatiron Institute,University of California San Diego,Jagiellonian University,New York University (NYU),Broad Institute,University of Auckland,Massachusettes General Hospital,Harvard Medical School,Massachusetts Institute of Technology (MIT) - **ConSERT** (Meituan University,Beijing University of Posts and Telecommunications) — 2021-05-25 | Parameters: 340M - License: open | Type: model - AI model by Meituan University,Beijing University of Posts and Telecommunications - **MedBERT** (Peng Cheng Laboratory,University of Texas at Houston) — 2021-05-20 | Parameters: 17M - License: closed | Type: model - AI model by Peng Cheng Laboratory,University of Texas at Houston - **Multitask Unified Model (MUM)** (Google) — 2021-05-18 - License: closed | Type: model - AI model by Google - **Fairseq + UID: variance** (Google AI,ETH Zurich,University of Cambridge) — 2021-05-15 - License: closed | Type: model - AI model by Google AI,ETH Zurich,University of Cambridge - **ADM** (OpenAI) — 2021-05-11 | Parameters: 559M - License: open | Type: model - AI model by OpenAI - **ProtT5-XXL** (Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University) — 2021-05-04 | Parameters: 11B - License: open | Type: model - AI model by Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University - **ProtT5-XXL-BFD** (Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University) — 2021-05-04 | Parameters: 11B - License: open | Type: model - AI model by Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University - **ProtBERT-BFD** (Technical University of Munich,NVIDIA,Seoul National University,Google,Oak Ridge National Laboratory,Med AI Technology) — 2021-05-04 | Parameters: 420M - License: open | Type: model - AI model by Technical University of Munich,NVIDIA,Seoul National University,Google,Oak Ridge National Laboratory,Med AI Technology - **ProtBERT-UniRef** (Technical University of Munich,NVIDIA,Seoul National University,Google,Oak Ridge National Laboratory,Med AI Technology) — 2021-05-04 | Parameters: 420M - License: closed | Type: model - AI model by Technical University of Munich,NVIDIA,Seoul National University,Google,Oak Ridge National Laboratory,Med AI Technology - **ProtT5-XL-U50** (Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University) — 2021-05-04 | Parameters: 3B - License: open | Type: model - AI model by Technical University of Munich,Med AI Technology,NVIDIA,Oak Ridge National Laboratory,Google,Seoul National University - **Transformer-XL + SIS** (INRIA) — 2021-05-03 | Parameters: 246M - License: closed | Type: model - AI model by INRIA - **GPT-J-6B** (EleutherAI,LAION) — 2021-05-01 | Parameters: 6.1B - License: open | Type: model - AI model by EleutherAI,LAION - **ViT + DINO** (INRIA,Facebook AI Research) — 2021-04-29 | Parameters: 85M - License: open | Type: model - AI model by INRIA,Facebook AI Research - **SPALM + kNN** (DeepMind) — 2021-04-26 - License: closed | Type: model - AI model by DeepMind - **PanGu-α** (Huawei Noah's Ark Lab) — 2021-04-25 | Parameters: 207B - License: closed | Type: model - AI model by Huawei Noah's Ark Lab - **DiffQ Transformer (16L)** (Meta AI) — 2021-04-20 | Parameters: 247M - License: closed | Type: model - AI model by Meta AI - **PLUG** (Alibaba) — 2021-04-19 | Parameters: 27B - License: closed | Type: model - AI model by Alibaba - **DLRM-12T** (Meta AI,Carnegie Mellon University (CMU)) — 2021-04-12 | Parameters: 12T - License: closed | Type: model - AI model by Meta AI,Carnegie Mellon University (CMU) - **Megatron-LM (1T)** (Microsoft Research,NVIDIA,Stanford University) — 2021-04-09 | Parameters: 1T - License: closed | Type: model - AI model by Microsoft Research,NVIDIA,Stanford University - **Transformer-C** (University of Massachusetts Amherst) — 2021-04-08 | Parameters: 148M - License: closed | Type: model - AI model by University of Massachusetts Amherst - **TransfoRNN(d=1024)(2-layer) (PTB)** (Lenovo Research) — 2021-04-04 | Parameters: 97.6M - License: closed | Type: model - AI model by Lenovo Research - **GraphMS** (Dalian University of Technology,Dongbei University of Technology,Baidu,China National Health Development Research Center) — 2021-04-04 - License: closed | Type: model - AI model by Dalian University of Technology,Dongbei University of Technology,Baidu,China National Health Development Research Center - **T2R + Random Init** (University of Washington,Microsoft,DeepMind,Allen Institute for AI) — 2021-03-24 | Parameters: 450M - License: closed | Type: model - AI model by University of Washington,Microsoft,DeepMind,Allen Institute for AI - **T2R 75% + Pretrain (WT-103)** (University of Washington,Microsoft,DeepMind) — 2021-03-24 | Parameters: 668.9M - License: closed | Type: model - AI model by University of Washington,Microsoft,DeepMind - **T2R + Pretrain** (University of Washington,Microsoft,DeepMind) — 2021-03-24 | Parameters: 668.9M - License: closed | Type: model - AI model by University of Washington,Microsoft,DeepMind - **Unicorn** (Allen Institute for AI) — 2021-03-24 | Parameters: 11B - License: open | Type: model - AI model by Allen Institute for AI - **GPT-Neo-2.7B** (EleutherAI) — 2021-03-21 | Parameters: 2.7B - License: open | Type: model - AI model by EleutherAI - **GPT-Neo-2.7B (finetuned)** (EleutherAI) — 2021-03-21 | Parameters: 2.7B - License: closed | Type: model - AI model by EleutherAI - **GPT-Neo-2.7B (finetuned on PTB)** (EleutherAI) — 2021-03-21 | Parameters: 2.7B - License: closed | Type: model - AI model by EleutherAI - **GPT-Neo-125M** (EleutherAI) — 2021-03-21 | Parameters: 125M - License: open | Type: model - AI model by EleutherAI - **GPT-Neo-1.3B** (EleutherAI) — 2021-03-21 | Parameters: 1.3B - License: open | Type: model - AI model by EleutherAI - **GPT-Neo-125M (finetuned)** (EleutherAI) — 2021-03-21 | Parameters: 125M - License: closed | Type: model - AI model by EleutherAI - **GPT-Neo-1.3B (finetuned)** (EleutherAI) — 2021-03-21 | Parameters: 1.3B - License: closed | Type: model - AI model by EleutherAI - **U-Net GAN (FFHQ)** (Bosch Center for Artificial Intelligence,Max Planck Institute for Informatics) — 2021-03-19 - License: open | Type: model - AI model by Bosch Center for Artificial Intelligence,Max Planck Institute for Informatics - **GLM-10B** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT),Shanghai Qi Zhi institute) — 2021-03-18 | Parameters: 10B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT),Shanghai Qi Zhi institute - **GLM-2B** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT)) — 2021-03-18 | Parameters: 2B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT) - **GLM-10B-bidirectional** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT)) — 2021-03-18 | Parameters: 10B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT) - **GLM-10B-unidirectional** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT)) — 2021-03-18 | Parameters: 10B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI,Massachusetts Institute of Technology (MIT) - **Very Deep VAEs (ImageNet-64)** (OpenAI) — 2021-03-16 | Parameters: 125M - License: open | Type: model - AI model by OpenAI - **ResNet-RS** (Google Brain,University of California (UC) Berkeley) — 2021-03-13 | Parameters: 192M - License: open | Type: model - AI model by Google Brain,University of California (UC) Berkeley - **AraELECTRA** (American University of Beirut) — 2021-03-07 | Parameters: 136M - License: open | Type: model - AI model by American University of Beirut - **M6-T** (Alibaba) — 2021-03-05 | Parameters: 1.0T - License: closed | Type: model - AI model by Alibaba - **Generative BST** (Facebook AI Research) — 2021-03-05 | Parameters: 9.4B - License: open | Type: model - AI model by Facebook AI Research - **DCTransformer (ImageNet)** (DeepMind) — 2021-03-05 | Parameters: 736M - License: closed | Type: model - AI model by DeepMind - **ProteinGAN** (Vilnius University,Chalmers University of Technology) — 2021-03-04 | Parameters: 60M - License: closed | Type: model - AI model by Vilnius University,Chalmers University of Technology - **RFA-GATE-Gaussian-Stateful Big** (University of Washington,DeepMind,Allen Institute for AI,Hebrew University of Jerusalem,The University of Hong Kong) — 2021-03-03 | Parameters: 242M - License: closed | Type: model - AI model by University of Washington,DeepMind,Allen Institute for AI,Hebrew University of Jerusalem,The University of Hong Kong - **Wu Dao - Wen Hui** (Beijing Academy of Artificial Intelligence / BAAI) — 2021-03-01 | Parameters: 11.3B - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Wu Dao - Wen Lan** (Beijing Academy of Artificial Intelligence / BAAI) — 2021-03-01 | Parameters: 1B - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Wu Dao - Wen Su** (Beijing Academy of Artificial Intelligence / BAAI) — 2021-03-01 - License: open | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **Meta Pseudo Labels** (Google Brain,Google AI) — 2021-03-01 | Parameters: 480M - License: closed | Type: model - AI model by Google Brain,Google AI - **SRU++ Large** (ASAPP) — 2021-02-24 | Parameters: 234M - License: open | Type: model - AI model by ASAPP - **SRU++ Base** (ASAPP) — 2021-02-24 | Parameters: 148M - License: closed | Type: model - AI model by ASAPP - **SRU++ Large only 2 attention layers (k=5) (WT103)** (ASAPP) — 2021-02-24 | Parameters: 225M - License: closed | Type: model - AI model by ASAPP - **Linear Transformer (large)** (IDSIA) — 2021-02-22 | Parameters: 90M - License: closed | Type: model - AI model by IDSIA - **Linear Transformer (small)** (IDSIA,SUPSI) — 2021-02-22 | Parameters: 40M - License: closed | Type: model - AI model by IDSIA,SUPSI - **MSA Transformer** (Facebook AI Research,University of California (UC) Berkeley,New York University (NYU)) — 2021-02-13 | Parameters: 100M - License: open | Type: model - AI model by Facebook AI Research,University of California (UC) Berkeley,New York University (NYU) - **top-down frozen classifier** (University of Edinburgh,Toshiba Cambridge Research Laboratory) — 2021-02-09 - License: closed | Type: model - AI model by University of Edinburgh,Toshiba Cambridge Research Laboratory - **DLWP** (University of Washington,Microsoft Research) — 2021-02-09 | Parameters: 2.7M - License: closed | Type: model - AI model by University of Washington,Microsoft Research - **CryoDRGN** (Massachusetts Institute of Technology (MIT)) — 2021-02-04 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **ruGPT-3** (Huawei/Sberbank) — 2021-02-01 | Parameters: ruGPT-3 - License: open | Type: model - Russian GPT-3 with input from Huawei - **Selfish-RNN (SNT-ASGD) Stacked LSTMs** (Eindhoven University of Technology,University of Twente) — 2021-01-22 | Parameters: 25.2M - License: open | Type: model - AI model by Eindhoven University of Technology,University of Twente - **Selfish-RNN (ON-LSTM)** (Eindhoven University of Technology) — 2021-01-22 | Parameters: 25.2M - License: closed | Type: model - AI model by Eindhoven University of Technology - **Selfish-RNN (SNT-ASGD)RHNs** (Eindhoven University of Technology) — 2021-01-22 | Parameters: 7.6M - License: closed | Type: model - AI model by Eindhoven University of Technology - **Selfish-RNN (AWD-LSTM-MoS)** (Eindhoven University of Technology) — 2021-01-22 | Parameters: 15.6M - License: closed | Type: model - AI model by Eindhoven University of Technology - **DeiT-B** (Meta AI,Sorbonne University) — 2021-01-15 | Parameters: 86M - License: open | Type: model - AI model by Meta AI,Sorbonne University - **Switch** (Google) — 2021-01-11 | Parameters: 1.6T - License: open | Type: model - AI model by Google - **Wu Dao - Wen Yuan** (Beijing Academy of Artificial Intelligence / BAAI) — 2021-01-11 | Parameters: 2.6B - License: closed | Type: model - AI model by Beijing Academy of Artificial Intelligence / BAAI - **NVAE (CIFAR 10)** (NVIDIA) — 2021-01-08 - License: closed | Type: model - AI model by NVIDIA - **NVAE (FFHQ)** (NVIDIA) — 2021-01-08 - License: closed | Type: model - AI model by NVIDIA - **NVAE (Celeba HQ)** (NVIDIA) — 2021-01-08 - License: closed | Type: model - AI model by NVIDIA - **DALL-E** (OpenAI) — 2021-01-05 | Parameters: 12B - License: closed | Type: model - AI model by OpenAI - **CLIP (ViT L/14@336px)** (OpenAI) — 2021-01-05 | Parameters: 370M - License: open | Type: model - AI model by OpenAI - **CLIP (ResNet-50)** (OpenAI) — 2021-01-05 | Parameters: 88.6M - License: open | Type: model - AI model by OpenAI - **Transformer-XL + AutoDropout (PTB)** (Google Research) — 2021-01-05 | Parameters: 24M - License: closed | Type: model - AI model by Google Research - **Switch** (Google) — 2021-01-01 | Parameters: Switch - License: open | Type: model - - **Subformer (122M)** (National Institute of Advanced Industrial Science and Technology (AIST),University of Tokyo) — 2021-01-01 | Parameters: 122M - License: closed | Type: model - AI model by National Institute of Advanced Industrial Science and Technology (AIST),University of Tokyo - **Subformer (83M)** (University of Tokyo,National Institute of Advanced Industrial Science and Technology (AIST)) — 2021-01-01 | Parameters: 83M - License: closed | Type: model - AI model by University of Tokyo,National Institute of Advanced Industrial Science and Technology (AIST) - **Subformer (96M)** (University of Tokyo,National Institute of Advanced Industrial Science and Technology (AIST)) — 2021-01-01 | Parameters: 96M - License: closed | Type: model - AI model by University of Tokyo,National Institute of Advanced Industrial Science and Technology (AIST) - **AraGPT2-Mega** (American University of Beirut) — 2020-12-31 | Parameters: 1.5B - License: open | Type: model - AI model by American University of Beirut - **Shortformer** (University of Washington,Facebook AI Research,Allen Institute for AI) — 2020-12-31 | Parameters: 247M - License: closed | Type: model - AI model by University of Washington,Facebook AI Research,Allen Institute for AI - **ERNIE-Doc (247M)** (Baidu) — 2020-12-31 | Parameters: 247.0M - License: open | Type: model - AI model by Baidu - **ERNIE-Doc Base (151M, WT103)** (Baidu) — 2020-12-31 | Parameters: 151M - License: open | Type: model - AI model by Baidu - **CT-MoS (WT2)** (Google,National Tsing Hua University) — 2020-12-25 | Parameters: 45M - License: closed | Type: model - AI model by Google,National Tsing Hua University - **CT-MoS + DynamicEval (WT2)** (National Tsing Hua University,Google) — 2020-12-25 | Parameters: 45M - License: closed | Type: model - AI model by National Tsing Hua University,Google - **CT-MoS (PTB)** (National Tsing Hua University,Google) — 2020-12-25 | Parameters: 24M - License: closed | Type: model - AI model by National Tsing Hua University,Google - **CT-MoS + DynamicEval (PTB)** (National Tsing Hua University,Google) — 2020-12-25 | Parameters: 24M - License: closed | Type: model - AI model by National Tsing Hua University,Google - **DensePhrases** (Korea University,Princeton University) — 2020-12-23 - License: open | Type: model - AI model by Korea University,Princeton University - **RaSoR** (Korea University,Princeton University) — 2020-12-23 - License: closed | Type: model - AI model by Korea University,Princeton University - **VQGAN + CLIP** (Heidelberg University) — 2020-12-17 - License: open | Type: model - AI model by Heidelberg University - **ESM1b** (Facebook AI Research,New York University (NYU)) — 2020-12-15 | Parameters: 652.4M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **RoBERTa (PFAM)** (IBM Research,ETH Zurich) — 2020-12-05 - License: open | Type: model - AI model by IBM Research,ETH Zurich - **OC-GAN (Visual Genome)** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Microsoft Research,CIFAR AI Research) — 2020-12-03 - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Microsoft Research,CIFAR AI Research - **OC-GAN (COCO-Stuff)** (Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Microsoft Research,CIFAR AI Research) — 2020-12-03 - License: closed | Type: model - AI model by Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal,Microsoft Research,CIFAR AI Research - **CPM-Large** (Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI) — 2020-12-01 | Parameters: 2.6B - License: open | Type: model - AI model by Tsinghua University,Beijing Academy of Artificial Intelligence / BAAI - **Profile Prediction** (University of Washington,Salesforce Research) — 2020-12-01 - License: closed | Type: model - AI model by University of Washington,Salesforce Research - **AlphaFold 2** (DeepMind) — 2020-11-30 | Parameters: 93M - License: open | Type: model - AI model by DeepMind - **KEPLER** (Tsinghua University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),HEC,CIFAR AI Research,Princeton University,University of Montreal / Université de Montréal) — 2020-11-23 | Parameters: 125M - License: closed | Type: model - AI model by Tsinghua University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),HEC,CIFAR AI Research,Princeton University,University of Montreal / Université de Montréal - **AWD-FWM (WT2)** (IDSIA,Microsoft Research) — 2020-11-16 | Parameters: 37M - License: closed | Type: model - AI model by IDSIA,Microsoft Research - **AWD-FWM (PTB)** (IDSIA,Microsoft Research) — 2020-11-16 | Parameters: 24M - License: closed | Type: model - AI model by IDSIA,Microsoft Research - **Machine learning a model for RNA structure prediction** (International School for Advanced Studies,Institute of Structural Biology,Technical University of Munich) — 2020-11-16 - License: closed | Type: model - AI model by International School for Advanced Studies,Institute of Structural Biology,Technical University of Munich - **CPCProt** (University of Toronto) — 2020-11-10 | Parameters: 1.7M - License: closed | Type: model - AI model by University of Toronto - **HiPPO-LegS** (Stanford University,University at Buffalo) — 2020-10-23 - License: open | Type: model - AI model by Stanford University,University at Buffalo - **ChemBERTa** (University of Toronto,Reverie Labs,DeepChem) — 2020-10-23 | Parameters: 125M - License: open | Type: model - AI model by University of Toronto,Reverie Labs,DeepChem - **ViT-Huge/14** (Google Brain,Google Research) — 2020-10-22 | Parameters: 632M - License: open | Type: model - AI model by Google Brain,Google Research - **wave2vec 2.0 LARGE** (Facebook) — 2020-10-22 | Parameters: 317M - License: open | Type: model - AI model by Facebook - **CryptoGRU** (Indiana University Bloomington) — 2020-10-22 - License: closed | Type: model - AI model by Indiana University Bloomington - **GBERT-Large** (deepset,Bayerische Staatsbibliothek Muenchen) — 2020-10-21 | Parameters: 335M - License: open | Type: model - AI model by deepset,Bayerische Staatsbibliothek Muenchen - **German ELECTRA Large** (deepset,Bayerische Staatsbibliothek Muenchen) — 2020-10-21 | Parameters: 335M - License: open | Type: model - AI model by deepset,Bayerische Staatsbibliothek Muenchen - **Conformer + Wav2vec 2.0 + Noisy Student** (Google,Google Research,Google Brain) — 2020-10-20 | Parameters: 1B - License: closed | Type: model - AI model by Google,Google Research,Google Brain - **mT5-XXL** (Google,Google Research) — 2020-10-20 | Parameters: 13B - License: open | Type: model - AI model by Google,Google Research - **TinyBert** (Huazhong University of Science and Technology,Huawei Noah's Ark Lab,Huawei) — 2020-10-16 | Parameters: 67M - License: open | Type: model - AI model by Huazhong University of Science and Technology,Huawei Noah's Ark Lab,Huawei - **Memformer (4 encoder + 16 decoder)** (UC Davis,Westlake University,Facebook AI) — 2020-10-14 | Parameters: 76.2M - License: closed | Type: model - AI model by UC Davis,Westlake University,Facebook AI - **LUKE** (University of Washington,National Institute of Informatics) — 2020-10-02 | Parameters: 483M - License: open | Type: model - AI model by University of Washington,National Institute of Informatics - **Frage-AWD-LSTM-MemoryAug-NeuralCache (PTB)** (Johns Hopkins University,Xiaomi Corp) — 2020-09-29 | Parameters: 24M - License: closed | Type: model - AI model by Johns Hopkins University,Xiaomi Corp - **LSTM-MemoryAug (WT2)** (Johns Hopkins University,Xiaomi Corp) — 2020-09-29 | Parameters: 28.5M - License: closed | Type: model - AI model by Johns Hopkins University,Xiaomi Corp - **LSTM-MemoryAug (PTB)** (Johns Hopkins University,Xiaomi Corp) — 2020-09-29 | Parameters: 13.3M - License: closed | Type: model - AI model by Johns Hopkins University,Xiaomi Corp - **PAR Transformer Large** (NVIDIA) — 2020-09-09 - License: closed | Type: model - AI model by NVIDIA - **ProBERTa** (University of Illinois Urbana-Champaign (UIUC),Reed College) — 2020-09-01 | Parameters: 44M - License: open | Type: model - AI model by University of Illinois Urbana-Champaign (UIUC),Reed College - **ESM1-670M (UR50/S)** (Facebook AI Research,New York University (NYU)) — 2020-08-31 | Parameters: 669.2M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **ESM1-670M (UR50/D)** (Facebook AI Research,New York University (NYU)) — 2020-08-31 | Parameters: 669.2M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **ESM1-670M (UR100)** (Facebook AI Research,New York University (NYU)) — 2020-08-31 | Parameters: 669.2M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **ESM1-85M** (Facebook AI Research,New York University (NYU)) — 2020-08-31 | Parameters: 85.1M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **ESM1-43M** (Facebook AI Research,New York University (NYU)) — 2020-08-31 | Parameters: 42.6M - License: open | Type: model - AI model by Facebook AI Research,New York University (NYU) - **Transformer+Recurrent Windows of Context** (Toyota Technological Institute at Chicago,University of Chicago) — 2020-08-16 | Parameters: 124M - License: closed | Type: model - AI model by Toyota Technological Institute at Chicago,University of Chicago - **ERNIE-GEN (large)** (Baidu) — 2020-08-06 | Parameters: 340M - License: open | Type: model - AI model by Baidu - **DeLighT** (University of Washington,Allen Institute for AI,Facebook AI Research) — 2020-08-03 | Parameters: 99M - License: closed | Type: model - AI model by University of Washington,Allen Institute for AI,Facebook AI Research - **mBART-50** (Facebook AI) — 2020-08-02 | Parameters: 610M - License: open | Type: model - AI model by Facebook AI - **Grown to Prune Two-layer stacked LSTM** (University of Chicago,Toyota Technological Institute at Chicago) — 2020-07-30 - License: closed | Type: model - AI model by University of Chicago,Toyota Technological Institute at Chicago - **TransformerXL-LayerFusion-CA** (University of Liverpool,University of Southern California) — 2020-07-29 - License: closed | Type: model - AI model by University of Liverpool,University of Southern California - **GPT2-LayerFusion-WS** (University of Liverpool,University of Southern California) — 2020-07-29 - License: closed | Type: model - AI model by University of Liverpool,University of Southern California - **Hopfield Networks (2020)** (Johannes Kepler University Linz,Institute of Advanced Research in Artificial Intelligence,University of Oslo) — 2020-07-16 - License: open | Type: model - AI model by Johannes Kepler University Linz,Institute of Advanced Research in Artificial Intelligence,University of Oslo - **SemExp** (Carnegie Mellon University (CMU),Facebook AI Research) — 2020-07-02 - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Facebook AI Research - **DLRM-2021** (Meta AI) — 2020-07-01 | Parameters: 1T - License: closed | Type: model - AI model by Meta AI - **GShard (dense)** (Google) — 2020-06-30 | Parameters: 2.3B - License: closed | Type: model - AI model by Google - **GShard (600B)** (Google) — 2020-06-30 | Parameters: 600B - License: closed | Type: model - AI model by Google - **GPT-3 6.7B** (OpenAI) — 2020-06-22 | Parameters: 6.7B - License: closed | Type: model - AI model by OpenAI - **GPT-3 XL** (OpenAI) — 2020-06-22 | Parameters: 1.3B - License: closed | Type: model - AI model by OpenAI - **GPT-3 Small** (OpenAI) — 2020-06-22 | Parameters: 125M - License: closed | Type: model - AI model by OpenAI - **GPT-3 Medium** (OpenAI) — 2020-06-22 | Parameters: 356M - License: closed | Type: model - AI model by OpenAI - **GPT-3 Large** (OpenAI) — 2020-06-22 | Parameters: 760M - License: closed | Type: model - AI model by OpenAI - **GPT-3 2.7B** (OpenAI) — 2020-06-22 | Parameters: 2.6B - License: closed | Type: model - AI model by OpenAI - **GPT-3 13B** (OpenAI) — 2020-06-22 | Parameters: 12.8B - License: closed | Type: model - AI model by OpenAI - **iGPT-XL** (OpenAI) — 2020-06-17 | Parameters: 6.8B - License: open | Type: model - AI model by OpenAI - **iGPT-L** (OpenAI) — 2020-06-17 | Parameters: 1.4B - License: open | Type: model - AI model by OpenAI - **6-Layer-Tensor-Transformer+AdaHessian** (NERSC, Lawrence Berkeley National Laboratory,University of California (UC) Berkeley) — 2020-06-01 | Parameters: 85.5M - License: closed | Type: model - AI model by NERSC, Lawrence Berkeley National Laboratory,University of California (UC) Berkeley - **3-Layer-Tensor-Transformer+AdaHessian** (University of California (UC) Berkeley,NERSC, Lawrence Berkeley National Laboratory) — 2020-06-01 - License: closed | Type: model - AI model by University of California (UC) Berkeley,NERSC, Lawrence Berkeley National Laboratory - **GPT-3 175B (davinci)** (OpenAI) — 2020-05-28 | Parameters: 174.6B - License: closed | Type: model - AI model by OpenAI - **GPT3-6.7B (rerun of original)** (Microsoft,OpenAI) — 2020-05-28 | Parameters: 6.7B - License: closed | Type: model - AI model by Microsoft,OpenAI - **DETR** (Facebook) — 2020-05-26 | Parameters: 60M - License: open | Type: model - AI model by Facebook - **Retrieval-Augmented Generator** (Facebook,New York University (NYU),University College London (UCL)) — 2020-05-22 | Parameters: 626M - License: open | Type: model - AI model by Facebook,New York University (NYU),University College London (UCL) - **rTop-k(distributed setting)** (Stanford University) — 2020-05-21 | Parameters: 69M - License: closed | Type: model - AI model by Stanford University - **Specter** (Allen Institute for AI,University of Washington) — 2020-05-20 | Parameters: 110M - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **Conformer** (Google) — 2020-05-16 | Parameters: 118.8M - License: closed | Type: model - AI model by Google - **ONLSTM-SYD** (Westlake University,Institute for Advanced Study,McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),CIFAR AI Research,University of Montreal / Université de Montréal) — 2020-05-12 | Parameters: 25M - License: closed | Type: model - AI model by Westlake University,Institute for Advanced Study,McGill University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),CIFAR AI Research,University of Montreal / Université de Montréal - **ContextNet** (Google) — 2020-05-07 | Parameters: 112.7M - License: closed | Type: model - AI model by Google - **NAS+ESS (156M)** (Northeastern University (China),Chinese Academy of Sciences,NiuTrans Research,Kingsoft) — 2020-05-06 | Parameters: 156M - License: closed | Type: model - AI model by Northeastern University (China),Chinese Academy of Sciences,NiuTrans Research,Kingsoft - **NAS+ESS (23M)** (Northeastern University (China),NiuTrans Research,Kingsoft) — 2020-05-06 | Parameters: 23M - License: closed | Type: model - AI model by Northeastern University (China),NiuTrans Research,Kingsoft - **UnifiedQA** (Allen Institute for AI,University of Washington) — 2020-05-02 | Parameters: 11B - License: open | Type: model - AI model by Allen Institute for AI,University of Washington - **GPT-3** (OpenAI) — 2020-05-01 | Parameters: GPT-3 - License: open | Type: model - No RLHF (base only). Popular: 3.1M wpm. Dataset: https://lifearchitect.ai/whats-in-my-ai/ - **Segatron XL large, M=384** (University of Waterloo,Peking University,RSVP.ai) — 2020-04-30 | Parameters: 257.0M - License: open | Type: model - AI model by University of Waterloo,Peking University,RSVP.ai - **Segatron XL base, M=384** (University of Waterloo,RSVP.ai,Peking University) — 2020-04-30 | Parameters: 257M - License: open | Type: model - AI model by University of Waterloo,RSVP.ai,Peking University - **Once for All** (MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT),IBM) — 2020-04-29 | Parameters: 7.7M - License: open | Type: model - AI model by MIT-IBM Watson AI Lab,Massachusetts Institute of Technology (MIT),IBM - **Go-explore** (Uber AI,OpenAI) — 2020-04-27 - License: closed | Type: model - AI model by Uber AI,OpenAI - **Cube-Space AutoEncoder** (MIT-IBM Watson AI Lab) — 2020-04-27 - License: closed | Type: model - AI model by MIT-IBM Watson AI Lab - **DiffStk-MRNN** (Pennsylvania State University,Rochester Institute of Technology) — 2020-04-04 | Parameters: 1.0M - License: closed | Type: model - AI model by Pennsylvania State University,Rochester Institute of Technology - **Megatron-11B** (Meta AI) — 2020-04-01 | Parameters: Megatron-11B - License: open | Type: model - My favourite model until GPT-3 and GPT-4 came along: https://github.com/facebookresearch/fairseq/blob/main/examples/megatron_11b/README.md - **Agent57** (DeepMind) — 2020-03-30 - License: closed | Type: model - AI model by DeepMind - **AraBERT** (American University of Beirut) — 2020-03-30 | Parameters: 110M - License: open | Type: model - AI model by American University of Beirut - **AraBERT LArge v2** (American University of Beirut) — 2020-03-30 | Parameters: 371M - License: open | Type: model - AI model by American University of Beirut - **MetNet** (Google) — 2020-03-24 | Parameters: 225M - License: closed | Type: model - AI model by Google - **ELECTRA** (Stanford University,Google,Google Brain) — 2020-03-23 | Parameters: 335M - License: open | Type: model - AI model by Stanford University,Google,Google Brain - **Tensor-Transformer(1core)+PN (WT103)** (University of California (UC) Berkeley) — 2020-03-17 | Parameters: 85.3M - License: open | Type: model - AI model by University of California (UC) Berkeley - **Tensor-Transformer(1core)+PN (PTB)** (University of California (UC) Berkeley) — 2020-03-17 - License: closed | Type: model - AI model by University of California (UC) Berkeley - **WDC20 / DLWP** (University of Washington,Microsoft Research) — 2020-03-15 | Parameters: 672.1K - License: closed | Type: model - AI model by University of Washington,Microsoft Research - **ProGen** (Salesforce Research,Stanford University) — 2020-03-13 | Parameters: 1.2B - License: closed | Type: model - AI model by Salesforce Research,Stanford University - **Routing Transformer (WT-103)** (Google Research) — 2020-03-12 | Parameters: 79.5M - License: open | Type: model - AI model by Google Research - **Local Transformer (WT103)** (Google Research) — 2020-03-12 - License: closed | Type: model - AI model by Google Research - **TransformerXL + spectrum control** (University of California Los Angeles (UCLA),JD.com) — 2020-03-11 | Parameters: 151M - License: closed | Type: model - AI model by University of California Los Angeles (UCLA),JD.com - **MuPIPR** (University of California Los Angeles (UCLA),University of Pennsylvania) — 2020-03-05 - License: closed | Type: model - AI model by University of California Los Angeles (UCLA),University of Pennsylvania - **LSTM-3-layer+Gadam** (University of Oxford,University of Bristol,University of Cambridge) — 2020-03-02 | Parameters: 24M - License: closed | Type: model - AI model by University of Oxford,University of Bristol,University of Cambridge - **Transformer++** (American Express) — 2020-03-01 | Parameters: Transformer++ - License: closed | Type: model - Not to be confused with the more common usage of Transformer++, the ~2023 Transformer++ based on Llama. See Mamba paper. - **TCAN (WT2)** (Nanjing University,Ant Group) — 2020-02-28 | Parameters: 33M - License: closed | Type: model - AI model by Nanjing University,Ant Group - **TCAN (PTB)** (Ant Group) — 2020-02-28 | Parameters: 13M - License: closed | Type: model - AI model by Ant Group - **Feedback Transformer** (LORIA,University of Lorraine,Facebook AI Research) — 2020-02-21 | Parameters: 126M - License: closed | Type: model - AI model by LORIA,University of Lorraine,Facebook AI Research - **FFN SwiGLU** (Google) — 2020-02-14 | Parameters: 220M - License: closed | Type: model - AI model by Google - **Turing-NLG** (Microsoft) — 2020-02-13 | Parameters: 17B - License: closed | Type: model - AI model by Microsoft - **ALBERT-xxlarge** (Toyota Technological Institute at Chicago,Google) — 2020-02-09 | Parameters: 235M - License: open | Type: model - AI model by Toyota Technological Institute at Chicago,Google - **Perceiver IO (optical flow)** (DeepMind) — 2020-02-08 | Parameters: 27.9M - License: closed | Type: model - AI model by DeepMind - **TaLK Convolution** (Carleton University) — 2020-02-08 | Parameters: 240M - License: closed | Type: model - AI model by Carleton University - **Theseus 6/768** (University of California San Diego,Beihang University,Microsoft) — 2020-02-07 | Parameters: 66M - License: open | Type: model - AI model by University of California San Diego,Beihang University,Microsoft - **Meena** (Google Brain) — 2020-01-28 | Parameters: 2.6B - License: closed | Type: model - AI model by Google Brain - **ContextNet + Noisy Student** (Google) — 2020-01-19 - License: closed | Type: model - AI model by Google - **AlphaFold** (DeepMind) — 2020-01-15 | Parameters: 16.3M - License: closed | Type: model - AI model by DeepMind - **Meena** (Google) — 2020-01-01 | Parameters: Meena - License: closed | Type: model - Dialogue model. Trained 61B tokens for 164x epochs to 10T tokens! - **Big Transfer (BiT-M)** (Google Brain) — 2019-12-24 | Parameters: 928M - License: closed | Type: model - AI model by Google Brain - **DD-PPO** (Georgia Institute of Technology,Facebook AI Research,Oregon State University,Simon Fraser University) — 2019-12-19 - License: closed | Type: model - AI model by Georgia Institute of Technology,Facebook AI Research,Oregon State University,Simon Fraser University - **SeqVec** (Technical University of Munich) — 2019-12-17 | Parameters: 93M - License: open | Type: model - AI model by Technical University of Munich - **OpenAI Five** (OpenAI) — 2019-12-13 | Parameters: 159M - License: closed | Type: model - AI model by OpenAI - **OpenAI Five Rerun** (OpenAI) — 2019-12-13 | Parameters: 159M - License: closed | Type: model - AI model by OpenAI - **MMLSTM (WT-103)** (Beijing University of Posts and Telecommunications,University of West London) — 2019-12-05 | Parameters: 75M - License: closed | Type: model - AI model by Beijing University of Posts and Telecommunications,University of West London - **MMLSTM (WT-2)** (Beijing University of Posts and Telecommunications,University of West London) — 2019-12-05 | Parameters: 32.3M - License: closed | Type: model - AI model by Beijing University of Posts and Telecommunications,University of West London - **MMLSTM (PTB)** (Beijing University of Posts and Telecommunications,University of West London) — 2019-12-05 | Parameters: 21.3M - License: closed | Type: model - AI model by Beijing University of Posts and Telecommunications,University of West London - **StarGAN v2** (NAVER,Yonsei University,Swiss Federal Institute of Technology) — 2019-12-04 - License: open | Type: model - AI model by NAVER,Yonsei University,Swiss Federal Institute of Technology - **StyleGAN2** (NVIDIA,Aalto University) — 2019-12-03 | Parameters: 30M - License: open | Type: model - AI model by NVIDIA,Aalto University - **bRSM + cache** (Numenta,Incubator 491) — 2019-12-02 | Parameters: 2.5M - License: closed | Type: model - AI model by Numenta,Incubator 491 - **Transformer-XL DeFINE (141M)** (University of Washington,Allen Institute for AI) — 2019-11-27 | Parameters: 141M - License: closed | Type: model - AI model by University of Washington,Allen Institute for AI - **Transformer-XL DeFINE (107M)** (University of Washington) — 2019-11-27 | Parameters: 107.4M - License: closed | Type: model - AI model by University of Washington - **Adaptive LSTM + DeFINE** (University of Washington) — 2019-11-27 | Parameters: 48.7M - License: closed | Type: model - AI model by University of Washington - **AWD-LSTM + DeFINE** (University of Washington) — 2019-11-27 | Parameters: 20M - License: closed | Type: model - AI model by University of Washington - **Photo-Geometric Autoencoder** (University of Oxford) — 2019-11-25 - License: open | Type: model - AI model by University of Oxford - **FastSpeech** (Zhejiang University (ZJU),Microsoft Research) — 2019-11-20 | Parameters: 30.1M - License: closed | Type: model - AI model by Zhejiang University (ZJU),Microsoft Research - **MuZero** (DeepMind) — 2019-11-19 | Parameters: 36.9M - License: closed | Type: model - AI model by DeepMind - **Transformer - LibriVox + Decoding/Rescoring** (Facebook) — 2019-11-19 | Parameters: 296M - License: open | Type: model - AI model by Facebook - **Long-range sequence Compressive Transformers** (DeepMind) — 2019-11-13 - License: closed | Type: model - AI model by DeepMind - **Noisy Student (L2)** (Carnegie Mellon University (CMU),Google) — 2019-11-11 | Parameters: 480M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Google - **Sandwich Transformer** (Allen Institute for AI,Facebook AI Research) — 2019-11-10 | Parameters: 209M - License: closed | Type: model - AI model by Allen Institute for AI,Facebook AI Research - **CamemBERT** (Facebook,INRIA,Sorbonne University) — 2019-11-10 | Parameters: 335M - License: open | Type: model - AI model by Facebook,INRIA,Sorbonne University - **Self-Attention and Convolutional Layers** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL)) — 2019-11-08 | Parameters: 29.5M - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL) - **XLM-RoBERTa** (Facebook AI) — 2019-11-05 | Parameters: 550M - License: open | Type: model - AI model by Facebook AI - **Base LM + kNN LM + Continuous Cache** (Stanford University,Facebook AI Research) — 2019-11-01 | Parameters: 247.0M - License: closed | Type: model - AI model by Stanford University,Facebook AI Research - **AlphaStar** (DeepMind) — 2019-10-30 | Parameters: 139M - License: closed | Type: model - AI model by DeepMind - **BART-large** (Facebook AI) — 2019-10-29 | Parameters: 406.3M - License: open | Type: model - AI model by Facebook AI - **T5-3B** (Google) — 2019-10-23 | Parameters: 2.8B - License: open | Type: model - AI model by Google - **T5-11B** (Google) — 2019-10-23 | Parameters: 11B - License: open | Type: model - AI model by Google - **LSTM(large)+Sememe+cell** (Tsinghua University,Beijing University of Posts and Telecommunications,Huawei Noah's Ark Lab) — 2019-10-20 | Parameters: 48M - License: closed | Type: model - AI model by Tsinghua University,Beijing University of Posts and Telecommunications,Huawei Noah's Ark Lab - **LSTM(medium)+Sememe+cell (WT2)** (Tsinghua University,Beijing University of Posts and Telecommunications,Huawei Noah's Ark Lab) — 2019-10-20 | Parameters: 24M - License: closed | Type: model - AI model by Tsinghua University,Beijing University of Posts and Telecommunications,Huawei Noah's Ark Lab - **RMSNorm (Transformer-base)** (University of Edinburgh,University of Zurich) — 2019-10-16 | Parameters: 65M - License: closed | Type: model - AI model by University of Edinburgh,University of Zurich - **Rubik's cube ADR robot** (OpenAI) — 2019-10-15 | Parameters: 27.8M - License: closed | Type: model - AI model by OpenAI - **M4-50B** (Google) — 2019-10-11 | Parameters: 50B - License: closed | Type: model - AI model by Google - **AlphaX-1** (Facebook AI Research,Brown University) — 2019-10-02 | Parameters: 5.4M - License: closed | Type: model - AI model by Facebook AI Research,Brown University - **DistilBERT** (Hugging Face) — 2019-10-02 | Parameters: 66M - License: open | Type: model - AI model by Hugging Face - **T5** (Google) — 2019-10-01 | Parameters: T5 - License: open | Type: model - "Text-to-Text Transfer Transformer". C4 + NLP language problems. "compared the following three configurations: First, the standard baseline model, which was pre-trained on 235 ≈ 34B tokens; second, the baseline trained instead for about 1 trillion tokens (i.e. the same amount of pre-training used for T5), which we refer to as “baseline-1T”; and third, T5-Base." - **ALBERT** (Toyota Technological Institute at Chicago,Google Research) — 2019-09-26 | Parameters: 18M - License: open | Type: model - AI model by Toyota Technological Institute at Chicago,Google Research - **Adaptive Inputs + LayerDrop** (Facebook AI Research,LORIA) — 2019-09-25 | Parameters: 423.0M - License: open | Type: model - AI model by Facebook AI Research,LORIA - **Alleviated TOI 10 (WT103)** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg) — 2019-09-18 - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg - **Alleviated TOI 10 (PTB)** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg) — 2019-09-18 - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg - **Alleviated TOI 10 (WT2)** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg) — 2019-09-18 - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL),Swisscom,University of Freiburg - **Megatron-BERT** (NVIDIA) — 2019-09-17 | Parameters: 3.9B - License: closed | Type: model - AI model by NVIDIA - **Megatron-LM (8.3B)** (NVIDIA) — 2019-09-17 | Parameters: 8.3B - License: closed | Type: model - AI model by NVIDIA - **Hide and Seek** (OpenAI) — 2019-09-17 | Parameters: 1.6M - License: closed | Type: model - AI model by OpenAI - **Megatron-LM (2.5B)** (NVIDIA) — 2019-09-17 | Parameters: 2.5B - License: closed | Type: model - AI model by NVIDIA - **Megatron-LM (355M)** (NVIDIA) — 2019-09-17 | Parameters: 355M - License: open | Type: model - AI model by NVIDIA - **Megatron-LM (1.2B)** (NVIDIA) — 2019-09-17 | Parameters: 1.2B - License: closed | Type: model - AI model by NVIDIA - **Xiaoice** (Microsoft Research Asia) — 2019-09-14 - License: closed | Type: model - AI model by Microsoft Research Asia - **ResNet-152 + ObjectNet** (Massachusetts Institute of Technology (MIT)) — 2019-09-06 | Parameters: 38M - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Mogrifier (d2, MoS2, MC) + dynamic eval** (DeepMind,University of Oxford) — 2019-09-04 | Parameters: 35M - License: closed | Type: model - AI model by DeepMind,University of Oxford - **Mogrifier (d2, MC) + dynamic eval** (DeepMind) — 2019-09-04 | Parameters: 24M - License: closed | Type: model - AI model by DeepMind - **UDSMProt** (Fraunhofer Heinrich Hertz Institute) — 2019-09-04 | Parameters: 28.3M - License: open | Type: model - AI model by Fraunhofer Heinrich Hertz Institute - **DEQ-Transformer (Medium, Adaptive Embedding)** (Carnegie Mellon University (CMU),Intel Labs) — 2019-09-03 | Parameters: 110.0M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **DEQ-TrellisNet (PTB)** (Carnegie Mellon University (CMU),Intel Labs) — 2019-09-03 | Parameters: 24M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **DEQ-TrellisNet (WT-103)** (Carnegie Mellon University (CMU),Intel Labs) — 2019-09-03 | Parameters: 180M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **Megatron-LM** (NVIDIA) — 2019-09-01 | Parameters: Megatron-LM - License: open | Type: model - - **AWD-LSTM+Behaviorial-Gating** (University of Southern California) — 2019-08-31 | Parameters: 27M - License: closed | Type: model - AI model by University of Southern California - **LSTM-Medium+Behaviorial-Gating (PTB)** (University of Southern California) — 2019-08-31 | Parameters: 20M - License: closed | Type: model - AI model by University of Southern California - **trRosetta** (Nankai University,University of Washington,Tianjin University,Harvard University) — 2019-08-22 - License: open | Type: model - AI model by Nankai University,University of Washington,Tianjin University,Harvard University - **TripletRes** (University of Michigan) — 2019-08-13 - License: closed | Type: model - AI model by University of Michigan - **RNN + char4-MS-vec** (NTT Communication Science Laboratories,Tohoku University) — 2019-07-17 | Parameters: 226.0M - License: closed | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **Graph-based Semi-Supervised Learning (GSSL) Model on MNIST** (West Virginia University) — 2019-07-17 - License: closed | Type: model - AI model by West Virginia University - **RNN + char3-MS-vec** (NTT Communication Science Laboratories,Tohoku University) — 2019-07-16 | Parameters: 175M - License: closed | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **RNN Baseline** (Massachusetts Institute of Technology (MIT),Rey Juan Carlos University) — 2019-07-14 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Rey Juan Carlos University - **R-Transformer** (Michigan State University,TAL Education Group (Xueersi)) — 2019-07-12 | Parameters: 15.8M - License: closed | Type: model - AI model by Michigan State University,TAL Education Group (Xueersi) - **Pluribus** (Facebook AI Research) — 2019-07-11 - License: closed | Type: model - AI model by Facebook AI Research - **All-attention network + adaptive span** (Facebook AI Research) — 2019-07-02 | Parameters: 133M - License: closed | Type: model - AI model by Facebook AI Research - **RoBERTa** (Meta AI) — 2019-07-01 | Parameters: RoBERTa - License: open | Type: model - calcs: "In total, this batch size and number of steps corresponds to pre-training on 235 ≈ 34B tokens. This is considerably less than BERT (Devlin et al., 2018), which used roughly 137B tokens, or RoBERTa (Liu et al., 2019c), which used roughly 2.2T tokens. Using only 2 35 tokens results in a reasonable computational budget while still providing a sufficient amount of pre-training for acceptable performance. We consider the effect of pre-training for more steps in Sections 3.6 and 3.7. Note that 2 35 tokens only covers a fraction of the entire C4 data set, so we never repeat any data during pre-training." https://arxiv.org/pdf/1910.10683.pdf MMLU shows RoBERTa-base 125M only=27.9 (not 355M) - **RoBERTa Large** (Facebook,University of Washington) — 2019-07-01 | Parameters: 355M - License: open | Type: model - AI model by Facebook,University of Washington - **RoBERTa Base** (Facebook,University of Washington) — 2019-07-01 | Parameters: 125M - License: open | Type: model - AI model by Facebook,University of Washington - **Tensorized Transformer (257M)** (Tianjin University,Microsoft Research Asia,Beijing Institute of Technology) — 2019-06-24 | Parameters: 257.0M - License: closed | Type: model - AI model by Tianjin University,Microsoft Research Asia,Beijing Institute of Technology - **Tensorized Transformer (OBW)** (Tianjin University,Microsoft Research Asia,Beijing Institute of Technology) — 2019-06-24 | Parameters: 160M - License: closed | Type: model - AI model by Tianjin University,Microsoft Research Asia,Beijing Institute of Technology - **Tensorized Transformer (PTB)** (Tianjin University,Microsoft Research Asia,Beijing Institute of Technology) — 2019-06-24 | Parameters: 12M - License: closed | Type: model - AI model by Tianjin University,Microsoft Research Asia,Beijing Institute of Technology - **Tensorized Transformer (large PTB)** (Tianjin University,Microsoft Research Asia,Beijing Institute of Technology) — 2019-06-24 - License: closed | Type: model - AI model by Tianjin University,Microsoft Research Asia,Beijing Institute of Technology - **Tensorized Transformer (W103)** (Tianjin University,Microsoft Research Asia,Beijing Institute of Technology) — 2019-06-24 | Parameters: 85.3M - License: closed | Type: model - AI model by Tianjin University,Microsoft Research Asia,Beijing Institute of Technology - **Walking Minotaur robot** (University of California (UC) Berkeley,Google Brain) — 2019-06-19 - License: closed | Type: model - AI model by University of California (UC) Berkeley,Google Brain - **TAPE Transformer** (University of California (UC) Berkeley,Covariant,Google,Chan Zuckerberg Initiative) — 2019-06-19 | Parameters: 38M - License: open | Type: model - AI model by University of California (UC) Berkeley,Covariant,Google,Chan Zuckerberg Initiative - **LaNet-L (CIFAR-10)** (Brown University,Facebook) — 2019-06-17 | Parameters: 44.1M - License: open | Type: model - AI model by Brown University,Facebook - **PG-SWGAN** (ETH Zurich) — 2019-06-15 - License: closed | Type: model - AI model by ETH Zurich - **SAGAN** (Rutgers University,Google Research) — 2019-06-14 - License: closed | Type: model - AI model by Rutgers University,Google Research - **Char-CNN-BiLSTM** (Capital One) — 2019-06-13 - License: closed | Type: model - AI model by Capital One - **RNN + char2-MS-vec** (NTT Communication Science Laboratories,Tohoku University) — 2019-06-13 | Parameters: 158M - License: closed | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **AWD-LSTM + MoS + Partial Shuffled** (University of Texas at Austin) — 2019-06-10 | Parameters: 35M - License: open | Type: model - AI model by University of Texas at Austin - **AdvSoft + 4 layer QRNN + dynamic evaluation (WT103)** (University of Texas at Austin) — 2019-06-10 - License: closed | Type: model - AI model by University of Texas at Austin - **Adversarial + AWD-LSTM-MoS + partial shuffled** (University of Texas at Austin) — 2019-06-10 - License: closed | Type: model - AI model by University of Texas at Austin - **4 layer QRNN + dynamic evaluation** (University of Texas at Austin) — 2019-06-10 - License: closed | Type: model - AI model by University of Texas at Austin - **Transformer-XL Large + Phrase Induction** (Massachusetts Institute of Technology (MIT),University of Illinois Urbana-Champaign (UIUC)) — 2019-06-04 | Parameters: 257.0M - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),University of Illinois Urbana-Champaign (UIUC) - **AWD-LSTM + Phrase Induction + finetuning (PTB)** (Massachusetts Institute of Technology (MIT),University of Illinois Urbana-Champaign (UIUC)) — 2019-06-04 | Parameters: 24M - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),University of Illinois Urbana-Champaign (UIUC) - **VQ-VAE-2 (ImageNet)** (Google) — 2019-06-02 - License: closed | Type: model - AI model by Google - **VQ-VAE-2 (FFHQ)** (Google) — 2019-06-02 - License: closed | Type: model - AI model by Google - **XLNet** (Carnegie Mellon University (CMU),Google Brain) — 2019-06-01 | Parameters: 340M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Google Brain - **DLRM-2020** (Facebook AI) — 2019-05-31 | Parameters: 100B - License: closed | Type: model - AI model by Facebook AI - **Grover-Mega** (University of Washington) — 2019-05-29 | Parameters: 1.5B - License: open | Type: model - AI model by University of Washington - **MnasNet-A3** (Google) — 2019-05-29 | Parameters: 5.2M - License: open | Type: model - AI model by Google - **MnasNet-A1 + SSDLite** (Google) — 2019-05-29 | Parameters: 4.9M - License: open | Type: model - AI model by Google - **RSM** (Cerenaut) — 2019-05-28 - License: closed | Type: model - AI model by Cerenaut - **EfficientNet-B1** (Google) — 2019-05-28 | Parameters: 7.8M - License: open | Type: model - AI model by Google - **Flow++ (CIFAR10)** (University of California (UC) Berkeley,Covariant) — 2019-05-15 | Parameters: 31.4M - License: closed | Type: model - AI model by University of California (UC) Berkeley,Covariant - **AWD-LSTM-DRILL + dynamic evaluation† (WT2)** (IDIAP) — 2019-05-14 | Parameters: 34M - License: open | Type: model - AI model by IDIAP - **AWD-LSTM-DRILL + dynamic evaluation† (PTB)** (IDIAP) — 2019-05-14 | Parameters: 24M - License: open | Type: model - AI model by IDIAP - **RaptorX-Contact** (Toyota Technological Institute at Chicago) — 2019-05-02 - License: closed | Type: model - AI model by Toyota Technological Institute at Chicago - **Neuro-Symbolic Concept Learner** (Massachusetts Institute of Technology (MIT),Tsinghua University,MIT-IBM Watson AI Lab,DeepMind) — 2019-04-26 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Tsinghua University,MIT-IBM Watson AI Lab,DeepMind - **MuseNet** (OpenAI) — 2019-04-25 | Parameters: 2.0B - License: closed | Type: model - AI model by OpenAI - **Sparse Transformer (Enwik8)** (OpenAI) — 2019-04-23 | Parameters: 95M - License: closed | Type: model - AI model by OpenAI - **Sparse Transformer (CIFAR10)** (OpenAI) — 2019-04-23 | Parameters: 59M - License: closed | Type: model - AI model by OpenAI - **Sparse Transformer (ImageNet)** (OpenAI) — 2019-04-23 | Parameters: 152M - License: closed | Type: model - AI model by OpenAI - **DANet** (Chinese Academy of Sciences) — 2019-04-21 - License: open | Type: model - AI model by Chinese Academy of Sciences - **BERT-Large-CAS (PTB+WT2+WT103)** (Amazon) — 2019-04-20 | Parameters: 395M - License: closed | Type: model - AI model by Amazon - **BERT-Large-CAS (WT103)** (Amazon) — 2019-04-20 | Parameters: 340M - License: closed | Type: model - AI model by Amazon - **BERT-Large-CAS (WT2)** (Amazon) — 2019-04-20 | Parameters: 340M - License: closed | Type: model - AI model by Amazon - **SpecAugment** (Google Brain) — 2019-04-18 - License: closed | Type: model - AI model by Google Brain - **LTM** (Murdoch University) — 2019-04-18 - License: closed | Type: model - AI model by Murdoch University - **Transformer-XL + RMS dynamic eval** (University of Edinburgh) — 2019-04-17 | Parameters: 257.0M - License: closed | Type: model - AI model by University of Edinburgh - **MEGNet (molecule model)** (University of California San Diego) — 2019-04-10 | Parameters: 8.7K - License: closed | Type: model - AI model by University of California San Diego - **MEGNet (crystal formation energy model)** (University of California San Diego) — 2019-04-10 | Parameters: 26.1K - License: closed | Type: model - AI model by University of California San Diego - **MEGNet (crystal band gap model)** (University of California San Diego) — 2019-04-10 | Parameters: 26.1K - License: closed | Type: model - AI model by University of California San Diego - **MEGNet (crystal elasticity model)** (University of California San Diego) — 2019-04-10 | Parameters: 26.1K - License: closed | Type: model - AI model by University of California San Diego - **True-Regularization+Finetune+Dynamic-Eval** (Mobvoi,Williams College) — 2019-04-08 | Parameters: 7M - License: closed | Type: model - AI model by Mobvoi,Williams College - **WeNet (PTB)** (Amazon) — 2019-04-08 | Parameters: 23M - License: closed | Type: model - AI model by Amazon - **WeNet (Penn Treebank)** (Amazon) — 2019-04-08 | Parameters: 23M - License: closed | Type: model - AI model by Amazon - **Cross-lingual alignment** (Tel Aviv University,Massachusetts Institute of Technology (MIT)) — 2019-04-04 - License: open | Type: model - AI model by Tel Aviv University,Massachusetts Institute of Technology (MIT) - **FAIRSEQ Adaptive Inputs** (Facebook AI Research,Google Brain) — 2019-04-01 | Parameters: 247.0M - License: closed | Type: model - AI model by Facebook AI Research,Google Brain - **UniRep** (Harvard University) — 2019-03-26 | Parameters: 18.2M - License: open | Type: model - AI model by Harvard University - **SciBERT** (Allen Institute for AI) — 2019-03-26 | Parameters: 110M - License: open | Type: model - AI model by Allen Institute for AI - **DOC + Finetune∗ + Partial Shuffle (PTB)** (University of Washington) — 2019-03-11 - License: closed | Type: model - AI model by University of Washington - **NMT Transformer 437M** (Google,Bar-Ilan University) — 2019-02-28 | Parameters: 437.7M - License: closed | Type: model - AI model by Google,Bar-Ilan University - **KataGo** (Jane Street) — 2019-02-27 | Parameters: 2.5M - License: open | Type: model - AI model by Jane Street - **ProxylessNAS** (Massachusetts Institute of Technology (MIT)) — 2019-02-23 - License: open | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **SSA** (Massachusetts Institute of Technology (MIT)) — 2019-02-22 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **code2seq** (Technion - Israel Institute of Technology,Facebook AI Research) — 2019-02-21 | Parameters: 37M - License: open | Type: model - AI model by Technion - Israel Institute of Technology,Facebook AI Research - **GPT-2 (1.5B)** (OpenAI) — 2019-02-14 | Parameters: 1.5B - License: open | Type: model - AI model by OpenAI - **GPT-2 (124M)** (OpenAI) — 2019-02-14 | Parameters: 124M - License: open | Type: model - AI model by OpenAI - **GPT-2 (774M)** (OpenAI) — 2019-02-14 | Parameters: 774M - License: open | Type: model - AI model by OpenAI - **GPT-2 (355M)** (OpenAI) — 2019-02-14 | Parameters: 355M - License: open | Type: model - AI model by OpenAI - **SDE** (Carnegie Mellon University (CMU),Google Brain,Monash University) — 2019-02-09 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Google Brain,Monash University - **Compress-LSTM (66M)** (Samsung R&D Institute Russia,National Research University Higher School of Economics) — 2019-02-06 | Parameters: 66M - License: closed | Type: model - AI model by Samsung R&D Institute Russia,National Research University Higher School of Economics - **Compress-LSTM (4.6M)** (Samsung R&D Institute Russia,National Research University Higher School of Economics) — 2019-02-06 | Parameters: 4.6M - License: closed | Type: model - AI model by Samsung R&D Institute Russia,National Research University Higher School of Economics - **GPT-2** (OpenAI) — 2019-02-01 | Parameters: GPT-2 - License: open | Type: model - WebText 10B token corpus × 4 epochs → 40B tokens processed. Reddit outbound only - **Hanabi 4 player** (DeepMind,University of Oxford,Carnegie Mellon University (CMU),Google Brain) — 2019-02-01 | Parameters: 764K - License: closed | Type: model - AI model by DeepMind,University of Oxford,Carnegie Mellon University (CMU),Google Brain - **MT-DNN** (Microsoft) — 2019-01-31 | Parameters: 330M - License: open | Type: model - AI model by Microsoft - **TSLM+MoS (PTB)** (Tianjin University,Beijing Institute of Technology) — 2019-01-31 - License: closed | Type: model - AI model by Tianjin University,Beijing Institute of Technology - **Mono3D++** (University of California Los Angeles (UCLA),Megvii Inc) — 2019-01-11 - License: closed | Type: model - AI model by University of California Los Angeles (UCLA),Megvii Inc - **Transformer-XL (257M)** (Carnegie Mellon University (CMU),Google Brain) — 2019-01-09 | Parameters: 257.0M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Google Brain - **Transformer-XL-ptb** (Carnegie Mellon University (CMU),Google Brain) — 2019-01-09 | Parameters: 24M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Google Brain - **Decoupled weight decay regularization** (University of Freiburg) — 2019-01-04 | Parameters: 36.5M - License: open | Type: model - AI model by University of Freiburg - **Transformer ELMo** (Allen Institute for AI,University of Washington) — 2019-01-01 | Parameters: 56M - License: closed | Type: model - AI model by Allen Institute for AI,University of Washington - **Transformer + Average Attention Network** (University of Electronic Science and Technology of China) — 2019-01-01 - License: closed | Type: model - AI model by University of Electronic Science and Technology of China - **StyleGAN** (NVIDIA) — 2018-12-12 | Parameters: 26.2M - License: open | Type: model - AI model by NVIDIA - **Vine copula (breast cancer)** (Massachusetts Institute of Technology (MIT),Rey Juan Carlos University) — 2018-12-04 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Rey Juan Carlos University - **Vine copula (wine quality)** (Massachusetts Institute of Technology (MIT),Rey Juan Carlos University) — 2018-12-04 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Rey Juan Carlos University - **Vine copula (crime)** (Massachusetts Institute of Technology (MIT),Rey Juan Carlos University) — 2018-12-04 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT),Rey Juan Carlos University - **SPN (ImageNet 128)** (Google Brain,DeepMind) — 2018-12-04 | Parameters: 250M - License: closed | Type: model - AI model by Google Brain,DeepMind - **SPN (CelebA HQ)** (Google Brain,DeepMind) — 2018-12-04 | Parameters: 50M - License: closed | Type: model - AI model by Google Brain,DeepMind - **DMPFold** (University College London (UCL)) — 2018-11-29 | Parameters: 3.8M - License: open | Type: model - AI model by University College London (UCL) - **GPipe (Transformer)** (Google) — 2018-11-16 | Parameters: 6B - License: closed | Type: model - AI model by Google - **Multi-cell LSTM** (University of Hyderabad) — 2018-11-15 | Parameters: 7.2M - License: closed | Type: model - AI model by University of Hyderabad - **Fine-tuned-AWD-LSTM-DOC (fin)** (Samsung R&D Institute Russia) — 2018-11-12 | Parameters: 46M - License: closed | Type: model - AI model by Samsung R&D Institute Russia - **Discriminator-tuned LSTM** (Samsung R&D Institute Russia) — 2018-11-12 | Parameters: 111.9M - License: closed | Type: model - AI model by Samsung R&D Institute Russia - **Mesh-TensorFlow Transformer 4.9B (language)** (Google Brain) — 2018-11-05 | Parameters: 4.9B - License: closed | Type: model - AI model by Google Brain - **Mesh-TensorFlow Transformer 2.9B (translation)** (Google Brain) — 2018-11-05 | Parameters: 2.9B - License: closed | Type: model - AI model by Google Brain - **MemoReader** (Samsung,Korea University) — 2018-10-31 - License: closed | Type: model - AI model by Samsung,Korea University - **code2vec** (Technion - Israel Institute of Technology,Facebook AI Research) — 2018-10-30 - License: open | Type: model - AI model by Technion - Israel Institute of Technology,Facebook AI Research - **TrellisNet** (Carnegie Mellon University (CMU),Bosch Center for Artificial Intelligence,Intel Labs) — 2018-10-15 | Parameters: 180M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Bosch Center for Artificial Intelligence,Intel Labs - **TrellisNet-MoS (1.4x larger) PTB** (Carnegie Mellon University (CMU),Intel Labs,Bosch Center for Artificial Intelligence) — 2018-10-15 | Parameters: 34M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs,Bosch Center for Artificial Intelligence - **BERT-Large** (Google) — 2018-10-11 | Parameters: 340M - License: open | Type: model - AI model by Google - **DeepConPred2** (Tsinghua University) — 2018-10-11 - License: closed | Type: model - AI model by Tsinghua University - **BERT** (Google) — 2018-10-01 | Parameters: BERT - License: open | Type: model - "BERT — 128 000 tokens per step × 1 000 000 steps → 128 B tokens processed" - **BigGAN-deep 512x512** (Heriot-Watt University,DeepMind) — 2018-09-28 | Parameters: 112.7M - License: open | Type: model - AI model by Heriot-Watt University,DeepMind - **Transformer (Adaptive Input Embeddings) WT103** (Facebook AI Research) — 2018-09-28 | Parameters: 247M - License: open | Type: model - AI model by Facebook AI Research - **ADP-FAIRSEQ + NGRAMRES** (Nara Institute of Science and Technology,Chinese University of Hong Kong (CUHK),Tsinghua University) — 2018-09-28 | Parameters: 247M - License: closed | Type: model - AI model by Nara Institute of Science and Technology,Chinese University of Hong Kong (CUHK),Tsinghua University - **LSTM+NeuralCache** (KU Leuven,ESAT - PSI,Apple) — 2018-09-24 | Parameters: 2.1M - License: closed | Type: model - AI model by KU Leuven,ESAT - PSI,Apple - **AWD-LSTM-MoS + dynamic evaluation (WT2, 2018)** (Peking University,Microsoft Research Asia) — 2018-09-18 | Parameters: 35M - License: closed | Type: model - AI model by Peking University,Microsoft Research Asia - **AWD-LSTM-MoS + dynamic evaluation (PTB, 2018)** (Peking University,Microsoft Research Asia) — 2018-09-18 | Parameters: 24M - License: open | Type: model - AI model by Peking University,Microsoft Research Asia - **Talent Search and Recommendation Systems** (LinkedIn) — 2018-09-18 - License: closed | Type: model - AI model by LinkedIn - **Transformer + Simple Recurrent Unit** (ASAPP,Cornell University,Google,Princeton University) — 2018-09-17 | Parameters: 90M - License: closed | Type: model - AI model by ASAPP,Cornell University,Google,Princeton University - **NetSurfP-2.0** (Technical University of Denmark,University of Copenhagen,Universidad Nacional de San Martín,AIMST University) — 2018-09-10 - License: closed | Type: model - AI model by Technical University of Denmark,University of Copenhagen,Universidad Nacional de San Martín,AIMST University - **ESRGAN** (Chinese University of Hong Kong (CUHK),Chinese Academy of Sciences,Nanyang Technological University) — 2018-09-01 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK),Chinese Academy of Sciences,Nanyang Technological University - **(ensemble): AWD-LSTM-DOC (fin) × 5 (WT2)** (NTT Communication Science Laboratories,Tohoku University) — 2018-08-30 | Parameters: 185M - License: open | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **(ensemble): AWD-LSTM-DOC (fin) × 5 (PTB)** (NTT Communication Science Laboratories,Tohoku University) — 2018-08-30 - License: open | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **AWD-LSTM-DOC (fin) (37M)** (NTT Communication Science Laboratories,Tohoku University) — 2018-08-30 | Parameters: 37M - License: open | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **AWD-LSTM-DOC (fin) (23M)** (NTT Communication Science Laboratories,Tohoku University) — 2018-08-30 | Parameters: 23M - License: open | Type: model - AI model by NTT Communication Science Laboratories,Tohoku University - **Big Transformer for Back-Translation** (Facebook AI Research,Google Brain) — 2018-08-28 - License: open | Type: model - AI model by Facebook AI Research,Google Brain - **AWD-LSTM-MoS+PDR + dynamic evaluation (WT2)** (IBM) — 2018-08-14 | Parameters: 35M - License: closed | Type: model - AI model by IBM - **AWD-LSTM-MoS+PDR + dynamic evaluation (PTB)** (IBM) — 2018-08-14 | Parameters: 60.9M - License: closed | Type: model - AI model by IBM - **RGC+ASQ (PTB)** (Tsinghua University,University of California Los Angeles (UCLA)) — 2018-08-13 | Parameters: 53.5M - License: closed | Type: model - AI model by Tsinghua University,University of California Los Angeles (UCLA) - **Dexterous In-Hand Manipulation [control policy]** (OpenAI) — 2018-08-01 | Parameters: 3.2M - License: closed | Type: model - AI model by OpenAI - **Big-Little Net** (IBM) — 2018-07-10 | Parameters: 77.4M - License: open | Type: model - AI model by IBM - **Big-Little Net (vision)** (IBM) — 2018-07-10 | Parameters: 77.4M - License: open | Type: model - AI model by IBM - **Big-Little Net (speech)** (IBM) — 2018-07-10 | Parameters: 3.3M - License: open | Type: model - AI model by IBM - **Glow (Celeba HQ)** (OpenAI) — 2018-07-10 - License: open | Type: model - AI model by OpenAI - **RCAN** (Northeastern University) — 2018-07-08 | Parameters: 16M - License: closed | Type: model - AI model by Northeastern University - **FTW (For The Win)** (DeepMind) — 2018-07-03 | Parameters: 126.0M - License: closed | Type: model - AI model by DeepMind - **QT-Opt** (Google Brain,University of California (UC) Berkeley) — 2018-06-27 | Parameters: 1.2M - License: closed | Type: model - AI model by Google Brain,University of California (UC) Berkeley - **S + I-Attention (3)** (National Research University Higher School of Economics,Samsung R&D Institute Russia) — 2018-06-26 - License: closed | Type: model - AI model by National Research University Higher School of Economics,Samsung R&D Institute Russia - **DARTS** (DeepMind,Carnegie Mellon University (CMU)) — 2018-06-24 | Parameters: 33M - License: closed | Type: model - AI model by DeepMind,Carnegie Mellon University (CMU) - **DARTS (second order) (PTB)** (Carnegie Mellon University (CMU),DeepMind) — 2018-06-24 | Parameters: 23M - License: open | Type: model - AI model by Carnegie Mellon University (CMU),DeepMind - **Relational Memory Core** (DeepMind,University College London (UCL)) — 2018-06-05 - License: closed | Type: model - AI model by DeepMind,University College London (UCL) - **GPT-1** (OpenAI) — 2018-06-01 | Parameters: GPT-1 - License: open | Type: model - "GPT-1 — 984 M tokens corpus × 100 epochs × 1 token per word → 98.4 B tokens processed" Books only. "We train for 100 epochs on minibatches of 64 randomly sampled, contiguous sequences of 512 tokens." =3,276,800 - **RHN+HSG(depth=40)** (Ben-Gurion University) — 2018-05-23 - License: closed | Type: model - AI model by Ben-Gurion University - **RHN(depth=40)** (Ben-Gurion University) — 2018-05-23 - License: closed | Type: model - AI model by Ben-Gurion University - **2-layer skip-LSTM + dropout tuning (PTB)** (DeepMind) — 2018-05-23 | Parameters: 24M - License: closed | Type: model - AI model by DeepMind - **aLSTM(depth-2)+RecurrentPolicy (WT2)** (University of Manchester,Alan Turing Institute) — 2018-05-22 | Parameters: 32M - License: closed | Type: model - AI model by University of Manchester,Alan Turing Institute - **aLSTM(depth-2)+RecurrentPolicy (PTB)** (University of Manchester) — 2018-05-22 | Parameters: 24M - License: closed | Type: model - AI model by University of Manchester - **Dropout-LSTM+Noise(Bernoulli) (WT2)** (Columbia University,New York University (NYU),Princeton University) — 2018-05-03 | Parameters: 51M - License: closed | Type: model - AI model by Columbia University,New York University (NYU),Princeton University - **LSTM+Noise(Beta)** (Columbia University,New York University (NYU),Princeton University) — 2018-05-03 | Parameters: 51M - License: closed | Type: model - AI model by Columbia University,New York University (NYU),Princeton University - **AWD-LSTM-MoS+Noisin+dynamic evaluation (PTB)** (Columbia University,New York University (NYU),Princeton University) — 2018-05-03 | Parameters: 22M - License: closed | Type: model - AI model by Columbia University,New York University (NYU),Princeton University - **Dropout-LSTM+Noise(Laplace) - medium (WT2)** (Columbia University,New York University (NYU),Princeton University) — 2018-05-03 | Parameters: 13M - License: closed | Type: model - AI model by Columbia University,New York University (NYU),Princeton University - **Dropout-LSTM+Noise(Bernoulli) - large(PTB)** (Columbia University,New York University (NYU),Princeton University) — 2018-05-03 | Parameters: 51M - License: closed | Type: model - AI model by Columbia University,New York University (NYU),Princeton University - **ResNeXt-101 32x48d** (Facebook) — 2018-05-02 | Parameters: 829M - License: open | Type: model - AI model by Facebook - **TF-LM-discourse LSTM (WT2)** (ESAT - PSI) — 2018-05-01 - License: open | Type: model - AI model by ESAT - PSI - **TF-LM-discourse LSTM (PTB)** (ESAT - PSI) — 2018-05-01 - License: open | Type: model - AI model by ESAT - PSI - **DNCON2** (University of Missouri) — 2018-05-01 - License: open | Type: model - AI model by University of Missouri - **RNNLM + Dynamic KL Regularization (WT2)** (Northwestern University) — 2018-04-27 | Parameters: 87.6M - License: closed | Type: model - AI model by Northwestern University - **RNMT+** (Google AI) — 2018-04-26 | Parameters: 378.9M - License: closed | Type: model - AI model by Google AI - **Diffractive Deep Neural Network** (University of California Los Angeles (UCLA)) — 2018-04-14 | Parameters: 8B - License: closed | Type: model - AI model by University of California Los Angeles (UCLA) - **YOLOv3** (University of Washington) — 2018-04-08 | Parameters: 56.9M - License: closed | Type: model - AI model by University of Washington - **LSTM (Hebbian, Cache, MbPA)** (DeepMind,University College London (UCL)) — 2018-03-27 | Parameters: 530.4M - License: closed | Type: model - AI model by DeepMind,University College London (UCL) - **4 layer QRNN (h=2500)** (Salesforce Research) — 2018-03-22 | Parameters: 151M - License: closed | Type: model - AI model by Salesforce Research - **Chinese - English translation** (Microsoft) — 2018-03-01 - License: closed | Type: model - AI model by Microsoft - **Residual Dense Network** (Northeastern University,University of Rochester) — 2018-02-24 - License: closed | Type: model - AI model by Northeastern University,University of Rochester - **Spectrally Normalized GAN** (Preferred Networks Inc,Ritsumeikan University,National Institute of Informatics) — 2018-02-16 - License: closed | Type: model - AI model by Preferred Networks Inc,Ritsumeikan University,National Institute of Informatics - **TCN (13M)** (Carnegie Mellon University (CMU),Intel Labs) — 2018-02-15 | Parameters: 13M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **Multipop Adaptive Continuous Stack (PTB)** (DeepMind,University of Oxford) — 2018-02-15 | Parameters: 40M - License: closed | Type: model - AI model by DeepMind,University of Oxford - **TCN (P-MNIST)** (Carnegie Mellon University (CMU),Intel Labs) — 2018-02-15 | Parameters: 42K - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Intel Labs - **ENAS** (Google Brain,Carnegie Mellon University (CMU),Stanford University) — 2018-02-09 | Parameters: 24M - License: closed | Type: model - AI model by Google Brain,Carnegie Mellon University (CMU),Stanford University - **DeepLabV3+** (Google) — 2018-02-07 - License: closed | Type: model - AI model by Google - **AmoebaNet-A (F=448)** (Google Brain) — 2018-02-05 | Parameters: 469M - License: closed | Type: model - AI model by Google Brain - **IMPALA** (DeepMind) — 2018-02-05 | Parameters: 1.6M - License: closed | Type: model - AI model by DeepMind - **ELMo** (University of Washington,Allen Institute for AI) — 2018-02-01 | Parameters: 94M - License: closed | Type: model - AI model by University of Washington,Allen Institute for AI - **QRNN** (Salesforce Research) — 2018-02-01 | Parameters: 135M - License: closed | Type: model - AI model by Salesforce Research - **T-DMCA** (Google Brain) — 2018-01-30 - License: closed | Type: model - AI model by Google Brain - **DenseNet201** (Tsinghua University,Facebook AI Research,Cornell University) — 2018-01-28 | Parameters: 20M - License: open | Type: model - AI model by Tsinghua University,Facebook AI Research,Cornell University - **ULM-FiT** (University of San Francisco,Insight Centre NUI Galway,Fast.ai) — 2018-01-18 | Parameters: 441M - License: open | Type: model - AI model by University of San Francisco,Insight Centre NUI Galway,Fast.ai - **Refined Part Pooling** (Tsinghua University,University of Technology Sydney,University of Texas at San Antonio) — 2018-01-09 - License: closed | Type: model - AI model by Tsinghua University,University of Technology Sydney,University of Texas at San Antonio - **ULMFiT** (Fast.ai) — 2018-01-01 | Parameters: ULMFiT - License: open | Type: model - "ULMFiT — 103 M tokens corpus × 14 epochs → 1.44 B tokens processed" "Corpus size. WikiText-103 contains about 103 million word-level tokens. Training schedule. The reference pre-training run trains for 14 full epochs on that corpus. Total tokens seen. 103 M tokens × 14 epochs → roughly 1.44 billion token prediction steps." Aussie Prof Jeremy Howard: https://www.abc.net.au/news/science/2023-11-15/jeremy-howard-taught-ai-to-the-world-and-helped-invent-chatgpt/103092474 - **RNNLM + Dynamic KL Regularization** (Northwestern University) — 2018-01-01 | Parameters: 13.3M - License: closed | Type: model - AI model by Northwestern University - **PixelSNAIL (CIFAR 10)** (University of California (UC) Berkeley) — 2017-12-28 - License: open | Type: model - AI model by University of California (UC) Berkeley - **PixelSNAIL (ImageNet)** (University of California (UC) Berkeley) — 2017-12-28 - License: open | Type: model - AI model by University of California (UC) Berkeley - **Tacotron 2** (Google,University of California (UC) Berkeley) — 2017-12-19 - License: closed | Type: model - AI model by Google,University of California (UC) Berkeley - **WGAN (Wasserstein GAN)** (Facebook AI Research,Courant Institute of Mathematical Sciences) — 2017-12-06 - License: closed | Type: model - AI model by Facebook AI Research,Courant Institute of Mathematical Sciences - **AlphaZero** (DeepMind) — 2017-12-05 - License: closed | Type: model - AI model by DeepMind - **2-layer-LSTM+Deep-Gradient-Compression** (Tsinghua University,Stanford University,NVIDIA) — 2017-12-05 | Parameters: 6.0M - License: closed | Type: model - AI model by Tsinghua University,Stanford University,NVIDIA - **PNASNet-5** (Johns Hopkins University,Google AI,Stanford University) — 2017-12-02 | Parameters: 86.1M - License: closed | Type: model - AI model by Johns Hopkins University,Google AI,Stanford University - **DL scaling speech** (Baidu) — 2017-12-01 | Parameters: 193M - License: closed | Type: model - AI model by Baidu - **DL scaling LM** (Baidu) — 2017-12-01 | Parameters: 177M - License: closed | Type: model - AI model by Baidu - **DL scaling Image** (Baidu) — 2017-12-01 | Parameters: 121M - License: closed | Type: model - AI model by Baidu - **TriNet** (Visual Computing Institute,RWTH Aachen University) — 2017-11-21 - License: closed | Type: model - AI model by Visual Computing Institute,RWTH Aachen University - **AWD-LSTM-MoS + dynamic evaluation (WT2, 2017)** (Carnegie Mellon University (CMU)) — 2017-11-10 | Parameters: 35M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **AWD-LSTM-MoS + dynamic evaluation (PTB, 2017)** (Carnegie Mellon University (CMU)) — 2017-11-10 | Parameters: 22M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **VQ-VAE** (DeepMind) — 2017-11-02 - License: closed | Type: model - AI model by DeepMind - **Fraternal dropout + AWD-LSTM 3-layer (WT2)** (Jagiellonian University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal) — 2017-10-31 | Parameters: 34M - License: closed | Type: model - AI model by Jagiellonian University,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms),University of Montreal / Université de Montréal - **Fraternal dropout + AWD-LSTM 3-layer (PTB)** (University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2017-10-31 | Parameters: 24M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **DCN+** (Salesforce Research) — 2017-10-31 - License: closed | Type: model - AI model by Salesforce Research - **S-Norm** (University of Washington,Allen Institute for AI) — 2017-10-29 - License: closed | Type: model - AI model by University of Washington,Allen Institute for AI - **PhraseCond** (Carnegie Mellon University (CMU),University of Pittsburgh) — 2017-10-28 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),University of Pittsburgh - **ProgressiveGAN** (NVIDIA) — 2017-10-27 - License: closed | Type: model - AI model by NVIDIA - **LRSO-GAN** (University of Technology Sydney) — 2017-10-22 - License: closed | Type: model - AI model by University of Technology Sydney - **AlphaGo Master** (DeepMind) — 2017-10-19 - License: closed | Type: model - AI model by DeepMind - **AlphaGo Zero** (DeepMind) — 2017-10-18 | Parameters: 46.4M - License: closed | Type: model - AI model by DeepMind - **Rainbow DQN** (DeepMind) — 2017-10-06 - License: closed | Type: model - AI model by DeepMind - **AWD-LSTM+WT+Cache+IOG (WT2)** (NTT Communication Science Laboratories) — 2017-09-26 | Parameters: 53M - License: closed | Type: model - AI model by NTT Communication Science Laboratories - **AWD-LSTM+WT+Cache+IOG (PTB)** (NTT Communication Science Laboratories) — 2017-09-26 | Parameters: 30M - License: closed | Type: model - AI model by NTT Communication Science Laboratories - **LSTM + dynamic eval** (University of Edinburgh) — 2017-09-21 | Parameters: 50M - License: closed | Type: model - AI model by University of Edinburgh - **AWD-LSTM + dynamic eval (PTB)** (University of Edinburgh) — 2017-09-21 | Parameters: 24M - License: closed | Type: model - AI model by University of Edinburgh - **AWD-LSTM + dynamic eval (WT2)** (University of Edinburgh) — 2017-09-21 | Parameters: 33M - License: closed | Type: model - AI model by University of Edinburgh - **ISS** (Duke University,Microsoft) — 2017-09-15 | Parameters: 11.1M - License: closed | Type: model - AI model by Duke University,Microsoft - **PyramidNet** (Korea Advanced Institute of Science and Technology (KAIST)) — 2017-09-06 | Parameters: 26M - License: open | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST) - **GL-LWGC-AWD-MoS-LSTM + dynamic evaluation (WT2)** (Ben-Gurion University of the Negev) — 2017-08-29 | Parameters: 38M - License: closed | Type: model - AI model by Ben-Gurion University of the Negev - **GL-LWGC-AWD-MoS-LSTM + dynamic evaluation (PTB)** (Ben-Gurion University) — 2017-08-29 | Parameters: 26M - License: closed | Type: model - AI model by Ben-Gurion University - **D-LSRC(100)+KN5 (PTB)** (Saarland University) — 2017-08-22 | Parameters: 6.0M - License: closed | Type: model - AI model by Saarland University - **Libratus** (Carnegie Mellon University (CMU)) — 2017-08-19 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **GRU + p-tHSM (pretrain via Brown) (PTB)** (Beihang University,University of Montreal / Université de Montréal,Chongqing University) — 2017-08-19 - License: closed | Type: model - AI model by Beihang University,University of Montreal / Université de Montréal,Chongqing University - **GRU + p-tHSM (pretrain via Brown) (WT2)** (Beihang University,University of Montreal / Université de Montréal,Chongqing University) — 2017-08-19 - License: closed | Type: model - AI model by Beihang University,University of Montreal / Université de Montréal,Chongqing University - **Adversarial Joint Adaptation Network (ResNet)** (Tsinghua University,University of California (UC) Berkeley) — 2017-08-17 | Parameters: 60M - License: closed | Type: model - AI model by Tsinghua University,University of California (UC) Berkeley - **NeuMF (Pinterest)** (Shandong University,Texas A&M,National University of Singapore,Columbia University) — 2017-08-16 - License: closed | Type: model - AI model by Shandong University,Texas A&M,National University of Singapore,Columbia University - **Cutout-regularized net** (University of Guelph,Vector Institute,CIFAR AI Research) — 2017-08-15 - License: closed | Type: model - AI model by University of Guelph,Vector Institute,CIFAR AI Research - **EI-REHN-1000D** (Korea Advanced Institute of Science and Technology (KAIST)) — 2017-08-14 | Parameters: 19M - License: closed | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST) - **EI-REHN-1200D (PTB)** (Korea Advanced Institute of Science and Technology (KAIST)) — 2017-08-14 | Parameters: 25M - License: closed | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST) - **OpenAI TI7 DOTA 1v1** (OpenAI) — 2017-08-11 - License: closed | Type: model - AI model by OpenAI - **RetinaNet-R101** (Facebook AI Research) — 2017-08-07 | Parameters: 53M - License: closed | Type: model - AI model by Facebook AI Research - **AWD-LSTM - 3-layer LSTM (tied) + continuous cache pointer (WT2)** (Salesforce Research) — 2017-08-07 | Parameters: 33M - License: closed | Type: model - AI model by Salesforce Research - **AWD-LSTM - 3-layer LSTM (tied) + continuous cache pointer (PTB)** (Salesforce Research) — 2017-08-07 | Parameters: 24M - License: closed | Type: model - AI model by Salesforce Research - **GSM** (Peking University,Microsoft Research) — 2017-07-30 - License: closed | Type: model - AI model by Peking University,Microsoft Research - **ConvS2S (ensemble of 8 models)** (Meta AI) — 2017-07-25 - License: closed | Type: model - AI model by Meta AI - **PSPNet** (Chinese University of Hong Kong (CUHK)) — 2017-07-21 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **Densely Connected LSTM + Var. Dropout** (Ghent University) — 2017-07-19 | Parameters: 23M - License: closed | Type: model - AI model by Ghent University - **4 layer Densely Connected LSTM 14M (PTB)** (Ghent University) — 2017-07-19 | Parameters: 14M - License: closed | Type: model - AI model by Ghent University - **AWD-LSTM** (DeepMind,University of Oxford) — 2017-07-18 | Parameters: 24M - License: closed | Type: model - AI model by DeepMind,University of Oxford - **JFT** (Google Research,Carnegie Mellon University (CMU)) — 2017-07-10 | Parameters: 44.7M - License: closed | Type: model - AI model by Google Research,Carnegie Mellon University (CMU) - **DeepLoc** (Technical University of Denmark,University of Copenhagen) — 2017-07-07 - License: closed | Type: model - AI model by Technical University of Denmark,University of Copenhagen - **NoisyNet-Dueling** (DeepMind) — 2017-06-30 - License: closed | Type: model - AI model by DeepMind - **DeepLabV3** (Google) — 2017-06-17 - License: closed | Type: model - AI model by Google - **HRA** (Maluuba,Microsoft) — 2017-06-13 - License: closed | Type: model - AI model by Maluuba,Microsoft - **Transformer** (Google Research,Google Brain) — 2017-06-12 | Parameters: 213M - License: closed | Type: model - AI model by Google Research,Google Brain - **EDSR** (Seoul National University) — 2017-06-10 - License: closed | Type: model - AI model by Seoul National University - **Reading Twice for NLU** (DeepMind) — 2017-06-08 - License: closed | Type: model - AI model by DeepMind - **PointNet++** (Stanford University) — 2017-06-07 - License: closed | Type: model - AI model by Stanford University - **Transformer (big)** (Google) — 2017-06-01 | Parameters: Transformer (big) - License: open | Type: model - "Transformer Big — 32 768 tokens per step × 300 000 steps → 9.83 B tokens processed" "We trained on the standard WMT 2014 English-German dataset consisting of about 4.5 million sentence pairs... For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M sentences and split tokens into a 32000 word-piece vocabulary. Sentence pairs were batched together by approximate sequence length. Each training batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000 target tokens." - **Transformer (base)** (Google) — 2017-06-01 | Parameters: Transformer (base) - License: open | Type: model - "Transformer Base — 32 768 tokens per step × 100 000 steps → 3.28 B tokens processed" "We trained on the standard WMT 2014 English-German dataset consisting of about 4.5 million sentence pairs... For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M sentences and split tokens into a 32000 word-piece vocabulary. Sentence pairs were batched together by approximate sequence length. Each training batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000 target tokens." - **Inflated 3D ConvNet** (DeepMind,University of Oxford) — 2017-06-01 - License: closed | Type: model - AI model by DeepMind,University of Oxford - **SRGAN** (Twitter) — 2017-05-25 - License: closed | Type: model - AI model by Twitter - **Low-Cost Collaborative Network** (National University of Singapore,University of Technology Sydney,Qihoo 360 AI Institute) — 2017-05-15 - License: closed | Type: model - AI model by National University of Singapore,University of Technology Sydney,Qihoo 360 AI Institute - **Mnemonic Reader** (Fudan University,Microsoft Research) — 2017-05-08 - License: closed | Type: model - AI model by Fudan University,Microsoft Research - **DeepLab (2017)** (Johns Hopkins University,Google,University College London (UCL)) — 2017-04-27 - License: closed | Type: model - AI model by Johns Hopkins University,Google,University College London (UCL) - **Tacotron** (Google) — 2017-04-06 - License: closed | Type: model - AI model by Google - **WGAN-GP** (Courant Institute of Mathematical Sciences,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms)) — 2017-03-31 - License: closed | Type: model - AI model by Courant Institute of Mathematical Sciences,Mila - Quebec AI (originally Montreal Institute for Learning Algorithms) - **Mask R-CNN** (Facebook AI Research) — 2017-03-30 - License: closed | Type: model - AI model by Facebook AI Research - **AlexNet + coordinating filters** (University of Pittsburgh,Duke University) — 2017-03-28 | Parameters: 60M - License: open | Type: model - AI model by University of Pittsburgh,Duke University - **Prototypical networks** (University of Toronto,Twitter) — 2017-03-15 - License: closed | Type: model - AI model by University of Toronto,Twitter - **Variational Lossy Autoencoder (VLAE) MNIST** (University of California (UC) Berkeley,OpenAI) — 2017-03-04 - License: closed | Type: model - AI model by University of California (UC) Berkeley,OpenAI - **SEST** (Carnegie Mellon University (CMU)) — 2017-03-02 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **DnCNN** (Harbin Institute of Technology,Hong Kong Polytechnic University,ULSee Inc.,Xi’an Jiaotong University) — 2017-02-01 - License: closed | Type: model - AI model by Harbin Institute of Technology,Hong Kong Polytechnic University,ULSee Inc.,Xi’an Jiaotong University - **VDCNN (on Amazon Review Full dataset)** (Facebook AI Research,University of Le Mans) — 2017-01-27 | Parameters: 7.8M - License: closed | Type: model - AI model by Facebook AI Research,University of Le Mans - **MoE-Multi** (Jagiellonian University,Google Brain) — 2017-01-23 | Parameters: 8.7B - License: closed | Type: model - AI model by Jagiellonian University,Google Brain - **PixelCNN++** (OpenAI) — 2017-01-19 - License: open | Type: model - AI model by OpenAI - **OR-WideResNet** (Duke University,University of Chinese Academy of Sciences) — 2017-01-07 | Parameters: 18.2M - License: closed | Type: model - AI model by Duke University,University of Chinese Academy of Sciences - **DeepStack** (University of Alberta,Charles University,Czech Technical University) — 2017-01-06 | Parameters: 2.5M - License: closed | Type: model - AI model by University of Alberta,Charles University,Czech Technical University - **GCNN-14** (Facebook AI Research) — 2016-12-23 - License: closed | Type: model - AI model by Facebook AI Research - **EnhanceNet** (Max Planck Institute for Intelligent Systems) — 2016-12-23 | Parameters: 814.5K - License: open | Type: model - AI model by Max Planck Institute for Intelligent Systems - **GCRN-M1, dropout** (Ecole Polytechnique F´ed´erale de Lausanne (EPFL)) — 2016-12-22 | Parameters: 42M - License: closed | Type: model - AI model by Ecole Polytechnique F´ed´erale de Lausanne (EPFL) - **3DMM-CNN** (University of Southern California) — 2016-12-15 | Parameters: 44.5M - License: open | Type: model - AI model by University of Southern California - **Diabetic Retinopathy Detection Net** (UT Austin,University of California (UC) Berkeley,Google) — 2016-12-13 - License: closed | Type: model - AI model by UT Austin,University of California (UC) Berkeley,Google - **Neural cache model (size=2000)** (Facebook AI Research) — 2016-12-13 - License: closed | Type: model - AI model by Facebook AI Research - **LSTM (PTB)** (Facebook AI Research) — 2016-12-13 - License: closed | Type: model - AI model by Facebook AI Research - **LSTM (WT2)** (Facebook AI Research) — 2016-12-13 - License: closed | Type: model - AI model by Facebook AI Research - **LSTM (WT103)** (Facebook AI Research) — 2016-12-13 - License: closed | Type: model - AI model by Facebook AI Research - **HR-ResNet101** (Carnegie Mellon University (CMU)) — 2016-12-13 | Parameters: 44.5M - License: open | Type: model - AI model by Carnegie Mellon University (CMU) - **GAN-Advancer** (OpenAI) — 2016-12-05 - License: closed | Type: model - AI model by OpenAI - **Layer-Norm Fast Weights RNN** (University of Toronto,Google DeepMind,Google Brain) — 2016-12-05 - License: closed | Type: model - AI model by University of Toronto,Google DeepMind,Google Brain - **Elastic weight consolidation** (DeepMind) — 2016-12-02 - License: closed | Type: model - AI model by DeepMind - **PointNet** (Stanford University) — 2016-12-02 - License: closed | Type: model - AI model by Stanford University - **Image-to-image cGAN** (University of California (UC) Berkeley) — 2016-11-21 - License: closed | Type: model - AI model by University of California (UC) Berkeley - **RefineNet** (University of Adelaide,Australian Centre for Robotic Vision) — 2016-11-20 - License: closed | Type: model - AI model by University of Adelaide,Australian Centre for Robotic Vision - **PolyNet** (Chinese University of Hong Kong (CUHK)) — 2016-11-17 | Parameters: 92M - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **ResNeXt-50** (University of California San Diego,Facebook) — 2016-11-16 | Parameters: 25M - License: open | Type: model - AI model by University of California San Diego,Facebook - **DAC-CSR** (Jiangnan University,University of Surrey) — 2016-11-16 - License: closed | Type: model - AI model by Jiangnan University,University of Surrey - **ResNeXt-101 (64×4d)** (University of California San Diego,Facebook) — 2016-11-16 | Parameters: 83M - License: open | Type: model - AI model by University of California San Diego,Facebook - **Deeply-recursive ConvNet** (Seoul National University) — 2016-11-11 - License: closed | Type: model - AI model by Seoul National University - **DTN (Domain Transfer Network)** (Facebook AI Research) — 2016-11-07 - License: closed | Type: model - AI model by Facebook AI Research - **DLDL (PASCAL)** (University of Oxford) — 2016-11-06 | Parameters: 564M - License: open | Type: model - AI model by University of Oxford - **NASv3 (CIFAR-10)** (Google Brain) — 2016-11-05 | Parameters: 37.4M - License: closed | Type: model - AI model by Google Brain - **NAS with base 8 and shared embeddings** (Google Brain) — 2016-11-05 | Parameters: 54M - License: closed | Type: model - AI model by Google Brain - **BIDAF** (University of Washington,Allen Institute for AI) — 2016-11-05 | Parameters: 2.6M - License: open | Type: model - AI model by University of Washington,Allen Institute for AI - **VD-LSTM+REAL Large** (Salesforce Research,Stanford University) — 2016-11-04 | Parameters: 51M - License: closed | Type: model - AI model by Salesforce Research,Stanford University - **VD-LSTM+REAL Medium** (Stanford University,Salesforce Research) — 2016-11-04 | Parameters: 22.1M - License: closed | Type: model - AI model by Stanford University,Salesforce Research - **VD-LSTM+REAL Small** (Stanford University,Salesforce Research) — 2016-11-04 | Parameters: 6.8M - License: closed | Type: model - AI model by Stanford University,Salesforce Research - **SPIDER2** (Griffith University,University of Iowa,Dezhou University) — 2016-10-28 | Parameters: 409.5K - License: open | Type: model - AI model by Griffith University,University of Iowa,Dezhou University - **Differentiable neural computer** (Google DeepMind) — 2016-10-12 - License: closed | Type: model - AI model by Google DeepMind - **GAWWN** (University of Michigan,Max Planck Institute for Informatics) — 2016-10-08 - License: closed | Type: model - AI model by University of Michigan,Max Planck Institute for Informatics - **Xception** (Google) — 2016-10-07 | Parameters: 22.9M - License: closed | Type: model - AI model by Google - **GNMT** (Google) — 2016-09-26 | Parameters: 278M - License: closed | Type: model - AI model by Google - **Pointer Sentinel-LSTM (medium)** (MetaMind Inc,Salesforce) — 2016-09-26 | Parameters: 21M - License: closed | Type: model - AI model by MetaMind Inc,Salesforce - **Zoneout + Variational LSTM (WT2)** (MetaMind Inc,Salesforce) — 2016-09-26 | Parameters: 21M - License: closed | Type: model - AI model by MetaMind Inc,Salesforce - **Zoneout + Variational LSTM (PTB)** (MetaMind Inc) — 2016-09-26 | Parameters: 20M - License: closed | Type: model - AI model by MetaMind Inc - **Pointer Sentinel-LSTM (WT2)** (MetaMind Inc) — 2016-09-26 | Parameters: 21M - License: closed | Type: model - AI model by MetaMind Inc - **Knowledge distillation student model** (Harvard University) — 2016-09-22 | Parameters: 84M - License: closed | Type: model - AI model by Harvard University - **Wide Residual Network** (Université Paris-Est) — 2016-09-19 - License: closed | Type: model - AI model by Université Paris-Est - **MS-CNN** (IBM,University of California San Diego) — 2016-09-17 - License: closed | Type: model - AI model by IBM,University of California San Diego - **Stacked hourglass network** (University of Michigan) — 2016-09-17 - License: closed | Type: model - AI model by University of Michigan - **TSN** (ETH Zurich,Shenzhen Institute of Advanced Technology,Chinese University of Hong Kong (CUHK)) — 2016-09-17 - License: closed | Type: model - AI model by ETH Zurich,Shenzhen Institute of Advanced Technology,Chinese University of Hong Kong (CUHK) - **ResNet-200** (Microsoft Research Asia) — 2016-09-17 - License: closed | Type: model - AI model by Microsoft Research Asia - **Youtube recommendation model** (Google) — 2016-09-15 - License: closed | Type: model - AI model by Google - **WaveNet** (Google DeepMind) — 2016-09-12 - License: closed | Type: model - AI model by Google DeepMind - **MS-ensemble-speech-recognition** (Microsoft) — 2016-09-12 | Parameters: 3.2B - License: closed | Type: model - AI model by Microsoft - **LF-MMI** (Johns Hopkins University,Cornell University) — 2016-09-08 | Parameters: 16.6M - License: closed | Type: model - AI model by Johns Hopkins University,Cornell University - **Multi-task Cascaded CNN** (Chinese Academy of Sciences,Chinese University of Hong Kong (CUHK)) — 2016-08-26 - License: closed | Type: model - AI model by Chinese Academy of Sciences,Chinese University of Hong Kong (CUHK) - **DenseNet-264** (Tsinghua University,Facebook AI Research,Cornell University) — 2016-08-25 | Parameters: 34M - License: open | Type: model - AI model by Tsinghua University,Facebook AI Research,Cornell University - **SimpleNet** (Sensifai,Islamic Azad University,Technicolor R&I,Institute for Research in Fundamental Sciences (IPM)) — 2016-08-22 | Parameters: 5.5M - License: closed | Type: model - AI model by Sensifai,Islamic Azad University,Technicolor R&I,Institute for Research in Fundamental Sciences (IPM) - **Attend-Infer-Repeat** (Google DeepMind) — 2016-08-12 | Parameters: 82.1M - License: closed | Type: model - AI model by Google DeepMind - **Order embeddings with layer norm** (University of Toronto) — 2016-07-21 - License: closed | Type: model - AI model by University of Toronto - **Layer Normalization: The Attentive Reader** (University of Toronto) — 2016-07-21 - License: closed | Type: model - AI model by University of Toronto - **Layer Normalization: Skip Thoughts** (University of Toronto) — 2016-07-21 - License: closed | Type: model - AI model by University of Toronto - **Layer Normalization: Draw** (University of Toronto) — 2016-07-21 - License: closed | Type: model - AI model by University of Toronto - **Layer Normalization: Handwriting sequence generation** (University of Toronto) — 2016-07-21 | Parameters: 3.7M - License: closed | Type: model - AI model by University of Toronto - **Character-enriched word2vec** (Facebook AI Research) — 2016-07-15 - License: closed | Type: model - AI model by Facebook AI Research - **VD-RHN** (ETH Zurich,IDSIA) — 2016-07-12 | Parameters: 32M - License: closed | Type: model - AI model by ETH Zurich,IDSIA - **Variational RHN + WT (PTB)** (ETH Zurich,IDSIA) — 2016-07-12 | Parameters: 23M - License: closed | Type: model - AI model by ETH Zurich,IDSIA - **fastText** (Facebook AI Research) — 2016-07-06 - License: closed | Type: model - AI model by Facebook AI Research - **node2vec** (Stanford University) — 2016-07-03 | Parameters: 1.3M - License: closed | Type: model - AI model by Stanford University - **CCL** (SenseTime,Chinese University of Hong Kong (CUHK),Chinese Academy of Sciences) — 2016-06-27 - License: closed | Type: model - AI model by SenseTime,Chinese University of Hong Kong (CUHK),Chinese Academy of Sciences - **Wide & Deep** (Google) — 2016-06-24 - License: closed | Type: model - AI model by Google - **R-FCN** (Tsinghua University,Microsoft Research) — 2016-06-21 - License: closed | Type: model - AI model by Tsinghua University,Microsoft Research - **DMN** (Salesforce) — 2016-06-20 - License: closed | Type: model - AI model by Salesforce - **Segmental RNN** (University of Edinburgh,Carnegie Mellon University (CMU),University of Washington) — 2016-06-20 - License: closed | Type: model - AI model by University of Edinburgh,Carnegie Mellon University (CMU),University of Washington - **CMS-RCNN** (IEEE) — 2016-06-17 | Parameters: 138M - License: closed | Type: model - AI model by IEEE - **PixelCNN** (Google DeepMind) — 2016-06-16 - License: closed | Type: model - AI model by Google DeepMind - **Part-of-sentence tagging model** (Carnegie Mellon University (CMU)) — 2016-05-29 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **LRR-4X** (UC Irvine) — 2016-05-08 | Parameters: 138M - License: open | Type: model - AI model by UC Irvine - **Dueling DQN** (Google DeepMind) — 2016-04-05 | Parameters: 1.7M - License: closed | Type: model - AI model by Google DeepMind - **Symmetric Residual Encoder-Decoder Net** (Nanjing University,University of Adelaide) — 2016-03-30 - License: closed | Type: model - AI model by Nanjing University,University of Adelaide - **Binarized Neural Network (MNIST)** (Technion - Israel Institute of Technology,Columbia University,University of Montreal / Université de Montréal) — 2016-03-17 | Parameters: 37M - License: closed | Type: model - AI model by Technion - Israel Institute of Technology,Columbia University,University of Montreal / Université de Montréal - **Template Adaptation** (University of Oxford) — 2016-03-12 | Parameters: 138M - License: closed | Type: model - AI model by University of Oxford - **Named Entity Recognition model** (Carnegie Mellon University (CMU)) — 2016-03-04 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **Double DQN** (Google DeepMind) — 2016-03-02 | Parameters: 1.5M - License: closed | Type: model - AI model by Google DeepMind - **Order-Embeddings of Images and Language** (University of Toronto) — 2016-03-01 - License: open | Type: model - AI model by University of Toronto - **BIG LSTM+CNN INPUTS** (Google Brain) — 2016-02-11 | Parameters: 1.0B - License: closed | Type: model - AI model by Google Brain - **10 LSTMS + KN-5 (OPTIMAL WEIGHTS)** (Google Brain) — 2016-02-11 | Parameters: 1.0B - License: closed | Type: model - AI model by Google Brain - **A3C FF hs** (Google,University of Montreal / Université de Montréal) — 2016-02-04 - License: closed | Type: model - AI model by Google,University of Montreal / Université de Montréal - **Convolutional Pose Machines** (Carnegie Mellon University (CMU)) — 2016-01-30 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **AlphaGo Lee** (DeepMind) — 2016-01-27 - License: closed | Type: model - AI model by DeepMind - **Variational (untied weights, MC) LSTM (Large)** (University of Cambridge) — 2015-12-16 | Parameters: 66M - License: closed | Type: model - AI model by University of Cambridge - **Advantage Learning** (Google DeepMind) — 2015-12-15 - License: closed | Type: model - AI model by Google DeepMind - **BPL** (University of Toronto,New York University (NYU),Massachusetts Institute of Technology (MIT)) — 2015-12-11 - License: closed | Type: model - AI model by University of Toronto,New York University (NYU),Massachusetts Institute of Technology (MIT) - **ResNet-152 (ImageNet)** (Microsoft) — 2015-12-10 | Parameters: 60.2M - License: closed | Type: model - AI model by Microsoft - **ResNet-101 (ImageNet)** (Microsoft) — 2015-12-10 | Parameters: 44.5M - License: open | Type: model - AI model by Microsoft - **DeepSpeech2 (English)** (Baidu Research - Silicon Valley AI Lab) — 2015-12-08 | Parameters: 38M - License: closed | Type: model - AI model by Baidu Research - Silicon Valley AI Lab - **Inception v3** (Google,University College London (UCL)) — 2015-12-02 | Parameters: 23.6M - License: closed | Type: model - AI model by Google,University College London (UCL) - **Netflix Recommender System** (Netflix) — 2015-12-01 - License: closed | Type: model - AI model by Netflix - **Multi-scale Dilated CNN** (Princeton University,Intel Labs) — 2015-11-23 - License: closed | Type: model - AI model by Princeton University,Intel Labs - **Highway Network** (IDSIA,SUPSI) — 2015-11-23 | Parameters: 2.3M - License: closed | Type: model - AI model by IDSIA,SUPSI - **3DDFA** (Chinese Academy of Sciences,Michigan State University) — 2015-11-23 | Parameters: 5.4M - License: closed | Type: model - AI model by Chinese Academy of Sciences,Michigan State University - **The Attentive Reader** (Google DeepMind) — 2015-11-19 - License: open | Type: model - AI model by Google DeepMind - **SAF R-CNN** (Beijing Institute of Technology,Sun Yat-sen University,Panasonic R&D,National University of Singapore) — 2015-10-28 | Parameters: 138M - License: closed | Type: model - AI model by Beijing Institute of Technology,Sun Yat-sen University,Panasonic R&D,National University of Singapore - **AlphaGo Fan** (DeepMind) — 2015-10-01 | Parameters: 8.2M - License: closed | Type: model - AI model by DeepMind - **Deep Deterministic Policy Gradients** (Google DeepMind) — 2015-09-09 - License: closed | Type: model - AI model by Google DeepMind - **LSTM-Char-Large** (Harvard University,New York University (NYU)) — 2015-08-26 | Parameters: 19M - License: closed | Type: model - AI model by Harvard University,New York University (NYU) - **Listen, Attend and Spell** (Google,Carnegie Mellon University (CMU)) — 2015-08-20 - License: closed | Type: model - AI model by Google,Carnegie Mellon University (CMU) - **DCNN** (University of Maryland,Rutgers University) — 2015-08-07 | Parameters: 5.0M - License: closed | Type: model - AI model by University of Maryland,Rutgers University - **Deep CNN + COTS** (IEEE) — 2015-07-26 | Parameters: 5.0M - License: closed | Type: model - AI model by IEEE - **CompACT-Deep** (University of California San Diego) — 2015-07-19 - License: closed | Type: model - AI model by University of California San Diego - **Search-Proven Best LSTM** (Google) — 2015-07-06 | Parameters: 20M - License: closed | Type: model - AI model by Google - **Skip-Thoughts** (University of Toronto,Massachusetts Institute of Technology (MIT),CIFAR AI Research) — 2015-06-22 - License: open | Type: model - AI model by University of Toronto,Massachusetts Institute of Technology (MIT),CIFAR AI Research - **BatchNorm** (Google) — 2015-06-15 | Parameters: 13.6M - License: closed | Type: model - AI model by Google - **CFSS** (SenseTime,Chinese University of Hong Kong (CUHK),Shenzhen Institute of Advanced Technology) — 2015-06-07 | Parameters: 17.4K - License: open | Type: model - AI model by SenseTime,Chinese University of Hong Kong (CUHK),Shenzhen Institute of Advanced Technology - **Faster R-CNN** (Microsoft Research) — 2015-06-04 - License: open | Type: model - AI model by Microsoft Research - **Draw** (Google DeepMind) — 2015-05-20 - License: closed | Type: model - AI model by Google DeepMind - **U-Net** (University of Freiburg) — 2015-05-18 | Parameters: 37.7M - License: open | Type: model - AI model by University of Freiburg - **Deep LSTM video classifier** (University of Texas at Austin,Google) — 2015-05-01 - License: closed | Type: model - AI model by University of Texas at Austin,Google - **Fast R-CNN** (Microsoft Research) — 2015-04-30 - License: closed | Type: model - AI model by Microsoft Research - **TC-DNN-BLSTM-DNN** (Carnegie Mellon University (CMU)) — 2015-04-06 | Parameters: 18.4M - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **genCNN + dyn eval** (Chinese Academy of Sciences,Huawei Noah's Ark Lab,Dublin City University) — 2015-03-17 | Parameters: 8M - License: closed | Type: model - AI model by Chinese Academy of Sciences,Huawei Noah's Ark Lab,Dublin City University - **TRPO** (University of California (UC) Berkeley) — 2015-02-19 | Parameters: 33.5K - License: closed | Type: model - AI model by University of California (UC) Berkeley - **CRF-RNN** (University of Oxford,Stanford University,Baidu) — 2015-02-11 - License: closed | Type: model - AI model by University of Oxford,Stanford University,Baidu - **MSRA (C, PReLU)** (Microsoft Research) — 2015-02-06 | Parameters: 87.0M - License: closed | Type: model - AI model by Microsoft Research - **VGG-Face** (University of Oxford) — 2015-01-01 | Parameters: 138M - License: closed | Type: model - AI model by University of Oxford - **N-gram+Cache (PTB)** (Facebook AI Research) — 2014-12-24 - License: closed | Type: model - AI model by Facebook AI Research - **N-gram (PTB)** (Facebook AI Research) — 2014-12-24 - License: closed | Type: model - AI model by Facebook AI Research - **ADAM (CIFAR-10)** (University of Amsterdam,OpenAI,University of Toronto) — 2014-12-22 | Parameters: 2.4M - License: closed | Type: model - AI model by University of Amsterdam,OpenAI,University of Toronto - **DeepLab** (Google,University of California Los Angeles (UCLA)) — 2014-12-22 - License: closed | Type: model - AI model by Google,University of California Los Angeles (UCLA) - **Fractional Max-Pooling** (University of Warwick) — 2014-12-18 | Parameters: 27M - License: closed | Type: model - AI model by University of Warwick - **NTM** (Google DeepMind) — 2014-12-10 - License: closed | Type: model - AI model by Google DeepMind - **SNM-skip** (Google) — 2014-12-03 | Parameters: 62B - License: closed | Type: model - AI model by Google - **TA-CNN** (Chinese University of Hong Kong (CUHK)) — 2014-11-29 | Parameters: 706.0K - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **Cascaded LNet-ANet** (Chinese University of Hong Kong (CUHK)) — 2014-11-28 - License: closed | Type: model - AI model by Chinese University of Hong Kong (CUHK) - **Fully Convolutional Networks** (University of California (UC) Berkeley) — 2014-11-14 - License: closed | Type: model - AI model by University of California (UC) Berkeley - **SC-NLM** (University of Toronto) — 2014-11-10 - License: closed | Type: model - AI model by University of Toronto - **Spatially-Sparse CNN** (University of Warwick) — 2014-09-23 - License: closed | Type: model - AI model by University of Warwick - **GoogLeNet / InceptionV1** (Google,University of Michigan,University of North Carolina) — 2014-09-17 | Parameters: 6.8M - License: closed | Type: model - AI model by Google,University of Michigan,University of North Carolina - **SPN-4+KN5** (Singapore University of Technology & Design,DSO National Laboratories) — 2014-09-14 | Parameters: 5M - License: closed | Type: model - AI model by Singapore University of Technology & Design,DSO National Laboratories - **Seq2Seq LSTM** (Google) — 2014-09-10 | Parameters: 1.9B - License: closed | Type: model - AI model by Google - **Large regularized LSTM** (New York University (NYU),Google Brain) — 2014-09-08 | Parameters: 66M - License: closed | Type: model - AI model by New York University (NYU),Google Brain - **VGG16** (University of Oxford) — 2014-09-04 | Parameters: 138M - License: closed | Type: model - AI model by University of Oxford - **VGG19** (University of Oxford) — 2014-09-04 | Parameters: 144M - License: closed | Type: model - AI model by University of Oxford - **RNNsearch-50*** (Jacobs University Bremen,University of Montreal / Université de Montréal) — 2014-09-01 - License: closed | Type: model - AI model by Jacobs University Bremen,University of Montreal / Université de Montréal - **AdClickNet** (Facebook) — 2014-08-24 - License: closed | Type: model - AI model by Facebook - **NPD** (IEEE) — 2014-08-06 | Parameters: 313.9K - License: open | Type: model - AI model by IEEE - **ACF-WIDER** (Chinese Academy of Sciences) — 2014-07-15 | Parameters: 6.1K - License: closed | Type: model - AI model by Chinese Academy of Sciences - **SmooCT** (University College London (UCL)) — 2014-07-01 - License: closed | Type: model - AI model by University College London (UCL) - **DeepFace** (Tel Aviv University,Facebook) — 2014-06-23 - License: closed | Type: model - AI model by Tel Aviv University,Facebook - **RNN-WER** (DeepMind,University of Toronto) — 2014-06-22 | Parameters: 26.5M - License: closed | Type: model - AI model by DeepMind,University of Toronto - **Fragment embedding** (Stanford University) — 2014-06-21 | Parameters: 144.5M - License: closed | Type: model - AI model by Stanford University - **SPPNet** (Microsoft,Xi’an Jiaotong University,University of Science and Technology of China (USTC)) — 2014-06-18 - License: closed | Type: model - AI model by Microsoft,Xi’an Jiaotong University,University of Science and Technology of China (USTC) - **GANs** (University of Montreal / Université de Montréal) — 2014-06-10 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Two-stream ConvNets for action recognition** (University of Oxford) — 2014-06-09 - License: closed | Type: model - AI model by University of Oxford - **GRUs** (University of Montreal / Université de Montréal,Jacobs University,University of Maine) — 2014-06-03 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal,Jacobs University,University of Maine - **Dropout: SVHN** (University of Toronto) — 2014-06-01 | Parameters: 47.8M - License: closed | Type: model - AI model by University of Toronto - **AdaRNN** (Beihang University) — 2014-06-01 | Parameters: 13.0K - License: closed | Type: model - AI model by Beihang University - **Paragraph Vector** (Google) — 2014-05-14 | Parameters: 32M - License: closed | Type: model - AI model by Google - **Clockwork RNN (CW-RNN)** (IDSIA,SUPSI) — 2014-02-14 | Parameters: 10K - License: closed | Type: model - AI model by IDSIA,SUPSI - **SPN-4** (Singapore University of Technology & Design,DSO National Laboratories) — 2014-01-01 - License: closed | Type: model - AI model by Singapore University of Technology & Design,DSO National Laboratories - **OverFeat** (New York University (NYU)) — 2013-12-21 | Parameters: 144M - License: closed | Type: model - AI model by New York University (NYU) - **Image generation** (University of Amsterdam) — 2013-12-20 | Parameters: 784K - License: closed | Type: model - AI model by University of Amsterdam - **DQN** (DeepMind) — 2013-12-19 | Parameters: 836.1K - License: closed | Type: model - AI model by DeepMind - **Network in Network** (National University of Singapore) — 2013-12-16 - License: closed | Type: model - AI model by National University of Singapore - **Deep RNN (PTB)** (MetaMind Inc) — 2013-12-11 | Parameters: 6M - License: closed | Type: model - AI model by MetaMind Inc - **RNN for 1B words** (Google) — 2013-12-11 | Parameters: 20B - License: closed | Type: model - AI model by Google - **TransE** (Universite de Technologie de Compiègne – CNRS,Google) — 2013-12-05 | Parameters: 942M - License: closed | Type: model - AI model by Universite de Technologie de Compiègne – CNRS,Google - **DeViSE** (Google) — 2013-12-05 - License: closed | Type: model - AI model by Google - **TensorReasoner** (Stanford University) — 2013-12-01 - License: closed | Type: model - AI model by Stanford University - **Visualizing CNNs** (New York University (NYU)) — 2013-11-12 - License: closed | Type: model - AI model by New York University (NYU) - **Word2Vec (large)** (Google) — 2013-10-16 | Parameters: 692M - License: closed | Type: model - AI model by Google - **RNTN** (Stanford University) — 2013-10-01 - License: closed | Type: model - AI model by Stanford University - **RCTM** (University of Oxford) — 2013-10-01 - License: closed | Type: model - AI model by University of Oxford - **Mitosis** (IDSIA) — 2013-09-22 | Parameters: 37.2K - License: closed | Type: model - AI model by IDSIA - **RNN+weight noise+dynamic eval** (University of Toronto) — 2013-08-04 | Parameters: 54M - License: closed | Type: model - AI model by University of Toronto - **Hierarchical Scene Labeling (Stanford Background)** (New York University (NYU)) — 2013-08-01 | Parameters: 51.6M - License: closed | Type: model - AI model by New York University (NYU) - **Fisher Vector image classifier** (Universidad Nacional de Cordoba,Inteligent Systems Lab Amsterdam,University of Amsterdam,LEAR Team,INRIA,Xerox Research Centre Europe (XRCE)) — 2013-06-12 - License: closed | Type: model - AI model by Universidad Nacional de Cordoba,Inteligent Systems Lab Amsterdam,University of Amsterdam,LEAR Team,INRIA,Xerox Research Centre Europe (XRCE) - **SemVec** (Microsoft Research) — 2013-06-09 - License: closed | Type: model - AI model by Microsoft Research - **Multilingual DNN** (Google) — 2013-05-26 | Parameters: 206.9M - License: closed | Type: model - AI model by Google - **ReLU-Speech** (Google,University of Toronto,New York University (NYU)) — 2013-05-26 | Parameters: 101.7M - License: closed | Type: model - AI model by Google,University of Toronto,New York University (NYU) - **Selective Search** (University of Trento,University of Amsterdam) — 2013-04-02 - License: closed | Type: model - AI model by University of Trento,University of Amsterdam - **Maxout Networks** (University of Montreal / Université de Montréal) — 2013-02-18 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Textual Imager** (Stanford University) — 2013-01-16 - License: closed | Type: model - AI model by Stanford University - **DistBelief NNLM** (Google) — 2013-01-16 - License: closed | Type: model - AI model by Google - **RNN (SGD+CLR)** (University of Montreal / Université de Montréal) — 2012-12-14 | Parameters: 195.6K - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **DistBelief Speech** (Google) — 2012-12-03 | Parameters: 47.2M - License: closed | Type: model - AI model by Google - **DistBelief Vision** (Google) — 2012-12-03 | Parameters: 1.7B - License: closed | Type: model - AI model by Google - **DNN EM segmentation** (IDSIA,SUPSI) — 2012-12-03 | Parameters: 218.9K - License: closed | Type: model - AI model by IDSIA,SUPSI - **Bayesian automated hyperparameter tuning** (University of Toronto,University of Sherbrooke,Harvard University) — 2012-12-02 - License: closed | Type: model - AI model by University of Toronto,University of Sherbrooke,Harvard University - **RNN+LDA** (Microsoft Research) — 2012-12-01 - License: closed | Type: model - AI model by Microsoft Research - **RNN+LSA+KN5+cache (model combination w/ linear extrapolation)** (Microsoft Research) — 2012-12-01 - License: closed | Type: model - AI model by Microsoft Research - **RNN** (Microsoft Research) — 2012-12-01 - License: closed | Type: model - AI model by Microsoft Research - **AlexNet** (University of Toronto) — 2012-09-30 | Parameters: 60M - License: closed | Type: model - AI model by University of Toronto - **LSTM LM** (RWTH Aachen University) — 2012-09-09 | Parameters: 102.7M - License: closed | Type: model - AI model by RWTH Aachen University - **Context-dependent RNN** (Microsoft Research,Brno University of Technology) — 2012-07-27 - License: closed | Type: model - AI model by Microsoft Research,Brno University of Technology - **Unsupervised High-level Feature Learner** (Google) — 2012-07-12 | Parameters: 1B - License: closed | Type: model - AI model by Google - **Ngram corpus** (Google) — 2012-07-08 - License: closed | Type: model - AI model by Google - **LBL** (University College London (UCL)) — 2012-06-27 | Parameters: 2M - License: closed | Type: model - AI model by University College London (UCL) - **Dropout (ImageNet)** (University of Toronto) — 2012-06-03 - License: closed | Type: model - AI model by University of Toronto - **Dropout (CIFAR)** (University of Toronto) — 2012-06-03 - License: closed | Type: model - AI model by University of Toronto - **Dropout (MNIST)** (University of Toronto) — 2012-06-03 | Parameters: 5.6M - License: closed | Type: model - AI model by University of Toronto - **MCDNN (MNIST)** (IDSIA,SUPSI) — 2012-02-13 | Parameters: 2.7M - License: closed | Type: model - AI model by IDSIA,SUPSI - **HOGWILD!** (University of Wisconsin Madison) — 2011-11-11 - License: closed | Type: model - AI model by University of Wisconsin Madison - **Adaptive Subgrad** (Technion - Israel Institute of Technology,Google,University of California (UC) Berkeley) — 2011-10-03 - License: closed | Type: model - AI model by Technion - Israel Institute of Technology,Google,University of California (UC) Berkeley - **CNN committee (traffic sign)** (IDSIA) — 2011-10-03 | Parameters: 1.4M - License: closed | Type: model - AI model by IDSIA - **CNN Committee (NIST)** (IDSIA) — 2011-09-18 | Parameters: 128.4K - License: closed | Type: model - AI model by IDSIA - **CNN Committee (MNIST)** (IDSIA) — 2011-09-18 | Parameters: 120.6K - License: closed | Type: model - AI model by IDSIA - **High Performance CNN (NORB)** (IDSIA,SUPSI) — 2011-07-16 | Parameters: 4.9M - License: closed | Type: model - AI model by IDSIA,SUPSI - **Recursive sentiment autoencoder** (Stanford University) — 2011-07-01 - License: closed | Type: model - AI model by Stanford University - **Recursive Neural Network** (Stanford University) — 2011-06-28 - License: closed | Type: model - AI model by Stanford University - **Cross-Lingual POS Tagger** (Carnegie Mellon University (CMU),Google Research) — 2011-06-19 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU),Google Research - **Vector Space Model** (Stanford University) — 2011-06-19 | Parameters: 255K - License: closed | Type: model - AI model by Stanford University - **Deep Autoencoders** (University of Toronto) — 2011-04-29 | Parameters: 139.8M - License: closed | Type: model - AI model by University of Toronto - **Deep rectifier networks** (University of Montreal / Université de Montréal) — 2011-04-13 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Optimized Single-layer Net** (University of Michigan,Stanford University) — 2011-04-11 - License: closed | Type: model - AI model by University of Michigan,Stanford University - **KN5 LM + RNN 400/10 (WSJ)** (Brno University of Technology,Johns Hopkins University) — 2010-09-26 | Parameters: 22.2M - License: closed | Type: model - AI model by Brno University of Technology,Johns Hopkins University - **RNN 500/10 + RT09 LM (NIST RT05)** (Brno University of Technology,Johns Hopkins University) — 2010-09-26 | Parameters: 19.3M - License: closed | Type: model - AI model by Brno University of Technology,Johns Hopkins University - **RNN LM** (Johns Hopkins University) — 2010-09-26 | Parameters: 70.3M - License: closed | Type: model - AI model by Johns Hopkins University - **RNN 1000/5 + RT09 LM (NIST RT05)** (Brno University of Technology,Johns Hopkins University) — 2010-09-26 | Parameters: 77.0M - License: closed | Type: model - AI model by Brno University of Technology,Johns Hopkins University - **Pooling CNN (Caltech 101)** (University of Bonn) — 2010-09-15 | Parameters: 294.9K - License: closed | Type: model - AI model by University of Bonn - **Pooling CNN (NORB)** (University of Bonn) — 2010-09-15 | Parameters: 268.7K - License: closed | Type: model - AI model by University of Bonn - **Fisher-Boost** (Xerox Research Centre Europe (XRCE)) — 2010-09-05 - License: closed | Type: model - AI model by Xerox Research Centre Europe (XRCE) - **SimuParallelSGD** (Yahoo Research) — 2010-07-01 - License: closed | Type: model - AI model by Yahoo Research - **ReLU (LFW)** (University of Toronto) — 2010-06-15 - License: closed | Type: model - AI model by University of Toronto - **Mid-level Features** (INRIA,Ecole Normale Supèrieure,New York University (NYU)) — 2010-06-13 - License: closed | Type: model - AI model by INRIA,Ecole Normale Supèrieure,New York University (NYU) - **Deconvolutional Network** (New York University (NYU)) — 2010-06-13 - License: closed | Type: model - AI model by New York University (NYU) - **iCCCP** (Massachusetts Institute of Technology (MIT)) — 2010-06-13 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Feedforward NN** (University of Montreal / Université de Montréal) — 2010-05-13 | Parameters: 7.1M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **6-layer MLP (MNIST)** (IDSIA,University of Lugano,SUPSI) — 2010-03-01 | Parameters: 12.1M - License: closed | Type: model - AI model by IDSIA,University of Lugano,SUPSI - **Stacked Denoising Autoencoders** (University of Montreal / Université de Montréal,University of Toronto) — 2010-01-03 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal,University of Toronto - **Super-vector coding** (University of Illinois Urbana-Champaign (UIUC),NEC Laboratories,Rutgers University) — 2010-01-01 | Parameters: 1.0K - License: closed | Type: model - AI model by University of Illinois Urbana-Champaign (UIUC),NEC Laboratories,Rutgers University - **LCNP LabelMe** (University of Bonn) — 2009-11-22 | Parameters: 13.7M - License: closed | Type: model - AI model by University of Bonn - **3D city reconstruction** (University of Washington,Microsoft Research,Cornell University) — 2009-09-29 - License: closed | Type: model - AI model by University of Washington,Microsoft Research,Cornell University - **Two Stage Feature Extraction (MNIST)** (New York University (NYU)) — 2009-09-01 | Parameters: 258.8K - License: closed | Type: model - AI model by New York University (NYU) - **ConvNet Processor** (Courant Institute of Mathematical Sciences) — 2009-08-31 | Parameters: 14.4K - License: closed | Type: model - AI model by Courant Institute of Mathematical Sciences - **Pragmatic Theory solution (Netflix 2009)** (Pragmatic Theory Inc.) — 2009-08-01 - License: closed | Type: model - AI model by Pragmatic Theory Inc. - **Conditional Maximum Entropy Model (Gigaworld)** (Google) — 2009-07-01 | Parameters: 1M - License: closed | Type: model - AI model by Google - **GPU DBNs** (Stanford University) — 2009-06-15 | Parameters: 100M - License: closed | Type: model - AI model by Stanford University - **Conv-DBN** (Stanford University) — 2009-06-14 - License: closed | Type: model - AI model by Stanford University - **Deep Boltzmann Machines** (University of Toronto) — 2009-04-16 - License: closed | Type: model - AI model by University of Toronto - **RBM Image Classifier** (University of Toronto) — 2009-04-08 | Parameters: 80M - License: closed | Type: model - AI model by University of Toronto - **Long-Range Autonomous Off-Road Driving System** (Courant Institute of Mathematical Sciences) — 2009-01-08 | Parameters: 12.4K - License: closed | Type: model - AI model by Courant Institute of Mathematical Sciences - **BP-DBN** (University of Toronto) — 2009-01-01 | Parameters: 18.0M - License: closed | Type: model - AI model by University of Toronto - **GNN** (University of Siena) — 2008-12-09 | Parameters: 30 - License: open | Type: model - AI model by University of Siena - **HLBL** (University of Toronto) — 2008-12-08 | Parameters: 1.8M - License: closed | Type: model - AI model by University of Toronto - **ADAPTIVE NLPM** (University of Toronto) — 2008-12-08 | Parameters: 12.2M - License: closed | Type: model - AI model by University of Toronto - **Sparse digit recognition SVM** (University of Lubeck) — 2008-11-19 - License: closed | Type: model - AI model by University of Lubeck - **Boss (DARPA Urban Challenge)** (Carnegie Mellon University (CMU)) — 2008-07-23 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **Denoising Autoencoders** (University of Montreal / Université de Montréal) — 2008-07-05 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Semi-Supervised Embedding for DL** (Google,NUANCE Communications,IDIAP,University of Illinois Urbana-Champaign (UIUC)) — 2008-07-05 - License: closed | Type: model - AI model by Google,NUANCE Communications,IDIAP,University of Illinois Urbana-Champaign (UIUC) - **Deep Multitask NLP Network** (NEC Laboratories) — 2008-07-05 | Parameters: 1.5M - License: closed | Type: model - AI model by NEC Laboratories - **Multiscale deformable part model** (UC Irvine,University of Chicago,Toyota Technological Institute at Chicago) — 2008-06-23 - License: closed | Type: model - AI model by UC Irvine,University of Chicago,Toyota Technological Institute at Chicago - **Enhanced Neighborhood-Based Filtering** (AT&T) — 2007-10-28 - License: closed | Type: model - AI model by AT&T - **BLSTM for handwriting (1)** (University of Bern,IDSIA,Technical University of Munich) — 2007-09-23 - License: closed | Type: model - AI model by University of Bern,IDSIA,Technical University of Munich - **Fisher Kernel GMM** (Xerox) — 2007-07-16 - License: closed | Type: model - AI model by Xerox - **SB-LM** (Google) — 2007-06-22 | Parameters: 300B - License: closed | Type: model - AI model by Google - **KN-LM** (Google) — 2007-06-22 | Parameters: 21B - License: closed | Type: model - AI model by Google - **Empirical evaluation of deep architectures** (University of Montreal / Université de Montréal) — 2007-06-01 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Greedy layer-wise DNN training** (University of Montreal / Université de Montréal) — 2006-12-04 - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **Local Binary Patterns for facial recognition** (University of Oulu,IEEE) — 2006-12-01 - License: closed | Type: model - AI model by University of Oulu,IEEE - **Sparse Vision Encoding** (Stanford University) — 2006-11-01 - License: closed | Type: model - AI model by Stanford University - **Spatial Pyramid Matching** (INRIA,University of Illinois Urbana-Champaign (UIUC),Ecole Normale Supèrieure) — 2006-06-17 - License: closed | Type: model - AI model by INRIA,University of Illinois Urbana-Champaign (UIUC),Ecole Normale Supèrieure - **Hybrid CNN/SVM Object Categorizer** (Courant Institute of Mathematical Sciences) — 2006-06-17 | Parameters: 3.6M - License: closed | Type: model - AI model by Courant Institute of Mathematical Sciences - **SVM-CNN** (New York University (NYU)) — 2006-06-17 | Parameters: 90.9K - License: closed | Type: model - AI model by New York University (NYU) - **Crazy Stone** (INRIA) — 2006-05-29 - License: closed | Type: model - AI model by INRIA - **FAST** (University of Cambridge) — 2006-05-07 - License: closed | Type: model - AI model by University of Cambridge - **RL for helicopter flight** (University of California (UC) Berkeley,Stanford University) — 2006-03-09 - License: closed | Type: model - AI model by University of California (UC) Berkeley,Stanford University - **TFE SVM** (Centre de Recherche en Automatique de Nancy (CRAN),CENPARMI) — 2006-02-02 - License: closed | Type: model - AI model by Centre de Recherche en Automatique de Nancy (CRAN),CENPARMI - **Stanley (DARPA Grand Challenge 2)** (Stanford University) — 2006-01-01 - License: closed | Type: model - AI model by Stanford University - **Vision-based obstacle avoidance system (2005)** (New York University (NYU),Net-Scale technologies,NEC Laboratories) — 2005-12-05 | Parameters: 72K - License: closed | Type: model - AI model by New York University (NYU),Net-Scale technologies,NEC Laboratories - **Monocular Depth Prediction** (Stanford University) — 2005-12-05 | Parameters: 1.5M - License: closed | Type: model - AI model by Stanford University - **RankNet** (Microsoft Research,Microsoft) — 2005-08-07 | Parameters: 5.7K - License: closed | Type: model - AI model by Microsoft Research,Microsoft - **BiLSTM for Speech** (IDSIA,Technical University of Munich) — 2005-08-01 | Parameters: 152.1K - License: closed | Type: model - AI model by IDSIA,Technical University of Munich - **Synergistic Face Detector** (NEC Laboratories,Courant Institute of Mathematical Sciences) — 2004-12-01 | Parameters: 16.6K - License: closed | Type: model - AI model by NEC Laboratories,Courant Institute of Mathematical Sciences - **Invariant CNN** (New York University (NYU)) — 2004-06-27 | Parameters: 90.6K - License: closed | Type: model - AI model by New York University (NYU) - **Sandstorm (DARPA Grand Challenge I)** (Carnegie Mellon University (CMU)) — 2004-06-14 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **GPU implementation of neural networks** (Soongsil University) — 2004-06-01 - License: closed | Type: model - AI model by Soongsil University - **RankBoost (EachMovie)** (Columbia University,Princeton University,Hebrew University of Jerusalem) — 2003-11-15 - License: closed | Type: model - AI model by Columbia University,Princeton University,Hebrew University of Jerusalem - **RankBoost (meta-search)** (Columbia University,Princeton University,Hebrew University of Jerusalem) — 2003-11-15 - License: closed | Type: model - AI model by Columbia University,Princeton University,Hebrew University of Jerusalem - **Bayesian object categorizer** (California Institute of Technology,University of Oxford) — 2003-10-13 | Parameters: 100 - License: closed | Type: model - AI model by California Institute of Technology,University of Oxford - **NPLM (Brown)** (University of Montreal / Université de Montréal) — 2003-03-15 | Parameters: 4.1M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **NPLM (AP News)** (University of Montreal / Université de Montréal) — 2003-03-15 | Parameters: 11.9M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **LDA** (Stanford University) — 2003-02-02 - License: closed | Type: model - AI model by Stanford University - **Statistical Shape Constellations** (California Institute of Technology) — 2003-01-01 - License: closed | Type: model - AI model by California Institute of Technology - **Web mining + Decision tree recommender** (Korea Advanced Institute of Science and Technology (KAIST)) — 2002-10-01 - License: closed | Type: model - AI model by Korea Advanced Institute of Science and Technology (KAIST) - **Tagging via Viterbi Decoding** (AT&T) — 2002-06-01 - License: closed | Type: model - AI model by AT&T - **NEAT** (UT Austin) — 2002-06-01 - License: closed | Type: model - AI model by UT Austin - **Decision tree (classification)** (Mitsubishi Electric Research Labs,Compaq CRL) — 2001-12-08 | Parameters: 12K - License: closed | Type: model - AI model by Mitsubishi Electric Research Labs,Compaq CRL - **Gradient Boosting Machine** (Stanford University) — 2001-10-01 - License: closed | Type: model - AI model by Stanford University - **Immediate trihead** (Brown University) — 2001-07-06 - License: closed | Type: model - AI model by Brown University - **Restricted Boltzmann machine for Face Recognition** (University of Toronto,University College London (UCL)) — 2001-04-01 - License: closed | Type: model - AI model by University of Toronto,University College London (UCL) - **PoE MNIST** (University College London (UCL)) — 2000-11-28 | Parameters: 3.9M - License: closed | Type: model - AI model by University College London (UCL) - **Neural LM** (University of Montreal / Université de Montréal) — 2000-11-28 | Parameters: 6.9M - License: closed | Type: model - AI model by University of Montreal / Université de Montréal - **SVD in recommender systems** (University of Minnesota) — 2000-07-14 - License: closed | Type: model - AI model by University of Minnesota - **Credibilty Network** (University College London (UCL),University of Toronto) — 1999-07-01 | Parameters: 324 - License: closed | Type: model - AI model by University College London (UCL),University of Toronto - **Learning to Order Things** (AT&T) — 1999-05-15 - License: closed | Type: model - AI model by AT&T - **LeNet-5** (AT&T) — 1998-11-01 | Parameters: 60K - License: closed | Type: model - AI model by AT&T - **LSTM** (Technical University of Munich) — 1997-11-15 | Parameters: 10.5K - License: closed | Type: model - AI model by Technical University of Munich - **n-gram LM** (University of Cambridge,Carnegie Mellon University (CMU)) — 1997-07-01 - License: closed | Type: model - AI model by University of Cambridge,Carnegie Mellon University (CMU) - **Deep Blue** (IBM) — 1997-05-01 | Parameters: 8K - License: closed | Type: model - AI model by IBM - **AdaBoost.M2 Digit Recognition** (AT&T) — 1996-07-03 - License: closed | Type: model - AI model by AT&T - **System 11** (Carnegie Mellon University (CMU)) — 1996-06-18 | Parameters: 6.5K - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **LISSOM** (University of Texas at Austin) — 1995-11-27 | Parameters: 432.8K - License: closed | Type: model - AI model by University of Texas at Austin - **Multi-cause Binary Clustering** (Xerox) — 1995-01-01 - License: closed | Type: model - AI model by Xerox - **Predictive Coding NN** (Technical University of Munich) — 1994-12-02 | Parameters: 206.9K - License: closed | Type: model - AI model by Technical University of Munich - **Ceramic-MLP** (Sapienza Università di Roma) — 1994-01-07 | Parameters: 1.9K - License: closed | Type: model - AI model by Sapienza Università di Roma - **Learning-curve prediction** (AT&T) — 1993-11-29 - License: closed | Type: model - AI model by AT&T - **Siamese-TDNN** (Bell Laboratories) — 1993-08-01 | Parameters: 744 - License: closed | Type: model - AI model by Bell Laboratories - **Boosting** (Bell Laboratories) — 1992-11-30 | Parameters: 2.6K - License: closed | Type: model - AI model by Bell Laboratories - **Cancer drug mechanism prediction** (National Cancer Institute) — 1992-10-16 | Parameters: 594 - License: closed | Type: model - AI model by National Cancer Institute - **Golem** (Alan Turing Institute) — 1992-10-01 - License: closed | Type: model - AI model by Alan Turing Institute - **Fuzzy NN** (Indian Statistical Institute) — 1992-09-01 | Parameters: 1.2K - License: closed | Type: model - AI model by Indian Statistical Institute - **TD-Gammon** (IBM) — 1992-05-01 | Parameters: 25K - License: closed | Type: model - AI model by IBM - **ISR network** (Stanford University) — 1990-10-01 - License: closed | Type: model - AI model by Stanford University - **NETtalk reimplementation** (Oregon State University) — 1990-06-01 | Parameters: 27.5K - License: closed | Type: model - AI model by Oregon State University - **Zip CNN** (AT&T,Bell Laboratories) — 1989-12-01 | Parameters: 9.8K - License: closed | Type: model - AI model by AT&T,Bell Laboratories - **Innervator** (Stanford University,California Institute of Technology) — 1989-12-01 | Parameters: 10 - License: closed | Type: model - AI model by Stanford University,California Institute of Technology - **ALVINN** (Carnegie Mellon University (CMU)) — 1989-12-01 | Parameters: 36.6K - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **Speaker-independent vowel classification** (University of Washington) — 1989-11-27 | Parameters: 3.0K - License: closed | Type: model - AI model by University of Washington - **Handwritten digit recognition network** (AT&T) — 1989-11-27 | Parameters: 2.6K - License: closed | Type: model - AI model by AT&T - **Invariant image recognition** (Complutense University of Madrid) — 1989-06-18 - License: closed | Type: model - AI model by Complutense University of Madrid - **Truck backer-upper** (Stanford University) — 1989-06-18 | Parameters: 805 - License: closed | Type: model - AI model by Stanford University - **Time-delay neural networks** (Advanced Telecommunications Research Institute,Carnegie Mellon University (CMU)) — 1989-03-03 - License: closed | Type: model - AI model by Advanced Telecommunications Research Institute,Carnegie Mellon University (CMU) - **Q-learning** (University of London) — 1989-01-01 - License: closed | Type: model - AI model by University of London - **MLP baggage detector** (Science Applications International Corporation / SAIC) — 1989-01-01 - License: closed | Type: model - AI model by Science Applications International Corporation / SAIC - **MLN-ASR** (McGill University) — 1988-08-01 | Parameters: 10K - License: closed | Type: model - AI model by McGill University - **MADALINE II** (Stanford University) — 1988-07-24 - License: closed | Type: model - AI model by Stanford University - **Latent semantic analysis** (University of Chicago,Bell Laboratories,University of Western Ontario) — 1988-04-05 - License: closed | Type: model - AI model by University of Chicago,Bell Laboratories,University of Western Ontario - **Translation-invariant MLP** (Carnegie Mellon University (CMU)) — 1987-06-15 | Parameters: 816 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **NetTalk (transcription)** (Princeton University) — 1987-06-06 | Parameters: 18.6K - License: closed | Type: model - AI model by Princeton University - **NetTalk (dictionary)** (Princeton University) — 1987-06-06 | Parameters: 18.6K - License: closed | Type: model - AI model by Princeton University - **Optimized Multi-Scale Edge Detection** (Massachusetts Institute of Technology (MIT)) — 1986-11-01 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **MLP with back-propagation** (University of California San Diego,Carnegie Mellon University (CMU)) — 1986-10-01 | Parameters: 720 - License: closed | Type: model - AI model by University of California San Diego,Carnegie Mellon University (CMU) - **Distributed representation NN** (Carnegie Mellon University (CMU)) — 1986-08-15 | Parameters: 432 - License: closed | Type: model - AI model by Carnegie Mellon University (CMU) - **PDP model for serial order** (University of California San Diego) — 1986-01-05 - License: closed | Type: model - AI model by University of California San Diego - **Error Propagation** (University of California San Diego,Carnegie Mellon University (CMU)) — 1986-01-03 - License: closed | Type: model - AI model by University of California San Diego,Carnegie Mellon University (CMU) - **Learnability theory of language development** (Massachusetts Institute of Technology (MIT)) — 1984-07-01 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Hierarchical Cognitron** (NHK Broadcasting Science Research Laboratories) — 1984-04-01 | Parameters: 9.3K - License: closed | Type: model - AI model by NHK Broadcasting Science Research Laboratories - **ASE+ACE** (University of Massachusetts Amherst) — 1983-09-01 | Parameters: 324 - License: closed | Type: model - AI model by University of Massachusetts Amherst - **Neocognitron** (NHK Broadcasting Science Research Laboratories) — 1980-04-01 | Parameters: 1.1M - License: closed | Type: model - AI model by NHK Broadcasting Science Research Laboratories - **Transfer Learning** (University of Zagreb) — 1976-07-01 - License: closed | Type: model - AI model by University of Zagreb - **Statistical continuous speech recognizer** (Massachusetts Institute of Technology (MIT)) — 1976-04-30 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Cognitron** (Biological Cybernetics) — 1975-09-01 | Parameters: 21.6K - License: closed | Type: model - AI model by Biological Cybernetics - **Piecewise linear model** (University of Kansas) — 1973-11-01 | Parameters: 357 - License: closed | Type: model - AI model by University of Kansas - **Self-Organizing Nets of Threshold Elements** (University of Tokyo) — 1972-11-30 - License: closed | Type: model - AI model by University of Tokyo - **Graph-based structural reasoning** (Massachusetts Institute of Technology (MIT)) — 1970-09-01 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Decision tree adaline** (Tokyo Medical and Dental University) — 1969-05-01 | Parameters: 2.5K - License: closed | Type: model - AI model by Tokyo Medical and Dental University - **GLEE** (University of Edinburgh) — 1968-07-01 - License: closed | Type: model - AI model by University of Edinburgh - **Boxes (pole)** (University of Edinburgh) — 1968-07-01 - License: closed | Type: model - AI model by University of Edinburgh - **LTE speaker verification system** (IBM) — 1966-11-01 | Parameters: 2.1K - License: closed | Type: model - AI model by IBM - **Heuristic Reinforcement Learning** (Purdue University) — 1965-10-01 - License: closed | Type: model - AI model by Purdue University - **MENACE** (University of Edinburgh) — 1963-11-01 - License: closed | Type: model - AI model by University of Edinburgh - **STeLLA** (University of Canterbury) — 1963-06-01 - License: closed | Type: model - AI model by University of Canterbury - **Print Recognition Logic** (IBM) — 1963-01-01 - License: closed | Type: model - AI model by IBM - **MADALINE I** (Stanford University) — 1962-07-01 - License: closed | Type: model - AI model by Stanford University - **Linear Decision Functions** (Bell Laboratories) — 1962-06-01 - License: closed | Type: model - AI model by Bell Laboratories - **PAPA** (University of Genoa) — 1961-09-01 - License: closed | Type: model - AI model by University of Genoa - **ADALINE** (Stanford University) — 1960-06-30 | Parameters: 17 - License: closed | Type: model - AI model by Stanford University - **Perceptron (1960)** (Cornell Aeronautical Laboratory) — 1960-03-30 | Parameters: 1K - License: closed | Type: model - AI model by Cornell Aeronautical Laboratory - **Samuel Neural Checkers** (IBM) — 1959-07-01 | Parameters: 16 - License: closed | Type: model - AI model by IBM - **Pandemonium (morse)** (Massachusetts Institute of Technology (MIT)) — 1959-02-01 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Perceptron Mark I** (Cornell Aeronautical Laboratory,Cornell University) — 1957-01-01 | Parameters: 1K - License: closed | Type: model - AI model by Cornell Aeronautical Laboratory,Cornell University - **Sequence-based pattern recognition** (Massachusetts Institute of Technology (MIT)) — 1955-03-01 - License: closed | Type: model - AI model by Massachusetts Institute of Technology (MIT) - **Genetic algorithm** (Institute for Advanced Study) — 1954-07-02 - License: closed | Type: model - AI model by Institute for Advanced Study - **Theseus** (Bell Laboratories) — 1950-07-02 | Parameters: 40 - License: closed | Type: model - AI model by Bell Laboratories ## Data Structure Each model entry includes: - **name**: Model name (e.g., GPT-4, Claude 3, Gemini 1.5) - **date**: Release date (YYYY-MM-DD) - **org**: Organization (OpenAI, Anthropic, Google DeepMind, Meta, etc.) - **params**: Parameter count (e.g., 175B, 1.8T) when available - **type**: "model" or "milestone" - **license**: "open", "closed", or "partial" - **desc**: Brief description ## Source - Data: https://lifearchitect.ai/models-table - Code: https://github.com/duyet/monorepo/tree/master/apps/llm-timeline - Last updated: 2026-03-25 ## License Data sourced from LifeArchitect.AI. Site code: MIT.