Complete Online LLM Rankings — Every Model, Every Price, What Works Right Now

The Complete Online LLM Rankings — Every Model, Every Price, What Works Right Now

The Complete Online LLM Rankings — Every Model, Every Price, What Works Right Now

Posted in AI ·

Online LLM Rankings · Updated Daily

A living scoreboard of every major online large language model available today. All models, all prices, all performance scores. No hype. Just results.

Last updated: Loading…
Next refresh: Tomorrow
Data source: Chatbot Arena (LMSYS)

There are over 50 significant LLMs available online today across a dozen providers. Most are fine at everything and great at nothing. The ones that actually excel at specific tasks are the ones worth knowing about.

This page tracks which models lead in each category, how those rankings shift over time, exactly what they cost, and exactly how we test them. The data updates automatically every day based on the latest Chatbot Arena results, benchmark scores, and real-world performance tests.

How These Rankings Work

Our Testing Method (In Plain English)

We score each model on a 100-point scale across three areas:

  • Accuracy (40 points): Does it get the answer right? We run standardized tests for each field — coding problems, reasoning puzzles, creative writing, multimodal understanding.
  • Reliability (35 points): Is it consistent? We test each model 50 times on similar tasks and measure how often it produces quality output without errors or hallucinations.
  • Usability (25 points): Is it practical? We factor in speed, cost, ease of access, and how well the model explains its reasoning.
Important: We do not accept payment or sponsorship from AI companies. Rankings are based entirely on independent testing and publicly available benchmark data from sources like Chatbot Arena (LMSYS), SWE-bench, MMLU-Pro, Humanity’s Last Exam, and LiveCodeBench.

The Complete Model Directory

Every model currently available online, ranked by Text Arena Elo score, with pricing and key specs.

Updated

💡 Drag and drop column headers to reorder columns. Scrollbar stays visible at the bottom even when you scroll down the page.

Loading model data…

Current Rankings by Category

Where to Start

You do not need the top model in every category. You need the right tool for the work you actually do.

For most people: Start with Claude Sonnet 4.7 as your general assistant ($3/$15). It’s 40% cheaper than Opus, ranks in the top 3 across most categories, and handles everyday tasks exceptionally well.

For developers: Claude Opus 4.7 for hard coding, but consider Qwen 3.7 Max ($2.50/$7.50) as your volume default — it’s a third of the price with a #4 Code Arena ranking.

For enterprises: Gemini 2.5 Pro ($2/$12) offers the best price-to-performance ratio for general tasks, with a 1M token context window.

For cost-sensitive teams: DeepSeek V4 Flash ($0.14/$0.28) is the cheapest production-ready model. Qwen 3.7 Plus ($0.40/$1.60) adds vision for barely more.

Methodology Notes

Rankings are derived from:

  • Chatbot Arena (LMSYS) — 6M+ blind pairwise human votes across text, code, and vision tasks
  • SWE-bench Verified — Real GitHub issue resolution
  • MMLU-Pro — Professional knowledge assessment
  • Humanity’s Last Exam — Expert-level reasoning
  • LiveCodeBench — Real-world coding problems
  • Terminal-Bench — Long-horizon terminal-based coding
  • ARC-AGI-2 — Fluid intelligence benchmark

These rankings update automatically every 24 hours based on the latest Arena data and benchmark releases.

Author: Jon-Paul Walton