Study 004

The Model Divergence Report

Same question, different AI, different answers. How 8 major AI models disagree on which brands to recommend - and what it means for your visibility strategy.

43.3%
average agreement rate
4.0%
perfect consensus
798K+
comparisons analyzed
8
AI models compared
Last updated: March 11, 2026
[01]

The Landscape

The Multi-Model Reality

When someone asks ChatGPT, Claude, Gemini, or Perplexity for a product recommendation, they expect consistent answers. But our analysis of 797,644 comparisons reveals a startling truth: AI models agree less than half the time.

This isn't a bug - it's a fundamental characteristic of how different AI systems are trained, what data they've seen, and how they interpret intent. For brands, this means your visibility on ChatGPT tells you nothing about your visibility on Claude.

We analyzed 797,644 valid comparisons across 8 major AI models including Google AI Overviews, revealing patterns that should reshape how you think about AI visibility strategy.

Academic Context
Independent research confirms this pattern. An analysis of 567K LLM recommendations found that different models maintain distinct product preferences with surprisingly low overlap. Separately, researchers found that LLMs systematically favor global brands over local ones, with significant country-of-origin effects - a bias that compounds across models trained on different data.

The Bottom Line

Each AI model is its own channel.

Average Agreement
43.3%
models agree on #1 brand
Perfect Agreement
4.0%
all 8 models same answer
High Divergence
14.6%
queries with <25% agreement
Top-3 Overlap
2.8%
overlap in top 3 picks
[02]

Agreement Distribution

60%of queries have <50% model agreement
Agreement Rate Distribution
0-25%High divergence
14.6%
116,621 queries
25-50%Low agreement
45.1%
359,732 queries
50-75%Moderate
28.0%
223,267 queries
75-99%Good agreement
8.3%
66,466 queries
100%Perfect consensus
4.0%
31,558 queries

The Agreement Problem

When you ask different AI models the same question, they usually disagree. More than half of all queries (60%) have less than 50% agreement between models on the top brand recommendation.

Only 4.0% of queries achieve perfect consensus - all 8 models recommending the same brand. This is rare, and typically happens only for brands with overwhelming category dominance.

Why This Matters
Don't assume that being visible on ChatGPT means you're visible on Claude or Gemini. Each model is effectively a different channel with different audiences.
Perfect Agreement
4.0%
all 8 models same answer
High Divergence
14.6%
<25% agreement
[03]

Model Correlations

Pairwise Agreement Matrix
OpenAI
-
27%
21%
20%
24%
25%
17%
17%
Claude
27%
-
26%
19%
35%
35%
23%
15%
Gemini
21%
26%
-
17%
24%
26%
16%
12%
AIO
20%
19%
17%
-
17%
18%
12%
17%
Grok
24%
35%
24%
17%
-
31%
21%
14%
Deepseek
25%
35%
26%
18%
31%
-
22%
15%
Meta
17%
23%
16%
12%
21%
22%
-
10%
Perplexity
17%
15%
12%
17%
14%
15%
10%
-
Agreement:
Low
Med
High

Model Clusters Emerge

Some models tend to agree with each other more often, forming implicit clusters. Claude and Deepseek show the highest correlation at 35%.

Highest Agreement
Claude+Deepseek35%
Lowest Agreement
Meta+Perplexity10%
Why This Matters
Meta AI and Perplexity are outliers - they agree with other models only 14-17% of the time. If you're visible on these platforms, you're reaching a different audience.
Avg Pairwise
20%
Max Correlation
35%
[04]

Model Coverage

Not All Models Show Up

Some AI models are more likely to provide brand recommendations than others.Meta appears in 95.0% of queries, while Google AIO only shows up in 56.5%.

This matters because if a model rarely provides recommendations in your category, your visibility strategy for that model may need a different approach.

Why This Matters
Google AIO is the most selective - it only provides brand recommendations for 56.5% of queries. When it does recommend, pay attention.
Highest Coverage
95.0%
Meta
Lowest Coverage
56.5%
AIO
Model Appearance Rate
1
Meta95.0%
2
OpenAI85.4%
3
Grok83.0%
4
Gemini82.2%
5
Deepseek80.9%
6
Claude79.9%
7
Perplexity79.4%
8
AIO56.5%
% of queries with brand recommendation
[05]

Query Types

Comparison Queries = Highest Agreement

When users ask to compare specific brands ("Nike vs Adidas"), models agree 50.4% of the time. But "best of" and general queries cause the most disagreement - exactly the queries where brands have the most opportunity.

This makes sense: comparison queries have clearer context, while open-ended recommendations leave more room for interpretation.

Why This Matters
Target "best of" and general queries for growth opportunities. High divergence means competitors haven't locked in these queries yet.
Best Agreement
50.4%
Comparison queries
Lowest Agreement
42.2%
General queries
Agreement by Query Type
Comparison(34,300)
50.4%10.8% high div.
How-to(20,643)
45.3%13.4% high div.
Alternative(7,596)
44.1%11.4% high div.
Best-of(375,048)
43.4%14.8% high div.
Recommendation(43,052)
43.1%14.4% high div.
General(317,005)
42.2%15.0% high div.
Sorted by agreement rateHigh div. = <25% agreement
[06]

Maximum Divergence

These real examples show complete disagreement - 8 different AI models recommending 8 different brands for the same query. This is the reality brands need to understand.

"Will I be approved for caravan finance with a 550 credit score?"
general8/8 different
OpenAI
My Financing USA
Claude
Roadloans
Gemini
Oodle Car Finance
AIO
Carvana
Grok
Capital One Auto Finance
Deepseek
LightStream
Meta
Southeast Financial
Perplexity
LendingTree Auto
"compare integrated concrete solutions for complex infrastructure projects"
comparison8/8 different
OpenAI
CEMEX
Claude
BASF Master Builders Solutions
Gemini
Holcim
AIO
PERI
Grok
Sika
Deepseek
UltraTech Cement
Meta
CRH
Perplexity
Heidelberg Materials
"best payroll and HR platform for a fast-growing remote startup"
best of8/8 different
OpenAI
Gusto
Claude
Rippling
Gemini
Deel
AIO
ADP Workforce Now
Grok
BambooHR
Deepseek
Paychex Flex
Meta
Workday
Perplexity
HiBob
Each row shows how the same question yields completely different brand recommendations across AI models
[07]

When They Agree

Strong Dominance + Clear Context = Agreement

When ALL 8 models agree on a recommendation, it is typically for queries where:

  • A single brand has overwhelming category dominance
  • The query is highly specific or niche
  • There's clear category definition with limited alternatives
Why This Matters
Perfect agreement is rare (4.0%) but achievable. It happens most often in tightly defined categories where one option is consistently recognized as the category leader.
Perfect Agreement
4.0%
Unanimous Queries
31,558
100% Agreement Examples
Which cloud monitoring platform is best for Kubernetes observability?
→ single category leadercomparison
What is the safest default marketplace for downloading Android apps?
→ single category leadergeneral
What is the best CI/CD tool for a small engineering team?
→ single category leaderbest of
Which map app is most trusted for live traffic rerouting during commuting hours?
→ single category leadergeneral
Best password manager for cross-device team sharing?
→ single category leaderbest of
Which video conferencing tool is most common in enterprise meetings?
→ single category leadercomparison
Which documentation platform is best for public API references?
→ single category leadercomparison
Which marketplace is most recognized for handmade goods from independent creators?
→ single category leaderbest of
All 8 models selected the same top recommendation for these queries
[08]

The Playbook

The old playbook is broken.

Optimizing for "AI" as a single channel doesn't work. Each model sees a different web. Here's what that means for your strategy.

Track each model separately
Visibility on ChatGPT tells you nothing about your visibility on Claude or Gemini. Treat each model as its own channel.
Model-specific winner opportunity
If competitors dominate ChatGPT, consider focusing on Gemini or Meta AI where rankings may be more fluid.
Meta AI = different audience
With only 10-23% correlation with other models, Meta AI users see a completely different set of brand recommendations.
Comparison queries are stable
At 50.4% agreement, comparison queries offer the most predictable visibility. Focus comparison content for consistent results.
"Best of" queries = opportunity
High divergence (14.8%) on "best of" queries means the winner isn't locked in. Room for underdogs to break through.
Perplexity is its own channel
With 10-17% correlation to other models, Perplexity requires a separate optimization strategy.
[09]

Methodology

Data Scale
Reports Analyzed
44,088
Comparisons
798K+
Models
8
Unique Prompts
8,902
Models Compared
OpenAIClaudeGeminiAIOGrokDeepseekMetaPerplexity
Data collected: Aug 2025 - Mar 2026

How We Measured Agreement

Data Collection

We analyzed 44,088 brand visibility reports from Trakkr, each containing responses from up to 8 major AI models for the same set of queries.

Agreement Calculation

For each query, we identified the #1 recommended brand from each model, then calculated what percentage of models agreed on the same top brand. A 43.3% average agreement means less than half of models typically agree.

Quality Filtering

Comparisons were filtered to include only queries where at least 5 models provided a valid brand recommendation, ensuring statistical significance.