
Answer Bubbles: What Happens When AI Search Engines Disagree
A new study of 11,000 queries reveals that Search GPT and Google AI Overviews draw from fundamentally different sources, flatten uncertainty, and create invisible "information realities" that vary by platform. The implications for brand visibility are significant.
You probably assume that when someone asks an AI search engine about your brand, the answer is roughly the same regardless of which engine they use. That assumption is wrong, and a new paper quantifies exactly how wrong.
Researchers at the University of Illinois just published the first large-scale comparative study of how generative search systems differ from each other and from traditional Google Search. They called the phenomenon "answer bubbles" - a deliberate echo of Eli Pariser's "filter bubbles," but arguably worse. Filter bubbles gave different users different results. Answer bubbles give the same user asking the same question structurally different information realities depending on which AI search engine they happen to use.
The study is rigorous - 11,000 real user queries across 11 topics, tested on vanilla ChatGPT, Search GPT, Google AI Overviews, and traditional Google Search. The findings are uncomfortable. And they matter for anyone who cares about how their brand appears in AI-mediated search.
The same query issued to different AI search engines produces fundamentally different information realities - different sources, different linguistic framing, different epistemic confidence - invisible to the end user.
University of Illinois Urbana-Champaign, March 2026Different engines, different worlds
The headline finding: Search GPT and Google AI Overviews draw from fundamentally different source pools. The overlap between their top-100 most-cited domains is just 24%. To put that in perspective, Google AI Overviews and traditional Google Search share 68% of their top-100 domains. Search GPT is pulling from a largely separate universe of sources.
The composition of these source pools is revealing. Search GPT draws heavily from encyclopedic and reference sources: Wikipedia appears in 49% of queries, Britannica in 28%, Reuters in 11%. It cites social media platforms in just 0.1% of responses. Google AI Overviews, by contrast, cites Facebook in 17% of queries, Reddit in 10%, YouTube in 14%. Google draws from the broader, messier web. Search GPT draws from a curated, authoritative slice of it.
The practical implication is immediate. If your brand's strongest presence is on Reddit, forums, and social media, Search GPT effectively doesn't see you. If your brand's authority comes from long-form editorial content and Wikipedia references, Google AIO might underweight you relative to competitors with strong social signals.
Source preferences by platform
Search GPT favors encyclopedic/reference sources (27.3% of citations), news and media (7.2%), and institutional content. Near-zero social media.
Google AI Overviews draws more broadly, including social media and forums (8.5%), educational platforms (study.com, brainly.com), and user-generated content.
Traditional Google sits between the two but closer to AIO, with the broadest source diversity at 26,891 unique domains versus Search GPT's 7,606.
The Wikipedia compounding effect
Wikipedia's role is worth pausing on, because the bias isn't just about citation frequency. It's about compounding.
Across all three systems, Wikipedia is the most-cited domain. That's not surprising. But the researchers found something more concerning: Wikipedia is not only the most-cited source, it's also the most over-represented source in the synthesized answers. When the AI generates its response, Wikipedia content receives disproportionate weight - more than its share of citations would predict.
The numbers: Wikipedia is over-represented by +2.6 percentage points in Search GPT and +5.4pp in Google AIO. That might sound small, but in a system where dozens of sources compete for influence over a paragraph of text, it means Wikipedia content shapes the answer more than everything else. The bias starts at source selection (Wikipedia gets cited most often) and compounds during synthesis (Wikipedia content gets extracted most heavily).
For brands, this means your Wikipedia page isn't just one signal among many. It's the single most influential piece of content for how AI search engines describe you. If your Wikipedia entry is outdated, incomplete, or written by a competitor-sympathetic editor, that damage gets amplified through the entire AI search pipeline.
Epistemic flattening: how AI makes everything sound certain
The second major finding goes beyond which sources AI uses and into how it presents information. The researchers measured the linguistic and epistemic qualities of AI-generated responses using 73 psycholinguistic categories. What they found is subtle but important.
When an AI search engine retrieves sources and generates a summary, it doesn't just compress the information. It selectively reshapes its epistemic character. Hedging language - words like "maybe," "perhaps," "might" - drops by 40% from vanilla ChatGPT to Search GPT, and by 60% from vanilla ChatGPT to Google AIO. But confidence markers - words like "always," "definitely," "certainly" - remain largely unchanged.
This is what the researchers call "epistemic flattening." The AI doesn't uniformly compress all epistemic language. It selectively removes the markers of deliberation - the "I think," "it appears," "the evidence suggests" - while keeping the assertions of confidence. The result is that uncertain information gets presented in a more certain voice than the underlying sources warrant.
The ratio of certainty-to-tentative language rises from 0.39 in vanilla ChatGPT to 0.49 in both search-grounded systems. The AI sounds more sure of itself once it has sources, even though having sources should, if anything, make it more aware of complexity and disagreement.
For brands, this cuts both ways. If AI search says something positive about you, it says it with more conviction than the original sources did. But if it says something negative - if it synthesizes a criticism or a misconception from its sources - that criticism arrives without hedging, without nuance, as apparent fact.
The citation-synthesis gap
This is the paper's most novel contribution, and the finding with the most direct implications for anyone tracking brand visibility.
The researchers developed a method to measure how faithfully AI summaries actually represent their cited sources. They decomposed each AI-generated response into atomic factual statements, then checked whether those statements could actually be traced back to the cited source content. The method lets you answer the question: when an AI search engine cites your website, does it actually use your content?
The answer, often, is no - or at least not evenly. The study finds that social media and forum sources are cited by Google AIO in about 8.5% of queries, but their content is under-drawn by 22.1 percentage points compared to other sources. Reddit specifically, despite being cited alongside other sources, contributes far less actual content to the synthesized answer.
The researchers call this a "citation-synthesis gap." The AI lists user-driven sources in its references - lending an appearance of source diversity - while actually relying on a narrower set of encyclopedic and authoritative texts for the substance of its answer. It's diversity theater. The citations suggest breadth. The synthesis reveals concentration.
This has a direct implication for how we measure AI visibility. Being cited is not the same as being synthesized. A brand might appear in an AI search engine's source list without its actual content influencing the answer. Any visibility metric that counts citations without measuring synthesis influence is overstating the brand's real impact.
What drives over-representation
The paper identifies several factors that predict whether a source will be over- or under-represented in the synthesized answer, and some of them are surprising.
Source length is the strongest predictor. Longer documents are systematically over-represented in both Search GPT and Google AIO. Short sources (under 200 words) are under-represented by 13-18 percentage points. Long sources (over 800 words) are over-represented by 4-5pp. The AI appears to place more emphasis on the amount of extractable text than on relevance or quality. This parallels known biases in the dense retrievers that power these systems.
Sources with assertive, explanatory prose are favored. Content that contains causal language ("because," "therefore") and certainty markers ("always," "definitely") is systematically over-covered. Highly subjective content is under-covered. The AI gravitates toward sources that sound authoritative and explanatory, regardless of whether that authority is earned.
Negative sentiment is filtered out. Google AIO shows a significant asymmetry: negative-sentiment sources are under-covered by 13.8 percentage points, while positive-sentiment sources are slightly over-represented. Search GPT shows weaker sentiment effects. This means Google's AI search implicitly sanitizes the information environment, suppressing negative content even when it's factually relevant.
The length bias in practice
If your brand's most important content is a concise product page or a short FAQ, AI search engines are likely to underweight it in favor of longer, more verbose sources - even if those sources are less accurate or less relevant. Authoritative brevity is penalized. This inverts the usual web content advice: for AI search, longer really does mean louder.
Topic matters more than you think
The researchers broke their queries into 11 topic categories and found that AI search behavior varies dramatically by subject. Google AI Overviews only fire on 57.8% of queries overall, but coverage ranges from 84% (business/finance) to 32% (technology and travel).
More interesting: the epistemic differences between Search GPT and Google AIO are not uniform across topics. For technology and travel queries, Google AIO reduces hedging by 64% and positive emotion by 74% relative to Search GPT. For business, history, and education queries, the two search-grounded systems produce effectively indistinguishable output. The "answer bubble" is wider in some industries than others.
Source preferences are also strongly topic-dependent. IMDB dominates entertainment queries on Google (61% Organic, 23% AIO) but barely appears on Search GPT (7%). ESPN dominates sports on Google but not on Search GPT. Spotify and Genius dominate music on Google; Search GPT doesn't surface them at all. If your brand operates in a specific vertical, the platform-specific source preferences could be the most important variable in your AI visibility strategy.
What this means for brand visibility
Let me connect the paper's findings to what we see in practice.
Cross-platform tracking isn't optional. The 24% source overlap between Search GPT and Google AIO means your brand could be highly visible on one platform and invisible on the other. In our series on model divergence, we showed that LLMs agree on brand rankings only 43.9% of the time. This paper extends that finding from LLM recommendations to AI search specifically, and the divergence is even starker. Monitoring one AI search engine and assuming it represents the landscape is like checking your Google ranking and assuming your Bing ranking is the same. Except the gap is far wider.
Your Wikipedia presence is load-bearing. It's not just one source among many. It's the most-cited, most-synthesized, most-over-represented source across every AI search system studied. If your brand has any Wikipedia presence at all, make sure it's accurate, current, and well-sourced. If it doesn't, understand that you're missing the single most influential content channel for AI search.
Citations don't equal influence. The citation-synthesis gap means being listed in an AI search engine's sources is not the same as shaping the answer. A brand could appear in every source list and still have minimal influence over what the user actually reads. Visibility metrics need to evolve beyond "were we cited?" to "did our content shape the response?" This is harder to measure, but it's the difference that matters.
Content strategy needs to be platform-aware. For Search GPT visibility, invest in authoritative, long-form, data-rich content with clear causal explanations. That's what Search GPT's source-selection philosophy rewards. For Google AIO visibility, ensure you have presence across a broader set of platforms - including social media and forums - because Google's system draws from the wider web. The same content won't work equally well everywhere.
Negative narratives are harder to surface on Google AIO. The finding that Google AIO under-represents negative sentiment content by 13.8pp is double-edged. If competitors are spreading negative content about your brand, Google's AI might suppress it. But if you're trying to surface legitimate competitive differentiation - "unlike Brand X, we don't charge hidden fees" - that framing might also get filtered. Search GPT, by contrast, shows weaker sentiment effects and may present criticism more faithfully.
The bigger picture
The paper's authors propose that answer bubbles should be subject to transparency requirements analogous to those for algorithmic recommendation systems. I think they're right, but I also think the market will move faster than regulation.
What's happening is that the information environment is fragmenting in a new way. In the link-based search era, different search engines might rank pages differently, but they were all pointing to the same underlying web content. Users could click through and evaluate sources themselves. AI search engines do something fundamentally different: they synthesize information into a single answer, making their source-selection biases invisible. The user sees one confident paragraph. They don't see which sources were consulted, which were ignored, and which were cited but not actually used.
For brands, this means the era of "one visibility number" is over. Your visibility on Search GPT is a different metric from your visibility on Google AIO, which is different from Perplexity, which is different from Claude. Each system has its own source preferences, its own synthesis biases, its own epistemic character. The brands that understand this fragmentation and track across it will have a structural advantage over those still treating "AI search" as a monolith.
The same question now gets different answers on every platform. The question for brands is whether they're watching all of them - or just the one that happens to tell the most flattering story.
Related
See how AI talks about your brand
Enter your domain to get a free AI visibility report in under 60 seconds.
