The five optimizations

When the middleware decides "this request is from GPTBot, fetch the optimized HTML," what comes back isn't your page with a couple of meta tags added. It's a structurally different document, the same facts laid out in a form a language model can actually parse.

Five enhancements run as part of that transformation. Each one targets a specific weakness in how raw HTML communicates with an AI consumer: ambiguity about what kind of page this is, important facts buried inside paragraphs, no explicit question/answer structure, no summary, and entity references with no grounding. You can toggle each one on or off independently in the AI Pages settings; by default all five are on, which is the right choice for almost everyone.

This page exists because the features are easier to use well when you know what they actually do. If you've ever wondered "is the FAQ generation making things up?" or "what counts as an entity?" or "do I need this if my page already has schema markup?", the answers are below.

Why raw HTML is hard for models

A modern marketing page is built for human consumption, which means it's full of structure that doesn't communicate well to a model. Animation wrappers. Tailwind utility classes. A button that says "Buy now" with no surrounding text saying what's being bought. A price displayed via a number formatter inside a Vue component. An FAQ that's six expandable accordions, none of which have semantic markup tying the question to the answer.

A human reads through this fine because the visual hierarchy does the work. A language model sees a flat stream of tags and has to guess at meaning from naming conventions and proximity. Sometimes it guesses well. Often it doesn't.

The optimizations all attack the same underlying problem: make the meaning explicit instead of implied. Tag this paragraph as the description. Mark this number as the price. Wrap this block as an FAQ. Identify this proper noun as a brand. The model still does the reading; the markup just stops making the reading harder than it needs to be.

1. Structured data injection

The first optimization. It detects what type of page you have, generates the corresponding schema.org markup as JSON-LD, and injects it into the <head>.

The page type detection is the work. A homepage is an Organization. A product detail page is a Product with Offer and AggregateRating nested inside. A blog post is an Article with author, datePublished, headline. An FAQ is an FAQPage. A category page is an ItemList. The system reads the rendered HTML and picks the closest schema type, then fills in the properties from what it can extract.

Why it helps: AI models trust structured data more than they trust prose. If your product page just says "Nike Air Zoom Pegasus 40 $130" somewhere in the body, the model has to infer that "Pegasus 40" is the product name, "Nike" is the brand, and "$130" is the price. If you serve Product schema with those three fields explicitly tagged, the model doesn't have to infer anything. It also doesn't have to choose between conflicting interpretations when the surrounding marketing copy includes other prices, other product names, or other numbers.

When to leave it off: rarely. The only real reason is if you've hand-crafted very specific schema (an unusual type, custom properties for a vertical search engine) and you don't want anything else added to the head.

2. Key facts extraction

The transformer scans your content for specific data points (prices, percentages, dates, counts, measurements, ratings, statistics) and wraps each one with explicit semantic markup. A bare $130 becomes $130. "Released March 2024" becomes March 2024. "98% uptime" becomes 98% uptime.

Why it helps: when a user asks "How much does the Pegasus 40 cost?" the model needs to find that number and cite it confidently. Numbers without context (and there are usually a lot of them on a page) are noise. Numbers with explicit context become quotable.

The other reason this matters is for accuracy. Models that fabricate prices, dates, or stats almost always do so because they couldn't find the real number and synthesized one to fill the gap. The more your real numbers stand out from the surrounding content, the less likely a hallucination is to drift in.

When to leave it off: if your content is purely qualitative (an op-ed, a brand manifesto, a values page) and tagging numbers as facts would feel misleading. For anything data-dense (product pages, pricing pages, comparison pages, methodology posts), keep it on.

3. Automated FAQ generation

The transformer reads your page and generates a small FAQPage block, two to five questions and answers it derives from the content. The questions are the ones a user would naturally ask of this content; the answers are pulled or paraphrased from what's actually on the page.

Why it helps: AI tools have a strong preference for citing FAQ-structured content. When the model is generating an answer to a user question, an explicit Q&A block on a source page is a near-perfect match for what it's trying to do. FAQ schema is, in effect, the model's preferred format because it pre-decomposes content into the unit the model is going to produce anyway.

When to leave it off: regulated industries (financial advice, medical claims, legal), strict brand voice, or pages where the topic is too narrow to generate sensible questions. Turn it back on once you've verified the output is staying in bounds.

4. AI summary block

A short, plain-language summary of the page added in a dedicated section toward the top of the body. Three or four sentences, no marketing language, no hedging, just what the page is about and what it claims.

Why it helps: most pages bury the actual content behind a hero, a value-prop section, social proof, and three CTAs before the model gets to anything useful. A summary block at the top means the first thing a model reads is the substance of the page, in clean prose, without having to navigate the marketing scaffolding.

It also helps with crawlers that truncate. Some AI ingestion pipelines only read the first N tokens of a page. If the first N tokens of your page are "Trusted by 10,000+ companies" and "Get started in 5 minutes," the model never reaches the part that says what you actually do. A summary block fixes this by putting the substance first.

When to leave it off: if your page is already structured around its substance (a long-form article, a methodology post, a documentation page), the summary becomes redundant. Also if you've audited the summaries and don't like the voice; you can write your own equivalent in the page content and turn this off.

5. Entity recognition

Every named entity on the page gets identified and wrapped with a Brand, Product, Person, Organization, Place, or Technology tag. "Built with React on Vercel" becomes "Built with <Technology>React</Technology> on <Organization>Vercel</Organization>."

Why it helps: entity tagging is how you tell a model "when you cite this page, here are the things you should attribute correctly." Without it, the model can correctly extract a fact ("Nike released the Pegasus 40 in March 2024") but get the entities wrong, attributing the release to a different brand, or confusing the Pegasus 40 with a competing shoe. With explicit entity tags, the citation tracks back to the right place.

This is also the feature that helps most with comparison content. If your page mentions ten brands in your category, entity recognition makes it unambiguous which mentions are about which company. Models can then construct accurate comparison statements rather than blending attributes across brands.

When to leave it off: rarely. The downside is minimal. The only edge case is content that intentionally uses ambiguous language for stylistic reasons (some long-form essays) where explicit tagging would feel heavy-handed.

How they combine

The features compound. Schema gives the model a frame to read the page through. Key facts populate that frame with citable numbers. The FAQ block gives the model pre-composed answer chunks to pull from. The summary makes sure the model reaches the substance even if it truncates. Entity tags ensure the citation lands on the right brand and product.

Turning off one feature doesn't break the others, but the cumulative legibility drops. The default position (everything on) is the right starting point. The reason to turn things off is usually a specific concern (brand voice, regulated content, conflict with existing markup) rather than a general "less is more" preference.

A note on what doesn't get generated

There are things AI Pages will not do, on purpose:

It won't invent facts. The transformer extracts and restructures what's on your page. If a number isn't there, the optimized HTML doesn't have it either. The FAQ generation paraphrases existing content; it doesn't introduce new claims.
It won't translate. The page is served back in the same language it was written in.
It won't change your messaging. Tone, positioning, narrative order: untouched. The transformations are structural, not editorial.
It won't add citations or backlinks. What's on the page stays on the page. Nothing is injected from external sources.

If you want to change what the optimized page says, change the source page. AI Pages is a transformation layer, not a content engine.