Skip to content

AI Pages Technical Details

Deep dive into how AI Pages works - architecture, caching, and optimization.

8 min readUpdated Jan 11, 2026
What you'll learn
  • Understand the full request flow from crawler to optimized response
  • Learn how caching works and why it's fast
  • See exactly what optimizations are applied to each page
  • Know the architecture that makes AI Pages reliable

This page explains how AI Pages works under the hood. If you're technical or just curious, read on. If you just want to use AI Pages, you don't need to know any of this.


Architecture overview

AI Pages consists of several components working together:

Text
[AI Crawler] → [Your Platform Integration] → [AI Pages API] → [Optimization Engine]
                         ↓                        ↓
                  [Your Origin]              [GCS Cache]

Components

ComponentTechnologyPurpose
Platform IntegrationJS/TS/PHP/Lua (varies)Intercepts requests, detects crawlers
AI Pages APIGoHandles auth, routing, caching
Optimization EnginePythonRenders pages, applies AI optimizations
CacheGoogle Cloud StorageStores optimized HTML
AnalyticsFirestoreLogs all crawler activity

The platform integration is the lightweight code you deploy on your hosting platform (Cloudflare Worker, Vercel middleware, Netlify edge function, WordPress plugin, etc.). All integrations follow the same pattern and communicate with the same AI Pages API.


Request flow

Here's exactly what happens when an AI crawler visits your site:

1. Request arrives at your platform

Text
GET /products/running-shoes HTTP/1.1
Host: nike.com
User-Agent: GPTBot/1.0

2. Integration checks user-agent

The AI Pages integration examines the \User-Agent\ header:

JavaScript
const AI_CRAWLERS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'ClaudeBot', 'Claude-User', 'Claude-SearchBot',
  'PerplexityBot', 'Google-Extended', 'Google-Agent', 'Googlebot-Extended',
  'cohere-ai', 'Applebot-Extended', 'Amazonbot',
  'Meta-ExternalAgent', 'ByteSpider', 'Baiduspider'
];

If human browser: Pass request to your origin server unchanged (adds <10ms latency)

If AI crawler: Forward to the AI Pages API with crawler details

3. AI Pages API receives request

Text
POST /v1/optimize
X-API-Key: prism_xxxxx
{
  "url": "https://nike.com/products/running-shoes",
  "pathname": "/products/running-shoes",
  "crawler": "GPTBot"
}

4. Cache check

AI Pages checks if an optimized version exists:

Cache key formula:

Text
SHA256(url + mode + features)

Cache hit: Return optimized HTML immediately (~100ms)

Cache miss: Trigger optimization

5. Optimization (cache miss only)

For new pages, the optimization engine:

  1. 1Renders the page - Uses headless browser to execute JavaScript
  2. 2Extracts content - Strips scripts, tracking, unnecessary markup
  3. 3Applies features:

- Structured data injection (JSON-LD) - Key facts extraction - FAQ generation - AI summary block - Entity recognition

  1. 1Compresses - Removes whitespace, optimizes HTML
  2. 2Stores in cache - Saves to GCS with 7-day TTL

On cache miss, response is:

JSON
{
  "status": "optimizing",
  "cache": "MISS",
  "message": "Serve original, optimization in progress"
}

The integration serves your original page while optimization happens in the background. Next crawler visit gets the optimized version.

6. Response served

Cache hit response:

JSON
{
  "optimizedHTML": "<html>...",
  "cache": "HIT",
  "is404": false
}

Response headers:

Text
X-Prism-Cache: HIT
X-Prism-Response-Time: 87ms

URL processing

Normalization

URLs are normalized for consistent caching:

  • Remove trailing slashes (except root)
  • Lowercase the hostname
  • Remove fragments (\#section\)
  • Keep only essential query params: \id\, \page\, \category\, \product\, \search\, \q\

Example:

Text
Input:  https://Nike.com/Products/?ref=nav&category=shoes#reviews
Output: https://nike.com/products?category=shoes

Filtered requests

These are automatically skipped (not optimized, not counted):

File extensions:

Text
.css, .js, .jpg, .jpeg, .png, .gif, .svg, .webp,
.ico, .woff, .woff2, .ttf, .pdf, .zip, .json, .xml

Paths:

Text
/api/*, /ws/*, /graphql, /_next/*, /static/*,
/assets/*, /fonts/*, /images/*, /favicon*

Caching strategy

Cache TTL

Optimized pages are cached for 7 days. After expiry, the next crawler visit triggers re-optimization.

Deduplication

Same customer + URL requests within 5 seconds are deduplicated (only counted once). This prevents bots that hit the same page repeatedly from inflating your usage.

Cache invalidation

Currently, caches expire naturally after 7 days. Manual invalidation coming soon.


The five optimization features

1. Pre-rendering

What it does:

  • Spins up headless Chrome
  • Loads your page with JavaScript
  • Waits for content to render
  • Captures final DOM state

Result: JavaScript-rendered content becomes static HTML that crawlers can read.

2. Structured data injection

What it does:

  • Analyzes page content
  • Detects page type (Article, Product, FAQ, etc.)
  • Generates JSON-LD schema markup
  • Injects into \<head>\

Example output:

HTML
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Nike Air Zoom Pegasus 40",
  "brand": {"@type": "Brand", "name": "Nike"},
  "offers": {
    "@type": "Offer",
    "price": "130.00",
    "priceCurrency": "USD"
  }
}
</script>

3. Key facts extraction

What it does:

  • Scans content for data points
  • Identifies prices, dates, percentages, statistics
  • Marks them with semantic HTML

Example:

HTML
<span itemtype="price">$130.00</span>
<span itemtype="date">Released March 2024</span>

4. FAQ generation

What it does:

  • Analyzes content for Q&A patterns
  • Generates relevant questions users might ask
  • Creates FAQPage schema

Example:

HTML
<section itemscope itemtype="https://schema.org/FAQPage">
  <div itemscope itemprop="mainEntity" itemtype="https://schema.org/Question">
    <h3 itemprop="name">Is the Pegasus 40 good for marathon training?</h3>
    <div itemscope itemprop="acceptedAnswer" itemtype="https://schema.org/Answer">
      <p itemprop="text">Yes, the Pegasus 40 is suitable for marathon training...</p>
    </div>
  </div>
</section>

5. Entity recognition

What it does:

  • Identifies named entities (brands, products, people, locations)
  • Marks them semantically
  • Creates entity relationships

Example:

HTML
<span itemtype="Brand">Nike</span> released the
<span itemtype="Product">Air Zoom Pegasus 40</span> in
<span itemtype="Date">March 2024</span>

Performance characteristics

Latency

ScenarioLatency Added
Human visitor<10ms
AI crawler (cache hit)~100ms
AI crawler (cache miss)~2000ms (first time only)

Reliability

  • Automatic failover - On any error, serves your original site unchanged
  • Edge-native - Runs at the edge on platforms like Cloudflare, Vercel, and Netlify for minimal latency
  • Graceful degradation - If the API is unreachable, your site continues to work normally

Bandwidth

Optimized pages are typically 60-80% smaller than original pages due to removal of:

  • JavaScript bundles
  • Tracking scripts
  • CSS (or minimal inline)
  • Non-essential markup

Security

API key protection

  • API keys are hashed with SHA256 before storage
  • Keys are only transmitted over HTTPS
  • You can regenerate keys anytime

Data handling

  • AI Pages doesn't store your content long-term
  • Cached pages expire after 7 days
  • Analytics are tied to your account
  • No cross-customer data sharing

SEO safety

AI Pages never serves different content to:

  • Googlebot (for search rankings)
  • Bingbot
  • Any non-AI search crawler

This means your SEO is completely unaffected.


Limitations

What AI Pages can't optimize

  • Login-required pages - Crawlers can't authenticate
  • Real-time data - Cached content may be up to 7 days old
  • Interactive features - JavaScript apps become static snapshots
  • User-specific content - Personalization is lost

Known issues

  • Very large pages (>1MB) may timeout during optimization
  • Complex SPAs may not render completely
  • Some anti-bot systems may block our rendering engine

Next steps

Troubleshooting

Fix common AI Pages issues.

API Reference

Integrate AI Pages programmatically.

Was this helpful?

Press ? for keyboard shortcuts