Metrics
:::summarybox remember Every number Trakkr shows you, grouped by what it actually measures The real formula behind each score - with worked examples where they're not obvious "What good looks like" with the caveat that context decides
Visibility metrics
How much real estate your brand takes up in AI answers. These four are your headline numbers and show up across the Dashboard, Prompts, and Competitors pages.
Visibility Score {#visibility}
What it measures: How prominently AI models mention your brand across every tracked prompt and model, weighted by where you appear in a list-style answer.
How it's calculated: Three steps.
- For every appearance, Trakkr awards a position score. First position = 10 points, second = 9, down to tenth = 1. Position 11+ or not mentioned = 0.
- Sum the position scores, then divide by the maximum possible score across all prompt-model combinations (not just the ones where you appeared). That denominator is what keeps the score honest - it stops a brand that appears rarely but always at #1 from outscoring a brand that consistently lands in the top three.
- Apply a square-root scaling so consistent mid-range performance counts more than occasional spikes.
| Position | Points |
|---|---|
| 1st | 10 |
| 2nd | 9 |
| 3rd | 8 |
| … | … |
| 10th | 1 |
| 11th+ or not mentioned | 0 |
raw = (sum of position scores) / (prompts × models × 10) × 100
score = 100 × √(raw / 100)
Worked example. You're tracked on 10 prompts across 3 models - 30 opportunities. You appear at positions [1, 3, 5] across three of those runs and don't show up in the other 27.
- Position points:
10 + 8 + 6 = 24 - Raw:
24 / (30 × 10) × 100 = 8.0 - Scaled:
100 × √(0.08) ≈ 28
A Visibility Score of 28 at low coverage - and that's deliberately conservative.
What's good: 40+ is solid. 60+ is excellent. 80+ means you dominate your tracked prompts (which usually means your tracked prompts are too easy - consider widening).
Where you see it: Dashboard hero, every Competitors row, top of the Brands list, Prompts page averages.
Presence Rate {#presence}
What it measures: The percentage of prompts where your brand appears at all - regardless of position.
How it's calculated: Prompts where you're mentioned ÷ total prompts × 100. No position weighting and no sqrt scaling.
What's good: 50%+ is solid coverage. 75%+ is excellent. 100% almost always means your tracking is too narrow - add a few aspirational prompts you don't yet win.
Why it's not the same as Visibility: Presence is breadth, Visibility is depth. A brand can hit 100% Presence but score 50% Visibility if it's always mentioned last. Use both.
Average Position {#position}
What it measures: When AI lists brands, where do you typically sit?
How it's calculated: Sum of all your positions across mentions ÷ total mentions. Prompts where you don't appear aren't counted - they push Presence down, not Average Position up.
What's good: 1-2 is excellent. 3-4 is good. 5+ means you're appearing but as an afterthought.
Watch out for: A great Average Position with low Presence Rate means you only show up in your easiest prompts. Narrow your view to the prompts where you're not yet appearing and the picture changes.
Mentions {#mentions}
What it measures: Raw count of times your brand appeared across all prompts and models in the selected period.
How it's counted: One mention per (prompt × model × run). The same brand appearing twice in one answer counts once. Mention totals across the period are summed across daily runs.
Why it matters: Volume is the lever for catching outliers. A sudden spike usually traces back to a single new citation source. A slow decline often points to a model retrain or a competitor displacing you on a high-frequency prompt.
Demand metrics
How much traffic flows through the prompts you track. These help you prioritize which prompts deserve attention before you grind on visibility.
Demand Score {#demand-score}
What it measures: How likely people are to ask this question in AI chat, expressed as a 0-100 score.
How it's calculated: A weighted blend of three signals, with adjustments.
- Search demand - real clickstream data showing how often people search for related topics, log-scaled to 0-100.
- LLM affinity - how naturally the query fits AI conversation patterns. Comparisons and creative requests score higher than simple navigational facts.
- Specificity penalty - very long, narrow queries get points deducted. They usually indicate niche topics with lower overall demand.
The output is compressed into a 20-90 range so a low-demand prompt isn't pinned at 0 (every prompt has some interest) and an extreme outlier doesn't max out the chart.
| Score | Meaning |
|---|---|
| 70+ | High demand - valuable real estate |
| 40-69 | Medium demand - solid opportunity |
| <40 | Lower demand - niche or specialized |
How to use it: Sort prompts by Demand Score, then look at the ones where your Visibility is lowest. That's your highest-impact backlog.
Watch out for: Demand Score is directional, not surgical. A 72 vs a 68 is a coin flip - treat the score as a bucket, not a rank.
AI Volume {#ai-volume}
What it measures: The estimated number of times people ask AI platforms about a given topic each month, across ChatGPT, Gemini, Claude, Perplexity, Copilot, and others.
Why it's an estimate: Unlike Google, AI platforms don't publish query volume data. Trakkr combines multiple data sources, then rounds conservatively - the real number is usually higher than what's shown, never lower.
How it's calculated: A three-tier waterfall, choosing the highest-confidence data available.
| Confidence | Label | How it works |
|---|---|---|
| High | Measured | Direct panel data from AI search platforms, smoothed with a trailing 12-month average. Most accurate. |
| Medium | Calibrated estimate | Derived from Google search volume using learned ratios per query type - e.g. comparison queries have higher AI crossover than navigational ones. |
| Low | Projected estimate | Classified by topic type when no search data is available. Shown as a range rather than a specific number. |
Hover any volume number to see its confidence tier. High and medium estimates display a number with a ~ prefix; low estimates display a range.
Platform breakdown: Total volume is split across platforms by current market share - ChatGPT ~72%, Gemini ~12%, Claude ~6%, Perplexity ~5%, Copilot ~3%, others ~2% (refreshed quarterly). The skew is then adjusted by query type - Perplexity over-indexes on research, Claude on technical, Copilot on productivity.
The platform shares here are for volume estimation only - they include Copilot because Microsoft publishes share data, even though Trakkr doesn't poll Copilot directly. Your tracked answer surface is the eight models on the Core Concepts page.
Query Type: Every prompt is classified into one of seven types - comparison, recommendation, how_to, factual, navigational, creative, technical. This feeds the platform skew and the AI Overviews trigger likelihood.
How to use it: Pair AI Volume with Visibility to find your biggest opportunities. High volume + low visibility = high-impact prompts to improve next.
Competitive metrics
How you stack up against rivals you track. These live on the Competitors page and the head-to-head drill-downs.
Share of Voice {#share-of-voice}
What it measures: Your visibility expressed as a proportion of total visibility across all tracked competitors. The closest thing Trakkr has to a "category leadership" number.
Share of Voice on the Competitors page is actually two donuts side by side. Both matter:
| Donut | What it measures |
|---|---|
| Recommended First | Of all #1 mentions across your prompts, what share are you? Captures leadership. |
| Mention Share | Of all mentions at any position, what share are you? Captures total category footprint. |
A brand can dominate Mention Share but lose Recommended First if it's always in the answer but rarely at the top. The gap between the two is itself a useful diagnostic.
What's good: 35%+ Mention Share in a market with five tracked rivals usually means you're the de facto leader on the prompts you've chosen.
Win Rate {#win-rate}
What it measures: How often you rank higher than a specific competitor when both of you appear in the same answer.
How it's calculated: Prompts where you outrank competitor X ÷ prompts where you both appear.
What's good: 55%+ means you're winning. 70%+ means you dominate that competitor. Below 40% with high co-occurrence is the threat zone.
Watch out for: Win Rate ignores prompts where you don't appear at all. A 90% Win Rate against a competitor who only shows up on three prompts isn't the flex it sounds like - check Presence first.
Threat Tier {#threats}
What it measures: Trakkr's classification of a competitor's pressure on your brand, computed from visibility gap and co-occurrence.
| Tier | Roughly means |
|---|---|
| High | Visibility gap >20 points against you, or Win Rate <30% on 10+ shared prompts |
| Medium | Gap >10 points, or Win Rate <40% with regular co-occurrence |
| Low | Anyone who outranks you anywhere - worth watching |
Where you see it: The Threats filter on the Competitors page and the per-competitor row.
Competitive Gap {#competitive-gap}
The percentage-point difference between your Visibility Score and a competitor's. Positive = you're ahead. Negative = they are. Coloured green or red on every comparison row. It's a presentation of the underlying scores, not a separate metric.
Head-to-Head {#head-to-head}
The drill-down view that opens when you click a competitor row - wins, losses, and ties across every prompt and model where you both appear, plus a per-model breakdown. It's a view, not a number.
Citation metrics
How AI sources its answers about you. These power the Citations page and the citation widget on the Dashboard.
Citations {#citations-count}
What it measures: Unique URLs that AI models reference when discussing your brand in the selected period. De-duplicated across runs - the same URL cited five days in a row counts as one citation, not five.
Why it matters: More citations from authoritative sources = stronger AI presence. The list itself is your improvement roadmap.
Citation Quality Score {#citation-quality}
What it measures: Average authority of the sources citing your brand, on a 0-100 scale.
How it's calculated: A weighted average of Domain Authority across every citing URL, with extra weight given to sources cited multiple times. A single citation on TechCrunch lifts the score more than ten citations on small blogs.
Domain Authority {#domain-authority}
What it measures: How authoritative a citing website is. Forbes outranks a random blog.
Where it comes from: A blend of inbound link profile, traffic estimates, and domain age - borrowed from established SEO metrics and refined with AI-specific signals (whether the domain is a preferred source for live-retrieval models, for example).
Source Type {#source-type}
What it measures: How Trakkr classifies a citing domain. Different types deserve different responses.
| Type | Examples | Leverage |
|---|---|---|
| Earned media | TechCrunch, NYT, industry pubs | Highest - chase these |
| Institution | .edu, .gov, IEEE, ISO | Very high - hard to land but durable |
| Review | G2, Capterra, TrustRadius | High for SaaS |
| Owned | Your own domain | Moderate - models down-weight self-references |
| Social | Reddit, LinkedIn, X | Variable - Reddit is unusually weighted by AI |
| PR wire | PRNewswire, Business Wire | Low - models discount these |
| Competition | Competitor blog comparisons | Variable - good signal of category presence |
| Other | Anything uncategorised | Investigate before acting |
Citation Intent {#citation-intent}
What it measures: The buyer intent behind queries that triggered each citation. Shown as a coverage bar on the Citations page so you can see whether your citations span the funnel or cluster in one stage.
| Intent | What it captures |
|---|---|
| Comparison | "X vs Y" queries |
| Alternative | "Alternatives to X" queries |
| Best For | "Best X for use case Y" queries |
| Discovery | "What's a tool that does X" |
| Recommendation | "Recommend me a tool for X" |
Watch out for: Heavy coverage on Discovery but nothing on Comparison means buyers know you exist but you're not in the final shortlist. That's a different fix than the reverse.
Reputation & Sentiment
How AI - and the sources AI reads - talk about you. The Citation Sentiment, Perception, and Reddit features all measure different facets of this.
Citation Sentiment {#citation-sentiment}
What it measures: Whether each citing page discusses your brand positively, neutrally, or negatively.
How it's calculated: Each citing page's content is passed through a sentiment classifier with brand context - so "Notion is the leader" reads positive, "Notion has bugs in version 2" reads negative, and "Notion costs $10/month" reads neutral.
Where you see it: A green/grey/red bar on every source card on the Citations page, and as a filter in the Citation Feed.
Reddit Citation Score {#reddit-citation-score}
What it measures: How likely AI models are to learn from a given Reddit thread, on a 0-100 scale. Drives the Citation Band filter (High / Mid / Low) and the Opportunity ranking on the Reddit page.
How it's calculated: A blend of subreddit authority, thread engagement, recency, and whether the discussion centres on your category. High-citation threads are the ones worth contributing to authentically.
Reddit Draft Quality Score {#reddit-draft-grade}
What it measures: The quality of an AI-drafted Reddit reply, on a 0-10 scale, with four sub-scores and a disclosure flag.
| Sub-score | What it grades |
|---|---|
| Helpfulness | Does it actually answer the question? |
| Specificity | Does it reference the thread, not just talk past it? |
| Tone | Does it sound like a human contributor, not a brand? |
| Non-spammy | Does it avoid pitch language and self-promotion? |
| Disclosure | Boolean - is the brand affiliation transparent? |
What's good: 7.5+ overall with all sub-scores above 6. Anything below 6 reads as marketing and Reddit will downvote it.
Reddit communities are allergic to marketing. If the grade flags any sub-score below 6, rewrite - don't post.
Perception metrics
How AI describes your brand qualitatively. These live on the Perception page.
Overall Perception {#perception-score}
What it measures: How positively AI describes your brand across 20 attributes in 5 categories, summarised as a single 0-100 score.
What's good: 75+ is excellent. 60-74 is good. Below 60 needs work - and the per-category breakdown will tell you where.
| Category | Attributes |
|---|---|
| Trust & Reliability | Overall trust · Reliability · Transparency · Safety perception |
| Quality & Performance | Overall quality · Problem resolution · Responsiveness · User satisfaction |
| Value & Experience | Value for money · Ease of interaction · Accessibility · Necessity |
| Market Position | Brand recognition · Professional image · Recommendation likelihood · Uniqueness |
| Innovation & Appeal | Forward thinking · Adaptability · Likability · Confidence-inspiring |
Each category is the average of its four attributes. The Perception page shows the full 20-attribute grid plus a per-model breakdown.
Watch out for: Perception scores are noisier than visibility scores because they're sentiment classifications on a smaller corpus of mentions. Treat anything with fewer than 20 mentions in the period as directional, not diagnostic.
Crawler & AI Traffic metrics
What AI bots do on your site, and what humans do after they leave AI. These power the Crawlers page and the Visitors page.
Total Visits {#crawler-total}
What it measures: All page requests from tracked AI crawlers in the selected period. The hero number on the Crawlers dashboard.
Conversations {#crawler-conversations}
What it measures: Live AI fetches happening during a real chat - ChatGPT-User, Perplexity-User, Claude-User, MistralAI-User, Meta-ExternalAgent.
Why it matters: This is the strongest leading indicator that a citation is about to land. A spike on a page usually means an answer was generated that referenced it.
Indexing {#crawler-indexing}
What it measures: Search bots pre-fetching content for an AI search index - OAI-SearchBot, PerplexityBot, Claude-SearchBot, Applebot.
Why it matters: Indexing hits precede citations on retrieval-heavy models like Perplexity and ChatGPT Search. Heavy indexing of a page is often a 1-7 day leading indicator of new citations.
Training {#crawler-training}
What it measures: Bulk crawlers gathering content for a future model training cycle - GPTBot, ClaudeBot, CCBot, Amazonbot, Bytespider, DeepSeekBot.
Why it matters: Training effects show up in 6-18 months, not days. Track these for long-term trajectory, not weekly action.
Agent (emerging) {#crawler-agent}
What it measures: Autonomous agent bots that act on behalf of a user - currently Google-Agent and emerging equivalents.
Why it matters: Volume is currently small enough that the dashboard tracks it as a platform filter rather than a top-level category card alongside Training / Indexing / Conversations. The metric is still the first read on whether AI agents are starting to use your site to complete tasks - when the population grows past a single bot, it gets promoted.
AI Visitors {#ai-visitors}
What it measures: Real humans landing on your site from an AI referrer (ChatGPT, Perplexity, Claude, Gemini, Copilot). Powered by your Google Analytics 4 connection.
Why it matters: Crawler hits prove the bot saw you. AI Visitors prove the citation converted into traffic.
Citation Correlation {#citation-correlation}
What it measures: The relationship between crawler hits on a page and citations to it - shown as a chart on the Crawlers page that overlays bot traffic against citation appearances.
How to use it: Pages with high indexing but no citations are usually the closest "near-miss" - they're being read but not chosen. Often a content or schema fix away from landing.
Opportunity & Outreach metrics
How Trakkr scores and ranks the citation gaps worth chasing. These power Outreach.
Fit Score {#fit-score}
What it measures: How well a citation source matches your brand and prompts, on a 0-100 scale.
How it's calculated: A blend of source-prompt relevance, source-type weight, whether competitors are already cited there, and alignment with your positioning.
What's good: 70+ is a strong fit. Below 50 is usually noise.
Difficulty {#difficulty}
What it measures: How hard this kind of source typically is to land coverage on - Low, Medium, or High.
| Difficulty | Typical sources |
|---|---|
| Low | Roundup posts, smaller blogs, niche reviews |
| Medium | Mid-tier publications, established review sites |
| High | Long-form editorial, institutional sources, top-tier news |
Priority {#priority}
What it measures: Trakkr's blended ranking of an opportunity - Fit, Difficulty, competitor pressure on the domain, and recency signals. The default sort in the Outreach queue.
When to override: Switch to sorting by Fit when you want pure leverage and don't care about momentum signals.
Quadrants {#quadrants}
Cross Fit and Difficulty and you get four buckets. The names are the strategy:
| Difficulty / Fit | High fit | Low fit |
|---|---|---|
| Easy | Quick Wins - start here | Low priority - skip unless idle |
| Hard | Worth It - the big landings | Skip - low payoff, high effort |
Trend & comparison windows
How your numbers are moving over time. Every score on the Dashboard, Prompts, Competitors, and Citations pages can be compared across four windows.
| Window | Reads as | Best for |
|---|---|---|
| 7-day | Short-term momentum | Catching outliers and live-retrieval movement |
| 14-day | Smoothed short-term | Filtering out single-day noise |
| 30-day | Medium-term trajectory | Real signal - this is the one to act on |
| 90-day | Long-term direction | Quarterly reviews, training-model effects |
Reading direction: Green = improving. Red = declining. Grey = stable.
Reading magnitude:
- ±2 points on a 7-day window is usually noise. Wait for 5+ before reacting.
- ±5 points on a 30-day window is real signal. Find the cause.
- Anything ±10 points or more on any window is worth investigating the same day.
Why model retrains show up here: A 30-day decline that hits across most prompts and models simultaneously almost always traces to either a new competitor landing a major citation or a model retraining with different training data. Single-model declines are usually fixable; cross-model declines need a content or citation response.
Site & content quality
Two scores that don't fit the visibility framework but shape it from upstream.
Audit Score {#audit-score}
Your site's AI-readiness rating from Optimize, on a 0-100 scale. Measures how easily AI crawlers can extract, parse, and cite your content. Higher Audit Score generally translates to higher Indexing crawler hits and faster citation pickup. See the Optimize docs for the full check list.
Narrative Score {#narrative-score}
For any Narrative you track, a 0-100 score per model showing how strongly the model associates your brand with the topic. Different from Visibility - this is what AI says about a specific theme, not how often you appear overall.
Three reminders before you act on a number {#reminders}
There's no universal "good" score. Every benchmark on this page is a rule of thumb. Your market, prompt mix, and competitor set all change the meaning. The most useful comparison is to your own number from last week.
Model performance varies a lot. Scoring 80 on Claude and 30 on Perplexity is normal - they have different training data, different cutoffs, different retrieval approaches. The Dashboard's per-model breakdown is where most diagnoses start.
Presence and Position answer different questions. Presence asks: were you mentioned at all? Position asks: where in the list? Both matter, and you usually need to fix Presence first.
Related
ai|Glossary|Plain-English definitions of every term used in Trakkr.|/learn/docs/glossary
visibility|Core Concepts|The mental model - what the metrics on this page are actually measuring.|/learn/docs/concepts
content|FAQ|Quick answers to common questions about scoring, models, and timing.|/learn/docs/faq