Generative-Engine Optimization (GEO) | OpenSource Technologies

The shift from search to answers.

For two decades, organic search optimization meant one thing: rank in the top three blue links. That game is not over, but it is no longer the only one. Four AI surfaces are now intercepting questions before users ever see a search results page.

~50%

Of U.S. consumers used AI search at least monthly in 2025 (McKinsey)

800M+

Weekly users on ChatGPT alone, with native search rolled in

36%

Of mobile Google searches now show an AI Overview before any blue link

The shift matters because the shape of the funnel changed. In classic SEO, you optimized to rank. The user clicked your blue link, landed on your page, and entered your funnel. In GEO, you optimize to get cited. The AI engine reads your page, paraphrases it into the answer, attributes you with a small clickable citation, and the user may or may not click through.

Two funnels, two optimization targets

Classic SEO funnel

User asks Google

↓

Sees blue links

↓

Clicks your page

↓

Reads, converts

GEO funnel

User asks ChatGPT / Perplexity

↓

AI cites your page

↓

User reads paraphrased answer

↓

May click through, may not

A page can earn citations across all four major AI surfaces simultaneously: ChatGPT (with built-in search), Perplexity (search-native by design), Claude (with the Claude search tool), and Google AI Mode / AI Overviews. Each weights signals differently, but the foundation is shared. Get the foundation right, and you compete on all four.

Why this matters now, not next year

AI Overviews have appeared on more than a third of mobile Google queries since rollout. ChatGPT's search rolled out broadly in late 2024 and reached parity with classic search for transactional intent in 2025. Perplexity passed 100 million weekly active users in 2025. The traffic patterns these surfaces produce are different from classic search:

Lower volume, higher intent. Click-through rates from AI citations are lower than from a #1 organic ranking, but the visitors who do click have already read a paraphrased version of your answer. They arrive qualified.
Brand-mention exposure scales independently of clicks. Users see your brand in the answer even if they never click. That builds awareness on a different axis than CTR.
Multi-engine compounding. The same content investment ranks you in four AI engines plus Google's classic search at the same time, because the technical foundations overlap.

How AI engines decide what to cite.

The AI surfaces do not all share a public ranking algorithm. But behavior across hundreds of OST prompt-coverage tests in 2025 points to four shared signals.

SIGNAL 01

Topical authority & depth

Multiple pages, internal links, original data, and external citations of you. Beats keyword density every time.

SIGNAL 02

Citation-ready paragraphs

Short, factual, attributable sentences that an LLM can lift unchanged. Bury the qualifiers.

SIGNAL 03

Schema & structured data

Article, Product, FAQ, HowTo, Organization. Tells engines what each page is without making them guess.

SIGNAL 04

Recency & dated facts

"As of April 2026" beats "as of recently." Old, undated content drops out of time-sensitive answers.

1. Topical authority and depth, not keyword density

AI engines are not counting how often you say "ATV parts." They are reading your pages to decide whether you are a recognized authority on ATV parts. Topical depth (multiple pages, internal links, original data, expert citations of you) wins over keyword stuffing every time.

2. Citation-ready paragraphs

An LLM will cite a sentence that is short, factual, attributable, and unambiguously about a single thing. A 100-word paragraph full of qualifiers and marketing language is harder to cite. Write for the LLM's quoting behavior: state the claim cleanly, then expand.

The single highest-leverage change we make on client sites is rewriting the lead paragraph of every key page so it answers the page's primary question in the first 40 words. Citation rates on those pages roughly double within 90 days.

3. Schema and structured data

Schema.org structured data lets you tell engines what type of thing each page is (Article, Product, FAQ, HowTo, Organization, Review, etc.) without making them guess. AI engines lean on this heavily for product Q&A, FAQ surfacing, and author / publisher attribution.

4. Recency, dates, and freshness signals

"As of April 2026" beats "as of recently." Dated facts are more citable. AI engines avoid old or undated content for time-sensitive questions, which is most B2B and ecommerce content.

Key takeaway

The four shared signals are: authority & depth, citation-ready paragraphs, schema, and dated facts. If you optimize for those four, you will compete in all four AI engines plus classic Google search at the same time.

Technical foundations.

This is the layer where most GEO retainers spend the first 30 days. None of it is novel. All of it is the same work that supports accessibility, classic SEO, and screen-reader compatibility. The difference is that GEO makes the payoff visible.

Semantic HTML and heading hierarchy

One H1 per page. Logical H2 / H3 nesting. No skipped levels. Screen readers and AI crawlers both navigate by heading structure. A 6,000-word post with a single H1 and no H2s is, to an LLM, an undifferentiated wall of text.

Schema.org structured data

The schema types most relevant for GEO across B2B and ecommerce stacks:

Schema type	When to use	What it unlocks
Organization	Sitewide, on every page	Brand attribution in citations
Article	Blog posts, white papers, news	Author, publish date, headline
FAQPage	Pages with Q & A blocks	FAQ rich results, voice answers
HowTo	Step-by-step instruction pages	Step-numbered AI answers
Product	Every PDP on an ecommerce store	Product Q&A in AI shopping
Review & AggregateRating	Pages that aggregate reviews	Trust signals in AI answers
BreadcrumbList	Sitewide	Navigation context for AI

The validation tooling has matured: Google's Rich Results Test, Schema Markup Validator (schema.org), and the AI-bot specific checkers from Profound and AthenaHQ all give clear pass / fail output. There is no excuse for invalid schema in 2026.

robots.txt and the AI bot question

This is the spot where well-meaning teams shoot themselves in the foot. There is a real debate about whether to allow AI crawlers, and it is legitimate. But here is what blocking them costs you:

Blocking GPTBot in robots.txt removes you from ChatGPT search citations entirely. Same for ClaudeBot (Claude), PerplexityBot (Perplexity), and Google-Extended (Google's AI training crawler, separate from Googlebot). If your goal is to be cited, you cannot also block. Pick one.

Our standard recommendation: allow the search-time crawlers (GPTBot, PerplexityBot, ClaudeBot), and selectively allow or disallow the training crawlers (Google-Extended, Common Crawl, GPTBot's training mode) based on your IP-protection comfort level. Search-time bots only fetch when a user asks; they do not retain content.

# Example robots.txt for a GEO-friendly site
User-agent: *
Allow: /

# Search-time AI crawlers (allow)
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /

# Training crawlers (your call)
User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/llms.txt

The llms.txt convention

An emerging open standard. A markdown-formatted file at /llms.txt that summarizes your site's structure, key pages, and topical scope for LLM consumption. Adoption is uneven (early 2026), but cost to ship is low and likely-future-value is high. We add one to every client site.

Page speed, Core Web Vitals, and AI

AI engines crawl on a budget like classic search engines. Slow pages drop out of the crawl. Core Web Vitals (LCP, INP, CLS) directly affect crawl frequency and citation freshness. The sites that show up in AI answers tend to load fast and ship regularly.

Content patterns that work.

Once the technical foundation is in place, content writing is the next leverage point. Six patterns surface most often in citation analysis.

Pattern 01

The 40-word lead

Open every key page with 40 words that directly answer the page's primary question. No throat-clearing.

Pattern 02

Stand-alone definitions

Bold the term, define inline in one sentence. An LLM can lift it unchanged.

Pattern 03

Original numbers

Proprietary data with dates and attribution outranks restated industry averages.

Pattern 04

Comparison tables

Structured rows and columns let LLMs extract specific cells without reasoning.

Pattern 05

Step-numbered procedures

Ordered lists paired with HowTo schema unlock voice and step-by-step AI answers.

Pattern 06

FAQ blocks (80-150 words)

Substantive Q&A with FAQPage schema gets cited at high rates for long-tail queries.

Pattern 01: The 40-word lead

Open every important page with a 40-word paragraph that directly answers the page's primary question. No throat-clearing. No "in today's fast-paced world." If the page answers "what is WCAG 2.2 AA," the first 40 words say what WCAG 2.2 AA is.

Pattern 02: Definitions that stand alone

Every key term on the page should have a one-sentence definition that an LLM can lift unchanged. Bold the term. Define it inline. Do not bury it three paragraphs in.

Pattern 03: Original numbers and data

"OST's 2025 client survey found that 67% of B2B companies under $25M revenue had not run a single GEO audit." That sentence is gold for citation. AI engines preferentially cite original, dated, attributable data over restated industry trends. If you have proprietary data, lead with it.

Pattern 04: Comparison tables

Tables comparing options, vendors, plans, or technical specifications get cited more than equivalent prose. The structured row / column format lets the LLM extract specific cells without reasoning.

Pattern 05: Step-numbered procedures

HowTo content with numbered steps gets cited cleanly. Use ordered lists. Pair them with HowTo schema. The same content unlocks AI Overviews, ChatGPT step-by-step answers, and voice-assistant guidance.

Pattern 06: FAQ blocks with expanded answers

FAQs at the bottom of a page (paired with FAQPage schema) get cited at high rates for long-tail queries. Each Q&A should be substantive: 80 to 150 words per answer. Yes/no answers do not get cited. Explanations do.

Measurement and monitoring.

You cannot improve what you cannot measure. The standard GEO measurement stack covers four layers.

The four-layer GEO measurement stack

01 GA4 referrer trackingAI-search domains as a custom channel group Continuous

02 GSC AI-Overview impressionsGoogle Search Console AI exposure curve Weekly

03 Server-log AI-bot crawlsGPTBot, ClaudeBot, PerplexityBot, Google-Extended Weekly

04 Prompt-coverage testing25-50 prompts run across all four AI engines Monthly

Layer 01: GA4 referrer tracking

Identify the AI-search referrers and tag them as their own channel group in GA4. The most common: chat.openai.com, perplexity.ai, copilot.microsoft.com, gemini.google.com, and Google AI Overviews (which appear under the standard Google referrer but with a UTM signal). Build a custom GA4 channel group called "AI Search."

Layer 02: GSC zero-click and AI Overview impressions

Google Search Console now reports impressions in AI Overviews separately from classic search. Track them. The shape of the curve tells you whether your AI exposure is growing, flat, or shrinking.

Layer 03: Server log analysis for AI bots

Watch for visits from GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Crawl frequency is a leading indicator of citation eligibility. Pages that AI bots crawl weekly are cited more than pages they crawl quarterly.

Layer 04: Prompt-coverage testing

The most direct measure. Take a list of 25 to 50 commercial-intent prompts your customers might ask, run each prompt against ChatGPT / Perplexity / Claude / Google AI Mode, and record whether your domain appears as a citation. Re-run monthly. The trendline is your GEO scoreboard. Tools like Profound, AthenaHQ, and Otterly automate this.

Layer	Tools	Cadence
GA4 referrers	Google Analytics 4	Continuous
GSC AI impressions	Google Search Console	Weekly review
Server log AI bots	Cloudflare Logs, AWS CloudFront, GoAccess	Weekly review
Prompt-coverage tests	Profound, AthenaHQ, Otterly.AI, manual	Monthly

The 30-day GEO audit checklist.

Run this in order. A two-person team (one technical, one content) can complete it inside 30 calendar days for a 50-to-200-page site.

Day 1-2 Inventory crawler access
Pull robots.txt. Identify which AI bots are allowed and which are blocked. Document state before changing.
Day 3-5 Schema audit
Run every key template through Schema Markup Validator. Note gaps for Organization, Article, Product, FAQ, HowTo.
Day 6-8 Heading hierarchy audit
Flag pages with missing H1, multiple H1s, skipped levels, or single-H1 pages over 1,500 words.
Day 9-12 Prompt-coverage baseline
List 30-50 commercial-intent prompts. Run each in ChatGPT, Perplexity, Claude, Google AI Mode. Record citations.
Day 13-16 Lead-paragraph rewrites
Rewrite the first 40 words of each top page to directly answer the page's primary question. Bold key terms.
Day 17-20 Schema implementation
Ship the missing schema types. Validate every template. Re-test in Google's Rich Results Test.
Day 21-23 FAQ block additions
Add FAQ blocks with FAQPage schema to your top 10 pages. Each Q&A 80-150 words. Real questions, real answers.
Day 24-26 llms.txt + sitemap
Publish a markdown-formatted /llms.txt summarizing site scope and key pages. Add to sitemap declarations.
Day 27-28 Measurement stack
Configure GA4 channel group for "AI Search." Set up server-log AI-bot reporting. Schedule monthly re-tests.
Day 29-30 Document baseline & targets
One-page report: starting prompt-coverage score, baseline crawl frequency, 90-day target. Distribute internally.

What "good" looks like at day 90

For a typical mid-market B2B site running this checklist, expect to see prompt-coverage scores rise from 8-15% baseline to 25-40% inside 90 days, AI bot crawl frequency double, and AI-Overview impressions grow visibly in GSC. Sites that already have good SEO and accessibility see faster gains. Sites with stale content and weak structure see the biggest absolute lift, but it takes longer.

Common mistakes.

From OST's GEO retainer work in 2025, ranked roughly by frequency.

Blocking AI bots without strategic intent. Teams reflexively block GPTBot in robots.txt for IP reasons, then complain that they get no AI citations. Pick one outcome.
Schema once, schema never. Schema gets shipped, then ages. Product schema with stale prices and old reviews actively hurts. Schedule quarterly schema audits.
Long-form pages with weak structure. 4,000-word pages with two H2s and no FAQ blocks are nearly impossible to cite. Chunk them or add structure.
Measuring blue-link CTR only. Classic SEO dashboards underreport AI exposure. Build a separate channel group and run prompt-coverage tests monthly.
"Just publish more content." AI engines do not preferentially cite high-volume publishers. They cite citable, authoritative, well-structured content. Quality > cadence.
Treating GEO as a 90-day project. The technical lift is one-time. The content cadence and prompt-coverage testing are continuous. Budget for the latter.

This paper is a snapshot of the practice as it stands in May 2026. AI search ranking signals will keep shifting. The technical foundations covered here are the parts least likely to change, because they are the same patterns that support accessibility, classic SEO, and screen-reader access. Investing in them is durable.

For organizations that want help running this checklist on their own site, OST offers GEO retainers with monthly prompt-coverage reporting and quarterly schema audits. Contact details below.

About the authors

Shaili Gupta

President, OpenSource Technologies. Lead author.

GEO Practitioner certified. Holds GA4, Google Ads, and WCAG 2.2 AA Accessibility Oversight credentials. WBE-accredited principal. Drives OST's digital marketing practice and signs every GEO retainer that leaves the building.

Manish Mittal

CEO & founder. Technical co-author.

Forbes Technology Council member. AI architect on every OST AI engagement. Provided the technical-foundations and measurement sections of this paper. Architect of the OST AI Assistant.

Generative-Engine Optimization (GEO). the practical playbook for getting cited by ChatGPT, Perplexity, and Google AI Mode.

What this paper covers, in plain English.