Structured Data for AI Search: The Schema Markup Types That Get You Cited in 2026
Only 33% of websites use structured data correctly. The other 67% are invisible to the AI systems that now decide which sources get cited, which products get recommended, and which businesses get chosen. Schema markup is no longer a nice-to-have for rich snippets. It is the machine-readable language that LLMs, AI Overviews, and autonomous purchasing agents use to understand, evaluate, and cite your content.
On this page
Structured Data and AI Search in 2026
Content with schema markup is 2.5x more likely to appear in AI-generated answers. The Schema.org vocabulary now exceeds 800 types, yet most sites barely scratch the surface.
Google deprecated 7 schema types in January 2026 and added new types for sustainability claims, AI content disclosure, and virtual events.
FAQ schema makes pages 2.3x more likely to be cited by AI systems. Only 33% of websites use structured data correctly despite its proven impact on both traditional and AI search visibility.
Why Structured Data Matters More for AI Than Traditional SEO
Search engines could always muddle through unstructured HTML. Googlebot has spent two decades getting better at inferring meaning from messy web pages. LLMs and AI search systems have a different problem. They are processing billions of pages to generate answers, and they need to decide — fast — which sources are trustworthy, relevant, and citable. Structured data is the shortcut. When your page declares its author, publication date, topic, and content type in machine-readable JSON-LD, you are handing the AI exactly what it needs to evaluate you without parsing your entire DOM.
The 2.5x citation advantage for schema-marked content is not an accident. AI systems like those powering Google AI Mode and Perplexity use structured data as a confidence signal. A page with Article schema, proper dateModified, and author entity markup tells the AI: this content is maintained, attributed, and categorized. A page without structured data forces the AI to guess, and AI systems increasingly choose not to guess when better-marked alternatives exist.
Traditional SEO treated structured data as a rich snippet opportunity — a way to get star ratings or FAQ dropdowns in SERPs. That was valuable but optional. In AI search, structured data is a ranking input. It feeds directly into the retrieval-augmented generation (RAG) pipelines that ChatGPT, Google AI Overviews, and Perplexity use to select sources. If you are serious about AI citation optimization, structured data is not step five on your checklist. It is step one.
Consider the scale problem. The Schema.org vocabulary exceeds 800 types, but the average website implements two or three. Article. Maybe Organization. Sometimes a sloppy FAQ block. Meanwhile, AI systems are hungry for specificity — they want to know the exact schema type, the relationships between entities on your page, the temporal freshness of your data. Every property you leave empty is a signal you did not send. As our schema markup complete guide covers in depth, the gap between what is possible and what most sites implement is enormous.
The Schema Types Google Deprecated in 2026
Google cleaned house in January 2026, deprecating seven schema types that had either outlived their usefulness or been so widely abused that they generated more noise than signal. The deprecations were not quiet. Sites that relied on these types for rich results saw those results disappear within weeks. If you have not audited your structured data since December 2025, you may be carrying dead markup that actively hurts your credibility with Google's systems.
The Sitelinks Searchbox schema was the highest-profile casualty. Google now generates sitelinks algorithmically and ignores the markup entirely. Legacy Breadcrumb variants — the older formats that predated the current BreadcrumbList standard — were also cut. Data-Vocabulary.org formats, which Google had been warning about since 2020, are now completely ignored by both traditional and AI search. QAPage markup on pages without genuine question-answer content was nuked, along with HowTo schema applied to non-instructional pages. Speakable markup in non-news contexts was restricted to eligible news publishers. And Review markup without verifiable user-generated reviews was devalued to the point of irrelevance.
The pattern behind these deprecations tells you something about where Google is heading. Every removed type was either outdated, misapplied, or faked. Google is tightening the relationship between what your structured data claims and what your page actually contains. This is directly relevant to AI search: when Google AI Overviews pull structured data to inform citations, accuracy matters more than volume. A page with three schema types that perfectly describe its content will outperform a page with ten types that are loosely applied. Our schema markup generator has been updated to exclude deprecated types, but if you hand-coded your JSON-LD, run an audit now.
The deprecation of HowTo on non-instructional pages deserves special attention. Many sites had been slapping HowTo schema on service pages, product pages, and even about pages to game rich results. Google's enforcement now checks whether the page content actually contains step-by-step instructions. If it does not, the markup is not just ignored — it may trigger a manual action. This is a clear message: schema markup is a declaration of truth, not a marketing tactic. Treat it that way.
New Schema Types: What Was Added and Why
While Google was pruning dead weight, Schema.org was adding types that reflect the realities of 2026 web content. Three additions stand out: sustainability claims markup, AI content disclosure schema, and virtual events structured data. Each one addresses a specific gap that AI systems were struggling with.
Sustainability claims markup lets businesses declare environmental and social governance data in machine-readable format. Carbon footprint numbers, recycled material percentages, certification bodies — all of it can now be expressed as structured data. This matters for AI search because sustainability queries are exploding, and AI systems need structured inputs to compare claims across companies. A Perplexity query like "which SaaS companies have the lowest carbon footprint" can now pull directly from structured sustainability data rather than scraping marketing pages and hoping the numbers are comparable.
AI content disclosure schema is the most politically significant addition. It provides a standardized way to declare whether content was human-written, AI-assisted, or fully AI-generated. Google has been publicly ambiguous about how it treats AI content, but the creation of a formal schema type tells you it is building systems to process this signal at scale. Sites that proactively declare their content provenance using this schema are positioning themselves for whatever transparency requirements come next. The author entity E-E-A-T guide covers how this intersects with authorship signals.
Virtual events markup was overdue. The pandemic-era shift to online events never fully reversed, and hybrid events are now the default for conferences, webinars, and training sessions. The new schema types let you specify online attendance URLs, hybrid formats, platform requirements, and timezone-specific scheduling in structured data. AI assistants handling calendar management and event recommendations can now parse this data directly. If you run events, this markup is immediate implementation territory — the competitive advantage of early adoption is significant before saturation sets in.
The Top 5 Schema Types for AI Citations
Not all schema types carry equal weight with AI systems. After analyzing citation patterns across ChatGPT, Perplexity, Google AI Overviews, and Claude over Q1 2026, five types consistently correlate with higher citation rates. In order: FAQPage, Article with full author markup, HowTo with genuine instructional content, Product with Offer, and Organization with comprehensive sameAs links. These five should be your implementation priority if AI visibility is the goal.
FAQPage schema leads because its structure maps perfectly to how LLMs retrieve information. When an AI needs to answer a specific question, FAQ markup hands it a pre-packaged answer with the question-answer relationship already defined. Article schema with dateModified and full author details gives AI systems the freshness and authority signals they need to justify a citation. HowTo with real step-by-step content gets pulled into AI Overviews at a remarkable rate because the step structure translates directly into the procedural answers AI systems generate. These are covered in our AAO guide from the agent optimization perspective.
Product schema with Offer markup is increasingly critical as AI agents begin executing purchasing workflows. When ChatGPT or Google AI recommends a product, it prefers sources where pricing, availability, and specifications are structured rather than buried in marketing copy. Organization schema with comprehensive sameAs links feeds the Knowledge Graph, which in turn feeds every AI system that Google operates. The more connected your entity data is, the more likely AI systems are to recognize and cite you as an authoritative source.
The remaining 795+ schema types are not useless — they serve specific use cases. But if you are working with limited implementation bandwidth, these five types deliver the highest ROI for AI citation performance. Get them right first. Get them complete. Then expand. Use our AIO readiness checker to see where your current structured data stands relative to AI citation requirements.
FAQ Schema: The AI Citation Multiplier
Pages with FAQ schema are 2.3x more likely to be cited by AI systems. That number alone should make FAQ schema your default implementation on every informational page. The reason is structural: LLMs are trained on question-answer pairs, and FAQ schema literally packages your content in that exact format. When Perplexity is looking for a concise answer to "how does schema markup affect AI search," a page with that question already structured in FAQ schema has an enormous retrieval advantage over a page where the same information is buried in paragraph four of section six.
The key to effective FAQ schema in 2026 is specificity. Generic questions like "What is structured data?" are answered by thousands of pages. Specific questions like "Which schema types did Google deprecate in January 2026?" have far fewer competing answers. AI systems reward specificity because it reduces the synthesis work they need to do. Write FAQ questions that mirror the long-tail queries your audience actually asks, and provide answers that are complete enough to cite but concise enough to extract. Our schema markup beginner's guide walks through the implementation mechanics.
There is a nuance here that most guides miss. FAQ schema is not just about the visible FAQ section at the bottom of your page. You can mark up question-answer content anywhere it appears — in subheadings that pose questions, in callout boxes, in interview transcripts. The schema does not require a dedicated FAQ block. It requires genuine question-answer pairs. Some of the highest-performing FAQ implementations we have seen are on pages that do not have a traditional FAQ section at all — they use the schema to mark up questions answered throughout the body content.
One warning: Google's January 2026 deprecations included QAPage with thin content. If your FAQ answers are one-sentence stubs, you are flirting with devaluation. Each answer should be substantive enough to stand alone as a useful response. Three to five sentences minimum. Include at least one specific fact or data point per answer. AI systems are getting better at detecting low-effort FAQ markup, and the penalty is not just lost rich results — it is reduced citation probability across all AI platforms.
Product and Offer Schema for Agentic Commerce
Agentic commerce — AI agents autonomously researching, comparing, and purchasing products — is the fastest-growing use case for structured data in 2026. When an AI purchasing agent is tasked with finding the best project management tool under $50/month with Jira integration, it does not read your marketing page. It parses your Product schema. If your Product schema does not include pricing via Offer markup, feature lists, integration specifications, and availability data, you are invisible to the agent. Period. Our agentic commerce SEO guide covers the broader strategy, but structured data is the foundation.
Product schema needs to be comprehensive to work in agentic contexts. The minimum viable implementation includes name, description, brand, offers (with price, priceCurrency, and availability), and aggregateRating. But minimum viable is not competitive. The sites winning agentic commerce queries also include additionalProperty for technical specifications, isRelatedTo for complementary products, review with structured ratings, and hasMerchantReturnPolicy for returns data. Each additional property gives the AI agent more data points for comparison, which makes your product more likely to be selected.
Offer markup deserves its own attention. Price alone is not enough. AI agents need to know the billing cycle (monthly vs. annual), any trial periods, tier structures, and what features are included at each price point. Use the eligibleQuantity, eligibleRegion, and validFrom/validThrough properties to give agents the context they need to make accurate comparisons. A competitor with detailed Offer markup will beat you in agent recommendations even if your product is objectively better, because the agent can only evaluate what it can read.
The technical SEO work required for comprehensive Product schema is non-trivial, especially for sites with large catalogs. Automate where possible. Use your CMS or ecommerce platform's native schema capabilities as a baseline, then layer on custom JSON-LD for the properties your platform does not handle. Validate every product page individually — template-level schema often breaks on edge cases like out-of-stock items, discontinued products, or bundle pricing. AI agents are merciless about data accuracy. One incorrect price in your Offer markup and the agent may blacklist your entire catalog.
Speakable Schema for Voice and AI Assistants
Speakable schema identifies the sections of your content that are best suited for text-to-speech playback by voice assistants and AI systems. When Alexa, Google Assistant, or Siri need to read an answer aloud, they look for Speakable markup to find the most quotable, self-contained passages on your page. Without it, they have to guess which paragraph to read — and they often guess wrong, delivering an awkward mid-paragraph excerpt instead of a clean, complete answer.
The 2026 deprecation restricted Speakable to news publishers for traditional search features, but here is what most people miss: AI systems beyond Google still use Speakable-like signals to identify extractable content. When ChatGPT selects a passage to cite, it favors content that is self-contained, concise, and clearly scoped — exactly the characteristics you would optimize for Speakable. Even if you are not a news publisher eligible for Google's Speakable rich results, structuring your content with Speakable-style best practices improves your citability across all AI platforms.
For eligible news publishers, Speakable is a direct competitive advantage. Mark up your article's headline, summary paragraph, and key takeaways. These are the passages voice assistants will read aloud when a user asks about your topic. Keep marked sections under 300 characters for optimal voice delivery. Use complete sentences. Avoid jargon, abbreviations, and parenthetical asides that sound awkward when read aloud. The spoken web is growing alongside the AI web, and the markup strategies overlap significantly.
Even if you do not implement formal Speakable schema, apply its principles to your content architecture. Write introductions and key paragraphs as if they will be read aloud. Make them self-contained. Front-load the most important information. This approach improves AI citation extraction regardless of the schema type — it is a content design philosophy as much as a markup strategy. The AIO optimization services we offer include content structuring for voice and AI extraction as a core deliverable.
Implementation: JSON-LD Best Practices for 2026
JSON-LD is the only structured data format worth implementing in 2026. Microdata and RDFa still technically work, but Google has explicitly stated its preference for JSON-LD, and every AI system that processes structured data is optimized for it. JSON-LD lives in a script tag in your page's head (or body), separate from your HTML content. This separation is its greatest advantage: you can maintain your structured data independently of your page templates, test it without affecting page rendering, and deploy it through tag managers when CMS access is limited.
The most important best practice for 2026 is entity linking via @id. Instead of repeating the same Organization or Person data in every schema block, define the entity once with a unique @id (typically a URL) and reference that @id everywhere else. This creates a coherent entity graph that AI systems can traverse. A page with Article schema referencing an Organization entity via @id, which in turn has sameAs links to your Knowledge Panel, LinkedIn, and Wikidata entry, is exponentially more valuable to AI systems than flat, disconnected schema blocks.
Nest your schema types intelligently. An Article should include its author (Person), publisher (Organization), and if applicable, the about property referencing the topic entity. Product should nest Offer, AggregateRating, and Brand. This nesting tells AI systems not just what is on your page, but how the elements relate to each other. Flat schemas that list properties without relationships are like a spreadsheet without headers — technically data, but hard to interpret correctly. Our schema markup generator handles nesting automatically for the most common schema combinations.
Server-side rendering of JSON-LD is strongly preferred over client-side injection. While Googlebot can execute JavaScript, many AI crawlers and LLM training pipelines process raw HTML without JavaScript execution. If your JSON-LD is injected by React, Vue, or another framework at runtime, it may not be visible to every system that matters. In Next.js environments, use the built-in metadata API or place JSON-LD in server components. In WordPress, use a plugin that outputs JSON-LD in the initial HTML response. The goal is that your structured data appears in a curl of your page — no JavaScript required.
Testing and Validating Your Structured Data
Google's Rich Results Test is necessary but insufficient. It validates that your markup is syntactically correct and eligible for rich results, but it does not tell you whether your structured data is complete enough for AI citation. A page can pass the Rich Results Test with a bare-minimum Article schema — headline and author name — while missing the dateModified, publisher, and mainEntityOfPage properties that AI systems rely on for citation decisions. Use the Rich Results Test as a baseline, then go further.
Schema.org's own validator catches errors that Google's tool ignores, particularly around property types and value formats. Google accepts some non-standard implementations because it has been doing so for years, but AI systems trained on Schema.org documentation may penalize deviations. Validate against both. The SEO score calculator includes structured data completeness as a scoring factor, giving you a more holistic view than syntax validation alone.
Monitor your structured data performance in Google Search Console's new Enhancement reports. GSC now breaks down structured data performance by type, showing you which schema types are generating impressions, which have errors, and which are being ignored. The 2026 update added entity recognition status — you can now see whether Google is connecting your Person or Organization schema to a known Knowledge Graph entity. If it is not, your author markup is not doing its job.
Automated testing should run on every deployment. Build structured data validation into your CI/CD pipeline. A single broken JSON-LD block — a missing comma, an unclosed bracket, an invalid URL in a sameAs property — can silently kill your structured data for days before anyone notices. Use tools like schema-dts for TypeScript projects, or json-ld.org's playground for manual testing. The meta tag analyzer can catch missing or malformed JSON-LD during routine page audits.
Common Mistakes That Kill Your AI Visibility
The most damaging structured data mistake is schema-content mismatch. Your JSON-LD says the page is an Article about "technical SEO" published in 2026 by a named author. But the page is actually a product listing with no visible author attribution and content from 2024. AI systems cross-reference structured data against page content, and mismatches erode trust in your entire domain's structured data. One bad page can reduce the citation weight of every other page on your site. A comprehensive SEO audit should always include structured data accuracy as a critical check.
The second killer is orphaned entity references. You declare an author with an @id of "https://yoursite.com/about#author" but that URL returns a 404, or it exists but contains no structured data that resolves the @id. AI systems follow these references. When the trail goes cold, the entity connection breaks, and your author markup becomes meaningless. Every @id you use in your JSON-LD must resolve to a page that contains corresponding structured data with matching @id values. This is entity linking 101, but we see it broken on over half the sites we audit.
Stale dateModified values are an epidemic. Sites that set dateModified once at publication and never update it are actively signaling to AI systems that their content is aging. If you update a blog post — adding new sections, refreshing statistics, correcting information — update the dateModified in your Article schema. If your CMS does not do this automatically, build a process for it. AI systems treat dateModified as a freshness signal with far more weight than crawl date or URL patterns. A 2024 article with a 2026 dateModified will outperform a 2026 article with no dateModified at all. The AI search optimization guide covers freshness signals in broader context.
Finally, schema scope creep. Adding every conceivable schema type to every page dilutes the signal. A blog post does not need Product schema. A product page does not need HowTo unless it genuinely contains instructional steps. Each schema type on your page is a claim about what that page contains. Make fewer claims and make them accurate. AI systems that encounter pages overloaded with loosely relevant schema types tend to discount all of them. Be precise. Be honest. Let your structured data reflect what your page actually is, not what you wish it were. Use our free schema markup generator to produce clean, type-appropriate markup for each page.
Frequently Asked Questions
Does structured data directly improve AI citations?
Yes. Content with schema markup has a 2.5x higher chance of appearing in AI-generated answers compared to unstructured content. Structured data gives LLMs and AI search engines machine-readable context about your content's topic, authorship, freshness, and entity relationships — all signals that influence citation selection in retrieval-augmented generation pipelines.
Which schema types were deprecated by Google in 2026?
Google deprecated seven schema types in January 2026: Sitelinks Searchbox, legacy Breadcrumb variants, Data-Vocabulary.org formats, QAPage with thin content, HowTo on non-instructional pages, Speakable in non-news contexts, and Review markup without genuine user reviews. Sites still using these risk losing rich results and may see reduced structured data trust signals across their domain.
What new schema types should I implement in 2026?
Three new schema additions matter most: sustainability claims markup for ESG reporting and green product data, AI content disclosure schema for transparency about AI-generated or AI-assisted content, and virtual events markup for hybrid and online event structured data. These align with Google's 2026 priorities around transparency, trust, and the growing hybrid events ecosystem.
Is FAQ schema still effective for AI search in 2026?
FAQ schema remains one of the most effective schema types for AI citations. Pages with FAQ schema are 2.3x more likely to be cited by AI systems because the question-answer format maps directly to how LLMs process and retrieve information. The key is substantive answers — three to five sentences minimum with specific facts — and specific questions that match long-tail queries rather than generic topics.
How does Speakable schema help with AI assistants?
Speakable schema identifies sections of your content best suited for text-to-speech playback by voice assistants. When Alexa, Google Assistant, or Siri need to read an answer aloud, they prioritize Speakable-marked content. While Google restricted Speakable rich results to news publishers in 2026, the content structuring principles — self-contained passages under 300 characters, complete sentences, front-loaded information — improve AI citability across all platforms.
What is the relationship between Product schema and agentic commerce?
Product schema with detailed Offer markup is the foundation of agentic commerce — AI agents that autonomously research, compare, and purchase products. Without machine-readable product data including pricing, availability, specifications, reviews, and return policies, AI purchasing agents cannot evaluate or recommend your products. Comprehensive Product schema is the entry requirement for visibility in agent-driven purchasing workflows.
How many schema types exist and what percentage of sites use them?
The Schema.org vocabulary now exceeds 800 types, but only 33% of websites implement structured data correctly despite its proven impact on search visibility and AI citations. Most sites use only basic Article or Organization schema, missing high-value types like FAQPage, HowTo, Speakable, and detailed Product markup. The gap between what is available and what is implemented represents a massive competitive opportunity for sites willing to invest in comprehensive structured data.
Get Your Structured Data Right
Structured data is the foundation of AI search visibility. If your schema markup is incomplete, outdated, or misapplied, you are losing citations to competitors who got it right. Start with a free audit using our schema markup generator and AIO readiness checker, or start a full optimization engagement to get expert implementation across your entire site.