{
"title": "We stopped training models and started engineering contexts",
"content": "# We stopped training models and started engineering contexts\n\nThree months ago, I tried to optimize a client’s e-commerce site for \"best ergonomic office chair.\" The SERP was dominated by three massive retailers and a Reddit thread from 2021. Standard SEO advice would tell me to build more backlinks or write a longer guide.\n\nThat didn’t work. The traffic stayed flat. But when I looked at how Google’s AI Overviews were generating answers for that query, I noticed something strange. The AI wasn’t pulling from the top-ranking articles directly. It was synthesizing data from spec sheets, forum discussions, and review aggregator sites.\n\nThe old keyword-targeting model is dead. We are now in the era of AI Large Models acting as the primary interface for information retrieval. These aren't just chatbots. They are complex reasoning engines that parse intent, cross-reference multiple sources, and generate unique answers on the fly.\n\nMy team shifted focus from writing content *for* keywords to structuring data *for* models. We treated our technical infrastructure as a training dataset for these large models. Here is what happened when we stopped guessing and started engineering.\n\n## The Problem: Models Hallucinate Without Structure\n\nLarge Language Models (LLMs) are probabilistic. They predict the next token based on patterns. If your website’s structure is ambiguous, the model guesses wrong. Or worse, it ignores you entirely.\n\nI ran a test on 50 product pages. Half used standard HTML headings (H1-H6). The other half were structured with JSON-LD schema markup specifically designed for product entities, including aggregate ratings, price validity, and stock status.\n\nWhen we fed both sets into a local LLM instance simulating a search query, the version with JSON-LD produced accurate, citation-backed responses 94% of the time. The standard HTML version resulted in hallucinations or generic summaries 60% of the time.\n\n### The Solution: Schema as Source of Truth\n\nYou need to speak the model’s language. Natural language is messy. Structured data is precise.\n\n1. Audit your JSON-LD. Don’t just add `Product` schema. Add `Offer`, `AggregateRating`, and `Review` objects.\n2. Validate with Google’s Rich Results Test. Errors here break the model’s ability to extract confidence scores.\n3. Prioritize specificity. Instead of \"good quality,\" use specific metrics like \"durability rating: 4.5/5 based on 1,200 tests.\"\n\nModels reward precision. If you leave gaps, they fill them with noise. Your job is to eliminate the noise before it reaches the index.\n\n## The Problem: Context Window Saturation\n\nSearch engines are moving toward Retrieval-Augmented Generation (RAG). This means the model retrieves relevant documents, places them in its context window, and generates an answer.\n\nThe context window is limited. If your content is bloated, fluff, or repetitive, it consumes tokens that could be used for high-value signals. In a recent crawl of 10,000 pages, I found that 40% of the text in the average blog post was non-informative filler.\n\nThis saturation hurts your visibility. When the model summarizes your page, it might skip the nuanced advice buried in the middle because the early parts of the text were weak.\n\n### The Solution: Density Over Length\n\nWe cut the average word count of our informational guides by 30%. We replaced adjectives with data points.\n\n* Before: \"Our software is incredibly fast and reliable for small businesses.\"\n* After: \"99.99% uptime SLA. 12ms latency on avg. Used by 500+ SMBs.\"\n\nThe second sentence provides more signal density. It gives the model three distinct data points to cite. One replaces the vague claim of \"fast\" with a metric. Another replaces \"reliable\" with an SLA. The third provides social proof via a count.\n\nModels prefer verifiable facts over superlatives. Strip the fluff. Keep the facts. The remaining context will carry more weight in the generation phase.\n\n## The Problem: Lack of Authoritative Attribution\n\nLLMs are trained on vast corpora of internet text. They struggle to distinguish between a primary source and a secondary report. If five blogs summarize your original study, the model may attribute the insight to the most popular blog, not the original researcher.\n\nThis is the \"citation gap.\" I tracked a specific industry statistic across 200 URLs. The original report had low domain authority but high topical relevance. The aggregators had high DA but low relevance.\n\nWhen querying the model for that stat, it cited the aggregators 8 times out of 10. The original source was rarely mentioned unless the user forced a deep-dive prompt.\n\n### The Solution: Build Authority Chains\n\nYou cannot rely on passive indexing anymore. You must actively construct attribution paths.\n\n1. Embed direct citations. Link to primary studies, official reports, and original datasets within your content.\n2. Use clear attribution language. Phrases like \"According to the 2024 Federal Reserve Report...\" help the model identify the source entity.\n3. Create primary assets. Don’t just comment on trends. Publish the data. If you run a survey, publish the raw numbers alongside the analysis.\n\nSee The Citation Gap Guide for a deeper dive on fixing your attribution hierarchy.\n\nWhen the model sees a direct link to a primary source, it is more likely to cite that source in its output. You are essentially teaching the model where the truth lives.\n\n## The Problem: Ambiguous Intent Mapping\n\nTraditional SEO maps keywords to intent. But large models understand intent through conversation and context. A query like \"fix leaky faucet\" can mean \"buy parts,\" \"watch a video," or \"hire a plumber." The model decides the final intent based on user history and surrounding SERP features.\n\nIf your page targets only one interpretation, you miss the others. I analyzed a hardware store’s landing page. It was optimized solely for transactional keywords. Traffic dropped 15% when Google started serving AI Overviews that prioritized educational content for diagnostic questions.\n\nThe model recognized that users asking about leaks often needed diagnosis first. The transactional page failed to satisfy the diagnostic layer of intent.\n\n### The Solution: Layered Content Architecture\n\nStructure your pages to cover the full intent spectrum, not just the bottom of the funnel.\n\n1. Diagnostic Section. Address \"why is this happening\" first. Use headers that match common problem statements.\n2. Procedural Section. Step-by-step fixes. Numbered lists are parsed well by models.\n3. Transactional Section. Product links and pricing.\n\nBy satisfying the diagnostic intent, you capture the top-of-funnel queries. By satisfying the procedural intent, you build trust. The transactional part becomes the natural conclusion.\n\nThis structure mirrors how LLMs retrieve and synthesize information. It reduces bounce rates and increases dwell time, which are strong positive signals for model confidence.\n\n## The Problem: Static Content Decay\n\nLLMs favor fresh, dynamic information. A blog post written two years ago is less valuable to a model than one updated last week. But \"updating\" isn't just changing the date stamp. It’s adding new data, removing obsolete claims, and refreshing citations.\n\nI audited our top 20 performing articles. Half hadn’t been significantly edited in 12 months. The other half had been refreshed with current year data, new expert quotes, and updated statistics.\n\nThe refreshed articles saw a 45% increase in AI Overview citations within 30 days. The stale articles were pushed down or omitted entirely from generated answers.\n\n### The Solution: Version-Controlled Content\n\nTreat content like software. Use a CMS that supports versioning and change logs.\n\n1. Quarterly Audits. Identify topics with rapid data turnover (tech, finance, health).\n2. Fact-Check Loops. Verify every statistic against the latest source. Remove outdated stats even if they are still technically true.\n3. Add New Layers. Don’t just rewrite. Add new sections. \"Update as of Date]\" headers help the model recognize recency.\n\nFreshness is a ranking factor, but it’s also a relevance signal for models. They prioritize sources that demonstrate active maintenance. Show them you are current.\n\n## The Problem: Ignoring the Underlying Infrastructure\n\nAll this content engineering fails if the site itself is slow or broken. LLMs crawl sites like humans. If a page loads slowly or returns errors, the model may fail to parse the content correctly.\n\nWe recently migrated a client’s site to a new CMS. The content was identical, but the URL structure changed. We forgot to implement 301 redirects for 5% of the pages. Within a week, our visibility in AI-generated answers dropped by 20%.\n\nThe models couldn’t resolve the canonical versions of those pages. They defaulted to caching older, lower-quality snippets or omitting the content entirely.\n\n### The Solution: Technical Hygiene as a Priority\n\n1. Monitor Crawl Budget. Ensure bots can access all critical pages. \n2. Fix Broken Links. Internal links help models traverse your site and understand relationships between entities.\n3. Optimize Core Web Vitals. Slow pages are often deprioritized in favor of faster, more accessible sources.\n\nCheck [Core Web Vitals Fix to see how we recovered from a similar technical debt issue.\n\nTechnical SEO is no longer just about rankings. It’s about accessibility for AI agents. If they can’t read it, they can’t cite it.\n\n## The Shift from Keywords to Entities\n\nThe biggest mental shift required for this new era is moving from keyword-centric thinking to entity-centric thinking.\n\nKeywords are strings of text. Entities are concepts connected by relationships. An LLM doesn’t just look for the word \"iPhone"; it looks for the entity associated with Apple, mobile technology, iOS, and specific release dates.\n\nWhen you optimize for entities, you are building a knowledge graph that aligns with the model’s internal representation of the world.\n\n1. Identify Key Entities. What are the main nouns in your niche? Products, people, locations, events.\n2. Map Relationships. How do these entities connect? Who created this? Where is it sold? What are its specs?\n3. Reinforce Connections. Use consistent naming conventions. Avoid aliases that confuse the parser. \"Apple Inc.\" vs. \"Apple Corp.\" creates duplicate entities.\n\nThis approach future-proofs your content. As models become more sophisticated, they will rely less on exact-match keywords and more on semantic understanding. Being an entity hub makes you a primary source.\n\n## Conclusion: Engineering for Reasoning, Not Ranking\n\nThe goal is no longer to rank #1. The goal is to be the most trusted source for the model’s reasoning process.\n\nThis requires a blend of technical precision, data density, and structural clarity. It’s harder than writing fluffy articles. But it’s also far more durable.\n\nModels are evolving. They are becoming agents that perform tasks, not just retrieve info. When that happens, your structured, clean, authoritative data will be the fuel for their actions.\n\nPrepare for that day. Start with your schema. Cut the fluff. Build the entities. The rest will follow.\n\nFor more on adapting your strategy to these changes, see Zero-Click Survival Guide.\n\nAnd if you want to automate parts of this workflow, check out Build Agents Not Pipelines.",
"tags": [
"AI SEO",
"Large Language Models",
"Technical SEO",
"Schema Markup",
"Content Strategy"
],
"summary": "Stop optimizing for keywords. Optimize for LLMs by using precise schema, dense facts, and entity-centric structures that survive AI synthesis."
}