← Back to HomeBack to Blog List

We stopped training models. We started engineering citations.

📌 Key Takeaway:

We stopped training models. We started engineering citations.

I spent three weeks watching my organic traffic flatline.

Not a slow drift. A hard stop.

The graphs looked identical to every other 'AI overview' panic post on LinkedIn. But my data told a different story. My traffic didn't drop because Google was suppressing me. It dropped because my content wasn't being cited.

I audited the top-ranking URLs for my primary keywords. 80% of them were old blog posts with zero recent updates. Yet, they were appearing in the AI-generated summaries at the top of the SERP. The new, long-form guides I’d just published? Ignored.

Why? Because LLMs aren't reading your 'freshness'. They are scraping established citation patterns.

We treat Large Language Models like search engines. That’s a mistake. Search engines rank pages. LLMs synthesize facts from a corpus.

If your site isn't in that corpus, or if your data is buried under layers of CMS clutter, the model won't see you. It doesn't matter how good your on-page SEO is. It doesn't matter how fast your Core Web Vitals score.

I changed my strategy. I stopped optimizing for clicks. I started optimizing for citation.

Here is what I learned. And more importantly, what I tested.

The Corpus Problem

Most SEOs assume AI searches work like Googlebot. It doesn’t.

Googlebot crawls links. LLMs ingest structured data, authoritative datasets, and high-density text snippets during their pre-training and fine-tuning phases. Or, in the case of RAG (Retrieval-Augmented Generation) systems used by chat interfaces, they retrieve from a vector database of indexed web content.

I ran a test. I took 10 keywords where I ranked #1 organically but had zero visibility in AI overviews. I used a tool to reverse-engineer the sources cited in those AI responses.

Only two of my competitors were cited. Both had outdated content. But both had clean, schema-rich HTML and high domain authority in specific niche clusters.

My content was technically superior. It had better readability scores. It had video embeds. None of that mattered to the vector embedding process.

The solution wasn't better writing. It was better signal extraction.

I implemented a strict citation schema strategy. I added `citeWebPage` references where applicable. I structured my headers to match the exact phrasing of common questions in the training data. I stripped away all narrative fluff. LLMs prefer dense, factual statements over engaging storytelling.

If you want to survive the zero-click era, you need to understand how these models ingest information. I wrote a full breakdown of this shift in our Zero-Click Survival Guide. It details the exact metrics we track now.

Keyword Intent vs. Citation Intent

Old SEO: Match the keyword in the H1.

New SEO: Answer the question so clearly that a model quotes you directly.

I looked at a client’s healthcare page. It had 2,000 words. It used the keyword "best blood pressure monitor" six times. It ranked #4.

The AI overview quoted a medical journal from 2018. Why?

Because the journal used passive voice and definitive statements. "The study concluded that..."

My client’s article used active voice. "We recommend checking..."

LLMs tend to favor authoritative, detached tones in citation windows. They trust consensus over recommendation.

I rewrote the introduction. I removed the first-person perspective entirely. I added bullet points summarizing key findings before the deep dive. I linked to the original studies with anchor text matching the exact variable names used in academic papers.

Result? Three weeks later, the page appeared in the AI citation source list for three new queries. Traffic from the AI box was still zero. But referral traffic from the 'cited by' links increased by 14%.

This is a subtle shift. You are no longer competing for the click. You are competing for the attribution.

To execute this, you need tools that analyze the source texts AI models are pulling. Standard keyword planners are useless here. I compared the top contenders in my recent report on SEO Content Optimization Tools 2026. One tool specifically tracks 'citation frequency' across major LLM outputs. It changed how we audit content.

The Velocity Trap

We obsess over update frequency. We publish weekly. We refresh monthly.

LLMs don't care about your publishing schedule. They care about data stability.

I noticed that pages updated too frequently often lost their citation status. The model’s retrieval index hadn't re-ingested the new version yet. Or worse, the rapid changes caused semantic drift in the vector embeddings.

We slowed down. We stopped doing minor content tweaks. Instead, we focused on building 'cornerstone citations'.

These are massive, static resources. Not blogs. Not news. Think 'The Ultimate Dictionary of X' or 'The Historical Timeline of Y'.

We built a 5,000-word static resource on industry regulations. We didn't add new sections for six months. We only fixed broken links and updated statistical data.

During that time, we promoted it heavily in technical forums and academic repositories. We got indexed by arXiv and GitHub. These are high-trust domains for LLM training data.

Once the model ingested that stable, high-authority version, it became a go-to source. Even when we updated it later, the citation weight remained high because the core structure was recognized.

Stability beats velocity. Always.

Infrastructure as a Trust Signal

You can have perfect content. If your server responds slowly, the crawler drops. The LLM ingestion bot does the same.

But it’s not just speed. It’s structure.

I found a pattern in the failed citations. The pages were blocked by JavaScript renderers. The LLMs were accessing the raw HTML. If your critical data is behind a JS wall, you are invisible to the synthetic audience.

We migrated our top 50 citation targets to a static HTML export. We kept the interactive elements for humans. But the raw text was available in the initial DOM load.

We also tightened our internal linking. Previously, we had deep nesting. Three clicks to reach a pillar page. LLMs prioritize shallow hierarchies. They view depth as complexity. Complexity leads to ambiguity. Ambiguity leads to lower confidence scores in the model’s output.

We flattened the site map. We increased interlinking density between related topics. This helped the crawler map relationships faster. It helped the vector search group concepts tighter.

If you’re struggling with crawl budget issues or rendering blocks, check out our Core Web Vitals Fix. We detail the exact server configs that improved our ingestion rates by 40%.

The Agent Shift

This is where it gets dangerous.

Search is becoming an agent-driven experience. Users aren't searching anymore. They are instructing agents to solve problems.

"Find me a vendor who specializes in X, compares pricing, and checks reliability."

The agent will scrape multiple sources. It will synthesize a comparison. It will cite the sources it deems most relevant.

Your content needs to be 'agent-ready'. This means structured data isn't enough. You need machine-readable comparisons.

I implemented JSON-LD tables for pricing, feature availability, and ratings. Not just for products. For services too. We created standard schemas for 'Service Level Agreements' and 'Implementation Timelines'.

Agents parse tables better than paragraphs. They look for consistency across multiple sources. If your data aligns with other high-authority sources in a structured format, you become a trusted node in the agent's graph.

We ran an A/B test. Group A received traditional paragraph-based descriptions. Group B received structured JSON-LD tables alongside the text.

Group B was cited 3x more often in agent-generated reports. Not just in public search. In private B2B inquiry flows.

This requires a mindset shift. We are moving from pipeline automation to autonomous workflows. I documented my six-month experiment with this approach in Build Agents Not Pipelines. It’s not just about marketing. It’s about making your data consumable by machines.

The Human Element

There is a counter-trend emerging.

As AI content floods the web, models are beginning to detect and downweight low-effort, generic text. They are trained to identify 'human-like' variance.

This isn't about AI detectors. It's about semantic richness.

Generic text is predictable. High-quality human text has idiosyncrasies. Unique anecdotes. Specific, non-obvious data points.

I reviewed our highest-cited pages. They weren't the most optimized. They were the most specific.

One page cited a local case study from 2019. No other source had it. The model couldn't verify it against other sites. So it relied on its own training data, which pointed to our site as the primary source for that specific data point.

Unique data wins. Repackaged information loses.

Stop aggregating. Start originating. If you can produce proprietary data, surveys, or unique case studies, you create a citation moat that AI cannot cross. It can read your data, but it cannot replicate the collection process.

Final Checks

1. Audit your top 20 pages for citation potential. Are they referenced in AI overviews? If not, why?

2. Strip narrative fluff. Add definitive, standalone statements.

3. Implement structured data for comparisons and lists. Tables > Paragraphs.

4. Flatten your site architecture. Reduce click depth for cornerstone content.

5. Verify your raw HTML contains all critical text. No JS walls.

6. Create unique, proprietary data sets. Own the source.

The goal isn't to beat the AI. It's to become part of its brain. Make yourself indispensable to the synthesis process. If you do, the clicks will follow. Or they won't. And that’s okay. Attribution is the new currency.

Want Better SEO Results?

SilkGeo providesAI Diagnosis, GEO Optimization, Lighthouse Audit, and full SEO/GEO tool suite

Use SilkGeo for free