Prevent Search Engines from Caching Your Website

Previous topic - Next topic
QuoteTo stop Google and Bing from showing the "Cached" version of your page, add the `meta name="robots" content="noarchive"` tag to your HTML `head`.
To stop the Wayback Machine (Internet Archive), this tag is not enough. You must explicitly block the `ia_archiver` bot in your `robots.txt` file.

Most webmasters confuse "Indexing" (showing up in search) with "Caching" (storing a backup copy).
If you use `noindex`, you disappear from Google entirely.
If you use `noarchive`, you stay in Google, but they stop offering the "Cached" link (on Bing/Yandex) and limit snippet generation based on stored copies.
2026 Update: Google removed the public "Cached" link in 2024, but they still store cached copies for internal processing and AI snippets. Bing still maintains a public cache link. You must block both for full privacy.

Checklist
  • Access: You need ability to edit the `head` of your site or the `robots.txt` file.
  • Scope: Decide if you want to block caching for the whole site or just specific pages (e.g., paywalled content).
  • The Hidden Requirement: The "Internet Archive" Lag. Blocking the Wayback Machine via `robots.txt` prevents future snapshots. It does not automatically delete past snapshots. You must send a standard email to `[email protected]` listing your URL to scrub historical data.

Step-by-Step Guide

  • Step 1: The Meta Tag (Google/Bing)
    Add this line inside the `head` section of every page you want to protect. This works for Google, Bing, and Yahoo.
    <meta name="robots" content="noarchive">If you also want to stop AI from using your content for snippets, use:
    <meta name="robots" content="noarchive, max-snippet:0">
  • Step 2: The Wayback Machine Block (robots.txt)
    The Internet Archive uses a specific bot called `ia_archiver`. It ignores standard meta tags often but respects `robots.txt`.
    Open your `robots.txt` file (usually at `yourdomain.com/robots.txt`) and add:
    User-agent: ia_archiver
    Disallow: /
  • Step 3: Server-Side Block (Optional)
    If you cannot edit HTML (e.g., PDF files or Images), send the instruction via HTTP Headers in your `.htaccess` or Nginx config:
    X-Robots-Tag: noarchive

How It Works & Hidden Details

The "Snippet" Connection:
When a search engine caches your page, it uses that copy to generate the text snippet below your link in search results.
If you enforce `noarchive`, you are forcing the search engine to rely on live data or minimal data. This is why `noarchive` is often used by news sites with paywalls—to ensure users cannot read the content via the "Cache" or "Text Only" version without paying.

Google's "Cache" Removal (2024):
Since Google removed the user-facing "Cached" button, many SEOs think `noarchive` is useless. This is false. `noarchive` is still the only signal that tells Google: "Do not store a visual snapshot of this page." This is critical for preventing your content from being resurrected if your site goes offline temporarily.

Things to Watch Out For
  • Risk 1: The `noindex` Accident. Do not type `noindex` when you mean `noarchive`. `noindex` kills your traffic. `noarchive` just kills the backup copy.
  • Risk 2: SEO Tools. If you block `ia_archiver`, you also block some SEO tools (like Alexa legacy data) that relied on that bot. This is generally fine in 2026.

Frequently Asked Questions
  • Q: Will this stop AI bots (ChatGPT/Gemini)?
    A: No. To stop AI training, `noarchive` is not enough. You need `User-agent: GPTBot` (and others) in your `robots.txt` with `Disallow: /`.
  • Q: Can I remove a specific page from Google Cache immediately?
    A: Yes. Use the Google Search Console > Removals tool. Select "Clear Cached URL." This wipes the current cache, and your `noarchive` tag prevents it from coming back.

Update: Additional Details & Recent Changes

  • Stopping Google AI Overviews (SGE):
    The `noarchive` tag does not stop Google from summarizing your content in its new "AI Overviews" (formerly SGE) at the top of search results. To prevent your text from being used in these generative answers, you must use the `nosnippet` tag or the `data-nosnippet` HTML attribute on specific paragraphs.
    The distinction: `Google-Extended` (in robots.txt) stops AI training (Gemini models), but `nosnippet` stops AI display in Search.
  • The "Retroactive" Policy Shift:
    Historically, the Wayback Machine would automatically delete past snapshots if it detected a new `robots.txt` block. As of recent policy changes (to prevent censorship via domain parking), this "retroactive wipe" is no longer automatic. You must email `[email protected]` with your URL and mention "Oakland Archive Policy" to get historical data scrubbed; the `robots.txt` block generally only applies to future crawls.
  • Bing's "Cached" Button Lives On:
    While Google killed the public "Cached" link in 2024, Bing still maintains it (accessible via the small arrow/dots next to the URL). The `noarchive` tag is currently the only way to remove this button on Bing. Without it, users can still view your old/paywalled content via Bing Cache even if Google is clean.

Quote<meta name="robots" content="noarchive">If you also want to stop AI from using your content for snippets, use:
<meta name="robots" content="noarchive, max-snippet:0">
Update: To be fully effective against AI Overviews in 2026, the specific directive is `nosnippet`. `max-snippet:0` is a softer limit; `nosnippet` is the hard block for the AI generation engine.

Similar topics (2)