Have you noticed the Cached/Cached Page link in search result pages of Google (example) and Bing (example) when you look for your site’s indexed pages in these search engines? Search engines generally takes a snapshot of almost all accessible pages of your website and stores them in their cache as a backup copy.
When you click on the Cached link for a particular page you will always be taken to that version of the page which search engines have stored in their cache, instead of the current version:
This feature of various search engines can be considered useful when the original webpage is inaccessible due to any of the following reasons:
- Slow website due to too much traffic, network congestion, host/data center issues etc.
- Website is completely down.
- The webmaster have completely removed the page for his/her website.
However if you (the webmaster) don’t want search engines to cache your website’s pages, then you can do so with your site’s robots.txt file:
1. Open your site’s robots.txt file, which is generally present in the root directory of your domain name, using your host’s control panel (like File Manager in cPanel etc.) or FTP client (like FileZilla, WinSCP etc.) and enter following lines in it (see example):
Note: In case User-Agent: * line is already present in your site’s robots.txt file, then you need to enter only NoArchive: / below it.
2. Once you have successfully entered above two lines, save the robots.txt file. This will direct all search engines not to cache your site anymore.
- Search engines may take 4 to 8 weeks to completely remove cached copies of your webpages from their index. However I have personally noted that Bing takes hell lot of time (more than 8 weeks) to remove cached pages from its index.
- It completely depends on search engines and other services to honor your “don’t cache” directive. There are some nasty services on the web which don’t listen to anything!
- After all your site’s cached pages have been removed from search engines, your visitors won’t be able to access them anymore. This may result in increased advertising revenue and page views. Now it’s your moral responsibility to keep your website always up!
- Above directive will disable Text Only version for your site as well. Also your visitors won’t be able to see the source code of your site using webcache.googleusercontent.com.