A Robots.txt file helps in blocking search engine bots (Google, Bing, Yandex etc.) from crawling hyperlinks and their respective page content, that you do not want to include in search results OR which may result in duplicate content issues.



If you are using Pretty URLs Mod with SMF, then you may definitely want to upload a robots.txt file in the root directory of your Domain Name. You can include following lines in it:

User-agent: *
Disallow: /*?*
Disallow: /*msg*
Disallow: /profile/
Disallow: /help/
Disallow: /search/
Disallow: /search2/
Disallow: /activate/
Disallow: /stats/
Disallow: /admin/
Disallow: /trackip/
Disallow: /Themes/
Disallow: /login/
Disallow: /register/
Disallow: /pm/
Disallow: /logout/
Disallow: /reminder/

User-agent: Googlebot-Image
Disallow: /

User-agent: YandexImages
Disallow: /

User-agent: msnbot-media
Disallow: /

User-agent: MSNBOT_Mobile
Allow: /*wap
Disallow: /

User-agent: Googlebot-Mobile
Allow: /*wap
Disallow: /

User-agent: MediaPartners-Google
Allow: /

Important Notes:

1. No bot can access following URL sets after you upload a robots.txt file with above lines in it:

  • /?sort=<any command here> like /?sort=starter/?sort=replies/?sort=views and /?sort=last_post
  • Any URL with PHPSESSID in it i.e. /?PHPSESSID
  • /?action=<any command here> like /?action=printpage/?action=notify/?action=reporttm etc.
  • /msg5/
  • ?type=rss – Prevent your forum’s RSS Feed from getting indexed.
  • Anything related to profile of Administrators, Moderators and other forum members.
  • Whole Admin Area i.e. /admin/
  • Help, Search (including Advanced Search), Stats, Track IP, and Reminder Pages.
  • Login, Logout, Register and Activate Pages
  • Anything related to Private Messaging Page
  • Theme Directory

2. We have disallowed Image bots from accessing your forum completely, as this can help you save bandwidth. But, if you want all your images to be crawled, then remove following lines:

User-agent: Googlebot-Image
Disallow: /

User-agent: YandexImages
Disallow: /

User-agent: msnbot-media
Disallow: /

3. Also. we have completely disallowed mobile crawlers from accessing your forum using hyperlinks that are meant for a Computer Browser. But, these bots can fully access the WAP Section of your forum. Disallowing Mobile Bots is quite useful, as because same content will be served from 2 different URLs.

4. MediaPartners-Google, which is AdSense bot, can fully access your forum.

5. If you want to disallow or allow any other Bot, configure your robots.txt file accordingly using these guidelines. Also, above guide is applicable only for Good Bots, not rogue bots!