How to write robot.txt to control search engine spider


What is Web Robot?
A robot is a program that automatically traverses the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. (From: robotstxt.org)
Web robot sometimes also call as web crawler, web spider, web wanderer.

Advertisements

What robot do?
Once your site got scan by robot, your site will probably get index by the search engine. Most of the time, these robots are program that written by search engine like Google, Yahoo, Alexa, MSN, etc.

What is the use of robot.txt or robots.txt?
robot.txt or robots.txt (plural) is just a simple text file tat use to control how search engine spider or crawler should go thru your site and which spider is not allow to visit your site.

Example of a robot.txt

User-agent: Titan
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: *
Disallow:

Where should i place my robot.txt / robots.txt?
Just place it at http://www.yourdomain.com/robots.txt

What should i write in robot.txt to prevent robot to scan my site?

User-agent: *
Disallow: /

[tags]robot.txt, robots.txt, search engine, search engine crawler, search engine spider, crawler, spider, web crawler, web spider, seo, search engine optimization[/tags]




Share this with your friends:-

3 Responses to “How to write robot.txt to control search engine spider”

  1. Ozymandias says:

    How do I make spiders list my website?

  2. Brad says:

    Who do I prevent my admin controls from being spidered?

  3. […] should go thru your site and which spider is not allow to visit your site.Example of a robot.txthttp://www.techiecorner.com/18/write-robot-txt-to-control-search-engine-spider/Robot reporter &39to write news in future&39 UK news The GuardianIn a few years, newspapers could be […]

Leave a Reply