How to write robot.txt to control search engine spider
What is Web Robot?
A robot is a program that automatically traverses the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. (From: robotstxt.org)
Web robot sometimes also call as web crawler, web spider, web wanderer.
What robot do?
Once your site got scan by robot, your site will probably get index by the search engine. Most of the time, these robots are program that written by search engine like Google, Yahoo, Alexa, MSN, etc.
What is the use of robot.txt or robots.txt?
robot.txt or robots.txt (plural) is just a simple text file tat use to control how search engine spider or crawler should go thru your site and which spider is not allow to visit your site.
Example of a robot.txt
User-agent: Titan
Disallow: /User-agent: EmailCollector
Disallow: /User-agent: EmailSiphon
Disallow: /User-agent: EmailWolf
Disallow: /User-agent: ExtractorPro
Disallow: /User-agent: *
Disallow:
Where should i place my robot.txt / robots.txt?
Just place it at http://www.yourdomain.com/robots.txt
What should i write in robot.txt to prevent robot to scan my site?
User-agent: *
Disallow: /
Posted at September 10th, 2006 by chua
If you think this article helps you to solve your problem and clear your headache, feel free to buy me a drink :)









May 29th, 2008 at 8:51 pm
[...] should go thru your site and which spider is not allow to visit your site.Example of a robot.txthttp://www.techiecorner.com/18/write-robot-txt-to-control-search-engine-spider/Robot reporter &39to write news in future&39 UK news The GuardianIn a few years, newspapers could be [...]