<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Techie Corner &#187; web crawler</title>
	<atom:link href="http://www.techiecorner.com/tag/web-crawler/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.techiecorner.com</link>
	<description>The place for computer tips and tricks! microsoft windows, open source, database, programming, freeware and etc</description>
	<lastBuildDate>Wed, 08 Sep 2010 01:39:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>How to write robot.txt to control search engine spider</title>
		<link>http://www.techiecorner.com/18/write-robot-txt-to-control-search-engine-spider/</link>
		<comments>http://www.techiecorner.com/18/write-robot-txt-to-control-search-engine-spider/#comments</comments>
		<pubDate>Sat, 09 Sep 2006 16:53:28 +0000</pubDate>
		<dc:creator>chua</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[crawler]]></category>
		<category><![CDATA[robot.txt]]></category>
		<category><![CDATA[robots.txt]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine crawler]]></category>
		<category><![CDATA[search engine optimization]]></category>
		<category><![CDATA[search engine spider]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[web spider]]></category>

		<guid isPermaLink="false">http://www.techiecorner.com/18/how-to-write-robottxt-to-control-search-engine-spider/</guid>
		<description><![CDATA[What is Web Robot? A robot is a program that automatically traverses the Web&#8217;s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. (From: robotstxt.org) Web robot sometimes also call as web crawler, web spider, web wanderer. What robot do? Once your site got scan by robot, your site will [...]]]></description>
			<content:encoded><![CDATA[<p><strong>What is Web Robot?</strong><br />
A robot is a program that automatically traverses the Web&#8217;s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. (From: <a href="http://www.robotstxt.org/wc/faq.html#what">robotstxt.org</a>)<br />
Web robot sometimes also call as web crawler, web spider, web wanderer.<br />
<span id="more-18"></span><br />
<strong>What robot do?</strong><br />
Once your site got scan by robot, your site will probably get index by the search engine. Most of the time, these robots are program that written by search engine like Google, Yahoo, Alexa, MSN, etc.</p>
<p><strong>What is the use of robot.txt or robots.txt?</strong><br />
robot.txt or robots.txt (plural) is just a simple text file tat use to control how search engine spider or crawler should go thru your site and which spider is not allow to visit your site.</p>
<p><strong>Example of a robot.txt</strong></p>
<blockquote><p>
User-agent: Titan<br />
Disallow: /</p>
<p>User-agent: EmailCollector<br />
Disallow: /</p>
<p>User-agent: EmailSiphon<br />
Disallow: /</p>
<p>User-agent: EmailWolf<br />
Disallow: /</p>
<p>User-agent: ExtractorPro<br />
Disallow: / </p>
<p>User-agent: *<br />
Disallow:
</p></blockquote>
<p><strong>Where should i place my robot.txt / robots.txt?</strong><br />
Just place it at http://www.yourdomain.com/robots.txt</p>
<p><strong>What should i write in robot.txt to prevent robot to scan my site?</strong></p>
<blockquote><p>
User-agent: *<br />
Disallow: /
</p></blockquote>
<p class="simpletags">Technorati Tags: <a href="http://technorati.com/tag/robot.txt" rel="tag">robot.txt</a>, <a href="http://technorati.com/tag/robots.txt" rel="tag"> robots.txt</a>, <a href="http://technorati.com/tag/search+engine" rel="tag"> search engine</a>, <a href="http://technorati.com/tag/search+engine+crawler" rel="tag"> search engine crawler</a>, <a href="http://technorati.com/tag/search+engine+spider" rel="tag"> search engine spider</a>, <a href="http://technorati.com/tag/crawler" rel="tag"> crawler</a>, <a href="http://technorati.com/tag/spider" rel="tag"> spider</a>, <a href="http://technorati.com/tag/web+crawler" rel="tag"> web crawler</a>, <a href="http://technorati.com/tag/web+spider" rel="tag"> web spider</a>, <a href="http://technorati.com/tag/seo" rel="tag"> seo</a>, <a href="http://technorati.com/tag/search+engine+optimization" rel="tag"> search engine optimization</a></p>
<!-- link start -->
						<script type="text/javascript"><!--
						google_ad_client = "pub-9874157950618711";
						/* Tech HLink After Post */
						google_ad_slot = "4582592668";
						google_ad_width = 468;
						google_ad_height = 15;
						//-->
						</script>
						<script type="text/javascript"
						src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
						</script>
						<!-- link end -->]]></content:encoded>
			<wfw:commentRss>http://www.techiecorner.com/18/write-robot-txt-to-control-search-engine-spider/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (user agent is rejected)
Database Caching 7/19 queries in 0.131 seconds using disk
Object Caching 191/318 objects using disk

Served from: www.techiecorner.com @ 2010-09-11 03:03:44 -->