Keeping the engines away with robots.txt

When you don’t want search engines to crawl certain pages there are things you can do to prevent them from doing it. Using a robots.txt file is the easiest way to do this. Not all crawlers use them, but the majority of search engines do.
Robots.txt format
The concept is simple. You create a file named “robots.txt” in your root directory like in this example:
http://www.clixnetwork.com/robots.txt
The format of the file is simple also. Each line has its own record, and each record uses the format below. White space is allowed before, after, and between the field and value.
"field: value"
Comments
Comments are allowed in the file. Each comment begins with a # symbol. You can put comments on lines after commands, however it is not recommended as not all crawlers will interpret this correctly.
Disallow: article32.html #the article about robots.txt
User-Agent
This field tells which robot the following disallow feilds pertain to. You can use the engine’s robot name, or using a * tells all robots to listen up.
User-Agent *
Disallow
This field is what actually tells the crawlers what to not allow in the index. You start them out with a /, and follow with either beginning characters, a directory, or a filename.
Disallow /
The above command will tell the engines not to crawl anything on your site.
Disallow /fred
The above command will tell the engines not to crawl anything on your site that begins with /fred. For example the directory /fredrick/, /fred/, or files like /freddy.html.
Disallow /bob/
The above command will tell the engines to stay out of the /bob/ folder.
Disallow /contact.html
The above command will tell engines to stay off the contact.html page.






