Friday, November 25, 2011

Robots.txt


Now let as talk about robots.txt. What is it? and how can it be helpful for us?
Many of us know that there are  special  programs which usually visit  WWW and  review the content of servers'  directories of the websites. For organizing the principle of work of the robots program, in June 1994 was established the format of a file called robots.txt. (Remember the writing form of it: robots.txt, not Robots.TXT, not robot.txt.) This file should be located in server, in the top-level directory of the website. If your site has several subdomains, you must create seperated robots.txts for each directry.
Then, if we want to allow robots to review our site we need to place robots.txt in the root of our web site.  But we must know that robots can ignore it and robots.txt file is reachable for everyone, and we can not hide it. Instead of it many search engine crawlers and robots pay attention to robots.txt. (Please try to visit any site adding /robots.txt in the end of the address and you'll see the robots.txt of that site.)
And now let us learn of the Robots Exclusion Protocol. We can say the robots.txt contains three most important terms: User-agent, Disallow, Allow:
User-agent: *
Disallow:
This means all kind of robots and search engine crawlers can visit and explore our site. 
User-agent: *
Disallow: /
But the second example means that robots and search engine crawlers can not indexing or entering your site. 
User-agent: *
Disallow: /cgi-bin/
This third example indicate that robots and search engine crawlers can enter your site's pages  except for the cgi-bin file.
User-agent: Googlebot
Disallow:
According to this example all robots and search engine crawlers are not  allowed crawling our site, except for Googlebot, which is allowed.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.