Posted by Lilacor in Optimizing Site StructureJun 17th, 2009 | no responses
- The “robots.txt” file must always be named in lowercase, even if your site is hosted on a case-insensitive platform like Windows (e.g. “Robots.txt” or “robots.Txt” is incorrect).
- Wildcards are not supported in both the fields. “*” can only be used in the User-agent field command syntax to denote “all”.
- Googlebot is the only robot that now supports some wildcard file extensions, giving you the ability to exclude certain file types from indexing. For more information visit http://www.google.com/webmasters/
- Website functionality is not affected if your robots.txt is absent or empty. Though it does open access for all robots to crawl all areas and pages of your site.
- However, with some servers and some crawlers an absent “robots.txt” file can generate a 404 error and redirect the robot to your default 404 error page. The robot considers it to be your “robots.txt” file and its behavior will become unpredictable. We recommend you always have a “robots.txt” file.
- Only one “robots.txt” file can be maintained per domain and it must be placed in the root directory of your site, i.e. in the same directory where you keep your home page.
- Website owners who do not have administrative rights or write access to the root domain URL will probably not be able to use a “robots.txt” file. In such situations you may attempt to use the META Robots Tag (see the related comments above in this lesson).
- Separate lines are required for specifying access to different User-agents and the Disallow field of the “robots.txt” file should not carry more than one command in a line, though there is no limit to the number of lines.
- Both the User-agent and Disallow fields can be repeated with different commands any number of times. Blank lines will also not work within a single record set of both the commands.
- Use lowercase for all “robots.txt” file content (except where you need to provide a directory or file name in the uppercase on case-sensitive platform, e.g. Unix).
For more rules and guidelines on using Robots visit http://www.robotstxt.org/wc/norobots.html
Related posts:
- Lesson (19): META Robots Tag and “robots.txt” Robots There are two ways you can restrict a spider from crawling all or part of your site. First is by placing the META Robots tag within the “head” section of your HTML file (making it effective only for the pages where the tag is inserted). The second is to write a special instruction file [...]...
- 10 правила при използването на Robots.txt 1. Файлът “robots.txt” трябва винаги да е озаглавен с малки букви, дори ако сайтът ви се хоства върху case-insensitive платформа като Уиндоус. Файлове именувани “Robots.txt” или “robots.Txt” са погрешни и невалидни; 2. Wildcards не се поддържат изцяло. Знакът (*) може да бъде използван само в User-agent полето за команден синтаксис, за да обозначи команда валидна [...]...
- Robots.txt и ботът на Гугъл (Googlebot) Googlebot и MSNBot тагове Както вероятно си спомняте, ботовете на Гугъл и MSN се наричат съответно GoogleBot и MSNBot. Когато обхождат уеб страниците тези ботове търсят за наличието на МЕТА тагове наречени META GoogleBot и META MSNBot. Тези тагове са създадени с идеята да дадат възможност на уебмастъри, които нямат достъп до root directory на [...]...
- Lesson (17): Building the Right Site Architecture Let’s first define “site architecture”. In terms of SEO / SEM, it refers to the entire framework that supports your website content and thus defines the way search engine spiders index it. Site architecture consists of the navigation structure of your website, the page layout and the structure of various elements on your page, your [...]...
- META Robots таг Роботи Съществуват 2 начина, с които можете да ограничите ботовете на търсачките да обхождат сайта ви или отделни негови страници. Единият е да поставите т.нар. META Robots таг в HEAD секцията на своя сайт (действа само за тези страници, в чийто header е поставен), а вторият – посредством добавянето на специален файл с инструкции наречен [...]...