Robots.txt: Everything You Need To Know Part Two
To allow all crawlers and user agents to index, at least the standard command: "user-agent: *" should be included.
If specific releases are to be made per user agent, the details must also be stored in formulated form for each crawler. As in the example above, individual user agents can be specifically addressed to keep individual directories and areas out of their indexes or to include them in a targeted manner.
Examples of Typically Addressed User Agents:
- Google: Googlebot
- Google Images: Googlebot image
- Bing: Bingbot
- Yahoo: Slurp
- Baidu: Baiduspider
- DuckDuckGo: DuckDuckBot
An extensive list of other crawlers can be found online.
Instructions: Allow and Disallow
Via the code line "disallow: / example-page /," the addressed user agent receives the instruction to skip the directory or specific web pages during indexing. This instruction always only applies to the crawler addressed in "useragent:"! The same applies to the explicit command "allow: / for example /."
"Allow" can also explicitly allow specific media or directory strings, even if main paths have been excluded with "disallow."