What is the robots.txt file

According to ABLOGTOPHONE, a robots.txt file is a text file in which you can store which directories can and cannot be read by the search engines. The robots.txt file is very important for the crawlers, they first look for the file and read it out. Like instructions on what you can and cannot do.

In the robots.txt it is stored exactly which crawler is not allowed to search certain directories, subdirectories or only individual files and is blocked for him. So you can specify that the googlebot is allowed to search certain pages, but the bingbot is not, or vice versa.

What to do with the robots.txt file?

In order for the search engines to find the robots.txt, it must be in the main directory of the domain. If you save the file somewhere else, it will not be found by the search engines and will not be taken into account.

There can only be one robots.txt file per main domain.

Structure and content of a robots.txt file

The robots.txt file consists of two elements that always belong together. First you speak to the user agent and give his name. Below that comes the command with the name of the directory to be read or not.

You can store the sitemap.xml file in the robots.txt file and thus be sure that the crawler calls it at all.

The first thing to do is name the command that addresses the bot. User agent:

You can add an exact name after this or indicate with * that you want to address all bots.

A command like disallow : excludes all affected files.

The command like Allow: / includes all affected files.

Here is an example of a robots.txt file:

Example 1:

User agent: seobot

Disallow: / noseobot /

That means the user agent with the name “seobot” should not crawl the folder test.de/nosebot/ including subdirectories.

Example 2:

User agent: *

Allow: /

Example two shows that all user agents should access the entire website. This rule is unnecessary because the crawlers will automatically crawl everything if no other command prevents them.

Example3:

User agent: seobot

Disallow: / directory1 /

Disallow: / directory6 /

Here we have told the seobot that directories 1 and 6 are blocked for it and that it is not allowed to search through them.

Here is a selection of the most important user-agent names:

Crawler User agent
Google Googlebot
Bing Bingbot
Yahoo Slurp
MSN Msnbot

Can I blindly trust the robots.txt file?

The robots.txt file is only a help for the crawlers, it is not guaranteed that the websites will not be crawled. The robots.txt file does not protect against access by others either, you should always work with a password protection of the web server. Goolge and Bing both state that they are paying attention to the robots.txt file, but they are under no obligation to do so.

How do I call up the robots.txt file?

You can easily access the file in your browser. Simply enter your domain in the URL bar and then /robots.txt.

Example: www.yourdomainname.de/robots.txt

The file should only be accessible with the main domain. If you call up the file as follows: www.yourdomain.de/directory/robots.txt, a 404 error should appear here. However, if this entry takes you to the homepage of your website, check your redirects. While this will not result in an error, it is not correct. A page that cannot be reached should also be output as such.

What is the robots.txt file