We’re excited to announce the new Search Console robots.txt report, available from the settings page. We’re also making relevant information available from the page indexing report. As part of this update, we’re sunsetting the robots.txt tester. Check the updated help center page.
Google Search Console recently added a new robots.txt file in the search console When you log in to the search console, go to settings and open crawling reports. Show the robots.txt report. Google found the top 20 hosts on your site, the last time they were crawled, and any warnings or errors encountered.
Robots.txt file
A robots.txt file is a text file used by websites to instruct web robots on which pages they can crawl and index. It is a standard for robots exclusion and is located in the root directory of a website. The file consists of a set of rules, each of which specifies a user agent (a type of web robot) and a path to a directory or file on the website. The user agent is told whether it is allowed to crawl the specified path.
For example, the following robots.txt file tells all web robots that they are not allowed to crawl the /images/ directory or the /admin/ directory:
User-agent: *
Disallow: /images/
Disallow: /admin/
The asterisk (*) in the User-agent directive means that the rule applies to all user agents. The Disallow directive tells the user agent that it is not allowed to crawl the specified path.
Robots.txt files are not legally enforceable, but most web robots will respect them. This is because it is in the interests of both the website owner and the web robot to follow the robots.txt file. The website owner wants the web robot to crawl their site so that their content can be indexed and appear in search results. The web robot wants to crawl the site efficiently and avoid wasting resources on pages that it is not allowed to crawl.
Here are some of the benefits of using a robots.txt file:
Prevents search engines from indexing private or sensitive content. For example, you might use a robots.txt file to prevent a search engine from indexing pages that contain login information.
It prevents search engines from crawling pages that are not important for search results. For example, you might use a robots.txt file to prevent a search engine from crawling pages that are under construction or that contain duplicate content.
Reduces the load on your web server. By preventing search engines from crawling pages that you do not want to be indexed, you can reduce the load on your web server.