Important: This article applies to Web.com customers only.
About Web Robots
Search engines use web robots (crawlers, spiders) to index (crawl) websites, web pages, and information within their directories. If a website or page is not indexed, then the page will not appear in search results.
Web robots receive instructions from /robots.txt files on what websites or pages to index, which is known as The Robots Exclusion Protocol.
Prior to crawling a website, the robots check https://www.example.com/robots.txt to verify what pages, if any, are not supposed to be crawled and indexed.
- Web robots will know that they have access and can crawl the site if they see:
User-agent: * and Disallow:
- Web robots will know that they are not to access pages or sites if they see:
User-agent * and Disallow: /
The SITEMAP command is used to let search engines and robots know where a website’s sitemap is located. The sitemap lists all areas and pages accessible to robots. The complete robots.txt is similar to the following:
Note: Malware robots that scan for security vulnerabilities and robots used by spammers to collect email addresses can bypass or ignore /robots.txt files. These files are publicly available and allow people to view sections of websites that are not supposed to be crawled. Robots.txt files should not be used to try to hide information.
Save Your Robots.txt File
To ensure robots and crawlers from Google and other search engines can identify your robots.txt files correctly, you need to apply the following conventions to your file:
- 1. Save the robots.txt code as a text file.
- 2. Name the file as robots.txt. Save the file.
- 3. Place the file in your site’s highest level directory or in the root of your domain directory.
Correct example for your website page:
For more troubleshooting activities, SEO-related, on how to update your website, please refer to the Related Articles.