Robots.txt files allow you to inform search engine crawlers and bots what URLs of your website should or should not be accessed via the Robots Exclusion Protocol. You can use it in combination with a sitemap.xml file and Robots meta tags for more granular control over what parts of your website get crawled. The robots.txt file should be located at the root directory of the website.
Important: The rules in the robots.txt file rely on voluntary compliance of the crawlers and bots visiting your website. If you wish to fully block access to specific pages or files of your website or prevent specific bots from accessing your website, you should consider using an .htaccess file instead. Various examples on applying such restrictions are available in our How to use .htaccess files article.
Some applications, like Joomla, will come with a robots.txt file by default, while others, like WordPress, may generate the robots.txt file dynamically. Dynamically generated robots.txt files do not exist on the server, so editing them will depend on the specific software application you are using. For WordPress, you can use a plugin that handles the default robots.txt file or manually create a new robots.txt file.
You can create a robots.txt file for your website via the File Manager section of the hosting Control Panel. Alternatively, you can create a robots.txt file locally in a text editor of your choice, and after that, you can upload the file via an FTP client. You can find step-by-step instructions on how to set up the most popular FTP clients in the Uploading files category from our online documentation.
The rules in the robots.txt file are defined by directives. The following directives are supported for use in a robots.txt file:
The values for the directives are case-sensitive, so you need to make sure you enter them with the correct capitalization. For example, two different bots will be targeted if you use "Googlebot" and "GoogleBot" as User-Agents in your robots.txt file.
You can use the # (hashtag) character to add comments for better readability by humans. Characters after a # character will be considered a comment if the character is at the start of a line or after a properly defined directive followed by an interval. Examples of valid and invalid comments can be found below:
# This is comment on a new line.
User-agent: * # This is a comment after the User-agent directive.
Disallow: / # This is a comment after the Disallow directive.
// This is not a valid comment.
User-agent: Bot# This is not a valid comment as it is not separated with an interval after the User-agent. The matched User-Agent will be "Bot#" and not "Bot".
The robots.txt file should be in UTF-8 encoding. The Robots Exclusion Protocol requires crawlers to parse robots.txt files of at least 500 KiB. Since Google have set their bots to not crawl robots.txt files bigger than 500 KiB, you should try to keep the size of your robots.txt file below that size.
The content of the robots.txt file will depend on your website and the applications/scripts you are using on it. By default, all User-Agents are allowed to access all pages of your website unless there is a custom robots.txt file with other instructions.
You can find the default content of the robots.txt for Joomla in its official documentation. It looks like this:
User-agent: *
Disallow: /administrator/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/
The default WordPress robots.txt file has the following content:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://YOUR_DOMAIN.com/wp-sitemap.xml
You can find sample uses of robots.txt files listed below:
If there is no robots.txt file available for your website, the default rule for your website will be to allow all User-Agents to access all pages of your domain:
User-Agent: *
Allow: /
The same can be achieved by specifying an empty Disallow directive:
User-Agent: *
Disallow:
To disallow all robots access to your website, you can use the following content in your website's robots.txt file:
User-Agent: *
Disallow: /
You can inform bots to crawl your website with the exception of a specific directory or a file using this code block in your robots.txt file:
User-agent: *
Disallow: /directory/
User-agent: *
Disallow: /directory/file.html
If you would like to allow all robots except just one to crawl your website, you can use this code block in your website's robots.txt file:
User-agent: NotAllowedBot
Disallow: /
Make sure you replace the NotAllowedBot string with the name of the actual bot you want the rule to apply for.
To instruct all bots but one to not crawl the website, use this code block:
User-agent: AllowedBot
Allow: /
User-agent: *
Disallow: /
Make sure you replace the AllowedBot string with the name of the actual bot you want the rule to apply for.
Adding the following code block to your robots.txt file will instruct compliant robots to not crawl .pdf files:
User-agent: *
Disallow: /*.pdf$
The $ (dollar sign) character has a special meaning in robots.txt files - it defines the end of a matching value in the Allow/Disallow directive. To add a rule that takes effect for URLs that contain the $ character, you can use the following pattern in the Allow/Disallow directive:
User-agent: *
Disallow: /*$*
If you add the value without the ending wildcard character:
User-agent: *
Disallow: /*$
all bots will be prevented from accessing the full website, and this could seriously affect your website's SEO ranking.
All prices are in USD. No setup fees. Minimum contract period for shared hosting services - 12 months. Full prepayment for the contract period. 100-day money-back guarantee. No automatic renewal. Fees for domain registrations and SSL certificates cannot be refunded in case of an early contract termination.
ICDSoft 2001-2024 © All rights reserved
Terms of Use
|
Legal notice
|
Privacy
|
Reseller terms