A text file is known as robots.txt that you put on your website to tell the crawlers that which pages you would like them to visit and not to visit. Search engines obey the things that are asked not to do, but robots.txt is not mandatory for them. Robot.txt is not any type of firewall or password protection for search engines. It is also not preventing the search engine from crawling your website. If you really have any data which you do not want to show in search results then you never have to trust on robots.txt to keep it from being indexed and displayed in search results.
robots.txt must reside in the main directory. Search engines are only capable of discovering it in the main directory if it resides in any other place than search engines do not search it in the whole website and unable to find it. Search engines Firstly look it into the main directory and if it doesn’t exist then search engines assume that robots.txt file does not exist on the website. So, if robots.txt is not placed in the correct place then search engine displays everything they find.
Syntax of a robots.txt File
There are many search engines are present and many different files that you want to disallow. The syntax of the robots.txt file is as follows:-
User-agent: * Disallow: /
Search engine’s crawlers are written in the user-agent and the list of directories and files which you do not want to display or crawl are written in front of the Disallow.
You can also add a comment line by using hash (#) sign at the beginning of the line.
User-agent: * Disallow: /temp/
The above example shows that User-agent: * means it includes all the search engine’s crawlers and Disallow: /temp/ means that it disallows the file name temp to display.
Important Things for Best robots.txt of a WordPress Website
If you are dealing with WordPress then you want to display your pages and posts by the search engines but you do not want the search engines to crawl your core WordPress files and directories and also trackbacks and feeds. The contents of the robots.txt file vary from site to site differently. You must have to create a robots.txt file in the root directory of your website. There isn’t a standardized robots.txt file for WordPress but the following points give you a clear idea about the best robots.txt file of a WordPress website.
1. Things you should always block
There are some of the files and directories in the WordPress site which should be blocked every time. The directories which you should disallow in the robot.txt file are “cgi-bin” directory and the standard WP directories.Some servers don’t allow to access “cgi-bin” directory but you have to include it in your disallow directive in the robot.txt file and it won’t be harmful if you do that.
The standard WordPress directories that you should block are wp-admin, wp-content, wp-includes. These directories do not have any data that are initially useful for the search engines, but the exception is there i.e. a subdirectory named as “uploads” exists in the wp-content directory. This sub directory should be allowed in the robot.txt because it includes everything you upload using WP media upload feature. So, you must have to make it unblocked.
The directives used for above are given below:-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /xmlrpc.php Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /trackback/ Disallow: /feed/ Disallow: /comments/ Disallow: /category/ Disallow: /trackback/ Disallow: /feed/ Disallow: /comments/ Disallow: /*? Allow: /wp-content/uploads/
2. Things to block depending on your WP configuration
You must have to know about your WordPress site uses tags or categories to structure the content or uses both categories and tags or uses none of them. If you are using categories then you must have to block the tag archives from search engines and vice-versa. Firstly check the base, just go to Admin panel > Settings > Permalinks.
By default, the base is a tag, if the field is blank. You have to disallow tag in the robot.txt as given below:
If you are using category then you have to block category in the robot.txt as given below:
If you are using both categories and tags then don’t have to do anything in the robot.txt file.
If you are using neither tags nor categories then block both of them in the robot.txt as given below:
Disallow: /category/ Disallow: /tag/
3. Files to block separately
Different files are used in WordPress to display the content. These all files need not to available for the search engines. So you have to block them also. The different files mostly used for display the content are PHP files, JS files, INC files, CSS files.
You have to block them in the robot.txt as given below:
Disallow: /index.php # separate directive for the main script file of WP Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$
The “$” character matches the end of an URL string.
Keep in mind that it is recommended not to block the files that are not in the uploads directory.
4. Things not to block
There are many things which you do not want to block its depending upon your choice. Here I don’t want to block images from Google image search, so I have to allow it in the robot.txt file as given below:
User-agent: Googlebot-Image Disallow: Allow: / # It is not a standard use of this directive but Google prefers it
You can add the things that you do not want to block as written in the above example.