CityHost.UA
Help and support

What is robots.txt | How to configure robots.txt for WordPress

 7084
10.05.2019
article

 


robots.txt is a text file in UTF-8 encoding that tells search engines which pages of the site should be crawled and which should not be crawled . It is located in the root directory of the resource. How to view robots.txt of a website? For example, it can be viewed on this site at the address https://cityhost.ua/robots.txt.

Read also: Where is robots.txt in Wordpress?

Correctly configured robots.txt will tell the web spiders of Google and other systems that they do not need to be scanned:

  • pages with personal information of registered users (for example, customer accounts in online stores);
  • pages with internal search results on the site;
  • pages for entering the site control panel;
  • pages that cause duplicate content.

And if you are just getting into the basics of web mastering, you probably already understand that theoretical knowledge in this field should ideally be consolidated in practice right away. To do this, order inexpensive CityHost hosting with the latest version of PHP, MySQL databases and access via the SSH protocol, create a blog or online store on your favorite engine and hone your site administration skills on a real example.

Read also: How to install WordPress on hosting

How to create and configure robots.txt for WordPress

To create a correct robots.txt, you will need:

  • a text editor (for example, Notepad++, Atom or even the standard Windows Notepad);
  • FTP client (eg FileZilla, WinSCP or Cyberduck);
  • 10–15 minutes of free time.

Note. If you do not know how to work with FTP clients and your site is hosted on CityHost, you can add customized WordPress robots txt through the file manager in the hosting control panel.

First, open a text editor, create a new file and save it with the name robots and the extension .txt. It is important that all letters are written in lower case. Robots.txt, robots.TXT, or ROBOTS.txt options are incorrect.

Next, add the following code to the file and replace the link in the last line with the URL of your XML sitemap: [SCREEN SHOT]

User-agent: *

Disallow: /cgi-bin
Disallow: /?
Disallow: /wp-
Disallow: /wp/
Disallow: *?s=
Disallow: *&s=
Disallow: /search/
Disallow: /author/
Disallow: /users/
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: */embed
Disallow: */wlwmanifest.xml
Disallow: /xmlrpc.php
Disallow: *utm*=
Disallow: *openstat=
Allow: */uploads

Sitemap: https://example.com/sitemap.xml

We suggest you take a closer look at the syntax of this robots.txt example for WordPress:

  • User-agent — defines for which web spiders the rules below are written. The value * indicates that what is written further in robots.txt should be taken into account by all search engines.
  • Disallow - Tells web spiders which directories or files should not be crawled. For WordPress sites, it is recommended to close from scanning the pages of the authors' archives, search results, the entrance to the Engine Workshop, as well as the RSS feed, etc. This will protect the site from duplication of content and will prevent pages that do not belong there from being displayed by search engines.
  • Allow - Tells crawlers which directories or files to scan during a crawl. In our example, the uploads directory is open, which stores the images that this site contains.
  • Sitemap — indicates the sitemap. If there are two or more XML maps, write each of them in robots.txt with a separate line starting with Sitemap:.

The last step is to upload robots txt to the root directory of the site using an FTP client or a file manager in the hosting control panel. To verify that the download was successful, go to http://example.com/robots.txt, replacing example.com with your site's domain . If everything is done correctly, you will see a page displaying the above code.

By the way, what will happen if you apply disallow all in robots.txt?

If you want to prevent all search engines from crawling the site, use the following contents of the robots.txt file:

User-agent: *
Disallow: /

This configuration can negatively affect SEO, as search engines will not be able to index the pages. Use it carefully, for example, for sites in the development stage.

Read also: What is WHOIS, What It Is Used For and How to Check a Domain

How to check robots.txt for errors

You learned how to configure robots.txt for WordPress, and the next step is to check that it is compiled correctly. To do this, do the following:

  1. Open the "robots.txt file checker" found in the old version of Google Search Console ("Scan" menu item).
  2. Copy the contents of the robots.txt file and paste it into the edit window.
  3. Make sure that the number of errors and warnings in the lower left corner of the editing window is zero. If there are problems with the syntax of the file, this line will show the number of errors or recommendations, and a red or orange icon will appear to the left of the editor line with incorrect information. Hover over it to see a description of the error.

If you see the line "Errors: 0, Warnings: 0" under the "Robots.txt Checker" edit window, notify the search engine of changes to robots.txt. To do this, click the "Send" button in the lower right corner of the editor. Confirm sending the file update request by clicking the "Send" button next to option #3 in the window that appears.

Read also : What are key phrases and how to choose them .

Why is it important to configure robots.txt correctly

In January 2017, Good Corporation Quality Analyst Gary Yesh published an article on Google's official blog for webmasters titled What Crawling Budget Means for Googlebot. In it, he noted that if a web spider encounters low-quality pages or pages that duplicate the content of another page while browsing the site, the speed and frequency of scanning will decrease. The negative consequence of this is that after adding new content to your site, it will not appear in the edition soon.

Correctly configured robots txt for WordPress prevents search engines from crawling duplicates and pages that are of no value to visitors. And, taking this into account, it is no less important an element of the technical optimization of the site than, for example, a correctly compiled and automatically updated site map or enabled gzip-compression.

Another characteristic of the site, which has a positive effect on both search engine optimization and the level of visitor satisfaction, is the high speed of loading web pages. Renting a dedicated server can help with this, or a cheaper option is a virtual server. There will be enough resources to make an online store, online portal or blog fast, reliable and secure.

Was the publication informative? Then share it on social networks and join our Telegram channel. We remind you that the hosting company CityHost provides inexpensive hosting services for sites of any complexity. For technical questions, contact us via online chat or by phone ?? 0 800 219 220.


Like the article? Tell your friends about it:

Author: Bohdana Haivoronska

Journalist (since 2003), IT copywriter (since 2013), content marketer at Cityhost.ua. Specializes in articles about technology, creation and promotion of sites.