Що таке robots.txt

robots.txt — текстовий файл у кодуванні UTF-8, який підказує пошуковим роботам, які сторінки сайту варто сканувати, а які — ні. Розташовується в кореневій директорії ресурсу. Наприклад, на цьому сайті його можна подивитися за адресою https://cityhost.ua/robots.txt.

Як створити й налаштувати robots.txt для WordPress

Щоби створити коректний robots.txt, знадобиться: текстовий редактор (наприклад, Notepad++, Atom або навіть стандартний Віндоус Блокнот); FTP-клієнт (наприклад, FileZilla, WinSCP або Cyberduck); 10–15 хвилин вільного часу. Примітка. Якщо не вмієте працювати з FTP-клієнтами і ваш сайт розміщений на СітіХост, додати налаштований WordPress robots txt зможете через файловий менеджер у панелі управління хостингом.

Як перевірити robots.txt на помилки

Ви дізналися як налаштувати robots.txt для WordPress і наступний крок — перевірка на правильність його складання. Для цього виконайте такі дії: Відкрийте «Інструмент перевірки файлу robots.txt», який знайдете в старій версії Google Search Console (пункт меню «Сканування»). Скопіюйте вміст файлу robots.txt і вставте його у вікно редагування. Переконайтеся, що в нижньому лівому кутку вікна редагування кількість помилок і попереджень дорівнює нулю. Якщо проблеми із синтаксисом файлу мають місце, у цьому рядку буде вказано кількість помилок або рекомендацій, а зліва від рядка редактора з некоректною інформацією відобразиться червоний або помаранчевий значок. Навівши на нього курсор, ви побачите опис помилки.

Чому важливо правильно налаштувати robots.txt

У січні 2017 року аналітик відділу якості «корпорації добра» Гері Ієш опублікував в офіційному блозі Google для веб-майстрів статтю «Що означає краулінговий бюджет для Googlebot». У ній він зазначив що, у разі, якщо веб-павук під час обходу сайту зустрічатиме неякісні сторінки або такі, що дублюють зміст іншої сторінки, швидкість і частота сканування зменшиться. Негативний наслідок цього в тому, що після додавання на ваш сайт нового контенту, він з’явиться у видачі нескоро.

What is robots.txt | How to configure robots.txt for WordPress

How to create and configure robots.txt for WordPress
How to check robots.txt for errors
Why is it important to configure robots.txt correctly

robots.txt is a text file in UTF-8 encoding that tells search engines which pages of the site should be crawled and which should not be crawled . It is located in the root directory of the resource. How to view robots.txt of a website? For example, it can be viewed on this site at the address https://cityhost.ua/robots.txt.

Read also: Where is robots.txt in Wordpress?

Correctly configured robots.txt will tell the web spiders of Google and other systems that they do not need to be scanned:

pages with personal information of registered users (for example, customer accounts in online stores);
pages with internal search results on the site;
pages for entering the site control panel;
pages that cause duplicate content.

And if you are just getting into the basics of web mastering, you probably already understand that theoretical knowledge in this field should ideally be consolidated in practice right away. To do this, order inexpensive CityHost hosting with the latest version of PHP, MySQL databases and access via the SSH protocol, create a blog or online store on your favorite engine and hone your site administration skills on a real example.

How to create and configure robots.txt for WordPress

To create a correct robots.txt, you will need:

a text editor (for example, Notepad++, Atom or even the standard Windows Notepad);
FTP client (eg FileZilla, WinSCP or Cyberduck);
10–15 minutes of free time.

Note. If you do not know how to work with FTP clients and your site is hosted on CityHost, you can add customized WordPress robots txt through the file manager in the hosting control panel.

First, open a text editor, create a new file and save it with the name robots and the extension .txt. It is important that all letters are written in lower case. Robots.txt, robots.TXT, or ROBOTS.txt options are incorrect.

Next, add the following code to the file and replace the link in the last line with the URL of your XML sitemap: [SCREEN SHOT]

User-agent: *

Disallow: /cgi-bin
Disallow: /?
Disallow: /wp-
Disallow: /wp/
Disallow: *?s=
Disallow: *&s=
Disallow: /search/
Disallow: /author/
Disallow: /users/
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: */embed
Disallow: */wlwmanifest.xml
Disallow: /xmlrpc.php
Disallow: *utm*=
Disallow: *openstat=
Allow: */uploads

Sitemap: https://example.com/sitemap.xml

We suggest you take a closer look at the syntax of this robots.txt example for WordPress:

User-agent — defines for which web spiders the rules below are written. The value * indicates that what is written further in robots.txt should be taken into account by all search engines.
Disallow - Tells web spiders which directories or files should not be crawled. For WordPress sites, it is recommended to close from scanning the pages of the authors' archives, search results, the entrance to the Engine Workshop, as well as the RSS feed, etc. This will protect the site from duplication of content and will prevent pages that do not belong there from being displayed by search engines.
Allow - Tells crawlers which directories or files to scan during a crawl. In our example, the uploads directory is open, which stores the images that this site contains.
Sitemap — indicates the sitemap. If there are two or more XML maps, write each of them in robots.txt with a separate line starting with Sitemap:.

The last step is to upload robots txt to the root directory of the site using an FTP client or a file manager in the hosting control panel. To verify that the download was successful, go to http://example.com/robots.txt, replacing example.com with your site's domain . If everything is done correctly, you will see a page displaying the above code.

By the way, what will happen if you apply disallow all in robots.txt?

If you want to prevent all search engines from crawling the site, use the following contents of the robots.txt file:

User-agent: *
Disallow: /

This configuration can negatively affect SEO, as search engines will not be able to index the pages. Use it carefully, for example, for sites in the development stage.

How to check robots.txt for errors

You learned how to configure robots.txt for WordPress, and the next step is to check that it is compiled correctly. To do this, do the following:

Open the "robots.txt file checker" found in the old version of Google Search Console ("Scan" menu item).
Copy the contents of the robots.txt file and paste it into the edit window.
Make sure that the number of errors and warnings in the lower left corner of the editing window is zero. If there are problems with the syntax of the file, this line will show the number of errors or recommendations, and a red or orange icon will appear to the left of the editor line with incorrect information. Hover over it to see a description of the error.

If you see the line "Errors: 0, Warnings: 0" under the "Robots.txt Checker" edit window, notify the search engine of changes to robots.txt. To do this, click the "Send" button in the lower right corner of the editor. Confirm sending the file update request by clicking the "Send" button next to option #3 in the window that appears.

Why is it important to configure robots.txt correctly

In January 2017, Good Corporation Quality Analyst Gary Yesh published an article on Google's official blog for webmasters titled What Crawling Budget Means for Googlebot. In it, he noted that if a web spider encounters low-quality pages or pages that duplicate the content of another page while browsing the site, the speed and frequency of scanning will decrease. The negative consequence of this is that after adding new content to your site, it will not appear in the edition soon.

Correctly configured robots txt for WordPress prevents search engines from crawling duplicates and pages that are of no value to visitors. And, taking this into account, it is no less important an element of the technical optimization of the site than, for example, a correctly compiled and automatically updated site map or enabled gzip-compression.

Another characteristic of the site, which has a positive effect on both search engine optimization and the level of visitor satisfaction, is the high speed of loading web pages. Renting a dedicated server can help with this, or a cheaper option is a virtual server. There will be enough resources to make an online store, online portal or blog fast, reliable and secure.

Was the publication informative? Then share it on social networks and join our Telegram channel. We remind you that the hosting company CityHost provides inexpensive hosting services for sites of any complexity. For technical questions, contact us via online chat or by phone ?? 0 800 219 220.

Like the article? Tell your friends about it:

What is robots.txt | How to configure robots.txt for WordPress

How to create and configure robots.txt for WordPress

How to check robots.txt for errors

Why is it important to configure robots.txt correctly

Author: Bohdana Haivoronska