Access restrictions for bots

Very often, search bots scan the site quite aggressively and thereby create an increased load. In order to stop indexing, you need to write the following rule in the robots.txt file:

User-agent: *

Disallow: /

This rule completely excludes site crawling by bots. If you completely restrict access to the site for crawling, the site may disappear from search engine results.

1. Therefore, you can restrict access only to specific folders, links, files, and extensions using the Disallow directive.

Examples:

User-agent: *

Disallow: / directory

User-agent: *

Disallow: /privatinfo.php

User-agent: *

Disallow: /privatpic.jpg

User-agent: *

Disallow: / * jpg $

More detailed information on working with the file can be found in the Google instructions .

2. You can also limit access through the file. htaccess to certain pages of the site as described in the instructions .

To limit access from a specific IP address, it is enough to write a rule:

Order Allow, Deny

Allow from all

Deny from ***. ***. ***. ***

***. ***. ***. *** - replace with IP address. You can find out the IP address by opening the FTP server logs in the logs folder.

3. To block access to all except Ukrainian IPs, you can use the instructions .

4. You can close bot access to vulnerable pages through an internal captcha, which can be found in the section Hosting 2.0 - Sites - Security - PROTECTED PAGES (CAPTCHA) *:

* Captcha is enabled by default for the following pages:

WP LOGIN PAGE : wp-admin, wp-login.php

JOOMLA ADMIN PAGE: /administrator, view=login

JOOMLA REGISTER PAGE : view=registration

OC ADMIN PAGE : /admin

MODX ADMIN PAGE: /manager

PRESTA SHOP ADMIN PAGE: /Backoffice

DRUPAL ADMIN PAGE : /user/login/

More details about robots.txt are described in our blog - https://cityhost.ua/blog/chto-takoe-robots-txt-kak-nastroit-robots-txt-dlya-wordpress.html

All question categories

Hosting

Domains

Virtual and dedicated servers [VPS/DS]

Payment of services