Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INSTALLATION] Blocking cli test OK, not blocking one actual bot #191

Open
superian opened this issue Mar 12, 2023 · 0 comments
Open

[INSTALLATION] Blocking cli test OK, not blocking one actual bot #191

superian opened this issue Mar 12, 2023 · 0 comments

Comments

@superian
Copy link

superian commented Mar 12, 2023

Describe the problem you are experiencing

A new install today - it took a few experiments to get it to work (I needed to lose the 'Require all granted' from the directory options that had been there before) but it is now doing so and blocking stuff

185.25.35.10 - - [12/Mar/2023:00:15:14 +0000] "GET /robots.txt HTTP/1.1" 403 260 "-" "magpie-crawler/1.1 (robots-txt-checker; +http://www.brandwatch.net)" 0

But there are others that are still getting through: the first line is me doing the curl test, the second coincidentally happened a fraction of a second later from the real thing.

1.2.3.4 - - [11/Mar/2023:23:59:11 +0000] "GET /page/ HTTP/1.1" 403 260 "-" "petalbot" 0

114.119.130.102 - - [11/Mar/2023:23:59:28 +0000] "GET /page/more-url/80?sort=last_post%3BPHPSESSID%3Df2qsc6p7174sfwjoej24ubmr63f HTTP/1.1" 200 6863 "https://example.com/page/more-url/120?sort=last_post%3BPHPSESSID%3D5757qonrosaciuhp1h3i23c7ft88" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" 0

Here's another that I think should have been stopped:

144.76.22.179 - - [12/Mar/2023:00:07:24 +0000] "GET /robots.txt HTTP/1.1" 200 211 "http://example.com/robots.txt" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20170101 Firefox/67.0" 0

.. because if I try, it is:

1.2.3.4 - - [12/Mar/2023:00:15:39 +0000] "GET /robots.txt HTTP/1.1" 403 260 "-" "trendictionbot" 0

Server (please complete the following information):

  • OS: Ubuntu 22.04.2
  • Apache Version: Apache/2.4.52 (Ubuntu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant