Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robots.txt #1509

Open
JeltevanBoheemen opened this issue Mar 14, 2024 · 1 comment
Open

Robots.txt #1509

JeltevanBoheemen opened this issue Mar 14, 2024 · 1 comment
Labels
enhancement improvements to user functionality

Comments

@JeltevanBoheemen
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Because I-Analyzer now does not require a login, the application is vulnerable to crawling.

Describe the solution you'd like
Provide a robots.txt with some sensible defaults. To be decided what these should be. @ar-jan seems to have some ideas about this?

@JeltevanBoheemen JeltevanBoheemen added the enhancement improvements to user functionality label Mar 14, 2024
@ar-jan
Copy link
Contributor

ar-jan commented Mar 28, 2024

Since our main concern is performance issues due to crawling, I think it's best to just disallow /search/ and keep it at that.

I think it's fine if no search results pages are included in crawl indices at all, since they are dynamic. There's no guarantee that a specific search phrase will still be located at a particular page of search results. If there's a benefit to having people find the website through the content of the corpora, we could include a sitemap with direct links to individual documents, but that would be a huge list.

So the simple solution would just be:

User-agent: *
Disallow: /search/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements to user functionality
Projects
None yet
Development

No branches or pull requests

2 participants