Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kshitijk4poor/main #1304

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open

Kshitijk4poor/main #1304

wants to merge 60 commits into from

Conversation

michalkrzem
Copy link
Collaborator

No description provided.

"wróć do bezpieczeństwa",
"utrzymywana jest na serwerach",
"parametry wspólne serwerów",
"forbidden",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these keywords are too generic and may match too mch (e.g. "forbidden")

image

IMO let's take 5-10 most popular website providers and just match their "no website" pages (even matching with html so that we don't match some text inside)

BTW, make this configurable

@@ -698,6 +698,29 @@ class Nuclei:
"NUCLEI_TEMPLATE_CHUNK_SIZE is 200, three calls will be made with 200 templates each.",
] = get_config("NUCLEI_TEMPLATE_CHUNK_SIZE", default=200, cast=int)

class PlaceholderPagesHtmlElements:
PLACEHOLDER_PAGE_HTML_ELEMENTS: Annotated[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a good idea to extract this to a file (have config named "PLACEHOLDER_PAGE_CONTENT_FILENAME" and these contents being loaded from file)

class PlaceholderPagesHtmlElements:
PLACEHOLDER_PAGE_HTML_ELEMENTS: Annotated[
List[str],
"A list of html elements that appear on placeholder pages",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe write that exact string matching will be performed, no parsing, no nothing. And write what will happen with such pages

@@ -153,6 +154,11 @@ def check_domain_exists(self, domain: str) -> bool:
bool: True if the domain exists, False otherwise.
"""
try:
manager = AnalyzerManager(domain)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name this class more verbosely, not AnalyzerManager, maybe PlaceholderPageDetector?

@@ -153,6 +154,11 @@ def check_domain_exists(self, domain: str) -> bool:
bool: True if the domain exists, False otherwise.
"""
try:
manager = AnalyzerManager(domain)
result = manager.run_analysis()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mabe just is_placeholder(domain)

@@ -698,6 +698,29 @@ class Nuclei:
"NUCLEI_TEMPLATE_CHUNK_SIZE is 200, three calls will be made with 200 templates each.",
] = get_config("NUCLEI_TEMPLATE_CHUNK_SIZE", default=200, cast=int)

class PlaceholderPagesHtmlElements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO add a configuration variable to enable this behavior, default false - by default people may not want random webpages to be skipped

if response:
html_content = response.content
with open(
Config.Modules.PlaceholderPageContent.PLACEHOLDER_PAGE_CONTENT_FILENAME, "r", encoding="utf-8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will cause the file to be read on every check, this can be made more optimal by moving to init and constrycting a singleton instance (and calling is_placeholder on different urls)

return True


class PlaceholderPageDetector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having two classes adds anything in terms of readability

@@ -153,6 +154,12 @@ def check_domain_exists(self, domain: str) -> bool:
bool: True if the domain exists, False otherwise.
"""
try:
if Config.Modules.PlaceholderPageContent.PLACEHOLDER_PAGE_DETECTOR_ENABLE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLACEHOLDER_PAGE_DETECTOR_ENABLED or ENABLE_PLACEHOLDER_PAGE_DETECTOR

class PlaceholderPageContent:
PLACEHOLDER_PAGE_DETECTOR_ENABLE: Annotated[
bool,
"Enable or disable placeholder pages detector."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space or newline between dot and A

PLACEHOLDER_PAGE_DETECTOR_ENABLE: Annotated[
bool,
"Enable or disable placeholder pages detector."
"A strict string matching will be performed without any parsing or modifications. The string will be "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two sentences with the same meaning: "A strict string matching will be performed without any parsing or modifications." and "The string will ...".

"Enable or disable placeholder pages detector."
"A strict string matching will be performed without any parsing or modifications. The string will be "
"matched exactly as provided, without applying any transformations or processing. If the page exists "
"but the specified string is found within it, the page will not be scanned for vulnerabilities. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo "and the specified"

class PlaceholderPageContent:
PLACEHOLDER_PAGE_DETECTOR_ENABLE: Annotated[
bool,
"Enable or disable placeholder pages detector."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add maybe "using this feature you may skip vulnerability scanning for websites that aren't built yet, but e.g. contain a hosting provider placeholder page"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants