change: sanitize markdown to prevent JS execution in documentation output #692

MaddyGuthridge · 2024-09-08T12:29:05Z

Is your change request related to a problem? Please describe.

I recently discovered in this mind-bending issue that most Markdown implementations don't sanitize their output, with Python-Markdown being no exception. This means that (contrary to popular belief), Markdown text cannot be trusted to be safe.

While developers should be careful to only accept merges after thoroughly reviewing code, there are lots of ways to subtlely embed JS into documents, which can be easily overlooked (eg using onerror in an <img> tag). I cannot think of any non-malicious to embed executing JS code within documentation markdown, when it is so much easier to bundle additional JS using mkdocs (which reviewers would be much more suspicious of, and therefore much more careful of).

Describe the solution you'd like

Sanitize the Markdown output within documentation to prevent executable JS code from being embedded in the output. Perhaps mozilla/bleach can be used.

Describe alternatives you've considered

Do nothing, but document that docstrings cannot be blindly trusted to be safe, as JS can be embedded within them.

Additional context

The text was updated successfully, but these errors were encountered:

pawamoy · 2024-09-08T13:08:45Z

Hi again @MaddyGuthridge, thanks for the request.

I cannot think of any non-malicious to embed executing JS code within documentation markdown

I often use <script> in my own Markdown pages. Some of these scripts only need to appear in single specific pages, not all pages. A few examples:

To update a few things in my insiders pages:

<script src="../js/insiders.js"></script>
<script>updateInsidersPage('pawamoy');</script>

To make some SVGs zoomable in a specific page:

<script>
    document.addEventListener("DOMContentLoaded", function(){
        const divs = document.getElementsByClassName("interactiveSVG");
        for (let i = 0; i < divs.length; i++) {
            if (!divs[i].firstElementChild.id) {
                divs[i].firstElementChild.id = `interactiveSVG-${i}`
            }
            svgPanZoom(`#${divs[i].firstElementChild.id}`, {});
        }
    });
</script>

Generally speaking, Markdown is a superset of HTML, so I don't find it surprising that HTML (including <script>) is not escaped. The role of Markdown implementations is to convert Markdown to HTML, so keeping the raw HTML as is seems like the correct thing to do. I'd personally be very annoyed if Python-Markdown started escaping <script> or more HTML elements, as I very often rely on HTML when Markdown is insufficient. IMO sanitizing should be done on the final HTML, not during Markdown->HTML conversion.

However I agree that it could make sense to provide an option at the level of mkdocstrings, to escape HTML (or just <script>) in docstrings, since it can sometimes happen that you render docstrings from code you don't have control over. I could argue that you shouldn't render docstrings from untrusted sources, but that wouldn't be very productive.

So the solution I suggest here is to use a Griffe extension that will escape HTML script elements in docstrings. Seems super easy to do with bleach, as you mentioned 🙂

import bleach
import griffe


# I checked bleach's docs, but unfortunately it doesn't look
# like there's a way to only escape script elements.
class EscapeHTMLExtension(griffe.Extension):
    def on_instance(self, obj: griffe.Object, **kwargs) -> None:
        if obj.docstring:
            obj.docstring.value = bleach.clean(obj.docstring.value)

WDYT?

waylan · 2024-09-10T16:08:22Z

If you are using bleach to sanitize Markdown output, I might suggest yourcelf/bleach-allowlist which contains a comprehensive set of bleach configs.

pawamoy · 2024-09-10T16:36:15Z

Thanks @waylan!

One thing I'm unsure of, is whether bleach is able to escape Markdown, without breaking it. For example, will the following be left intact, or will <script> within the code block be escaped to <script>?

a = 0
"""Something about `<script>`."""

waylan · 2024-09-11T15:04:59Z

Bleach does not understand Markdown at all. One should only pass the HTML output of Markdown to Bleach. Of course, if valid HTML code blocks (or spans) contain Markdown, Bleach will understand that those blocks are not HTML and not try to parse them, but I'm not sure what escaping it will do in that scenario. However, if Bleach is passed Markdown which is not in a valid HTML code block or span, then all bets are off. Assume Bleach will not handle it in a sensible way.

pawamoy · 2024-09-11T17:42:31Z

Thanks. Escaping docstrings with Bleach is probably not a good idea then. Maybe mkdocstrings could allow hooking into its convert_markdown Jinja filter instead, to escape docstrings once they're converted to HTML 🤔

MaddyGuthridge assigned pawamoy Sep 8, 2024

pawamoy mentioned this issue Sep 10, 2024

bug: rendering skips items when members option contains an item named _escape_children #691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change: sanitize markdown to prevent JS execution in documentation output #692

change: sanitize markdown to prevent JS execution in documentation output #692

MaddyGuthridge commented Sep 8, 2024

pawamoy commented Sep 8, 2024 •

edited

Loading

waylan commented Sep 10, 2024

pawamoy commented Sep 10, 2024

waylan commented Sep 11, 2024

pawamoy commented Sep 11, 2024 •

edited

Loading

change: sanitize markdown to prevent JS execution in documentation output #692

change: sanitize markdown to prevent JS execution in documentation output #692

Comments

MaddyGuthridge commented Sep 8, 2024

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

pawamoy commented Sep 8, 2024 • edited Loading

waylan commented Sep 10, 2024

pawamoy commented Sep 10, 2024

waylan commented Sep 11, 2024

pawamoy commented Sep 11, 2024 • edited Loading

pawamoy commented Sep 8, 2024 •

edited

Loading

pawamoy commented Sep 11, 2024 •

edited

Loading