Class SafeContentHandler

All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler
Direct Known Subclasses:
XHTMLContentHandler, XMPContentHandler

public class SafeContentHandler extends ContentHandlerDecorator
Content handler decorator that makes sure that the character events (characters(char[], int, int) or ignorableWhitespace(char[], int, int)) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding the writeReplacement(Output) method).

The XML standard defines the following Unicode character ranges as valid XML characters:

 #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
 

Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.