@Beta @GwtCompatible public abstract class UnicodeEscaper extends Escaper
Escaperthat converts literal text into a format safe for inclusion in a particular context (such as an XML document). Typically (but not always), the inverse process of "unescaping" the text is performed automatically by the relevant parser.
For example, an XML escaper would convert the literal string
"Foo<Bar>" to prevent
"<Bar>" from being confused with an XML tag. When the
resulting XML document is parsed, the parser API will return this text as the original literal
Note: This class is similar to
CharEscaper but with one very important
difference. A CharEscaper can only process Java UTF16 characters in isolation and may not cope
when it encounters surrogate pairs. This class facilitates the correct escaping of all Unicode
As there are important reasons, including potential security issues, to handle Unicode correctly if you are considering implementing a new escaper you should favor using UnicodeEscaper wherever possible.
UnicodeEscaper instance is required to be stateless, and safe when used concurrently
by multiple threads.
|Modifier||Constructor and Description|
Constructor for use by subclasses.
|Modifier and Type||Method and Description|
Returns the Unicode code point of the character at the given index.
Returns the escaped form of the given Unicode code point, or
Returns the escaped form of a given literal string.
Returns the escaped form of a given literal string, starting at the given index.
Scans a sub-sequence of characters from a given
protected abstract char escape(int cp)
nullif this code point does not need to be escaped. When called as part of an escaping operation, the given code point is guaranteed to be in the range
0 <= cp <= Character#MAX_CODE_POINT.
If an empty array is returned, this effectively strips the input character from the resulting text.
If the character does not need to be escaped, this method should return
than an array containing the character representation of the code point. This enables the
escaping algorithm to perform more efficiently.
If the implementation of this method cannot correctly handle a particular code point then it should either throw an appropriate runtime exception or return a suitable replacement character. It must never silently discard invalid input as this may constitute a security risk.
cp- the Unicode code point to escape if necessary
nullif no escaping was needed
If you are escaping input in arbitrary successive chunks, then it is not generally safe to
use this method. If an input string ends with an unmatched high surrogate character, then this
method will throw
IllegalArgumentException. You should ensure your input is valid UTF-16 before calling this method.
Note: When implementing an escaper it is a good idea to override this method for
efficiency by inlining the implementation of
nextEscapeIndex(CharSequence, int, int)
directly. Doing this for
PercentEscaper more than doubled the
performance for unescaped strings (as measured by
protected int nextEscapeIndex(CharSequence csq, int start, int end)
CharSequence, returning the index of the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override this method for
efficiency. The base class implementation determines successive Unicode code points and invokes
escape(int) for each of them. If the semantics of your escaper are such that code
points in the supplementary range are either all escaped or all unescaped, this method can be
implemented more efficiently using
Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
PercentEscaper for an example.
csq- a sequence of characters
start- the index of the first character to be scanned
end- the index immediately after the last character to be scanned
IllegalArgumentException- if the scanned sub-sequence of
csqcontains invalid surrogate pairs
protected final String escapeSlow(String s, int index)
escape(String)method when it discovers that escaping is required. It is protected to allow subclasses to override the fastpath escaping function to inline their escaping test. See
CharEscaperBuilderfor an example usage.
This method is not reentrant and may only be invoked by the top level
protected static int codePointAt(CharSequence seq, int index, int end)
The behaviour of this method is as follows:
index >= end,
seq- the sequence of characters from which to decode the code point
index- the index of the first character to decode
end- the index beyond the last valid character to decode
Copyright © 2010–2019. All rights reserved.